[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-02-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571002#comment-13571002
 ] 

Hudson commented on HBASE-7643:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/])
HBASE-7643 HFileArchiver.resolveAndArchive() race condition may lead to 
snapshot data loss (Revision 1438973)

 Result = FAILURE
mbertozzi : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/backup/TestHFileArchiving.java


 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563643#comment-13563643
 ] 

Lars Hofhansl commented on HBASE-7643:
--

You going to commit [~mbertozzi]?

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-26 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563648#comment-13563648
 ] 

Matteo Bertozzi commented on HBASE-7643:


committed to trunk and 0.94, thanks guys for the review!

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563675#comment-13563675
 ] 

Hudson commented on HBASE-7643:
---

Integrated in HBase-TRUNK #3806 (See 
[https://builds.apache.org/job/HBase-TRUNK/3806/])
HBASE-7643 HFileArchiver.resolveAndArchive() race condition may lead to 
snapshot data loss (Revision 1438972)

 Result = FAILURE
mbertozzi : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestHFileArchiving.java


 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563683#comment-13563683
 ] 

Hudson commented on HBASE-7643:
---

Integrated in HBase-0.94 #782 (See 
[https://builds.apache.org/job/HBase-0.94/782/])
HBASE-7643 HFileArchiver.resolveAndArchive() race condition may lead to 
snapshot data loss (Revision 1438973)

 Result = FAILURE
mbertozzi : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/backup/TestHFileArchiving.java


 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563691#comment-13563691
 ] 

Hudson commented on HBASE-7643:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #377 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/377/])
HBASE-7643 HFileArchiver.resolveAndArchive() race condition may lead to 
snapshot data loss (Revision 1438972)

 Result = FAILURE
mbertozzi : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/backup/TestHFileArchiving.java


 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13563733#comment-13563733
 ] 

Hudson commented on HBASE-7643:
---

Integrated in HBase-0.94-security #99 (See 
[https://builds.apache.org/job/HBase-0.94-security/99/])
HBASE-7643 HFileArchiver.resolveAndArchive() race condition may lead to 
snapshot data loss (Revision 1438973)

 Result = FAILURE
mbertozzi : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/backup/TestHFileArchiving.java


 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-25 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562883#comment-13562883
 ] 

Jesse Yates commented on HBASE-7643:


Looks good to me too. Annoying that this isn't covered by the master not doing 
recursive delete of the directory (which fails if there are things 
underneath)... grr, race conditions. Thanks matteo!

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561500#comment-13561500
 ] 

Hadoop QA commented on HBASE-7643:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12566268/HBASE-7653-p4-v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4156//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4156//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4156//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4156//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4156//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4156//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4156//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4156//console

This message is automatically generated.

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not 

[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562136#comment-13562136
 ] 

Lars Hofhansl commented on HBASE-7643:
--

Patch looks good. Curious about the RETRIES=6, how did you come up with 6? 
Should be very rare condition to I would through something like 3 is better.

Looking at all the deletingXXXWithoutArchiving. Should we rename these all for 
deletingXXXWithoutArchivingForTests? These should only ever be called during 
the tests, right?

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-24 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562143#comment-13562143
 ] 

Matteo Bertozzi commented on HBASE-7643:


I wanted a while (true) just to be sure, but unless you're spinning like in the 
test just one retry should be fine. I'll change it to 3.

some of the deletingXXXWithoutArchiving() are used in the main code, but mostly 
to perform the delete empty directory from the original location.

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-24 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562251#comment-13562251
 ] 

Matteo Bertozzi commented on HBASE-7643:


noticed that the deletingXXXWithoutArchiving() are private, so not used by tests

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562269#comment-13562269
 ] 

Lars Hofhansl commented on HBASE-7643:
--

Sorry, what I meant to say is that the conditions for these to be called only 
occur in tests.

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562312#comment-13562312
 ] 

Hadoop QA commented on HBASE-7643:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12566425/HBASE-7653-p4-v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestZooKeeper

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4173//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4173//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4173//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4173//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4173//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4173//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4173//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4173//console

This message is automatically generated.

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * 

[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562461#comment-13562461
 ] 

Lars Hofhansl commented on HBASE-7643:
--

+1 on v7.
Mr. [~jesse_yates], wanna have a look too?

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch, HBASE-7653-p4-v6.patch, HBASE-7653-p4-v7.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560939#comment-13560939
 ] 

Hadoop QA commented on HBASE-7643:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12566153/HBASE-7653-p4-v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestLocalHBaseCluster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4149//console

This message is automatically generated.

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, 

[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-23 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561143#comment-13561143
 ] 

Matteo Bertozzi commented on HBASE-7643:


committed p4-v5 to the snapshot branch, to have more coverage (jenkins, test 
rig, ...)
I'll commit it to trunk in a couple of days if everything is fine and there're 
no objections.

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss

2013-01-23 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561355#comment-13561355
 ] 

Lars Hofhansl commented on HBASE-7643:
--

Feel free to commit to 0.94 as well.

 HFileArchiver.resolveAndArchive() race condition may lead to snapshot data 
 loss
 ---

 Key: HBASE-7643
 URL: https://issues.apache.org/jira/browse/HBASE-7643
 Project: HBase
  Issue Type: Bug
Affects Versions: hbase-6055, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Blocker
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, 
 HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, 
 HBASE-7653-p4-v5.patch


  * The master have an hfile cleaner thread (that is responsible for cleaning 
 the /hbase/.archive dir)
  ** /hbase/.archive/table/region/family/hfile
  ** if the family/region/family directory is empty the cleaner removes it
  * The master can archive files (from another thread, e.g. DeleteTableHandler)
  * The region can archive files (from another server/process, e.g. compaction)
 The simplified file archiving code looks like this:
 {code}
 HFileArchiver.resolveAndArchive(...) {
   // ensure that the archive dir exists
   fs.mkdir(archiveDir);
   // move the file to the archiver
   success = fs.rename(originalPath/fileName, archiveDir/fileName)
   // if the rename is failed, delete the file without archiving
   if (!success) fs.delete(originalPath/fileName);
 }
 {code}
 Since there's no synchronization between HFileArchiver.resolveAndArchive() 
 and the cleaner run (different process, thread, ...) you can end up in the 
 situation where you are moving something in a directory that doesn't exists.
 {code}
 fs.mkdir(archiveDir);
 // HFileCleaner chore starts at this point
 // and the archiveDirectory that we just ensured to be present gets removed.
 // The rename at this point will fail since the parent directory is missing.
 success = fs.rename(originalPath/fileName, archiveDir/fileName)
 {code}
 The bad thing of deleting the file without archiving is that if you've a 
 snapshot that relies on the file to be present, or you've a clone table that 
 relies on that file is that you're losing data.
 Possible solutions
  * Create a ZooKeeper lock, to notify the master (Hey I'm archiving 
 something, wait a bit)
  * Add a RS - Master call to let the master removes files and avoid this 
 kind of situations
  * Avoid to remove empty directories from the archive if the table exists or 
 is not disabled
  * Add a try catch around the fs.rename
 The last one, the easiest one, looks like:
 {code}
 for (int i = 0; i  retries; ++i) {
   // ensure archive directory to be present
   fs.mkdir(archiveDir);
   //  possible race -
   // try to archive file
   success = fs.rename(originalPath/fileName, archiveDir/fileName);
   if (success) break;
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira