[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012128#comment-14012128 ] Andrew Purtell commented on HBASE-11218: Let me look what might be needed to put back into 0.98 tomorrow Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012244#comment-14012244 ] Hudson commented on HBASE-11218: SUCCESS: Integrated in HBase-0.98 #320 (See [https://builds.apache.org/job/HBase-0.98/320/]) HBASE-11218 Data loss in HBase standalone mode; REVERT (stack: rev 111ebaaea3bc70888ac27c09d0c8433f66df13e0) * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012359#comment-14012359 ] Hudson commented on HBASE-11218: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #301 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/301/]) HBASE-11218 Data loss in HBase standalone mode; REVERT (stack: rev 111ebaaea3bc70888ac27c09d0c8433f66df13e0) * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012522#comment-14012522 ] Andrew Purtell commented on HBASE-11218: Further work can be found on HBASE-11272 Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013246#comment-14013246 ] Hudson commented on HBASE-11218: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #302 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/302/]) HBASE-11272 Backport HBASE-11218 (Data loss in HBase standalone mode) to 0.98 (apurtell: rev 13c7ff669532e18ebf5261734b98c51a4ad69c6b) * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java * hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013274#comment-14013274 ] Hudson commented on HBASE-11218: SUCCESS: Integrated in HBase-0.98 #321 (See [https://builds.apache.org/job/HBase-0.98/321/]) HBASE-11272 Backport HBASE-11218 (Data loss in HBase standalone mode) to 0.98 (apurtell: rev 13c7ff669532e18ebf5261734b98c51a4ad69c6b) * hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011092#comment-14011092 ] Hadoop QA commented on HBASE-11218: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647120/HBASE-11218-trunk-v2.diff against trunk revision . ATTACHMENT ID: 12647120 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 32 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9621//console This message is automatically generated. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011369#comment-14011369 ] stack commented on HBASE-11218: --- Committed to master. Let me commit to 0.98 too. Fixed the javadoc warning in trunk commit (was not from this patch). Added addendum updating doc on local fs dataloss to say we don't lose data in 0.98.4 and 1.0.0. Thanks for the patch [~liushaohui] Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011458#comment-14011458 ] Hudson commented on HBASE-11218: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #300 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/300/]) HBASE-11218 Data loss in HBase standalone mode (Liu Shaohui) (stack: rev 86ab435b8cd4d77ad4b90cd43fd5acd8579b60a4) * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java * hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011577#comment-14011577 ] Hudson commented on HBASE-11218: SUCCESS: Integrated in HBase-TRUNK #5149 (See [https://builds.apache.org/job/HBase-TRUNK/5149/]) HBASE-11218 Data loss in HBase standalone mode (Liu Shaohui) (stack: rev 78f7cd450fe7ad5ed4b4b9634c7499e65968476f) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentListener.java * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java HBASE-11218 Data loss in HBase standalone mode (Liu Shaohui) -- DOC ADDENDUM (stack: rev b4a2d607a30344c37ed361cc8ffefe9159df01f0) * src/main/docbkx/configuration.xml Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011594#comment-14011594 ] Andrew Purtell commented on HBASE-11218: Thanks for applying to 0.98 Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011633#comment-14011633 ] Hudson commented on HBASE-11218: SUCCESS: Integrated in HBase-0.98 #319 (See [https://builds.apache.org/job/HBase-0.98/319/]) HBASE-11218 Data loss in HBase standalone mode (Liu Shaohui) (stack: rev 86ab435b8cd4d77ad4b90cd43fd5acd8579b60a4) * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java * hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java * hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012007#comment-14012007 ] Ted Yu commented on HBASE-11218: Was the following compilation error (on hadoop-1) related to this JIRA ? {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-server: Compilation failure [ERROR] /home/jenkins/jenkins-slave/workspace/HBase-0.98-on-Hadoop-1.1/hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java:[87,8] cannot find symbol [ERROR] symbol : method setWriteChecksum(boolean) [ERROR] location: class org.apache.hadoop.fs.FileSystem {code} Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: 11218v2.098.txt, HBASE-11218-trunk-v1.diff, HBASE-11218-trunk-v2.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009987#comment-14009987 ] stack commented on HBASE-11218: --- [~liushaohui] Could you do your same trick on the RS side too if local fs (as Enis suggests above)? Thanks for doing the test compares. I will commit this unless objection later today. We can look into fixing pseudo on local fs in a new issue? Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010023#comment-14010023 ] Enis Soztutar commented on HBASE-11218: --- bq. I think the data loss in pseudo distributed mode caused by hardware failure is accepted, data loss caused by the HBase's implement is not accepted and we should fix it. I think there is a misunderstanding here. I was referring to the same local file system issue. The fix in the patch will only fix it for local mode deployments, but not pseudo-distributed mode deployments. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010758#comment-14010758 ] Liu Shaohui commented on HBASE-11218: - [~enis] Sorry, It's my mistake. Agree that we should fix this issue in the pseudo-distributed mode deployments. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008526#comment-14008526 ] Liu Shaohui commented on HBASE-11218: - [~nkeywal] The comparison done in my dev machine is following ||case||trunk||HBASE-11218|| |small tests|6m32.725s|6m34.124s| |medium tests|91m23.238s|93m6.703s| |large tests|154m58.474s|150m42.585s| The test time increase is very little. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008527#comment-14008527 ] Liu Shaohui commented on HBASE-11218: - [~enis] {quote} in pseudo distributed mode, where there is an actual RS, but still using the local fs, there still will be data loss with this patch from my understanding {quote} Yes. but I think the data loss in pseudo distributed mode caused by hardware failure is accepted, data loss caused by the HBase's implement is not accepted and we should fix it. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008238#comment-14008238 ] stack commented on HBASE-11218: --- I like the idea of adding the config to the hbase fs. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006966#comment-14006966 ] Nicolas Liochon commented on HBASE-11218: - I'm curious, why is it an issue for you? On the patch, will this slow down the tests? Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006980#comment-14006980 ] Liu Shaohui commented on HBASE-11218: - [~nkeywal] {quote} I'm curious, why is it an issue for you? {quote} We use the standalone hbase in our onebox test env. In this env, we start the application and other services (eg, hbase, zookeeper) that it depends on in a machine, , then do many end-to-end logical and failure tests. So it is important to keep data durability in some level. Data loss in standalone mode give a not good impression of HBase to out users. {quote} On the patch, will this slow down the tests? {quote} I will do a comparison and show the data later. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006979#comment-14006979 ] Liu Shaohui commented on HBASE-11218: - [~nkeywal] {quote} I'm curious, why is it an issue for you? {quote} We use the standalone hbase in our onebox test env. In this env, we start the application and other services (eg, hbase, zookeeper) that it depends on in a machine, , then do many end-to-end logical and failure tests. So it is important to keep data durability in some level. Data loss in standalone mode give a not good impression of HBase to out users. {quote} On the patch, will this slow down the tests? {quote} I will do a comparison and show the data later. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007395#comment-14007395 ] stack commented on HBASE-11218: --- Patch looks good to me. I like the way the patch skirts the issue that causes us dataloss. Why you think slowdown [~nkeywal]? Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007456#comment-14007456 ] Nicolas Liochon commented on HBASE-11218: - I'm afraid of the added syncs... But I understand the need. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007461#comment-14007461 ] stack commented on HBASE-11218: --- If only we had a robot that could run the patch against trunk so we could compare before and after. Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007466#comment-14007466 ] Nicolas Liochon commented on HBASE-11218: - to be clearer I'm +1 if there is no test time increase, or if it's limited. 10% would be ok... Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11218) Data loss in HBase standalone mode
[ https://issues.apache.org/jira/browse/HBASE-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007846#comment-14007846 ] Enis Soztutar commented on HBASE-11218: --- in pseudo distributed mode, where there is an actual RS, but still using the local fs, there still will be data loss with this patch from my understanding. Is that correct? Should we configure the local fs in HRegionFileSystem? Data loss in HBase standalone mode -- Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 Attachments: HBASE-11218-trunk-v1.diff Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog - assign root - split pre meta regionserver's hlog - assign meta - split all other regionservers' hlogs - assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)