[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229358#comment-13229358 ] Hudson commented on HBASE-5179: --- Integrated in HBase-TRUNK #2679 (See [https://builds.apache.org/job/HBase-TRUNK/2679/]) HBASE-5179 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler (Revision 1300194) Result = SUCCESS stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.2 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228558#comment-13228558 ] Hudson commented on HBASE-5179: --- Integrated in HBase-0.94 #30 (See [https://builds.apache.org/job/HBase-0.94/30/]) HBASE-5179 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler (Revision 1300222) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.2 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228999#comment-13228999 ] Hudson commented on HBASE-5179: --- Integrated in HBase-TRUNK-security #137 (See [https://builds.apache.org/job/HBase-TRUNK-security/137/]) HBASE-5179 Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler (Revision 1300194) Result = FAILURE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/CreateTableHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.2 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206615#comment-13206615 ] chunhui shen commented on HBASE-5179: - @ram We didn't find any problem now with the patch in 0.90 version testing, but as the comment by Stack in HBASE-5270, maybe we could use a new method, so i'm not sure whether to change the solution Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206074#comment-13206074 ] ramkrishna.s.vasudevan commented on HBASE-5179: --- @Chunhui Any updates on this testing. We have tried to use it in our testing cluster and till now no specific problems. If you can provide your feedback also it would be helpful in deciding if we can put this in next 0.90 RC. Any way for trunk we have HBASE-5270. @Ted What is your opinion on including this in next 0.90 RC? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206120#comment-13206120 ] Zhihong Yu commented on HBASE-5179: --- I think this patch should be included in the new RC. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195478#comment-13195478 ] chunhui shen commented on HBASE-5179: - @ram Yes, we will test the 90patch after Chinese spring holidays. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195429#comment-13195429 ] ramkrishna.s.vasudevan commented on HBASE-5179: --- @Chunhui Are you using 0.90.x in your cluster? If so, Can you test this patch before we commit it into 0.90? We have done some testing and next week after Chinese spring holidays we will do some more testing. If we are confident then we can commit this and will prevent Stack from searching us with hammer :) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.7, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192208#comment-13192208 ] Zhihong Yu commented on HBASE-5179: --- I created HBASE-5270. Let's integrate patch v18 to 0.90 for this JIRA. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191630#comment-13191630 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511589/5179-90v18.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/844//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191678#comment-13191678 ] Zhihong Yu commented on HBASE-5179: --- What do we do with 5179-92v17.patch ? Test harness may not be ready in 0.92 for the (future) trunk patch to be applied. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191954#comment-13191954 ] stack commented on HBASE-5179: -- Reviewing 0.92v17 isDeadServerInProgress is a new public method in ServerManager but it does not seem to be used anywhere. Does isDeadRootServerInProgress need to be public? Ditto for meta version. This method param names are not right 'definitiveRootServer'; what is meant by definitive? Do they need this qualifier? Is there anything in place to stop us expiring a server twice if its carrying root and meta? What is difference between asking assignment manager isCarryingRoot and this variable that is passed in? Should be doc'd at least. Ditto for meta. I think I've asked for this a few times -- onlineServers needs to be explained... either in javadoc or in comment. This is the param passed into joinCluster. How does it arise? I think I know but am unsure. God love the poor noob that comes awandering this code trying to make sense of it all. It looks like we get the list by trawling zk for regionserver znodes that have not checked in. Don't we do this operation earlier in master setup? Are we doing it again here? Though distributed split log is configured, we will do in master single process splitting under some conditions with this patch. Its not explained in code why we would do this. Why do we think master log splitting 'high priority' when it could very well be slower. Should we only go this route if distributed splitting is not going on. Do we know if concurrent distributed log splitting and master splitting works? Why would we have dead servers in progress here in master startup? Because a servershutdownhandler fired? This patch is different to the patch for 0.90. Should go into trunk first with tests, then 0.92. Should it be in this issue? This issue is really hard to follow now. Maybe this issue is for 0.90.x and new issue for more work on this trunk patch? This patch needs to have the v18 differences applied. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190894#comment-13190894 ] stack commented on HBASE-5179: -- Here is some review of v17. logDirExists should be called getLogDir or getLogDirIfExists? Why is the below done up in ServerManager rather than down in ServerShutdownHandler? I'd think it'd make more sense therein? Perhaps even inside in the MetaServerShutdownHandler or whatever its called? Not a biggie. Just asking. Can we talk more about what there are: + * @param onlineServers onlined servers when master starts Are these servers that have checked in between master start and the call to processFailover. Could other servers come between the making of the list we pass into processFailover and the running of the processFailover code? Should this be a 'live' list? Or, I see that we are actually getting rid of the 'live' list of online servers to replace it w/ this static one in these lines: {code} - } else if (!serverManager.isServerOnline(regionLocation.getServerName())) { + } else if (!onlineServers.contains(regionLocation.getServerName())) { {code} Why do we do this? Could this block of be code be done out in a nice coherent method rather than inline here in HMaster: {code} +// Check zk for regionservers that are up but didn't register +for (String sn : this.regionServerTracker.getOnlineServerNames()) { + if (!this.serverManager.isServerOnline(sn)) { +HServerInfo serverInfo = HServerInfo.getServerInfo(sn); +if (serverInfo != null) { + HServerInfo existingServer = serverManager + .haveServerWithSameHostAndPortAlready(serverInfo + .getHostnamePort()); + if (existingServer == null) { +// Not registered; add it. +LOG.info(Registering server found up in zk but who has not yet ++ reported in: + sn); +// We set serverLoad with one region, it could differentiate with +// regionserver which is started just now +HServerLoad serverLoad = new HServerLoad(); +serverLoad.setNumberOfRegions(1); +serverInfo.setLoad(serverLoad); +this.serverManager.recordNewServer(serverInfo, true, null); + } +} else { + LOG.warn(Server + sn + + found up in zk, but is not a correct server name); +} + } +} {code} Can we say more about why we are doing this? And how do we know that it did not just start? Because if it has more than 0 regions, it must have been already running? {code} +if (rootServerLoad != null rootServerLoad.getNumberOfRegions() 0) { + // If rootServer is online not start just now, we expire it + this.serverManager.expireServer(rootServerInfo); +} {code} It looks like the processing of the meta server duplicates code from the processing of the root server. Can we have instead the duplicated code out in a method? Is that possible? Then pass in args for whether root or meta to process? Regards the hard coded wait of 1000ms when looking for dir to go away, see '13.4.1.2.2. Sleeps in tests' in http://hbase.apache.org/book.html#hbase.tests... it seems like we should wait less than 1000 going by the reference guide. DeadServer class looks much better. I can't write a test for a patch on 0.90 and this v17 won't work for trunk at all. The trunk patch will be very different which is a problem because then I have no confidence my trunk unit test applies to this patch that is to be applied on 0.90 branch. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch,
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190904#comment-13190904 ] Zhihong Yu commented on HBASE-5179: --- Thanks for the comments, Stack. Here is what Chunhui had to say (with minor grammatical corrections): {noformat} Because 0.90 branch doesn't record the startcode of metaserver or rootserver when we call getMetaLocation or getRootLocation, the patch for 90 is more complex than the patch for trunk. If there is unit test case, it would be perfect. But I didn't succeed doing that because the zk node of RS will be deleted immediately, not 3mins later in simulated RS termination. {noformat} Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190446#comment-13190446 ] Zhihong Yu commented on HBASE-5179: --- Ran test suite in 0.92 over the latest 5179-92v17.patch All tests passed. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189783#comment-13189783 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511256/5179-90v16.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/816//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189842#comment-13189842 ] Zhihong Yu commented on HBASE-5179: --- Patch v16 makes sense. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189946#comment-13189946 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511281/5179-90v17.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/817//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190002#comment-13190002 ] Zhihong Yu commented on HBASE-5179: --- Patch v17 for 0.90 passed unit tests. Got a strange complaint about TestLruBlockCache. But in org.apache.hadoop.hbase.io.hfile.TestLruBlockCache.txt: {code} --- Test set: org.apache.hadoop.hbase.io.hfile.TestLruBlockCache --- Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.152 sec {code} Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190299#comment-13190299 ] gaojinchao commented on HBASE-5179: --- Do you want to add any new case? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190298#comment-13190298 ] gaojinchao commented on HBASE-5179: --- This is V16 test result in branch90, number discriberesult 1 a new cluster startup ok 2 restart a cluster ok 3 No region serve crash ok 4 After Meta region server registered, and then crashed ok 5 After Meta/root region server registered, and then crashed ok 6 After Hmaster crashed and Meta/root region server crashed. Hmaster and region server start at same time.0k 7 After Hmaster crashed and Meta/root region server crashed. Hmaster startok Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190301#comment-13190301 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511358/5179-92v17.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/821//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190314#comment-13190314 ] Zhihong Yu commented on HBASE-5179: --- Due to a missing check against null, previous patch v17 would result in: {code} 2012-01-20 21:00:08,370 FATAL [Master:0;192.168.0.11,52564,1327122004379] master.HMaster(1489): Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.MasterFileSystem.logDirExists(MasterFileSystem.java:533) at org.apache.hadoop.hbase.master.HMaster.waitUntilNoLogDir(HMaster.java:712) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:598) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:530) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:348) {code} Will attach working version. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190328#comment-13190328 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511367/5179-92v17.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/823//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190331#comment-13190331 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511362/hbase-5179v17.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -143 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 82 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/822//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/822//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/822//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v16.patch, 5179-90v17.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188983#comment-13188983 ] chunhui shen commented on HBASE-5179: - @Jinchao I think v12 has consider the above cases, could you help to test this patch in your cluster ? I have tried to write an unit test case, but the simulated kill would delete EphemeralNode on zk when Regionserver Thread exit, so the above case is hard to appear through unit test case. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189028#comment-13189028 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511103/hbase-5179v12.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -143 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 82 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/810//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/810//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/810//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189039#comment-13189039 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511103/hbase-5179v12.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -143 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 82 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestLruBlockCache Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/811//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/811//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/811//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189057#comment-13189057 ] Zhihong Yu commented on HBASE-5179: --- @Chunhui: If you can upload v11 to review board, publish it then upload v12, we would be able to see the differences. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189066#comment-13189066 ] Zhihong Yu commented on HBASE-5179: --- https://reviews.apache.org/r/3545 was created with patch v11, then updated with patch v12. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189140#comment-13189140 ] gaojinchao commented on HBASE-5179: --- @Chunhui In 90v12, Maybe below code(metaServerInfo/rootServerInfo) has some nullexception? // MetaServer is may being processed as dead server. Before assign meta, // we need to wait until its log is splitted. waitUntilNoLogDir(metaServerInfo.getServerName()); if (!this.serverManager.isDeadMetaServerInProgress()) { this.assignmentManager.assignMeta(); } Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189148#comment-13189148 ] gaojinchao commented on HBASE-5179: --- @chunhui I only verify the branch90 version. I don't have the cluster for branch92(that needs a few day, we are planing to setup some test cluster.) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189193#comment-13189193 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511134/Errorlog against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/812//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189203#comment-13189203 ] Zhihong Yu commented on HBASE-5179: --- @Jinchao: So C3S31 was carrying .META., right ? I need to add DEBUG log into waitUntilNoLogDir() which would give us: 1. serverName 2. log directory path 3. how long waitUntilNoLogDir() waited. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189239#comment-13189239 ] Zhihong Yu commented on HBASE-5179: --- @Stack: Thanks for your review. Looks like we still have some work to really solve this issue. After that we can align naming to your liking. w.r.t. local log splitting, to my knowledge there is no priority in distributed log splitting. The rationale is to make log splitting of .META. server quick. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189245#comment-13189245 ] stack commented on HBASE-5179: -- So your thinking is that we'd go local in case a distributed log splitting is already going on? That would work for the case where say ten servers went down around same time and the tenth queued had meta on it. But then, what if there is no distributed log splitting going on? Master-only log splitting would be slower than distributed splitting? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189270#comment-13189270 ] Zhihong Yu commented on HBASE-5179: --- Maybe you're suggesting adding functionality to distributed log splitting which allows us to know the length of queue. That would be a new feature which would go in a separate JIRA. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189277#comment-13189277 ] Zhihong Yu commented on HBASE-5179: --- {code} 2012-01-19 11:10:35,847 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [C3S31,20020,1326987900492] 2012-01-19 11:10:35,850 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=C3S31,20020,1326987900492 to dead servers, submitted shutdown handler to be executed, root=true, meta=false 2012-01-19 11:10:35,851 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for C3S31,20020,1326987900492 ... 2012-01-19 11:12:37,003 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Archived processed log hdfs://C3S31:9000/hbase/.logs/C3S31,20020,1326987900492/C3S31%3A20020.1326987901100 to hdfs://C3S31:9000/hbase/.oldlogs/C3S31%3A20020.1326987901100 2012-01-19 11:12:37,004 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: hlog file splitting completed in 1137 ms for hdfs://C3S31:9000/hbase/.logs/C3S31,20020,1326987900492 2012-01-19 11:12:37,004 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Server C3S31,20020,1326987900492 was carrying ROOT. Trying to assign. ... 2012-01-19 11:12:37,382 INFO org.apache.hadoop.hbase.master.HMaster: .META. assigned=2, rit=false, location=C3S32:20020 2012-01-19 11:12:37,382 INFO org.apache.hadoop.hbase.master.HMaster: Master startup proceeding: master failover 2012-01-19 11:12:37,383 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 1028785192 on serverName=C3S32,20020,1326987400439, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) 2012-01-19 11:12:37,384 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 70236052 on serverName=C3S32,20020,1326987400439, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) 2012-01-19 11:12:37,393 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 0 region(s) that C3S31,20020,1326987900492 was carrying (skipping 0 regions(s) that are already in transition) 2012-01-19 11:12:37,393 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of C3S31,20020,1326987900492 {code} From above log, we can see that .META. was assigned to C3S32 and log splitting was for C3S31. So was there data loss ? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189326#comment-13189326 ] Zhihong Yu commented on HBASE-5179: --- In patch v10: {code} +this.fileSystemManager.splitLog(metaServerInfo.getServerName()); +this.serverManager.expireServer(metaServerInfo); {code} In latest patch, fileSystemManager.splitLog() is gone. I think what might have happened was that waitUntilNoLogDir() returned too soon because the log splitting was carried out in ShutdownHandler which barely got a chance of doing the split. Patch v13 adds DEBUG logging in waitUntilNoLogDir() so that we know whether the log dir exists upon its entry and how long waitUntilNoLogDir() takes before returning. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189333#comment-13189333 ] stack commented on HBASE-5179: -- Why would there be dataloss? Was C3S32 carrying .META.? Even if it was, it looks like the log split before .META. assign (and this issue is about concurrent master failover going on anyways?) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189338#comment-13189338 ] Zhihong Yu commented on HBASE-5179: --- If my assumption @ 19/Jan/12 19:38 is true, we should re-introduce this.fileSystemManager.splitLog() Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189339#comment-13189339 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511158/5179-90v13.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/813//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189578#comment-13189578 ] chunhui shen commented on HBASE-5179: - @Jinchao {code} gaojinchao added a comment - 19/Jan/12 16:40 when region server checkin, I killed meta/root region. I found splitHlog is after meta was assigned. So I think may has another problem. tomorrow continue to dig . {code} Because this test uses v11patch without waitUntilNoLogDir(), so it is the problem . Jinchao, could you test again with 90v13.patch. Thanks. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189585#comment-13189585 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511216/5179-90v14.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/814//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189584#comment-13189584 ] chunhui shen commented on HBASE-5179: - @Jinchao Could you test again with 90v14patch? Thanks much! Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189595#comment-13189595 ] gaojinchao commented on HBASE-5179: --- In my test case, I kill meta/root at same time. when master start to assign root/meta region , it should finish split hlogs. Today I will give a detail test report about 90V14. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189600#comment-13189600 ] gaojinchao commented on HBASE-5179: --- no problem. before I start to verify the patch. I will review it. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189607#comment-13189607 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511218/5179-90v15.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/815//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189641#comment-13189641 ] gaojinchao commented on HBASE-5179: --- @chunhui The first test case failed, we start a cluster, the patch is split a new region server's Hlog. 2012-01-20 00:34:39,462 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:20010 2012-01-20 00:34:39,462 DEBUG org.apache.hadoop.hbase.master.HMaster: Started service threads 2012-01-20 00:34:40,158 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=C3S32,20020,1327037679721, regionCount=0, userLoad=false 2012-01-20 00:34:40,296 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=C3S33,20020,1327037679059, regionCount=0, userLoad=false 2012-01-20 00:34:40,488 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=C3S31,20020,1327037679673, regionCount=0, userLoad=false 2012-01-20 00:34:40,962 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) count to settle; currently=3 2012-01-20 00:34:42,462 INFO org.apache.hadoop.hbase.master.ServerManager: Finished waiting for regionserver count to settle; count=3, sleptFor=3000 2012-01-20 00:34:42,463 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=0 2012-01-20 00:34:42,463 INFO org.apache.hadoop.hbase.master.HMaster: --sleep 60s- 2012-01-20 00:35:42,469 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://C3S31:9000/hbase/.logs/C3S31,20020,1327037679673 belongs to an existing region server 2012-01-20 00:35:42,470 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://C3S31:9000/hbase/.logs/C3S32,20020,1327037679721 belongs to an existing region server 2012-01-20 00:35:42,470 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://C3S31:9000/hbase/.logs/C3S33,20020,1327037679059 belongs to an existing region server 2012-01-20 00:35:42,504 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification of -ROOT-,,0 at address=C3S32:20020; org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: -ROOT-,,0 2012-01-20 00:36:42,610 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown. java.lang.RuntimeException: Timed out waiting to finish splitting log for C3S32,20020,1327037679721 at org.apache.hadoop.hbase.master.HMaster.waitUntilNoLogDir(HMaster.java:578) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:478) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) 2012-01-20 00:36:42,613 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-01-20 00:36:42,613 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads 2012-01-20 00:36:42,613 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 2 Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188413#comment-13188413 ] Zhihong Yu commented on HBASE-5179: --- {code} + * Dead servers under processing by the ServerShutdownHander. Map of + * ServerType to set of being processed dead servers + */ + private final MapServerType, SetServerName deadServersUnderProcessing = new HashMapServerType, SetServerName(); {code} The second line of javadoc should read: ' to set of dead servers being processed' Please wrap long line. I think we should pre-create HashSet's for the three ServerType's in DeadServer ctor. This way we can omit runtime checks such as: {code} +if (deadNormalServersBeingProcessed == null) { + deadNormalServersBeingProcessed = new HashSetServerName(); + deadServersUnderProcessing.put(ServerType.NORMAL, + deadNormalServersBeingProcessed); +} {code} {code} + * @param assuredRootServer true if it's dead server carrying root certainly. + * @param assuredMetaServer true if it's dead server carrying meta certainly. {code} How about naming the parameters definitiveRootServer and definitiveMetaServer ? (Change certainly to definitely). We should use low value for hbase.master.wait.on.regionservers.timeout so that the new test case doesn't take too long. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188599#comment-13188599 ] Lars Hofhansl commented on HBASE-5179: -- v11 lgtm +1 Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188607#comment-13188607 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511011/5179-v11.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -143 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 82 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/805//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/805//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/805//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188661#comment-13188661 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511020/5179-v11-92.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/807//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188686#comment-13188686 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511030/5179-90v10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/809//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188737#comment-13188737 ] Zhihong Yu commented on HBASE-5179: --- @Stack: Do you want to take a look at the v11 patches (3 of them) ? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188861#comment-13188861 ] Zhihong Yu commented on HBASE-5179: --- Tests for 0.90 passed for 5179-90v11.patch Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188938#comment-13188938 ] gaojinchao commented on HBASE-5179: --- +1, Good job! Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188939#comment-13188939 ] gaojinchao commented on HBASE-5179: --- @chunhui Do you want me to test this patch in our cluster ? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188956#comment-13188956 ] stack commented on HBASE-5179: -- Oh, one other thought is we could break up this issue some. For example, servers coming before a master finishes processing failover could be an issue unto itself. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188954#comment-13188954 ] stack commented on HBASE-5179: -- Regards v11 new splitLog method, I don't get this justification Zhihong: {code} What I am thinking is that maybe we should split currentMetaServer's log in a non-distributed fashion because the splitting is of high priority. {code} Is the thought that local splitting will run faster? Is this true? areDeadServersInProgress method name should match the other method names so it should be areDeadServersBeingProcessed (minor). Ditto these methods, isDeadRootServerInProgress, etc. Whats the difference between InProgress and BeingProcessed? We also seem to have active voice UnderProcessing going on. Should be consistent? Do these need to be public? Seem like only used in same package by master. Should the zk callback be up and operating before the master comes completely on line? The knownServers in HMaster, are heartbeating servers that have come in before the master came on line? That seems like an important fix. I'm now a little confused as to the scope of this patch. The Jinchao descriptions above on how to reproduce pathological situations I get. It'd be great to do these up as a unit tests. I'm not sure which of Jinchao descriptions apply to TRUNK as opposed to 0.90. Any chance of getting a list of scenarios this patch is supposed to fix? If we had that, I'd be up for writing unit tests for TRUNK at least (I think it has sufficient primitives mocking up Jinchao descriptions w/o need of a cluster). That said, this patch and the discussion above in this issue is uncovering critical stuff; thanks for all the work lads. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188979#comment-13188979 ] chunhui shen commented on HBASE-5179: - @all I think this issue we should consider following cases at least: 1.RS is dead and its zk node exists when master initializing 2.RS is restarted when master initializing (there are two nodes for the same RS on zk) For the case 2, we need consider the following cases: 2.1 RS is restarted before this.serverManager.waitForRegionServers() 2.2 RS is restarted between this.serverManager.waitForRegionServers() and HMaster#assignRootAndMeta We should also find carryingMeta and carringRoot is not accurate when master initializing. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187863#comment-13187863 ] Zhihong Yu commented on HBASE-5179: --- For hbase-5179v9.patch: {code} + if (currentMetaServer != null + this.serverManager.isServerOnline(currentMetaServer)) { +// Current meta server is dead, we first split its log and then expire {code} Is the above right ? {code} +this.fileSystemManager.splitLog(currentMetaServer); expireIfOnline(currentMetaServer); {code} The splitLog() would end up calling: {code} public void splitLog(final ListServerName serverNames) throws IOException { {code} If distributedLogSplitting is true, log splitting would be in progress upon return from splitLog(). We need to either disable distributedLogSplitting or introduce synchronization mechanism for above. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188207#comment-13188207 ] chunhui shen commented on HBASE-5179: - @Zhihong {code}+ if (currentMetaServer != null + this.serverManager.isServerOnline(currentMetaServer)) { +// Current meta server is dead, we first split its log and then expire {code} I think it's right, could you talk about the found problem? If distributedLogSplitting is true, the following code will be called {code} if (distributedLogSplitting) { splitLogManager.handleDeadWorkers(serverNames); splitTime = EnvironmentEdgeManager.currentTimeMillis(); splitLogSize = splitLogManager.splitLogDistributed(logDirs); splitTime = EnvironmentEdgeManager.currentTimeMillis() - splitTime; } {code} and {code} /** * The caller will block until all the log files of the given region server * have been processed - successfully split or an error is encountered - by an * available worker region server. This method must only be called after the * region servers have been brought online. * * @param logDirs * @throws IOException * if there was an error while splitting any log file * @return cumulative size of the logfiles split */ public long splitLogDistributed(final ListPath logDirs) throws IOException {code} So, it will block until completing splitlog Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188231#comment-13188231 ] Zhihong Yu commented on HBASE-5179: --- w.r.t. first question: if this.serverManager.isServerOnline() returns true, should currentMetaServer be considered dead ? For the second question, splitLogDistributed() would block. That's fine. What I am thinking is that maybe we should split currentMetaServer's log in a non-distributed fashion because the splitting is of high priority. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188248#comment-13188248 ] chunhui shen commented on HBASE-5179: - @Zhihong For the first question: this.serverManager.isServerOnline() returns true and catalogTracker.verifyMetaRegionLocation(timeout) returns false, if the meta server is really dead and we assign meta without splitting log, causing meta data loss. Through the following code: {code} if (!catalogTracker.verifyRootRegionLocation(timeout)) { ServerName currentRootServer = this.catalogTracker.getRootLocation(); if (expireIfOnline(currentRootServer)) { // We are expiring this server. The processing of expiration will assign // root so don't do it here. expiredServer = currentRootServer; } else { // Root was not on an online server when we failed verification this.assignmentManager.assignRoot(); } private boolean expireIfOnline(final ServerName sn) { if (sn == null) return false; if (!this.serverManager.isServerOnline(sn)) return false; LOG.info(Forcing expiration of + sn); this.serverManager.expireServer(sn); return true; } {code} We could find currentRootServer is considered dead, with the same condition, could we consider currentMetaServer dead? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188251#comment-13188251 ] chunhui shen commented on HBASE-5179: - For the second question: I think we could introducing a new function of forced non-distributed splitlog in MasterFileSystem Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188260#comment-13188260 ] Zhihong Yu commented on HBASE-5179: --- From serverManager.expireServer(): {code} if (carryingRoot || carryingMeta) { this.services.getExecutorService().submit(new MetaServerShutdownHandler(this.master, {code} Since expireIfOnline() is called after this.fileSystemManager.splitLog(currentMetaServer), MetaServerShutdownHandler would be queued after log splitting. Would this pose potential issue ? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188266#comment-13188266 ] chunhui shen commented on HBASE-5179: - @Zhihong when matser is initializing, carryingMeta is always false, so expiring currentMetaServer would only call ServerShutdownHandler not MetaServerShutdownHandler However, I find another problem, carryingRoot is also always false when initializing in trunk(in 90-branch it's true), so we should also this.fileSystemManager.splitLog(currentRootServer) before expire currentRootServer Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188269#comment-13188269 ] Zhihong Yu commented on HBASE-5179: --- @Chunhui: Thanks for the investigation. Please base your TRUNK patch on testing result using 0.92.0 RC4. I think we can add boolean parameters (for carryingRoot and carryingMeta) to serverManager.expireServer() and expireIfOnline() so that we don't need to call fileSystemManager.splitLog() separately. The goal is to make the code maintainable while achieving correctness. Just a suggestion. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188293#comment-13188293 ] chunhui shen commented on HBASE-5179: - @Zhihong: In v10patch for trunk, I introduce the above comment, please review it. Thanks. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188303#comment-13188303 ] Zhihong Yu commented on HBASE-5179: --- I like patch v10. Do you have time to verify in real cluster ? We can discuss names of variables later. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188305#comment-13188305 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510958/hbase-5179v10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -143 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 82 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplicationPeer org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/804//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/804//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/804//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188310#comment-13188310 ] chunhui shen commented on HBASE-5179: - I will submit the test case later. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188309#comment-13188309 ] chunhui shen commented on HBASE-5179: - @Zhihong: I think I could add one unit test case for this issue through setting conf, for example, ServerManager#waitForRegionServers() will wait timeout = this.master.getConfiguration(). getLong(hbase.master.wait.on.regionservers.timeout, 4500); So I could increment the time of timeout in test case to reappear the issue. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186770#comment-13186770 ] gaojinchao commented on HBASE-5179: --- @chunhui Maybe it has a problem. the number of shutdownhandler thread pool is 3(default), If there are more than 3 deadserver is processing. we will wait forever. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186781#comment-13186781 ] chunhui shen commented on HBASE-5179: - @Jinchao I think it is another problem. If we restart general three RS, and then kill META server, The first three ServerShutdownHandler will wait meta region, however METAServerShutdownHandler will not be processed because shutdownhandler thread pool is full until one ServerShutdownHandler is finished. So it exists a forever wait. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186786#comment-13186786 ] gaojinchao commented on HBASE-5179: --- @chunhui Regarding to a normal flow. METAServerShutdownHandler use different thread pool. only init flow, scome cases we can't distinguish meta region server. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186804#comment-13186804 ] chunhui shen commented on HBASE-5179: - @Jinchao So, we need ensure dead meta server (which is consider as a general server) is processed by SSH when master initializing? Otherwise, one of meta-data loss or waiting forever must happen? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186928#comment-13186928 ] gaojinchao commented on HBASE-5179: --- In patch v7, Can we replace process expired server to public void splitLog(final String serverName)? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187378#comment-13187378 ] chunhui shen commented on HBASE-5179: - @Jinchao process expired server is introduced from trunk, if we just only splitLog, the regions of RS will not be assigned because it is considerd a known server. For the possible waiting forever, what about changed as retrying and throwing exception? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187406#comment-13187406 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510799/5179-90v8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/785//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187415#comment-13187415 ] gaojinchao commented on HBASE-5179: --- @chunhui Does this code make sense to you ? if (!this.serverManager.isDeadMetaServerInProgress()) { if(metaServerInfo != null){ this.fileSystemManager.splitLog(metaServerInfo.getServerName()); } this.assignmentManager.assignMeta(); assigned++; } Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187417#comment-13187417 ] gaojinchao commented on HBASE-5179: --- +1 v8 , we need some test cases to verify in real cluster. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187424#comment-13187424 ] Zhihong Yu commented on HBASE-5179: --- @Chunhui: Thanks for your persistence. For 5179-90v8.patch, three sets are introduced into DeadServer. Can we use one Map in place of the 3 sets ? MapString, ServerType where ServerType is an enum marking NORMAL, ROOT and META servers. isDeadRootServerInProgress() would require a traversal of the Map. So this is just a suggestion. For MasterFileSystem.existLogs(), maybe logDirExists() is a better name ? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187430#comment-13187430 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510802/hbase-5179v8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -145 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 83 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/786//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/786//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/786//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187433#comment-13187433 ] Zhihong Yu commented on HBASE-5179: --- @Chunhui: Can you publish TRUNK patch v8 on review board ? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187448#comment-13187448 ] Zhihong Yu commented on HBASE-5179: --- {code} +int waitTime = conf.getInt(hbase.master.assignMeta.timeout, +6); +for (int i = 0; metaServerInfo != null; i++) { + if (!this.fileSystemManager.existLogs(metaServerInfo.getServerName())) { {code} I think hbase.master.meta.assignment.timeout would be more appropriate. metaServerInfo.getServerName() can be stored in a variable before entering the for loop (minor). {code} + if (i = waitTime / 1000) { +throw new RuntimeException( +Wait META server to finish splitting log for a time out); + } + sleep(1000); {code} I think the time accounting above is inaccurate because we don't know how long the sleep() call has taken. Also, Timed out waiting for .META. server to finish splitting log would be better exception message. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187452#comment-13187452 ] Zhihong Yu commented on HBASE-5179: --- {code} + public ListString getOnlineServersAsName() throws KeeperException { +return ZKUtil.listChildrenNoWatch(watcher, watcher.rsZNode); + } {code} A better method name would be getOnlineServerNames. Good job, Chunhui. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187468#comment-13187468 ] chunhui shen commented on HBASE-5179: - @Jinchao {code} if (!this.serverManager.isDeadMetaServerInProgress()) { if(metaServerInfo != null){ this.fileSystemManager.splitLog(metaServerInfo.getServerName()); } this.assignmentManager.assignMeta(); assigned++; } {code} Thanks for above code, I think it's fine and is used in v9patch to be simpler. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187472#comment-13187472 ] chunhui shen commented on HBASE-5179: - @Zhihong I don't know the review board address. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187473#comment-13187473 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510811/5179-90v9.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/789//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187476#comment-13187476 ] Zhihong Yu commented on HBASE-5179: --- Jinchao's code may introduce intricacy in TRUNK where log splitting may be distributed. In that case we cannot assign .META. upon return from fileSystemManager.splitLog() Review board is at https://reviews.apache.org You should select hbase as Repository and / as Base Directory. The diff should be patch generated against TRUNK. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187515#comment-13187515 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510813/hbase-5179v9.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -144 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 84 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/790//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/790//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/790//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186655#comment-13186655 ] chunhui shen commented on HBASE-5179: - I aggree with the meta-data-loss issue. In the gaojinchao's comment {code} When master starts, As current flow, RS has three part: 1.Some has registered in Hmaster by heatbeat report 2.Some is dead server being processed by ssh 3.Some is waiting zk is expired(default session timeout is 3 minutes) {code} For the 3, I think we can do as the following: 1.master first fail verifying metaLocation. 2.judge whether RS is processing by SSH 3.if 2 not, expire this RS 3 is done in Trunk, but not in 0.90 Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186668#comment-13186668 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510660/hbase-5179v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/767//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186669#comment-13186669 ] chunhui shen commented on HBASE-5179: - In 90-branch, the logic of carrying root and crarrying meta are not accurate as the trunk when expiring server. and root server's startcode is not recorded in ZK node, So I think it is a little disordered for solving the meta-data-loss issue. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186675#comment-13186675 ] gaojinchao commented on HBASE-5179: --- @chunhui I agree with you. In branch 90, I want to add a flag that marks SSH finish split Hlog. If all of Dead servers had split Hlog, Loss data should be quite rare. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186676#comment-13186676 ] Hadoop QA commented on HBASE-5179: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510663/hbase-5179v7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -145 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 82 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/768//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/768//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/768//console This message is automatically generated. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186679#comment-13186679 ] Zhihong Yu commented on HBASE-5179: --- +1 on patch v7 for TRUNK. Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss
[ https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186687#comment-13186687 ] chunhui shen commented on HBASE-5179: - @Zhihong @Jinchao @Ram In patch v5 for 90-branch, Before assigning rootAndMeta, I add a checking whether meta or root server is being processed as dead. And only assign rootAndMeta if not. what do you think about its availability? Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss Key: HBASE-5179 URL: https://issues.apache.org/jira/browse/HBASE-5179 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.2 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.92.0, 0.94.0, 0.90.6 Attachments: 5179-90.txt, 5179-90v2.patch, 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, hbase-5179.patch, hbase-5179v5.patch, hbase-5179v6.patch, hbase-5179v7.patch If master's processing its failover and ServerShutdownHandler's processing happen concurrently, it may appear following case. 1.master completed splitLogAfterStartup() 2.RegionserverA restarts, and ServerShutdownHandler is processing. 3.master starts to rebuildUserRegions, and RegionserverA is considered as dead server. 4.master starts to assign regions of RegionserverA because it is a dead server by step3. However, when doing step4(assigning region), ServerShutdownHandler may be doing split log, Therefore, it may cause data loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira