[jira] [Issue Comment Edited] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183424#comment-13183424 ] ramkrishna.s.vasudevan edited comment on HBASE-5137 at 1/10/12 5:54 PM: Committed to 0.92 also. Thanks for the review Stack. Thanks to Ted for the patch and review. was (Author: ram_krish): Committed to 0.92 also. MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 0.92.0, 0.90.6 Attachments: 5137-trunk.txt, HBASE-5137.patch, HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException
[ https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182111#comment-13182111 ] Zhihong Yu edited comment on HBASE-5137 at 1/7/12 10:04 PM: Nicolas might know the reason for introducing hbase.hlog.split.failure.retry.interval parameter was (Author: zhi...@ebaysf.com): Nicolas might know the reason for introducing hbase.hlog.split.failure.retry.interval parameter Please provide a patch for 0.92 and TRUNK which adds check for retrySplitting in the following if statement (line 220): {code} if (!checkFileSystem()) { {code} MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException Key: HBASE-5137 URL: https://issues.apache.org/jira/browse/HBASE-5137 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: 5137-trunk.txt, HBASE-5137.patch I am not sure if this bug was already raised in JIRA. In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog. But as the HDFS was down the check waitOnSafeMode throws IOException. {code} try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) { {code} We catch the exception {code} } catch (IOException e) { checkFileSystem(); LOG.error(Failed splitting + logDir.toString(), e); } {code} So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost. Can we abort the Master in such scenarios? Pls suggest. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira