[jira] [Updated] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-10 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5137:
--

Fix Version/s: 0.90.6
   0.92.1

Committed to 0.90 and trunk.  Do we need to commit in 0.92 also?

 MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
 IOException
 

 Key: HBASE-5137
 URL: https://issues.apache.org/jira/browse/HBASE-5137
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1, 0.90.6

 Attachments: 5137-trunk.txt, HBASE-5137.patch, HBASE-5137.patch


 I am not sure if this bug was already raised in JIRA.
 In our test cluster we had a scenario where the RS had gone down and 
 ServerShutDownHandler started with splitLog.
 But as the HDFS was down the check waitOnSafeMode throws IOException.
 {code}
 try {
 // If FS is in safe mode, just wait till out of it.
 FSUtils.waitOnSafeMode(conf,
   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
 splitter.splitLog();
   } catch (OrphanHLogAfterSplitException e) {
 {code}
 We catch the exception
 {code}
 } catch (IOException e) {
   checkFileSystem();
   LOG.error(Failed splitting  + logDir.toString(), e);
 }
 {code}
 So the HLog split itself did not happen. We encontered like 4 regions that 
 was recently splitted in the crashed RS was lost.
 Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-07 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5137:
--

Attachment: HBASE-5137.patch

 MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
 IOException
 

 Key: HBASE-5137
 URL: https://issues.apache.org/jira/browse/HBASE-5137
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-5137.patch


 I am not sure if this bug was already raised in JIRA.
 In our test cluster we had a scenario where the RS had gone down and 
 ServerShutDownHandler started with splitLog.
 But as the HDFS was down the check waitOnSafeMode throws IOException.
 {code}
 try {
 // If FS is in safe mode, just wait till out of it.
 FSUtils.waitOnSafeMode(conf,
   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
 splitter.splitLog();
   } catch (OrphanHLogAfterSplitException e) {
 {code}
 We catch the exception
 {code}
 } catch (IOException e) {
   checkFileSystem();
   LOG.error(Failed splitting  + logDir.toString(), e);
 }
 {code}
 So the HLog split itself did not happen. We encontered like 4 regions that 
 was recently splitted in the crashed RS was lost.
 Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-07 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5137:
--

Attachment: 5137-trunk.txt

Suggested patch for TRUNK.

 MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
 IOException
 

 Key: HBASE-5137
 URL: https://issues.apache.org/jira/browse/HBASE-5137
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: 5137-trunk.txt, HBASE-5137.patch


 I am not sure if this bug was already raised in JIRA.
 In our test cluster we had a scenario where the RS had gone down and 
 ServerShutDownHandler started with splitLog.
 But as the HDFS was down the check waitOnSafeMode throws IOException.
 {code}
 try {
 // If FS is in safe mode, just wait till out of it.
 FSUtils.waitOnSafeMode(conf,
   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
 splitter.splitLog();
   } catch (OrphanHLogAfterSplitException e) {
 {code}
 We catch the exception
 {code}
 } catch (IOException e) {
   checkFileSystem();
   LOG.error(Failed splitting  + logDir.toString(), e);
 }
 {code}
 So the HLog split itself did not happen. We encontered like 4 regions that 
 was recently splitted in the crashed RS was lost.
 Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-07 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5137:
--

Attachment: HBASE-5137.patch

Patch for 0.90 addressing Ted's comment of adding braces.  But did not handle 
interrupted exception.
@Ted
Pls check if it is ok.

 MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
 IOException
 

 Key: HBASE-5137
 URL: https://issues.apache.org/jira/browse/HBASE-5137
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: 5137-trunk.txt, HBASE-5137.patch, HBASE-5137.patch


 I am not sure if this bug was already raised in JIRA.
 In our test cluster we had a scenario where the RS had gone down and 
 ServerShutDownHandler started with splitLog.
 But as the HDFS was down the check waitOnSafeMode throws IOException.
 {code}
 try {
 // If FS is in safe mode, just wait till out of it.
 FSUtils.waitOnSafeMode(conf,
   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
 splitter.splitLog();
   } catch (OrphanHLogAfterSplitException e) {
 {code}
 We catch the exception
 {code}
 } catch (IOException e) {
   checkFileSystem();
   LOG.error(Failed splitting  + logDir.toString(), e);
 }
 {code}
 So the HLog split itself did not happen. We encontered like 4 regions that 
 was recently splitted in the crashed RS was lost.
 Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira