[ https://issues.apache.org/jira/browse/HBASE-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeffrey Zhong updated HBASE-7836: --------------------------------- Attachment: hbase-7836_v2.patch {quote} testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): none of the following counters went up in 80000 milliseconds - tot_wkr_task_resigned, tot_wkr_task_err, tot_wkr_final_transition_failed, tot_wkr_task_done, tot_wkr_preempt_task {quote} This is due to we don't handle FSHDFSUtils.recoverFileLease java.nio.channels.ClosedByInterruptException. When we get this exception, we still call {code}FSDataOutputStream out = fs.append(p);{code}, that causes one extra min wait and then fails the test case due to timeout. Below are related log traces: {code} 2013-04-01 14:45:04,735 DEBUG [SplitLogWorker-10.11.2.103,58161,1364852631051] util.FSHDFSUtils(95): Failed fs.recoverLease invocation, java.io.IOException: Call to localhost/127.0.0.1:58147 failed on local exception: java.nio.channels.ClosedByInterruptException, trying fs.append instead 2013-04-01 14:45:04,735 DEBUG [SplitLogWorker-10.11.2.103,58161,1364852631051] util.FSHDFSUtils(100): trying fs.append for hdfs://localhost:58147/user/jzhong/hbase/.logs/10.11.2.103,58161,1364852631051/10.11.2.103%2C58161%2C1364852631051.1364852632043 with java.io.IOException: Call to localhost/127.0.0.1:58147 failed on local exception: java.nio.channels.ClosedByInterruptException ... 2013-04-01 14:46:04,737 WARN [SplitLogWorker-10.11.2.103,58161,1364852631051] regionserver.SplitLogWorker$1(124): log splitting of hdfs://localhost:58147/user/jzhong/hbase/.logs/10.11.2.103,58161,1364852631051/10.11.2.103%2C58161%2C1364852631051.1364852632043 failed, returning error java.io.IOException: Failed to open hdfs://localhost:58147/user/jzhong/hbase/.logs/10.11.2.103,58161,1364852631051/10.11.2.103%2C58161%2C1364852631051.1364852632043 for append at org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:126) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:743) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:436) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:397) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:274) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:162) at java.lang.Thread.run(Thread.java:680) Caused by: java.io.IOException: Call to localhost/127.0.0.1:58147 failed on local exception: java.nio.channels.ClosedByInterruptException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144) at org.apache.hadoop.ipc.Client.call(Client.java:1112) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) …. {code} I fixed other test failures in the new patch. Thanks, -Jeffrey > Create a new "replay" command so that recovered edits won't mess up normal > coprocessing & metrics > ------------------------------------------------------------------------------------------------- > > Key: HBASE-7836 > URL: https://issues.apache.org/jira/browse/HBASE-7836 > Project: HBase > Issue Type: Sub-task > Reporter: Jeffrey Zhong > Assignee: Jeffrey Zhong > Fix For: 0.95.0 > > Attachments: hbase-7836_v1.patch, hbase-7836_v2.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira