[
https://issues.apache.org/jira/browse/HBASE-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jie Huang updated HBASE-6416:
-----------------------------
Attachment: hbase-6416.patch
Yes. The problem can be reproduced in my test environment. And here attaches
one fix for this bug to solve that *hanging around* issue.
According to the jstack result, the most suspicious threads are those
"pool-1-thread-*" threads (which are not daemon). I guess it is caused by the
following ExecutionException, which leads to an incomplete FutureTask.
{code}
for(int i=0; i<hbiFutures.size(); i++) {
WorkItemHdfsRegionInfo work = hbis.get(i);
Future<Void> f = hbiFutures.get(i);
try {
f.get();
} catch(ExecutionException e) {
LOG.warn("Failed to read .regioninfo file for region " +
work.hbi.getRegionNameAsString(), e.getCause());
}
}
{code}
The simple and quick fix is to shutdown the ExecutorService finally in exec()
function.
Now, in my test environment, the process will be terminated with that NPE after
applied this patch.
> hbck dies on NPE when a region folder exists but the table does not
> -------------------------------------------------------------------
>
> Key: HBASE-6416
> URL: https://issues.apache.org/jira/browse/HBASE-6416
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Fix For: 0.96.0, 0.94.3
>
> Attachments: hbase-6416.patch
>
>
> This is what I'm getting for leftover data that has no .regioninfo
> First:
> {quote}
> 12/07/17 23:13:37 WARN util.HBaseFsck: Failed to read .regioninfo file for
> region null
> java.io.FileNotFoundException: File does not exist:
> /hbase/stumble_info_urlid_user/bd5f6cfed674389b4d7b8c1be227cb46/.regioninfo
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1813)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
> at
> org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegioninfo(HBaseFsck.java:611)
> at
> org.apache.hadoop.hbase.util.HBaseFsck.access$2200(HBaseFsck.java:140)
> at
> org.apache.hadoop.hbase.util.HBaseFsck$WorkItemHdfsRegionInfo.call(HBaseFsck.java:2882)
> at
> org.apache.hadoop.hbase.util.HBaseFsck$WorkItemHdfsRegionInfo.call(HBaseFsck.java:2866)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {quote}
> Then it hangs on:
> {quote}
> 12/07/17 23:13:39 INFO util.HBaseFsck: Attempting to handle orphan hdfs dir:
> hdfs://sfor3s24:10101/hbase/stumble_info_urlid_user/bd5f6cfed674389b4d7b8c1be227cb46
> 12/07/17 23:13:39 INFO util.HBaseFsck: checking orphan for table null
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.util.HBaseFsck$TableInfo.access$100(HBaseFsck.java:1634)
> at
> org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphan(HBaseFsck.java:435)
> at
> org.apache.hadoop.hbase.util.HBaseFsck.adoptHdfsOrphans(HBaseFsck.java:408)
> at
> org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:529)
> at
> org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:313)
> at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:386)
> at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3227)
> {quote}
> The NPE is sent by:
> {code}
> Preconditions.checkNotNull("Table " + tableName + "' not present!",
> tableInfo);
> {code}
> I wonder why the condition checking was added if we don't handle it... In any
> case hbck dies but it hangs because there are some non-daemon hanging around.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira