[ https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294049#comment-16294049 ]
Chia-Ping Tsai commented on HBASE-18309: ---------------------------------------- I observer the NEP in log. {code} 2017-12-17 08:53:01,584 INFO [6ff31ba4b7ce,35583,1513500588019_Chore_1] hbase.ScheduledChore(181): Chore: ReplicationMetaCleaner was stopped Exception in thread "OldWALsCleaner-1" Exception in thread "OldWALsCleaner-0" java.lang.NullPointerException at org.apache.hadoop.hbase.master.cleaner.LogCleaner.deleteFile(LogCleaner.java:166) at org.apache.hadoop.hbase.master.cleaner.LogCleaner.lambda$createOldWalsCleaner$0(LogCleaner.java:127) at java.lang.Thread.run(Thread.java:748) java.lang.NullPointerException at org.apache.hadoop.hbase.master.cleaner.LogCleaner.deleteFile(LogCleaner.java:166) at org.apache.hadoop.hbase.master.cleaner.LogCleaner.lambda$createOldWalsCleaner$0(LogCleaner.java:127) at java.lang.Thread.run(Thread.java:748) {code} If the thread is interrupted, the context may be null. {code} while (true) { CleanerContext context = null; boolean succeed = false; boolean interrupted = false; try { context = pendingDelete.take(); if (context != null) { FileStatus toClean = context.getTargetToClean(); succeed = this.fs.delete(toClean.getPath(), false); } } catch (InterruptedException ite) { // It's most likely from configuration changing request if (context != null) { LOG.warn("Interrupted while cleaning oldWALs " + context.getTargetToClean() + ", try to clean it next round."); } interrupted = true; } catch (IOException e) { // fs.delete() fails. LOG.warn("Failed to clean oldwals with exception: " + e); succeed = false; } finally { context.setResult(succeed); // here if (interrupted) { // Restore interrupt status Thread.currentThread().interrupt(); break; } } } {code} > Support multi threads in CleanerChore > ------------------------------------- > > Key: HBASE-18309 > URL: https://issues.apache.org/jira/browse/HBASE-18309 > Project: HBase > Issue Type: Improvement > Reporter: binlijin > Assignee: Reid Chan > Fix For: 3.0.0, 2.0.0-beta-1 > > Attachments: HBASE-18309.master.001.patch, > HBASE-18309.master.002.patch, HBASE-18309.master.004.patch, > HBASE-18309.master.005.patch, HBASE-18309.master.006.patch, > HBASE-18309.master.007.patch, HBASE-18309.master.008.patch, > HBASE-18309.master.009.patch, HBASE-18309.master.010.patch, > HBASE-18309.master.011.patch, HBASE-18309.master.012.patch, > space_consumption_in_archive.png > > > There is only one thread in LogCleaner to clean oldWALs and in our big > cluster we find this is not enough. The number of files under oldWALs reach > the max-directory-items limit of HDFS and cause region server crash, so we > use multi threads for LogCleaner and the crash not happened any more. > What's more, currently there's only one thread iterating the archive > directory, and we could use multiple threads cleaning sub directories in > parallel to speed it up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)