[ 
https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294049#comment-16294049
 ] 

Chia-Ping Tsai commented on HBASE-18309:
----------------------------------------

I observer the NEP in log.
{code}
2017-12-17 08:53:01,584 INFO  [6ff31ba4b7ce,35583,1513500588019_Chore_1] 
hbase.ScheduledChore(181): Chore: ReplicationMetaCleaner was stopped
Exception in thread "OldWALsCleaner-1" Exception in thread "OldWALsCleaner-0" 
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.master.cleaner.LogCleaner.deleteFile(LogCleaner.java:166)
        at 
org.apache.hadoop.hbase.master.cleaner.LogCleaner.lambda$createOldWalsCleaner$0(LogCleaner.java:127)
        at java.lang.Thread.run(Thread.java:748)
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.master.cleaner.LogCleaner.deleteFile(LogCleaner.java:166)
        at 
org.apache.hadoop.hbase.master.cleaner.LogCleaner.lambda$createOldWalsCleaner$0(LogCleaner.java:127)
        at java.lang.Thread.run(Thread.java:748)
{code}

If the thread is interrupted, the context may be null. 
{code}
    while (true) {
      CleanerContext context = null;
      boolean succeed = false;
      boolean interrupted = false;
      try {
        context = pendingDelete.take();
        if (context != null) {
          FileStatus toClean = context.getTargetToClean();
          succeed = this.fs.delete(toClean.getPath(), false);
        }
      } catch (InterruptedException ite) {
        // It's most likely from configuration changing request
        if (context != null) {
          LOG.warn("Interrupted while cleaning oldWALs " +
              context.getTargetToClean() + ", try to clean it next round.");
        }
        interrupted = true;
      } catch (IOException e) {
        // fs.delete() fails.
        LOG.warn("Failed to clean oldwals with exception: " + e);
        succeed = false;
      } finally {
        context.setResult(succeed);  // here
        if (interrupted) {
          // Restore interrupt status
          Thread.currentThread().interrupt();
          break;
        }
      }
    }
{code}

> Support multi threads in CleanerChore
> -------------------------------------
>
>                 Key: HBASE-18309
>                 URL: https://issues.apache.org/jira/browse/HBASE-18309
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: binlijin
>            Assignee: Reid Chan
>             Fix For: 3.0.0, 2.0.0-beta-1
>
>         Attachments: HBASE-18309.master.001.patch, 
> HBASE-18309.master.002.patch, HBASE-18309.master.004.patch, 
> HBASE-18309.master.005.patch, HBASE-18309.master.006.patch, 
> HBASE-18309.master.007.patch, HBASE-18309.master.008.patch, 
> HBASE-18309.master.009.patch, HBASE-18309.master.010.patch, 
> HBASE-18309.master.011.patch, HBASE-18309.master.012.patch, 
> space_consumption_in_archive.png
>
>
> There is only one thread in LogCleaner to clean oldWALs and in our big 
> cluster we find this is not enough. The number of files under oldWALs reach 
> the max-directory-items limit of HDFS and cause region server crash, so we 
> use multi threads for LogCleaner and the crash not happened any more.
> What's more, currently there's only one thread iterating the archive 
> directory, and we could use multiple threads cleaning sub directories in 
> parallel to speed it up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to