[ 
https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219087#comment-14219087
 ] 

Jeffrey Zhong commented on HBASE-12533:
---------------------------------------

I think I found the root cause of the issue, which I think it's a serious one. 
Below is the culprit:

{code}
  public String prepareBulkLoad(final TableName tableName) throws IOException {
    try {
      return
        
table.coprocessorService(SecureBulkLoadProtos.SecureBulkLoadService.class,
          EMPTY_START_ROW,
          LAST_ROW,
 ...
{code}

The prepareBulkLoad is fired up to hit all data regions so it will create same 
number of staging folders as the number of regions of the bulkloaded table 
while we only use the first one.

That's why you can see many staging folders are left. 

There are couple of bugs in the SecureBulkLoadEndpoint#cleanupBulkLoad. 1) fire 
same request to all data regions 2) It tries to firstly create an already 
existing folder and then delete it.  Too many unnecessary NN operations.


> staging directories are not deleted after secure bulk load
> ----------------------------------------------------------
>
>                 Key: HBASE-12533
>                 URL: https://issues.apache.org/jira/browse/HBASE-12533
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.6
>         Environment: CDH5.2 + Kerberos
>            Reporter: Andrejs Dubovskis
>            Assignee: Jeffrey Zhong
>
> We using secure bulk load heavily in our environment. And it was working with 
> no problem during some time. But last week I found that clients hangs while 
> calling *doBulkLoad*
> After some investigation I found that HDFS keeps more than 1,000,000 
> directories in /tmp/hbase-staging directory.
> When directory's content was purged the load process runs successfully.
> According the [hbase 
> book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] 
> {code}
> HBase manages creation and deletion of this directory.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to