[jira] [Comment Edited] (HBASE-18084) Improve CleanerChore to clean from directory which consumes more disk space

Yu Li (JIRA) Sun, 21 May 2017 05:42:28 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018801#comment-16018801
 ]


Yu Li edited comment on HBASE-18084 at 5/21/17 12:41 PM:
---------------------------------------------------------

bq. if the initial batch contains large directory
But what if not sir?

Let me say more about my case. The current clean logic uses depth-first algo, 
while the archive dir hierarchical like:
{noformat}
/hbase/archive/data
- namespace
  - table
    - region
      - CF
        - files
{noformat}
And while we reach one leaf directory and get the file list in it and cleaning, 
flush is still ongoing and the new files will be included when we iterate the 
other directory later.

In our case the output of "hadoop fs -count" order by space usage (descending) 
is like:
{noformat}
        2043       686999    770527133663895 
/hbase/archive/data/default/pora_6_feature_queue
        2049      3430815    470358930247550 
/hbase/archive/data/default/pora_6_feature
       17101       704476    100740814980772 
/hbase/archive/data/default/mainv3_ic
       14251       495293     79161730247206 
/hbase/archive/data/default/mainv3_main_result_b
       14251       893144     71121202187220 
/hbase/archive/data/default/mainv3_main_result_a
        2045        79223     51098022268522 
/hbase/archive/data/default/pora_log_wireless_search_item_pv_queue
        2001       123332     49075201291122 
/hbase/archive/data/default/mainv3_main_askr_queue_a
        2001        65030     45649351359151 
/hbase/archive/data/default/mainv3_main_askr_queue_b
{noformat}
And we have many directories like
{noformat}
          13            6             173403 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_IdleFishPool_askr
           3            1             253497 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_buyoffer_searcher_askr
          17           17           15635421 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_cloud_wukuang_askr
          13            6           56062313 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_common_search_askr
           5            2            1165298 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_company_askr
          11            9            1196774 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_content_search_askr
{noformat}
So the largest 3 directories took 1.3PB while the whole archive directory took 
1.8PB, and the largest directory name starts with "p". If we use the greedy 
algorithm, we may choose {{mainv3_main_askr_queue_a}} which has 123k files to 
clean, while {{pora_6_feature_queue}} is still flushing with speed. And the 
worst case is we cannot reach the largest dir in a long time.

And I agree that depends on the real case, but in our case the simple method in 
current patch could work well, while I'm not sure whether the new approach 
suggested will do (smile).

Since the patch here is already applied online, how about letting it in and 
open other JIRA to implement and verify the new approach with greedy algo? 
[~tedyu]


was (Author: carp84):
bq. if the initial batch contains large directory
But what if not sir?

Let me say more about my case. The current clean logic uses depth-first algo, 
while the archive dir hierarchical like:
{noformat}
/hbase/archive/data
- namespace
  - table
    - region
      - CF
        - files
{noformat}
And while we reach one leaf directory and get the file list in it and cleaning, 
flush is still ongoing and the new files will be included when we iterate the 
other directory later.

In our case the output of "hadoop fs -count" order by space usage (descending) 
is like:
{noformat}
        2043       686999    770527133663895 
/hbase/archive/data/default/pora_6_feature_queue
        2049      3430815    470358930247550 
/hbase/archive/data/default/pora_6_feature
       17101       704476    100740814980772 
/hbase/archive/data/default/mainv3_ic
       14251       495293     79161730247206 
/hbase/archive/data/default/mainv3_main_result_b
       14251       893144     71121202187220 
/hbase/archive/data/default/mainv3_main_result_a
        2045        79223     51098022268522 
/hbase/archive/data/default/pora_log_wireless_search_item_pv_queue
        2001       123332     49075201291122 
/hbase/archive/data/default/mainv3_main_askr_queue_a
        2001        65030     45649351359151 
/hbase/archive/data/default/mainv3_main_askr_queue_b
{noformat}
And we have many directories like
{noformat}
          13            6             173403 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_IdleFishPool_askr
           3            1             253497 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_buyoffer_searcher_askr
          17           17           15635421 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_cloud_wukuang_askr
          13            6           56062313 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_common_search_askr
           5            2            1165298 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_company_askr
          11            9            1196774 
/hbase/archive/data/default/b2b-et2mainse_tisplus_tisplus_content_search_askr
{noformat}
So the largest 3 directories took 1.3PB while the whole archive directory took 
1.8PB, and the largest directory name starts with "p". If we use the greedy 
algorithm, we may choose {{mainv3_main_askr_queue_a}} which has 123k files to 
clean, while {{pora_6_feature_queue}} is still flushing with speed. And the 
worst case is we cannot reach the largest dir in a long time.

And I agree that depends on the real case, but in our case the simple method in 
current patch could works well, while I'm not sure whether the new approach 
suggested will do (smile).

> Improve CleanerChore to clean from directory which consumes more disk space
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-18084
>                 URL: https://issues.apache.org/jira/browse/HBASE-18084
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Yu Li
>            Assignee: Yu Li
>         Attachments: HBASE-18084.patch, HBASE-18084.v2.patch
>
>
> Currently CleanerChore cleans the directory in dictionary order, rather than 
> from the directory with largest space usage. And when data abnormally 
> accumulated to some huge volume in archive directory, the cleaning speed 
> might not be enough.
> This proposal is another improvement working together with HBASE-18083 to 
> resolve our online issue (archive dir consumed more than 1.8PB SSD space)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HBASE-18084) Improve CleanerChore to clean from directory which consumes more disk space

Reply via email to