[jira] [Comment Edited] (HDFS-12506) Ozone: ListBucket is too slow

Nandakumar (JIRA) Wed, 20 Sep 2017 09:20:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-12506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173430#comment-16173430
 ]


Nandakumar edited comment on HDFS-12506 at 9/20/17 4:19 PM:
------------------------------------------------------------

+1 for [~xyao]'s idea, I was also thinking of the same.
One small change though
For Volume
/#v1
For Bucket
/v1/#b1
Keys can be stored as they are stored now

With this we can iterate and get list of volumes without iterating over 
buckets, and get list of buckets without iterating over keys.

Something like
{code}
/#v1
/#v2
/#v3
/v1/#b1
/v1/#b2
/v2/#b1
/v3/#b1
/v1/b1/k1
/v2/b2/k2
{code}




was (Author: nandakumar131):
+1 for [~xyao]'s idea, I was also thinking of the same.
One small change though
For Volume
/#v1
For Bucket
/v1/#b1
Keys can be stored as they are stored now

With this we can iterate and get list of volumes without iterating over 
buckets, and get list of buckets without iterating over keys.

Something lime
{code}
/#v1
/#v2
/#v3
/v1/#b1
/v1/#b2
/v2/#b1
/v3/#b1
/v1/b1/k1
/v2/b2/k2
{code}



> Ozone: ListBucket is too slow
> -----------------------------
>
>                 Key: HDFS-12506
>                 URL: https://issues.apache.org/jira/browse/HDFS-12506
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Weiwei Yang
>            Priority: Blocker
>              Labels: ozoneMerge
>
> Generated 3 million keys in ozone, and run {{listBucket}} command to get a 
> list of buckets under a volume,
> {code}
> bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-15143 -user wwei
> {code}
> this call spent over *15 seconds* to finish. The problem was caused by the 
> inflexible structure of KSM DB. Right now {{ksm.db}} stores keys like 
> following
> {code}
> /v1/b1
> /v1/b1/k1
> /v1/b1/k2
> /v1/b1/k3
> /v1/b2
> /v1/b2/k1
> /v1/b2/k2
> /v1/b2/k3
> /v1/b3
> /v1/b4
> {code}
> keys are sorted in nature order so when we do list buckets under a volume e.g 
> /v1, we need to seek to /v1 point and start to iterate and filter keys, this 
> ends up with scanning all keys under volume /v1. The problem with this design 
> is we don't have an efficient approach to locate all buckets without scanning 
> the keys.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12506) Ozone: ListBucket is too slow

Reply via email to