[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HDFS-985: ------------------------------- Attachment: iterativeLS_yahoo1.patch This patch addressed Suresh's comments except for comments 3, 5, and 6. 1. rename lastReturnedName to be startAfter; 2. rename PathPartialListing to be DirectoryListing; in additon, I defined the config property dfs.ls.limit and its default value to constants and add comments to DistributedFileSystem#listStatus that explains the operation is no longer atomic. > HDFS should issue multiple RPCs for listing a large directory > ------------------------------------------------------------- > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.