[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-985: --- Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed]) > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch, > iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, > iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_trunk4.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I've just committed this. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch, > iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, > iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_trunk4.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Status: Patch Available (was: Open) > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch, > iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, > iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_trunk4.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Status: Open (was: Patch Available) > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch, > iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, > iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_trunk4.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_trunk4.patch This fixes a bug in TestFileStatus unit test. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch, > iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, > iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_trunk4.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: directoryBrowse_0.20yahoo_2.patch This patch synced with yahoo 0.20 security branch. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, directoryBrowse_0.20yahoo_2.patch, > iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, > iterativeLS_trunk3.patch, iterativeLS_trunk3.patch, iterativeLS_yahoo.patch, > iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_trunk3.patch Log4j change was not intended. This patch removes it. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, iterativeLS_trunk.patch, > iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, iterativeLS_trunk3.patch, > iterativeLS_trunk3.patch, iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, > testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_trunk3.patch This patch incorporated Suresh's review comments. It throws FileNotFoundException when the listing directory is deleted and it adds an aspectJ test to test this case. Comment 3 needs to be fixed in Mapreduce which I will do it later. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, iterativeLS_trunk.patch, > iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, iterativeLS_trunk3.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Status: Patch Available (was: Open) > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, iterativeLS_trunk.patch, > iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, iterativeLS_trunk3.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: directoryBrowse_0.20yahoo_1.patch This patch fixes the indention problem and returns null if the target directory is deleted before the full listing has been fetched. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > directoryBrowse_0.20yahoo_1.patch, iterativeLS_trunk.patch, > iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, iterativeLS_yahoo.patch, > iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_trunk2.patch iterativeLS_trunk2.patch fixed a bug in TestFileStatus.java. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_trunk2.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: directoryBrowse_0.20yahoo.patch This patch fixes the UI bug when browse directory from web in yahoo 0.20 branch. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: directoryBrowse_0.20yahoo.patch, > iterativeLS_trunk.patch, iterativeLS_trunk1.patch, iterativeLS_yahoo.patch, > iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_trunk1.patch > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: iterativeLS_trunk.patch, iterativeLS_trunk1.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: (was: iterativeLS_trunk.patch) > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: iterativeLS_trunk.patch, iterativeLS_yahoo.patch, > iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_trunk.patch This patch is for trunk and fixed a web UI bug when display a directory structure. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: iterativeLS_trunk.patch, iterativeLS_trunk.patch, > iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_trunk.patch Here is the patch for the trunk. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: iterativeLS_trunk.patch, iterativeLS_yahoo.patch, > iterativeLS_yahoo1.patch, testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: testFileStatus.patch This patch fixed a bug in TestFileStatus.java. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch, > testFileStatus.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_yahoo1.patch This patch addressed Suresh's comments except for comments 3, 5, and 6. 1. rename lastReturnedName to be startAfter; 2. rename PathPartialListing to be DirectoryListing; in additon, I defined the config property dfs.ls.limit and its default value to constants and add comments to DistributedFileSystem#listStatus that explains the operation is no longer atomic. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: iterativeLS_yahoo.patch, iterativeLS_yahoo1.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory
[ https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-985: --- Attachment: iterativeLS_yahoo.patch A patch for review. I am sorry that this patch is generated against the 0.20 Yahoo! branch because I do not have time to work on the trunk first. But I will work on a patch against the trunk for sure before resolving this issue. This patch has a few minor changes to the proposal in the jira description. 1. The upper limit of each getListing RPC is made configurable with a default value of 1000. This configuration property is undocumented for now. One reason to have this is for easy writing unit tests. AlsoI still need to conduct more experiments at a large scale to decide what's the right default number. 2. A getListing RPC returns the number of remaining entries to be listed instead of a flag to indicate if there are more to be listed. The number of remaining entries could provide a heuristic to decide the initial size of the status array at the client size, thus reducing unnecessary allocation/deallocation in most cases. Unit tests are added to TestFileStatus to check if multiple RPCs work fine or not. > HDFS should issue multiple RPCs for listing a large directory > - > > Key: HDFS-985 > URL: https://issues.apache.org/jira/browse/HDFS-985 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 0.22.0 > > Attachments: iterativeLS_yahoo.patch > > > Currently HDFS issues one RPC from the client to the NameNode for listing a > directory. However some directories are large that contain thousands or > millions of items. Listing such large directories in one RPC has a few > shortcomings: > 1. The list operation holds the global fsnamesystem lock for a long time thus > blocking other requests. If a large number (like thousands) of such list > requests hit NameNode in a short period of time, NameNode will be > significantly slowed down. Users end up noticing longer response time or lost > connections to NameNode. > 2. The response message is uncontrollable big. We observed a response as big > as 50M bytes when listing a directory of 300 thousand items. Even with the > optimization introduced at HDFS-946 that may be able to cut the response by > 20-50%, the response size will still in the magnitude of 10 mega bytes. > I propose to implement a directory listing using multiple RPCs. Here is the > plan: > 1. Each getListing RPC has an upper limit on the number of items returned. > This limit could be configurable, but I am thinking to set it to be a fixed > number like 500. > 2. Each RPC additionally specifies a start position for this listing request. > I am thinking to use the last item of the previous listing RPC as an > indicator. Since NameNode stores all items in a directory as a sorted array, > NameNode uses the last item to locate the start item of this listing even if > the last item is deleted in between these two consecutive calls. This has the > advantage of avoid duplicate entries at the client side. > 3. The return value additionally specifies if the whole directory is done > listing. If the client sees a false flag, it will continue to issue another > RPC. > This proposal will change the semantics of large directory listing in a sense > that listing is no longer an atomic operation if a directory's content is > changing while the listing operation is in progress. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.