[ https://issues.apache.org/jira/browse/HDFS-13398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ajay Sachdev updated HDFS-13398: -------------------------------- Attachment: HDFS-13398.002.patch > Hdfs recursive listing operation is very slow > --------------------------------------------- > > Key: HDFS-13398 > URL: https://issues.apache.org/jira/browse/HDFS-13398 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 2.7.1 > Environment: HCFS file system where HDP 2.6.1 is connected to ECS > (Object Store). > Reporter: Ajay Sachdev > Assignee: Ajay Sachdev > Priority: Major > Fix For: 2.7.1 > > Attachments: HDFS-13398.001.patch, HDFS-13398.002.patch, > parallelfsPatch > > > The hdfs dfs -ls -R command is sequential in nature and is very slow for a > HCFS system. We have seen around 6 mins for 40K directory/files structure. > The proposal is to use multithreading approach to speed up recursive list, du > and count operations. > We have tried a ForkJoinPool implementation to improve performance for > recursive listing operation. > [https://github.com/jasoncwik/hadoop-release/tree/parallel-fs-cli] > commit id : > 82387c8cd76c2e2761bd7f651122f83d45ae8876 > Another implementation is to use Java Executor Service to improve performance > to run listing operation in multiple threads in parallel. This has > significantly reduced the time to 40 secs from 6 mins. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org