[ 
https://issues.apache.org/jira/browse/HADOOP-14698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436296#comment-16436296
 ] 

Jason Cwik commented on HADOOP-14698:
-------------------------------------

As mentioned above in 
https://issues.apache.org/jira/browse/HADOOP-14698?focusedCommentId=16107552&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16107552
 the current threading model only works for the leaf nodes.  In deep/wide tree 
structures, the enumeration can take a significant amount of time itself, 
especially when using other FileSystem implementations like S3A or other object 
store connectors.  I started a patch in HDFS-13398 to address this (especially 
for `ls` or `du` commands) but it could likely be combined with this effort to 
parallelize the FsShell module in general.

So far, we've tried two approaches.  The first simply creates another executor 
in the base class and enqueues the child operations in processPaths.  The 
second approach uses ForkJoinPool to crawl the tree and process subtrees in 
parallel.  Currently, we have FJP working with `ls` and `du`, but not other 
operations.  I think that FJP is the best route since we could do things like 
wait to delete a directory until all its children have been deleted, but in 
order to do this properly it might require a significant refactoring of the 
whole FsShell module to implement the correct ForkJoinTask structure.

Thoughts?


> Make copyFromLocal's -t option available for put as well
> --------------------------------------------------------
>
>                 Key: HADOOP-14698
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14698
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Andras Bokor
>            Assignee: Andras Bokor
>            Priority: Major
>         Attachments: HADOOP-14698.01.patch, HADOOP-14698.02.patch, 
> HADOOP-14698.03.patch, HADOOP-14698.04.patch, HADOOP-14698.05.patch, 
> HADOOP-14698.06.patch, HADOOP-14698.07.patch, HADOOP-14698.08.patch
>
>
> After HDFS-11786 copyFromLocal and put are no longer identical.
> I do not see any reason why not to add the new feature to put as well.
> Being non-identical makes the understanding/usage of command more complicated 
> from user point of view.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to