[ 
https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618547#action_12618547
 ] 

Doug Cutting commented on HADOOP-3873:
--------------------------------------

Okay, sounds like a reasonable use case.

Your initial description sounded like you intended to count the files copied as 
the job runs, and terminate it when it crosses a limit.  That would be tricky, 
and is perhaps not what you meant anyway.  Rather, all we need to do to 
implement this is to count bytes and files as files are listed in the client 
before the job is created.  If that's all you mean, then +1, this seems like a 
fine feature.

The implementation would be much cleaner if listStatus acceptted a 
StatusFilter.  Then the filter can count bytes and files and stop returning new 
files once its limit is exceeded.  The existing code would hardly change.

> DistCp should have an option for limiting the number of files/bytes being 
> copied
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-3873
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3873
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>
> A single DistCp command may potentially copies a huge number of files/bytes.  
> In such case, DistCp will run a long time and there is no way stop it nicely. 
>  It would be good if DistCp have an option to limit the number of files/bytes 
> being copied.  Once the limit is reached, DistCp will terminate and return 
> success.  All files copied are guaranteed to be good and there is no 
> partially copied file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to