[
https://issues.apache.org/jira/browse/HADOOP-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618535#action_12618535
]
Tsz Wo (Nicholas), SZE commented on HADOOP-3873:
------------------------------------------------
>This sounds rather ad-hoc. What is the use case?
One use case is doing backup a number of directories, say /user1/data,
/user2/data, /user3/data, etc. during off peak hours everyday. Each of these
directories may contain large number of files/bytes. If we simply do distcp,
then it cannot finish copying everything within a single day.
Also, since DistCp currently copies files sequentially, files in /user1/data
will be copied first. The other users will be unhappy.
If distcp support a limit option, we could do something like
distcp /user1/data limit 100GB, 1000000 files
distcp /user2/data limit 100GB, 1000000 files
...
These commands will be executed everyday. Suppose /user1/data contains 5 files
as following
/user1/data/file1 50GB
/user1/data/file2 50GB
/user1/data/file3 50GB
/user1/data/file4 50GB
/user1/data/file5 50GB
Then, distcp will copy file1 and file2 in the first day. In the second day,
since file1 and file2 already exist, distcp will copy file3 and file4. User1
will expect 3 days to finish copying all files.
> DistCp should have an option for limiting the number of files/bytes being
> copied
> --------------------------------------------------------------------------------
>
> Key: HADOOP-3873
> URL: https://issues.apache.org/jira/browse/HADOOP-3873
> Project: Hadoop Core
> Issue Type: New Feature
> Components: tools/distcp
> Reporter: Tsz Wo (Nicholas), SZE
>
> A single DistCp command may potentially copies a huge number of files/bytes.
> In such case, DistCp will run a long time and there is no way stop it nicely.
> It would be good if DistCp have an option to limit the number of files/bytes
> being copied. Once the limit is reached, DistCp will terminate and return
> success. All files copied are guaranteed to be good and there is no
> partially copied file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.