[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2016-05-12 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282142#comment-15282142
 ] 

Mingliang Liu commented on HDFS-8828:
-

Thanks [~jingzhao], [~yzhangal] and [~jnp] for your prompt reply. I filed 
[HDFS-10397] to track the effort.

> Utilize Snapshot diff report to build diff copy list in distcp
> --
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.0
>
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
> HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
> HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2016-05-12 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282125#comment-15282125
 ] 

Jitendra Nath Pandey commented on HDFS-8828:


+1 to the proposal. '-delete' and '-diff' should have been mutually exclusive, 
but instead of throwing an exception ignoring "-delete" sounds ok to prevent 
existing apps from failing.

> Utilize Snapshot diff report to build diff copy list in distcp
> --
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.0
>
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
> HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
> HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2016-05-12 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282121#comment-15282121
 ] 

Yongjun Zhang commented on HDFS-8828:
-

Thanks [~liuml07] and [~jingzhao], +1 on the proposed change.


> Utilize Snapshot diff report to build diff copy list in distcp
> --
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.0
>
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
> HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
> HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2016-05-12 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282112#comment-15282112
 ] 

Jing Zhao commented on HDFS-8828:
-

Agree with [~liuml07]. This option change looks like incompatible, and the 
proposed fix sounds good to me.

> Utilize Snapshot diff report to build diff copy list in distcp
> --
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.0
>
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
> HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
> HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2016-05-12 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282098#comment-15282098
 ] 

Mingliang Liu commented on HDFS-8828:
-

Hi,

I'm aware that  {{-delete}} and {{-diff}} options are mutually exclusive. I'm 
thinking this patch makes the existing applications (or scripts) that work just 
fine with both {{-delete}} and {{-diff}} options stop performing because of the 
{{java.lang.IllegalArgumentException: Diff is valid only with update options}} 
exception. That's said, this change is backward incompatible.

Do you think it makes sense to ignore the {{-delete}} option, given {{-diff}} 
option, instead of exiting the program? Along with that, we can print a warning 
message saying that "Diff is valid only with update options, and -delete option 
is ignored". Ping [~jnp] for more input.

> Utilize Snapshot diff report to build diff copy list in distcp
> --
>
> Key: HDFS-8828
> URL: https://issues.apache.org/jira/browse/HDFS-8828
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp, snapshots
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.8.0
>
> Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
> HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
> HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
> HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch
>
>
> Some users reported huge time cost to build file copy list in distcp. (30 
> hours for 1.6M files). We can leverage snapshot diff report to build file 
> copy list including files/dirs which are changes only between two snapshots 
> (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
> less copy list building time. 2. less file copy MR jobs.
> HDFS snapshot diff report provide information about file/directory creation, 
> deletion, rename and modification between two snapshots or a snapshot and a 
> normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
> the default distcp. So it still relies on default distcp to building complete 
> list of files under the source dir. This patch only puts creation and 
> modification files into the copy list based on snapshot diff report. We can 
> minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707441#comment-14707441
 ] 

Hudson commented on HDFS-8828:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #294 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/294/])
HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. 
(Yufei Gu via Yongjun Zhang) (yzhang: rev 
0bc15cb6e60dc60885234e01dec1c7cb4557a926)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java


 Utilize Snapshot diff report to build diff copy list in distcp
 --

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Fix For: 2.8.0

 Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
 HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
 HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
 HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707317#comment-14707317
 ] 

Hudson commented on HDFS-8828:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #291 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/291/])
HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. 
(Yufei Gu via Yongjun Zhang) (yzhang: rev 
0bc15cb6e60dc60885234e01dec1c7cb4557a926)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java


 Utilize Snapshot diff report to build diff copy list in distcp
 --

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Fix For: 2.8.0

 Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
 HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
 HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
 HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707347#comment-14707347
 ] 

Hudson commented on HDFS-8828:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1024 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1024/])
HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. 
(Yufei Gu via Yongjun Zhang) (yzhang: rev 
0bc15cb6e60dc60885234e01dec1c7cb4557a926)
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java


 Utilize Snapshot diff report to build diff copy list in distcp
 --

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Fix For: 2.8.0

 Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
 HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
 HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
 HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707587#comment-14707587
 ] 

Hudson commented on HDFS-8828:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #283 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/283/])
HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. 
(Yufei Gu via Yongjun Zhang) (yzhang: rev 
0bc15cb6e60dc60885234e01dec1c7cb4557a926)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java


 Utilize Snapshot diff report to build diff copy list in distcp
 --

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Fix For: 2.8.0

 Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
 HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
 HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
 HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707603#comment-14707603
 ] 

Hudson commented on HDFS-8828:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2221 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2221/])
HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. 
(Yufei Gu via Yongjun Zhang) (yzhang: rev 
0bc15cb6e60dc60885234e01dec1c7cb4557a926)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java


 Utilize Snapshot diff report to build diff copy list in distcp
 --

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Fix For: 2.8.0

 Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
 HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
 HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
 HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707540#comment-14707540
 ] 

Hudson commented on HDFS-8828:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2240 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2240/])
HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. 
(Yufei Gu via Yongjun Zhang) (yzhang: rev 
0bc15cb6e60dc60885234e01dec1c7cb4557a926)
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java


 Utilize Snapshot diff report to build diff copy list in distcp
 --

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Fix For: 2.8.0

 Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
 HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
 HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
 HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2015-08-20 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705464#comment-14705464
 ] 

Yufei Gu commented on HDFS-8828:


Thank you very much for committing, [~yzhangal]. 
Thank you very much for review, [~jingzhao].

 Utilize Snapshot diff report to build diff copy list in distcp
 --

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Fix For: 2.8.0

 Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
 HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
 HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
 HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8828) Utilize Snapshot diff report to build diff copy list in distcp

2015-08-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705113#comment-14705113
 ] 

Hudson commented on HDFS-8828:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8328 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8328/])
HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. 
(Yufei Gu via Yongjun Zhang) (yzhang: rev 
0bc15cb6e60dc60885234e01dec1c7cb4557a926)
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
* 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
* 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java


 Utilize Snapshot diff report to build diff copy list in distcp
 --

 Key: HDFS-8828
 URL: https://issues.apache.org/jira/browse/HDFS-8828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: distcp, snapshots
Reporter: Yufei Gu
Assignee: Yufei Gu
 Fix For: 2.8.0

 Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, 
 HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, 
 HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, 
 HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch


 Some users reported huge time cost to build file copy list in distcp. (30 
 hours for 1.6M files). We can leverage snapshot diff report to build file 
 copy list including files/dirs which are changes only between two snapshots 
 (or a snapshot and a normal dir). It speed up the process in two folds: 1. 
 less copy list building time. 2. less file copy MR jobs.
 HDFS snapshot diff report provide information about file/directory creation, 
 deletion, rename and modification between two snapshots or a snapshot and a 
 normal directory. HDFS-7535 synchronize deletion and rename, then fallback to 
 the default distcp. So it still relies on default distcp to building complete 
 list of files under the source dir. This patch only puts creation and 
 modification files into the copy list based on snapshot diff report. We can 
 minimize the number of files to copy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)