[jira] [Commented] (HADOOP-18739) Parallelize concatenation of distcp chunks of separate files in CopyCommitter
[ https://issues.apache.org/jira/browse/HADOOP-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732986#comment-17732986 ] Abhay Yadav commented on HADOOP-18739: -- Hi [~ste...@apache.org], could you please review the PR? > Parallelize concatenation of distcp chunks of separate files in CopyCommitter > - > > Key: HADOOP-18739 > URL: https://issues.apache.org/jira/browse/HADOOP-18739 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Abhay Yadav >Priority: Trivial > Labels: pull-request-available > > While copying a folder containing large files consisting of multiple distcp > chunks, copy committer synchronously picks chunks of each file and > concatenates them. This part can be improved by parallelizing the > concatenation of distcp chunks of separate files. We are able to save 2-3 > minutes while copying a folder of 100 GB containing 20 files of 5GB size with > this improvement. > Contributing a patch for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18739) Parallelize concatenation of distcp chunks of separate files in CopyCommitter
[ https://issues.apache.org/jira/browse/HADOOP-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722159#comment-17722159 ] ASF GitHub Bot commented on HADOOP-18739: - hadoop-yetus commented on PR #5640: URL: https://github.com/apache/hadoop/pull/5640#issuecomment-1545689229 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 2s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 2s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 33s | | trunk passed | | +1 :green_heart: | compile | 0m 29s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | checkstyle | 0m 27s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 29s | | trunk passed | | +1 :green_heart: | javadoc | 0m 32s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 0m 55s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 9s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 20s | | the patch passed | | +1 :green_heart: | compile | 0m 21s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 21s | | the patch passed | | +1 :green_heart: | compile | 0m 18s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | javac | 0m 18s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 14s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 20s | | the patch passed | | +1 :green_heart: | javadoc | 0m 19s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 17s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 0m 46s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 12s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 14m 47s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 33s | | The patch does not generate ASF License warnings. | | | | 106m 52s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5640 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux a9606d1a53b1 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5204e688c8ed3000cbbbc3d087a546e49d9b6f97 | | Default Java | Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/4/testReport/ | | Max. process+thread count | 540 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/4/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Paralleliz
[jira] [Commented] (HADOOP-18739) Parallelize concatenation of distcp chunks of separate files in CopyCommitter
[ https://issues.apache.org/jira/browse/HADOOP-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722156#comment-17722156 ] ASF GitHub Bot commented on HADOOP-18739: - hadoop-yetus commented on PR #5640: URL: https://github.com/apache/hadoop/pull/5640#issuecomment-1545674155 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 35s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 43s | | trunk passed | | +1 :green_heart: | compile | 0m 26s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 27s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 30s | | trunk passed | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 0m 58s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 8s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 21s | | the patch passed | | +1 :green_heart: | compile | 0m 22s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 22s | | the patch passed | | +1 :green_heart: | compile | 0m 20s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | javac | 0m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 15s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/3/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 16 new + 16 unchanged - 0 fixed = 32 total (was 16) | | +1 :green_heart: | mvnsite | 0m 23s | | the patch passed | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 0m 48s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 37s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 15m 13s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 99m 16s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5640 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux efdc110e1164 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 16559fc875c0b13ada3e64583b06f3439b34d588 | | Default Java | Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/3/testReport/ | | Max. process+thread count | 560 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://ci
[jira] [Commented] (HADOOP-18739) Parallelize concatenation of distcp chunks of separate files in CopyCommitter
[ https://issues.apache.org/jira/browse/HADOOP-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721660#comment-17721660 ] ASF GitHub Bot commented on HADOOP-18739: - hadoop-yetus commented on PR #5640: URL: https://github.com/apache/hadoop/pull/5640#issuecomment-1543494792 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 37s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 53s | | trunk passed | | +1 :green_heart: | compile | 0m 31s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 30s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | checkstyle | 0m 32s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 34s | | trunk passed | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 29s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 1m 0s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 21s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 23s | | the patch passed | | +1 :green_heart: | compile | 0m 23s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 23s | | the patch passed | | +1 :green_heart: | compile | 0m 20s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | javac | 0m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 17s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 16 new + 12 unchanged - 0 fixed = 28 total (was 12) | | +1 :green_heart: | mvnsite | 0m 23s | | the patch passed | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 0m 46s | | the patch passed | | +1 :green_heart: | shadedclient | 19m 54s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 15m 7s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 99m 40s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5640 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux a17d7dd747a9 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / cc15540cdaadd307cf9e8ea8c12e5706e5b0ed72 | | Default Java | Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/2/testReport/ | | Max. process+thread count | 699 (vs. ul
[jira] [Commented] (HADOOP-18739) Parallelize concatenation of distcp chunks of separate files in CopyCommitter
[ https://issues.apache.org/jira/browse/HADOOP-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721566#comment-17721566 ] ASF GitHub Bot commented on HADOOP-18739: - hadoop-yetus commented on PR #5640: URL: https://github.com/apache/hadoop/pull/5640#issuecomment-1542925543 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 34s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 5s | | trunk passed | | +1 :green_heart: | compile | 0m 31s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 29s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | checkstyle | 0m 32s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 34s | | trunk passed | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 30s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 0m 59s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 34s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 23s | | the patch passed | | +1 :green_heart: | compile | 0m 22s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 22s | | the patch passed | | +1 :green_heart: | compile | 0m 20s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | javac | 0m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 16s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 16 new + 12 unchanged - 0 fixed = 28 total (was 12) | | +1 :green_heart: | mvnsite | 0m 23s | | the patch passed | | +1 :green_heart: | javadoc | 0m 21s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 19s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | -1 :x: | spotbugs | 0m 50s | [/new-spotbugs-hadoop-tools_hadoop-distcp.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/1/artifact/out/new-spotbugs-hadoop-tools_hadoop-distcp.html) | hadoop-tools/hadoop-distcp generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | shadedclient | 20m 0s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 14m 35s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 99m 39s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-tools/hadoop-distcp | | | Null passed for non-null parameter of java.util.concurrent.CompletionService.submit(Runnable, Object) in org.apache.hadoop.tools.mapred.CopyCommitter.concatFileChunks(Configuration) At CopyCommitter.java:of java.util.concurrent.CompletionService.submit(Runnable, Object) in org.apache.hadoop.tools.mapred.CopyCommitter.concatFileChunks(Configuration) At CopyCommitter.java:[line 284] | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5640/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5640 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclien
[jira] [Commented] (HADOOP-18739) Parallelize concatenation of distcp chunks of separate files in CopyCommitter
[ https://issues.apache.org/jira/browse/HADOOP-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721553#comment-17721553 ] ASF GitHub Bot commented on HADOOP-18739: - developersarm opened a new pull request, #5640: URL: https://github.com/apache/hadoop/pull/5640 ### Description of PR While copying a folder containing large files consisting of multiple distcp chunks, copy committer synchronously picks chunks of each file and concatenates them. This is very slow. As part of this PR, we are parallelising the concatenation of distcp chunks of separate files with a default thread pool of size 10. ### How was this patch tested? * Existing unit tests are passing which are testing both failure and success scenarios. * Tested the patch on distributed cloud setup by running a distcp job for copying a folder of size 100GB having 20 files of size 5GB from one cluster to another. Noticed an improvement of 2 mins in latency. ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Parallelize concatenation of distcp chunks of separate files in CopyCommitter > - > > Key: HADOOP-18739 > URL: https://issues.apache.org/jira/browse/HADOOP-18739 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Abhay Yadav >Priority: Trivial > > While copying a folder containing large files consisting of multiple distcp > chunks, copy committer synchronously picks chunks of each file and > concatenates them. This part can be improved by parallelizing the > concatenation of distcp chunks of separate files. We are able to save 2-3 > minutes while copying a folder of 100 GB containing 20 files of 5GB size with > this improvement. > Contributing a patch for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org