[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738348#comment-16738348 ]
Kai Xie commented on HADOOP-16018: ---------------------------------- after fixing the compilation issue in branch-2-003 patch, jenkins starts to hang during the unit test of distcp. from jenkins' unit test [log|https://builds.apache.org/job/PreCommit-HADOOP-Build/15755/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt] it said {code:java} [WARNING] Corrupted STDOUT by directly writing to native stream in forked JVM 1. See FAQ web page and the dump file /testptch/hadoop/hadoop-tools/hadoop-distcp/target/surefire-reports/2019-01-09T03-06-03_867-jvmRun1.dumpstream {code} which may be the hanging cause I ran it locally and all test can be passed. > DistCp won't reassemble chunks when blocks per chunk > 0 > -------------------------------------------------------- > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp > Affects Versions: 3.2.0, 2.9.2 > Reporter: Kai Xie > Assignee: Kai Xie > Priority: Major > Fix For: 3.0.4, 3.2.1, 3.1.3 > > Attachments: HADOOP-16018-002.patch, HADOOP-16018-branch-2-002.patch, > HADOOP-16018-branch-2-002.patch, HADOOP-16018-branch-2-003.patch, > HADOOP-16018-branch-2-003.patch, HADOOP-16018.01.patch > > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + "<blocksperchunk> blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, <blocksperchunk> is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org