[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742672#comment-16742672 ]
Hadoop QA commented on HADOOP-16018: ------------------------------------ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 1m 51s{color} | {color:red} Docker failed to build yetus/hadoop:a716388. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-16018 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12954908/HADOOP-16018-branch-2-004.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/15785/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > DistCp won't reassemble chunks when blocks per chunk > 0 > -------------------------------------------------------- > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp > Affects Versions: 3.2.0, 2.9.2 > Reporter: Kai Xie > Assignee: Kai Xie > Priority: Major > Fix For: 3.0.4, 3.2.1, 3.1.3 > > Attachments: HADOOP-16018-002.patch, HADOOP-16018-branch-2-002.patch, > HADOOP-16018-branch-2-002.patch, HADOOP-16018-branch-2-003.patch, > HADOOP-16018-branch-2-003.patch, HADOOP-16018-branch-2-004.patch, > HADOOP-16018-branch-2-004.patch, HADOOP-16018-branch-2-004.patch, > HADOOP-16018.01.patch > > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + "<blocksperchunk> blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, <blocksperchunk> is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org