[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727798#comment-16727798 ]
Kai X edited comment on HADOOP-16018 at 12/23/18 4:59 AM: ---------------------------------------------------------- I can observe `BLOCKS_PER_CHUNK.getConfigLabel()` is used for the first time in HADOOP-15850 and then hits this regression. Also verified these in my cluster * hadoop-distcp-2.9.1 can reassemble chunks (can be used as a workaround) * 2.9.2 cannot, debug log in CopyCommitter ctor always prints "blocks per chunk 0" * 2.9.2 with the patch applied can reassemble chunks, debug log can print the correct value for blocks per chunk. was (Author: kai33): I can observe `BLOCKS_PER_CHUNK.getConfigLabel()` is used for the first time in HADOOP-15850 and then hits this regression. Also verified these in my cluster * hadoop-distcp-2.9.1 can reassemble chunks (can be used as a workaround) * 2.9.2 cannot * 2.9.2 with the patch applied can reassemble chunks > DistCp won't reassemble chunks when blocks per chunk > 0 > -------------------------------------------------------- > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp > Affects Versions: 3.2.0, 2.9.2 > Reporter: Kai X > Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + "<blocksperchunk> blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, <blocksperchunk> is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org