[jira] [Commented] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732568#comment-16732568 ] Kai X commented on HADOOP-16018: Uploaded the patch from the PR to trigger pre-commit job > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2 >Reporter: Kai X >Assignee: Kai X >Priority: Major > Attachments: HADOOP-16018.01.patch > > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai X updated HADOOP-16018: --- Attachment: HADOOP-16018.01.patch > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2 >Reporter: Kai X >Assignee: Kai X >Priority: Major > Attachments: HADOOP-16018.01.patch > > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727798#comment-16727798 ] Kai X edited comment on HADOOP-16018 at 12/23/18 4:59 AM: -- I can observe `BLOCKS_PER_CHUNK.getConfigLabel()` is used for the first time in HADOOP-15850 and then hits this regression. Also verified these in my cluster * hadoop-distcp-2.9.1 can reassemble chunks (can be used as a workaround) * 2.9.2 cannot, debug log in CopyCommitter ctor always prints "blocks per chunk 0" * 2.9.2 with the patch applied can reassemble chunks, debug log can print the correct value for blocks per chunk. was (Author: kai33): I can observe `BLOCKS_PER_CHUNK.getConfigLabel()` is used for the first time in HADOOP-15850 and then hits this regression. Also verified these in my cluster * hadoop-distcp-2.9.1 can reassemble chunks (can be used as a workaround) * 2.9.2 cannot * 2.9.2 with the patch applied can reassemble chunks > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727798#comment-16727798 ] Kai X commented on HADOOP-16018: I can observe `BLOCKS_PER_CHUNK.getConfigLabel()` is used for the first time in HADOOP-15850 and then hits this regression. Also verified these in my cluster * hadoop-distcp-2.9.1 can reassemble chunks (can be used as a workaround) * 2.9.2 cannot * 2.9.2 with the patch applied can reassemble chunks > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726003#comment-16726003 ] Kai X edited comment on HADOOP-16018 at 12/20/18 4:34 PM: -- I opened a PR with the fix here: https://github.com/apache/hadoop/pull/451 was (Author: kai33): I opened a PR with the fix here: https://github.com/apache/hadoop/pull/450 > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai X updated HADOOP-16018: --- Comment: was deleted (was: https://github.com/apache/hadoop/pull/450) > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai X updated HADOOP-16018: --- Status: Patch Available (was: Open) https://github.com/apache/hadoop/pull/450 > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 2.9.2, 3.2.0 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726003#comment-16726003 ] Kai X commented on HADOOP-16018: I opened a PR with the fix here: https://github.com/apache/hadoop/pull/450 > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai X updated HADOOP-16018: --- Affects Version/s: (was: 3.0.3) (was: 3.1.1) 3.2.0 > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.2.0, 2.9.2 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725954#comment-16725954 ] Kai X commented on HADOOP-16018: its test can pass, because the empty config key is not overridden in the UT > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725937#comment-16725937 ] Kai X commented on HADOOP-16018: I'm trying to contribute a simple fix for this issue > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
[ https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai X updated HADOOP-16018: --- Description: I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the same file when blocks per chunk has been set > 0. In the CopyCommitter::commitJob, this logic can prevent chunks from reassembling if blocks per chunk is equal to 0: {code:java} if (blocksPerChunk > 0) { concatFileChunks(conf); } {code} Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: {code:java} blocksPerChunk = context.getConfiguration().getInt( DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); {code} But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() will always returns empty string because it is constructed without config label: {code:java} BLOCKS_PER_CHUNK("", new Option("blocksperchunk", true, "If set to a positive value, files" + "with more blocks than this value will be split into chunks of " + " blocks to be transferred in parallel, and " + "reassembled on the destination. By default, is " + "0 and the files will be transmitted in their entirety without " + "splitting. This switch is only applicable when the source file " + "system implements getBlockLocations method and the target file " + "system implements concat method")) {code} As a result it will fall back to the default value 0 for blocksPerChunk, and prevent the chunks from reassembling. was: I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the same file when blocks per chunk has been set > 0. In the CopyCommitter::commitJob, this logic can prevent reassemble chunks if blocks per chunk is equal to 0: {code:java} if (blocksPerChunk > 0) { concatFileChunks(conf); } {code} Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: {code:java} blocksPerChunk = context.getConfiguration().getInt( DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); {code} But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() will always returns empty string because it is constructed without config label: {code:java} BLOCKS_PER_CHUNK("", new Option("blocksperchunk", true, "If set to a positive value, files" + "with more blocks than this value will be split into chunks of " + " blocks to be transferred in parallel, and " + "reassembled on the destination. By default, is " + "0 and the files will be transmitted in their entirety without " + "splitting. This switch is only applicable when the source file " + "system implements getBlockLocations method and the target file " + "system implements concat method")) {code} As a result it will fall back to the default value 0 for blocksPerChunk, and prevent the chunks from reassembling. > DistCp won't reassemble chunks when blocks per chunk > 0 > > > Key: HADOOP-16018 > URL: https://issues.apache.org/jira/browse/HADOOP-16018 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp >Affects Versions: 3.1.1, 2.9.2, 3.0.3 >Reporter: Kai X >Priority: Major > > I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the > same file when blocks per chunk has been set > 0. > In the CopyCommitter::commitJob, this logic can prevent chunks from > reassembling if blocks per chunk is equal to 0: > {code:java} > if (blocksPerChunk > 0) { > concatFileChunks(conf); > } > {code} > Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: > {code:java} > blocksPerChunk = context.getConfiguration().getInt( > DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); > {code} > > But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() > will always returns empty string because it is constructed without config > label: > {code:java} > BLOCKS_PER_CHUNK("", > new Option("blocksperchunk", true, "If set to a positive value, files" > + "with more blocks than this value will be split into chunks of " > + " blocks to be transferred in parallel, and " > + "reassembled on the destination. By default, is " > + "0 and the files will be transmitted in their entirety without " > + "splitting. This switch is only applicable when the source file " > + "system implements getBlockLocations method and the target file " > + "system implements concat method")) > {code} > As a result it will fall back to the default value 0 for blocksPerChunk, and > prevent the chunks from reassembling. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0
Kai X created HADOOP-16018: -- Summary: DistCp won't reassemble chunks when blocks per chunk > 0 Key: HADOOP-16018 URL: https://issues.apache.org/jira/browse/HADOOP-16018 Project: Hadoop Common Issue Type: Bug Components: tools/distcp Affects Versions: 3.0.3, 2.9.2, 3.1.1 Reporter: Kai X I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the same file when blocks per chunk has been set > 0. In the CopyCommitter::commitJob, this logic can prevent reassemble chunks if blocks per chunk is equal to 0: {code:java} if (blocksPerChunk > 0) { concatFileChunks(conf); } {code} Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config: {code:java} blocksPerChunk = context.getConfiguration().getInt( DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0); {code} But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() will always returns empty string because it is constructed without config label: {code:java} BLOCKS_PER_CHUNK("", new Option("blocksperchunk", true, "If set to a positive value, files" + "with more blocks than this value will be split into chunks of " + " blocks to be transferred in parallel, and " + "reassembled on the destination. By default, is " + "0 and the files will be transmitted in their entirety without " + "splitting. This switch is only applicable when the source file " + "system implements getBlockLocations method and the target file " + "system implements concat method")) {code} As a result it will fall back to the default value 0 for blocksPerChunk, and prevent the chunks from reassembling. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org