[jira] [Comment Edited] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2018-12-20 Thread Kai X (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726003#comment-16726003
 ] 

Kai X edited comment on HADOOP-16018 at 12/20/18 4:34 PM:
--

I opened a PR with the fix here:

https://github.com/apache/hadoop/pull/451


was (Author: kai33):
I opened a PR with the fix here:

https://github.com/apache/hadoop/pull/450

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Kai X
>Priority: Major
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2018-12-22 Thread Kai X (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727798#comment-16727798
 ] 

Kai X edited comment on HADOOP-16018 at 12/23/18 4:59 AM:
--

I can observe `BLOCKS_PER_CHUNK.getConfigLabel()` is used for the first time in 
HADOOP-15850 and then hits this regression.

 

Also verified these in my cluster
 * hadoop-distcp-2.9.1 can reassemble chunks (can be used as a workaround)
 * 2.9.2 cannot, debug log in CopyCommitter ctor always prints "blocks per 
chunk 0"
 * 2.9.2 with the patch applied can reassemble chunks, debug log can print the 
correct value for blocks per chunk.

 

 


was (Author: kai33):
I can observe `BLOCKS_PER_CHUNK.getConfigLabel()` is used for the first time in 
HADOOP-15850 and then hits this regression.

 

Also verified these in my cluster
 * hadoop-distcp-2.9.1 can reassemble chunks (can be used as a workaround)
 * 2.9.2 cannot
 * 2.9.2 with the patch applied can reassemble chunks

 

 

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Kai X
>Priority: Major
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-09 Thread Kai Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738348#comment-16738348
 ] 

Kai Xie edited comment on HADOOP-16018 at 1/10/19 2:27 AM:
---

after fixing the compilation issue in branch-2-003 patch, jenkins starts to 
hang during the unit test of distcp.

from jenkins' unit test 
[log|https://builds.apache.org/job/PreCommit-HADOOP-Build/15755/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt]
 it said
{code:java}
[WARNING] Corrupted STDOUT by directly writing to native stream in forked JVM 
1. See FAQ web page and the dump file 
/testptch/hadoop/hadoop-tools/hadoop-distcp/target/surefire-reports/2019-01-09T03-06-03_867-jvmRun1.dumpstream
...
[INFO] Running org.apache.hadoop.tools.TestDistCpSync
(hanging){code}
which may be the hanging cause and seems unrelated to the patch

 

I ran it locally and all test can be passed.

 


was (Author: kai33):
after fixing the compilation issue in branch-2-003 patch, jenkins starts to 
hang during the unit test of distcp.

from jenkins' unit test 
[log|https://builds.apache.org/job/PreCommit-HADOOP-Build/15755/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt]
 it said
{code:java}
[WARNING] Corrupted STDOUT by directly writing to native stream in forked JVM 
1. See FAQ web page and the dump file 
/testptch/hadoop/hadoop-tools/hadoop-distcp/target/surefire-reports/2019-01-09T03-06-03_867-jvmRun1.dumpstream
{code}
which may be the hanging cause

 

I ran it locally and all test can be passed.

 

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Fix For: 3.0.4, 3.2.1, 3.1.3
>
> Attachments: HADOOP-16018-002.patch, HADOOP-16018-branch-2-002.patch, 
> HADOOP-16018-branch-2-002.patch, HADOOP-16018-branch-2-003.patch, 
> HADOOP-16018-branch-2-003.patch, HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-15 Thread Kai Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743014#comment-16743014
 ] 

Kai Xie edited comment on HADOOP-16018 at 1/15/19 11:55 AM:


Hi [~ste...@apache.org] 

tried the way you mentioned and I understand it should be working, but jenkins 
consistently fails to build docker image for branch-2 due to npm error:
{code:java}
npm ERR! 
npm ERR! Additional logging details can be found in:
npm ERR! /root/npm-debug.log
npm ERR! not ok code 0
[0mThe command '/bin/sh -c apt-get -y install nodejs && ln -s 
/usr/bin/nodejs /usr/bin/node && apt-get -y install npm && npm config 
set strict-ssl false && npm install -g bower && npm install -g 
ember-cli' returned a non-zero code: 1
{code}
when I removed the line above in Dockerfile, jenkins can proceed but it will 
run the whole set of unit test ... and failed. Do you have any suggestion to 
proceed? Thanks

 


was (Author: kai33):
Hi [~ste...@apache.org] 

tried the way you mentioned and I understand it should be working, but jenkins 
consistently fails to build docker image for branch-2 due to npm error:
{code:java}
[0mThe command '/bin/sh -c apt-get -y install nodejs && ln -s 
/usr/bin/nodejs /usr/bin/node && apt-get -y install npm && npm config 
set strict-ssl false && npm install -g bower && npm install -g 
ember-cli' returned a non-zero code: 1
{code}
when I removed the line above in Dockerfile, jenkins can proceed but it will 
run the whole set of unit test ... and failed. Do you have any suggestion to 
proceed? Thanks

 

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Fix For: 3.0.4, 3.2.1, 3.1.3
>
> Attachments: HADOOP-16018-002.patch, HADOOP-16018-branch-2-002.patch, 
> HADOOP-16018-branch-2-002.patch, HADOOP-16018-branch-2-003.patch, 
> HADOOP-16018-branch-2-003.patch, HADOOP-16018-branch-2-004.patch, 
> HADOOP-16018-branch-2-004.patch, HADOOP-16018-branch-2-004.patch, 
> HADOOP-16018-branch-2-004.patch, HADOOP-16018-branch-2-005.patch, 
> HADOOP-16018-branch-2-005.patch, HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-15 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743268#comment-16743268
 ] 

Allen Wittenauer edited comment on HADOOP-16018 at 1/15/19 6:12 PM:


The nodejs stuff is just awful and has been mostly unnecessary for a while now. 
 Another one of those moments where I was told to basically go away on 
common-dev when it was introduced.  Anyway, probably start with HADOOP-15617 .  
If Hadoop adds the jshint package, they'll get javascript linting in yetus 
0.9.0. (e.g., 
https://github.com/apache/yetus/blob/269ed7f4b89cdf50ee152fe2e7aa1eb805c964f0/precommit/src/main/shell/test-patch-docker/Dockerfile#L221)




was (Author: aw):
The nodejs stuff is just awful and has been mostly unnecessary for a while now. 
 Another one of those moments where I was told to basically go away on 
common-dev when it was introduced.  Anyway, probably start with HADOOP-15617 .  
If Hadoop adds the jshint package, they'll get javascript linting in yetus 
0.9.0.



> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Fix For: 3.0.4, 3.2.1, 3.1.3
>
> Attachments: HADOOP-16018-002.patch, HADOOP-16018-branch-2-002.patch, 
> HADOOP-16018-branch-2-002.patch, HADOOP-16018-branch-2-003.patch, 
> HADOOP-16018-branch-2-004.patch, HADOOP-16018-branch-2-005.patch, 
> HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-17 Thread Kai Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745720#comment-16745720
 ] 

Kai Xie edited comment on HADOOP-16018 at 1/18/19 2:39 AM:
---

Thanks for resolving the image building issue on branch-2!

Hi [~ste...@apache.org]

I tried to trigger CI with patch branch-2-004 and branch-2-005 (both only 
introduce a constant without any usage), and distcp's unit test consistently 
hangs at TestDistCpSync, TestDistCpSyncReverseFromTarget, and 
TestDistCpSyncReverseFromSource.

 

Example hanging logs:

[https://builds.apache.org/job/PreCommit-HADOOP-Build/15799/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt]

[https://builds.apache.org/job/PreCommit-HADOOP-Build/15798/artifact/out/patch-unit-hadoop-tools_hadoop-distcp.txt]

 


was (Author: kai33):
Thanks for resolving the image building issue on branch-2!

Hi [~ste...@apache.org]

I tried to trigger CI with patch branch-2-004 and branch-2-005 (both only 
introduce a constant without any usage), and distcp's unit test consistently 
hangs at TestDistCpSync, TestDistCpSyncReverseFromTarget, and 
TestDistCpSyncReverseFromSource.

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Fix For: 3.0.4, 3.2.1, 3.1.3
>
> Attachments: HADOOP-16018-002.patch, HADOOP-16018-branch-2-002.patch, 
> HADOOP-16018-branch-2-002.patch, HADOOP-16018-branch-2-003.patch, 
> HADOOP-16018-branch-2-004.patch, HADOOP-16018-branch-2-004.patch, 
> HADOOP-16018-branch-2-005.patch, HADOOP-16018-branch-2-005.patch, 
> HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org