[jira] [Commented] (HADOOP-14178) Move Mockito up to version 2.x

2018-02-08 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357988#comment-16357988
 ] 

Akira Ajisaka commented on HADOOP-14178:


006 patch: rebased

> Move Mockito up to version 2.x
> --
>
> Key: HADOOP-14178
> URL: https://issues.apache.org/jira/browse/HADOOP-14178
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: HADOOP-14178.001.patch, HADOOP-14178.002.patch, 
> HADOOP-14178.003.patch, HADOOP-14178.004.patch, HADOOP-14178.005-wip.patch, 
> HADOOP-14178.005-wip2.patch, HADOOP-14178.005-wip3.patch, 
> HADOOP-14178.005-wip4.patch, HADOOP-14178.005-wip5.patch, 
> HADOOP-14178.005-wip6.patch, HADOOP-14178.005.patch, HADOOP-14178.006.patch
>
>
> I don't know when Hadoop picked up Mockito, but it has been frozen at 1.8.5 
> since the switch to maven in 2011. 
> Mockito is now at version 2.1, [with lots of Java 8 
> support|https://github.com/mockito/mockito/wiki/What%27s-new-in-Mockito-2]. 
> That' s not just defining actions as closures, but in supporting Optional 
> types, mocking methods in interfaces, etc. 
> It's only used for testing, and, *provided there aren't regressions*, cost of 
> upgrade is low. The good news: test tools usually come with good test 
> coverage. The bad: mockito does go deep into java bytecodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14178) Move Mockito up to version 2.x

2018-02-08 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HADOOP-14178:
---
Attachment: HADOOP-14178.006.patch

> Move Mockito up to version 2.x
> --
>
> Key: HADOOP-14178
> URL: https://issues.apache.org/jira/browse/HADOOP-14178
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: HADOOP-14178.001.patch, HADOOP-14178.002.patch, 
> HADOOP-14178.003.patch, HADOOP-14178.004.patch, HADOOP-14178.005-wip.patch, 
> HADOOP-14178.005-wip2.patch, HADOOP-14178.005-wip3.patch, 
> HADOOP-14178.005-wip4.patch, HADOOP-14178.005-wip5.patch, 
> HADOOP-14178.005-wip6.patch, HADOOP-14178.005.patch, HADOOP-14178.006.patch
>
>
> I don't know when Hadoop picked up Mockito, but it has been frozen at 1.8.5 
> since the switch to maven in 2011. 
> Mockito is now at version 2.1, [with lots of Java 8 
> support|https://github.com/mockito/mockito/wiki/What%27s-new-in-Mockito-2]. 
> That' s not just defining actions as closures, but in supporting Optional 
> types, mocking methods in interfaces, etc. 
> It's only used for testing, and, *provided there aren't regressions*, cost of 
> upgrade is low. The good news: test tools usually come with good test 
> coverage. The bad: mockito does go deep into java bytecodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357864#comment-16357864
 ] 

genericqa commented on HADOOP-15218:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 12s{color} 
| {color:red} hadoop-yarn-services-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 77m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15218 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909865/HADOOP-15218-001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8cabe79779f0 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1bc03dd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14089/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14089/testReport/ |
| Max. process+thread count | 604 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 

[jira] [Updated] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HADOOP-15218:
---
Attachment: HADOOP-15218-001.patch

> Make Hadoop compatible with Guava 22.0+
> ---
>
> Key: HADOOP-15218
> URL: https://issues.apache.org/jira/browse/HADOOP-15218
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
> Attachments: HADOOP-15218-001.patch
>
>
> Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
> HostAndPort#getHost method is not available before Guava 20.0.
> This patch implements getHost(HostAndPort) method that extracts host from 
> HostAndPort#toString value.
> This is a little hacky, that's why I'm not sure if it worth to merge this 
> patch, but it could be nice if Hadoop will be Guava-neutral.
> With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HADOOP-15218:
---
Attachment: (was: HADOOP-15218-001.patch)

> Make Hadoop compatible with Guava 22.0+
> ---
>
> Key: HADOOP-15218
> URL: https://issues.apache.org/jira/browse/HADOOP-15218
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
> Attachments: HADOOP-15218-001.patch
>
>
> Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
> HostAndPort#getHost method is not available before Guava 20.0.
> This patch implements getHost(HostAndPort) method that extracts host from 
> HostAndPort#toString value.
> This is a little hacky, that's why I'm not sure if it worth to merge this 
> patch, but it could be nice if Hadoop will be Guava-neutral.
> With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15069) support git-secrets commit hook to keep AWS secrets out of git

2018-02-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357767#comment-16357767
 ] 

genericqa commented on HADOOP-15069:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
44s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
14s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  2m 
20s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
36m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  6m 
45s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15069 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899103/HADOOP-15069-002.patch
 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 4ef4db97dec9 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 
15:49:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1bc03dd |
| maven | version: Apache Maven 3.3.9 |
| mvnsite | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14088/artifact/out/branch-mvnsite-root.txt
 |
| mvnsite | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14088/artifact/out/patch-mvnsite-root.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14088/artifact/out/whitespace-eol.txt
 |
| Max. process+thread count | 341 (vs. ulimit of 5500) |
| modules | C: hadoop-tools/hadoop-aws . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14088/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> support git-secrets commit hook to keep AWS secrets out of git
> --
>
> Key: HADOOP-15069
> URL: https://issues.apache.org/jira/browse/HADOOP-15069
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15069-001.patch, HADOOP-15069-002.patch
>
>
> The latest Uber breach looks like it involved AWS keys in git repos.
> Nobody wants that, which is why amazon provide 
> [git-secrets|https://github.com/awslabs/git-secrets]; a script you can use to 
> scan a repo and its history, *and* add as an automated check.
> Anyone can set this up, but there are a few false positives in the scan, 
> mostly from longs and a few all-upper-case constants. These can all be added 
> to a .gitignore file.
> Also: mention git-secrets in the aws testing docs; say "use it"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357753#comment-16357753
 ] 

genericqa commented on HADOOP-15218:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core:
 The patch generated 1 new + 11 unchanged - 0 fixed = 12 total (was 11) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 14s{color} 
| {color:red} hadoop-yarn-services-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15218 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909855/HADOOP-15218-001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3e3d717a671f 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1bc03dd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14087/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
 |
| unit | 

[jira] [Commented] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357746#comment-16357746
 ] 

genericqa commented on HADOOP-15218:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core:
 The patch generated 1 new + 11 unchanged - 0 fixed = 12 total (was 11) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 34m 15s{color} 
| {color:red} hadoop-yarn-services-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15218 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909851/HADOOP-15218-001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux de5a061d329c 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1bc03dd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14086/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
 |
| unit | 

[jira] [Commented] (HADOOP-15216) S3AInputStream to handle reconnect on read() failure better

2018-02-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357725#comment-16357725
 ] 

Aaron Fabbri commented on HADOOP-15216:
---

I'm working on a patch that uses {{retry()}} for {{onReadFailure()}} in the s3a 
input stream, but only when s3guard is enabled.

What do you want to do for HEAD->200, GET -> 400 on the non-s3guard case?  
Currently we retry once immediately.  Was going to keep that behavior for now, 
unless you think otherwise.  We could add another retry policy config knob 
"input stream retry always" or something and default to off.

{quote}
+on s3guard, GET could be 403 -> fail
{quote}
Trying to parse this.  We have a couple cases in open(), when we call 
getFileStatus():
- MetadataStore sees tombstone and throws FNFE.
- MetadataStore has no state for the path, returns null. We fall through to 
s3GetFileStatus(), which should throw FNFE which bypasses the retry policy, 
right?




> S3AInputStream to handle reconnect on read() failure better
> ---
>
> Key: HADOOP-15216
> URL: https://issues.apache.org/jira/browse/HADOOP-15216
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> {{S3AInputStream}} handles any IOE through a close() of stream and single 
> re-invocation of the read, with 
> * no backoff
> * no abort of the HTTPS connection, which is just returned to the pool, If 
> httpclient hasn't noticed the failure, it may get returned to the caller on 
> the next read
> Proposed
> * switch to invoker
> * retry policy explicitly for stream (EOF => throw, timeout => close, sleep, 
> retry, etc)
> We could think about extending the fault injection to inject stream read 
> failures intermittently too, though it would need something in S3AInputStream 
> to (optionally) wrap the http input streams with the failing stream. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15069) support git-secrets commit hook to keep AWS secrets out of git

2018-02-08 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HADOOP-15069:
---
Target Version/s: 2.8.3, 3.1.0, 2.9.1, 3.0.2  (was: 2.8.3, 3.1.0, 2.9.1, 
3.0.1)

> support git-secrets commit hook to keep AWS secrets out of git
> --
>
> Key: HADOOP-15069
> URL: https://issues.apache.org/jira/browse/HADOOP-15069
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15069-001.patch, HADOOP-15069-002.patch
>
>
> The latest Uber breach looks like it involved AWS keys in git repos.
> Nobody wants that, which is why amazon provide 
> [git-secrets|https://github.com/awslabs/git-secrets]; a script you can use to 
> scan a repo and its history, *and* add as an automated check.
> Anyone can set this up, but there are a few false positives in the scan, 
> mostly from longs and a few all-upper-case constants. These can all be added 
> to a .gitignore file.
> Also: mention git-secrets in the aws testing docs; say "use it"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15140) S3guard mistakes root URI without / as non-absolute path

2018-02-08 Thread Abraham Fine (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357706#comment-16357706
 ] 

Abraham Fine commented on HADOOP-15140:
---

[~ste...@apache.org] What would a test that works consistently look like (as 
the tests would also run on the local file system)? 

> S3guard mistakes root URI without / as non-absolute path
> 
>
> Key: HADOOP-15140
> URL: https://issues.apache.org/jira/browse/HADOOP-15140
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Abraham Fine
>Priority: Major
>
> If you call {{getFileStatus("s3a://bucket")}} then S3Guard will throw an 
> exception in putMetadata, as it mistakes the empty path for "non-absolute 
> path"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15112) create-release didn't sign artifacts

2018-02-08 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357694#comment-16357694
 ] 

Lei (Eddy) Xu commented on HADOOP-15112:


Run on a ubuntu 16.04 machine with {{gnupg-agent  2.1.11-6ubuntu2}}.

{{GPG_AGENT_INFO}} is not set after running the following code :

{code:sh|title=dev-support/bin/create-release}
eval $("${GPGAGENT}" --daemon \
--options "${LOGDIR}/gpgagent.conf" \
--log-file="${LOGDIR}/create-release-gpgagent.log")
{code}

because {{gnupg-agent}} > 2.1 does not set this variable: 
https://www.gnupg.org/faq/whats-new-in-2.1.html#autostart.

{{create-release}} checks the existence of this {{GPG_AGENT_INFO}} before 
signing artifacts, so it will ignore signing process: 

{code:sh|title=dev-support/bin/create-release}
 if [[ -n "${GPG_AGENT_INFO}" ]]; then
  echo "Warming the gpg-agent cache prior to calling maven"
  # warm the agent's cache:
  touch "${LOGDIR}/warm"
  ${GPG} --use-agent --armor --output "${LOGDIR}/warm.asc" --detach-sig 
"${LOGDIR}/warm"
  rm "${LOGDIR}/warm.asc" "${LOGDIR}/warm"
else
  SIGN=false
  hadoop_error "ERROR: Unable to launch or acquire gpg-agent. Disable 
signing."
fi
{code}

[~mackrorysd] [~andrew.wang] [~aw] would like you hear your inputs here. Should 
we check gpg agent version before it?  Or just change how to use {{gpg > 2.1}}. 
 gpg 2.1 was released Nov 2014. 



> create-release didn't sign artifacts
> 
>
> Key: HADOOP-15112
> URL: https://issues.apache.org/jira/browse/HADOOP-15112
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Lei (Eddy) Xu
>Priority: Major
>
> While building the 3.0.0 RC1, I had to re-invoke Maven because the 
> create-release script didn't deploy signatures to Nexus. Looking at the repo 
> (and my artifacts), it seems like "sign" didn't run properly.
> I lost my create-release output, but I noticed that it will log and continue 
> rather than abort in some error conditions. This might have caused my lack of 
> signatures. IMO it'd be better to explicitly fail in these situations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HADOOP-15218:
---
Attachment: (was: HADOOP-15218-001.patch)

> Make Hadoop compatible with Guava 22.0+
> ---
>
> Key: HADOOP-15218
> URL: https://issues.apache.org/jira/browse/HADOOP-15218
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
> Attachments: HADOOP-15218-001.patch
>
>
> Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
> HostAndPort#getHost method is not available before Guava 20.0.
> This patch implements getHost(HostAndPort) method that extracts host from 
> HostAndPort#toString value.
> This is a little hacky, that's why I'm not sure if it worth to merge this 
> patch, but it could be nice if Hadoop will be Guava-neutral.
> With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HADOOP-15218:
---
Attachment: HADOOP-15218-001.patch

> Make Hadoop compatible with Guava 22.0+
> ---
>
> Key: HADOOP-15218
> URL: https://issues.apache.org/jira/browse/HADOOP-15218
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
> Attachments: HADOOP-15218-001.patch
>
>
> Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
> HostAndPort#getHost method is not available before Guava 20.0.
> This patch implements getHost(HostAndPort) method that extracts host from 
> HostAndPort#toString value.
> This is a little hacky, that's why I'm not sure if it worth to merge this 
> patch, but it could be nice if Hadoop will be Guava-neutral.
> With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HADOOP-15218:
---
Attachment: (was: HADOOP-15218-001.patch)

> Make Hadoop compatible with Guava 22.0+
> ---
>
> Key: HADOOP-15218
> URL: https://issues.apache.org/jira/browse/HADOOP-15218
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
> Attachments: HADOOP-15218-001.patch
>
>
> Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
> HostAndPort#getHost method is not available before Guava 20.0.
> This patch implements getHost(HostAndPort) method that extracts host from 
> HostAndPort#toString value.
> This is a little hacky, that's why I'm not sure if it worth to merge this 
> patch, but it could be nice if Hadoop will be Guava-neutral.
> With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HADOOP-15218:
---
Attachment: HADOOP-15218-001.patch

> Make Hadoop compatible with Guava 22.0+
> ---
>
> Key: HADOOP-15218
> URL: https://issues.apache.org/jira/browse/HADOOP-15218
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
> Attachments: HADOOP-15218-001.patch
>
>
> Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
> HostAndPort#getHost method is not available before Guava 20.0.
> This patch implements getHost(HostAndPort) method that extracts host from 
> HostAndPort#toString value.
> This is a little hacky, that's why I'm not sure if it worth to merge this 
> patch, but it could be nice if Hadoop will be Guava-neutral.
> With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HADOOP-15218:
---
Attachment: HADOOP-15218-001.patch

> Make Hadoop compatible with Guava 22.0+
> ---
>
> Key: HADOOP-15218
> URL: https://issues.apache.org/jira/browse/HADOOP-15218
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
> Attachments: HADOOP-15218-001.patch
>
>
> Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
> HostAndPort#getHost method is not available before Guava 20.0.
> This patch implements getHost(HostAndPort) method that extracts host from 
> HostAndPort#toString value.
> This is a little hacky, that's why I'm not sure if it worth to merge this 
> patch, but it could be nice if Hadoop will be Guava-neutral.
> With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated HADOOP-15218:
---
Status: Patch Available  (was: Open)

> Make Hadoop compatible with Guava 22.0+
> ---
>
> Key: HADOOP-15218
> URL: https://issues.apache.org/jira/browse/HADOOP-15218
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
> Attachments: HADOOP-15218-001.patch
>
>
> Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
> HostAndPort#getHost method is not available before Guava 20.0.
> This patch implements getHost(HostAndPort) method that extracts host from 
> HostAndPort#toString value.
> This is a little hacky, that's why I'm not sure if it worth to merge this 
> patch, but it could be nice if Hadoop will be Guava-neutral.
> With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15218) Make Hadoop compatible with Guava 22.0+

2018-02-08 Thread Igor Dvorzhak (JIRA)
Igor Dvorzhak created HADOOP-15218:
--

 Summary: Make Hadoop compatible with Guava 22.0+
 Key: HADOOP-15218
 URL: https://issues.apache.org/jira/browse/HADOOP-15218
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Igor Dvorzhak
Assignee: Igor Dvorzhak


Deprecated HostAndPort#getHostText method was deleted in Guava 22.0 and new 
HostAndPort#getHost method is not available before Guava 20.0.

This patch implements getHost(HostAndPort) method that extracts host from 
HostAndPort#toString value.

This is a little hacky, that's why I'm not sure if it worth to merge this 
patch, but it could be nice if Hadoop will be Guava-neutral.

With this patch Hadoop can be built against latest Guava v24.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15214) Make Hadoop compatible with Guava 21.0

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357678#comment-16357678
 ] 

ASF GitHub Bot commented on HADOOP-15214:
-

Github user medb commented on the issue:

https://github.com/apache/hadoop/pull/318
  
Was merged in https://issues.apache.org/jira/browse/HADOOP-15214


> Make Hadoop compatible with Guava 21.0
> --
>
> Key: HADOOP-15214
> URL: https://issues.apache.org/jira/browse/HADOOP-15214
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15214.001.patch
>
>
> There are only 3 changes that need to be done to make Hadoop compile with 
> Guava 21.0 dependency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15112) create-release didn't sign artifacts

2018-02-08 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HADOOP-15112:
---
Target Version/s: 3.1.0, 3.0.2  (was: 3.1.0, 3.0.1)

> create-release didn't sign artifacts
> 
>
> Key: HADOOP-15112
> URL: https://issues.apache.org/jira/browse/HADOOP-15112
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Lei (Eddy) Xu
>Priority: Major
>
> While building the 3.0.0 RC1, I had to re-invoke Maven because the 
> create-release script didn't deploy signatures to Nexus. Looking at the repo 
> (and my artifacts), it seems like "sign" didn't run properly.
> I lost my create-release output, but I noticed that it will log and continue 
> rather than abort in some error conditions. This might have caused my lack of 
> signatures. IMO it'd be better to explicitly fail in these situations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14969) Improve diagnostics in secure DataNode startup

2018-02-08 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HADOOP-14969:

Status: Open  (was: Patch Available)

> Improve diagnostics in secure DataNode startup
> --
>
> Key: HADOOP-14969
> URL: https://issues.apache.org/jira/browse/HADOOP-14969
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Attachments: HADOOP-14969.001.patch, HADOOP-14969.002.patch, 
> HADOOP-14969.003.patch, HADOOP-14969.004.patch, HADOOP-14969.005.patch, 
> HADOOP-14969.006.patch
>
>
> When DN secure mode configuration is incorrect, it throws the following 
> exception from Datanode#checkSecureConfig
> {code}
>   private static void checkSecureConfig(DNConf dnConf, Configuration conf,
>   SecureResources resources) throws RuntimeException {
> if (!UserGroupInformation.isSecurityEnabled()) {
>   return;
> }
> ...
> throw new RuntimeException("Cannot start secure DataNode without " +
>   "configuring either privileged resources or SASL RPC data transfer " +
>   "protection and SSL for HTTP.  Using privileged resources in " +
>   "combination with SASL RPC data transfer protection is not supported.");
> {code}
> The DN should print more useful diagnostics as to what exactly what went 
> wrong.
> Also when starting secure DN with resources then the startup scripts should 
> launch the SecureDataNodeStarter class. If no SASL is configured and 
> SecureDataNodeStarter is not used, then we could mention that too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

2018-02-08 Thread Joseph Fourny (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Fourny updated HADOOP-15217:
---
Description: 
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 

The real fix is quite simple. All you need to do is replace this line in 
_org.apache.hadoop.fs.FsUrlConnection.connect()_:
        is = fs.open(new Path(url.getPath()));

with this line:

     is = fs.open(new Path(url.*toUri()*.getPath()));

URI.getPath() will correctly decode the path, which is what is expected by 
_org.apache.hadoop.fs.Path(String)_ constructor.

 

  was:
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 

The real fix is quite simple. All you need to do is replace this line:
        is = fs.open(new Path(url.getPath()));

with this line:

     is = fs.open(new Path(url.*toUri()*.getPath()));

URI.getPath() will correctly decode the path, which is what is expected by 
_org.apache.hadoop.fs.Path(String)_ constructor.

 


> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> --
>
> Key: HADOOP-15217
> URL: https://issues.apache.org/jira/browse/HADOOP-15217
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.5
>Reporter: Joseph Fourny
>Priority: Major
> Attachments: TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
> Spark is initialized), it breaks URLs with spaces (even though they are 
> properly URI-encoded). I traced the problem down to 
> _FSUrlConnection.connect()_ method. It naively gets the path from the URL, 
> which contains encoded spaces, and pases it to 
> _org.apache.hadoop.fs.Path(String)_ constructor. This is not correct, because 
> the docs clearly say that the string must NOT be encoded. Doing so causes 
> double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics and issue I ran into when trying to use Commons 
> Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
> class to load configuration files, but Spark installs 
> _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
> AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 
> The real fix is quite simple. All you need to do is replace this line in 
> _org.apache.hadoop.fs.FsUrlConnection.connect()_:
>         is = fs.open(new Path(url.getPath()));
> with this line:
>      is = fs.open(new Path(url.*toUri()*.getPath()));
> URI.getPath() will correctly decode the path, which is what is expected by 
> _org.apache.hadoop.fs.Path(String)_ constructor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

2018-02-08 Thread Joseph Fourny (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Fourny updated HADOOP-15217:
---
Description: 
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 

The real fix is quite simple. All you need to do is replace this line:
        is = fs.open(new Path(url.getPath()));

with this line:

     is = fs.open(new Path(url.*toUri()*.getPath()));

URI.getPath() will correctly decode the path, which is what is expected by 
_org.apache.hadoop.fs.Path(String)_ constructor.

 

  was:
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 

The real fix is quite simple. All you need to do is replace this line:
       is = fs.open(new Path(url.getPath()));

with this line:

     is = fs.open(new Path(url.*toUri().*getPath()));

URI.getPath() will correctly decode the path, which is what is expected by 
_org.apache.hadoop.fs.Path(String)_ constructor.

 


> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> --
>
> Key: HADOOP-15217
> URL: https://issues.apache.org/jira/browse/HADOOP-15217
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.5
>Reporter: Joseph Fourny
>Priority: Major
> Attachments: TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
> Spark is initialized), it breaks URLs with spaces (even though they are 
> properly URI-encoded). I traced the problem down to 
> _FSUrlConnection.connect()_ method. It naively gets the path from the URL, 
> which contains encoded spaces, and pases it to 
> _org.apache.hadoop.fs.Path(String)_ constructor. This is not correct, because 
> the docs clearly say that the string must NOT be encoded. Doing so causes 
> double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics and issue I ran into when trying to use Commons 
> Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
> class to load configuration files, but Spark installs 
> _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
> AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 
> The real fix is quite simple. All you need to do is replace this line:
>         is = fs.open(new Path(url.getPath()));
> with this line:
>      is = fs.open(new Path(url.*toUri()*.getPath()));
> URI.getPath() will correctly decode the path, which is what is expected by 
> _org.apache.hadoop.fs.Path(String)_ constructor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

2018-02-08 Thread Joseph Fourny (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Fourny updated HADOOP-15217:
---
Description: 
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 

The real fix is quite simple. All you need to do is replace this line:
       is = fs.open(new Path(url.getPath()));

with this line:

     is = fs.open(new Path(url.*toUri().*getPath()));

URI.getPath() will correctly decode the path, which is what is expected by 
_org.apache.hadoop.fs.Path(String)_ constructor.

 

  was:
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 


> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> --
>
> Key: HADOOP-15217
> URL: https://issues.apache.org/jira/browse/HADOOP-15217
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.5
>Reporter: Joseph Fourny
>Priority: Major
> Attachments: TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
> Spark is initialized), it breaks URLs with spaces (even though they are 
> properly URI-encoded). I traced the problem down to 
> _FSUrlConnection.connect()_ method. It naively gets the path from the URL, 
> which contains encoded spaces, and pases it to 
> _org.apache.hadoop.fs.Path(String)_ constructor. This is not correct, because 
> the docs clearly say that the string must NOT be encoded. Doing so causes 
> double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics and issue I ran into when trying to use Commons 
> Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
> class to load configuration files, but Spark installs 
> _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
> AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 
> The real fix is quite simple. All you need to do is replace this line:
>        is = fs.open(new Path(url.getPath()));
> with this line:
>      is = fs.open(new Path(url.*toUri().*getPath()));
> URI.getPath() will correctly decode the path, which is what is expected by 
> _org.apache.hadoop.fs.Path(String)_ constructor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

2018-02-08 Thread Joseph Fourny (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Fourny updated HADOOP-15217:
---
Description: 
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 

  was:
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "path" the bytecode at load time to work-around the issue. 


> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> --
>
> Key: HADOOP-15217
> URL: https://issues.apache.org/jira/browse/HADOOP-15217
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.5
>Reporter: Joseph Fourny
>Priority: Major
> Attachments: TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
> Spark is initialized), it breaks URLs with spaces (even though they are 
> properly URI-encoded). I traced the problem down to 
> _FSUrlConnection.connect()_ method. It naively gets the path from the URL, 
> which contains encoded spaces, and pases it to 
> _org.apache.hadoop.fs.Path(String)_ constructor. This is not correct, because 
> the docs clearly say that the string must NOT be encoded. Doing so causes 
> double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics and issue I ran into when trying to use Commons 
> Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
> class to load configuration files, but Spark installs 
> _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
> AspectJ aspect to "patch" the bytecode at load time to work-around the issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

2018-02-08 Thread Joseph Fourny (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Fourny updated HADOOP-15217:
---
Description: 
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

See attached JUnit test. 

This test case mimics and issue I ran into when trying to use Commons 
Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
class to load configuration files, but Spark installs 
_FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
AspectJ aspect to "path" the bytecode at load time to work-around the issue. 

  was:
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

 See attached JUnit test. 


> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> --
>
> Key: HADOOP-15217
> URL: https://issues.apache.org/jira/browse/HADOOP-15217
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.5
>Reporter: Joseph Fourny
>Priority: Major
> Attachments: TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
> Spark is initialized), it breaks URLs with spaces (even though they are 
> properly URI-encoded). I traced the problem down to 
> _FSUrlConnection.connect()_ method. It naively gets the path from the URL, 
> which contains encoded spaces, and pases it to 
> _org.apache.hadoop.fs.Path(String)_ constructor. This is not correct, because 
> the docs clearly say that the string must NOT be encoded. Doing so causes 
> double encoding within the Path class (ie: %20 becomes %2520). 
> See attached JUnit test. 
> This test case mimics and issue I ran into when trying to use Commons 
> Configuration 1.9 AFTER initializing Spark. Commons Configuration uses URL 
> class to load configuration files, but Spark installs 
> _FsUrlStreamHandlerFactory_, which hits this issue. For now, we are using an 
> AspectJ aspect to "path" the bytecode at load time to work-around the issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

2018-02-08 Thread Joseph Fourny (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Fourny updated HADOOP-15217:
---
Description: 
When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

 See attached JUnit test. 

  was:
When _FsUrlStreamHandlerFactory_ is registered with java.net.URL (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

 


> org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces
> --
>
> Key: HADOOP-15217
> URL: https://issues.apache.org/jira/browse/HADOOP-15217
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.5
>Reporter: Joseph Fourny
>Priority: Major
> Attachments: TestCase.java
>
>
> When _FsUrlStreamHandlerFactory_ is registered with _java.net.URL_ (ex: when 
> Spark is initialized), it breaks URLs with spaces (even though they are 
> properly URI-encoded). I traced the problem down to 
> _FSUrlConnection.connect()_ method. It naively gets the path from the URL, 
> which contains encoded spaces, and pases it to 
> _org.apache.hadoop.fs.Path(String)_ constructor. This is not correct, because 
> the docs clearly say that the string must NOT be encoded. Doing so causes 
> double encoding within the Path class (ie: %20 becomes %2520). 
>  See attached JUnit test. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15217) org.apache.hadoop.fs.FsUrlConnection does not handle paths with spaces

2018-02-08 Thread Joseph Fourny (JIRA)
Joseph Fourny created HADOOP-15217:
--

 Summary: org.apache.hadoop.fs.FsUrlConnection does not handle 
paths with spaces
 Key: HADOOP-15217
 URL: https://issues.apache.org/jira/browse/HADOOP-15217
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.6.5
Reporter: Joseph Fourny
 Attachments: TestCase.java

When _FsUrlStreamHandlerFactory_ is registered with java.net.URL (ex: when 
Spark is initialized), it breaks URLs with spaces (even though they are 
properly URI-encoded). I traced the problem down to _FSUrlConnection.connect()_ 
method. It naively gets the path from the URL, which contains encoded spaces, 
and pases it to _org.apache.hadoop.fs.Path(String)_ constructor. This is not 
correct, because the docs clearly say that the string must NOT be encoded. 
Doing so causes double encoding within the Path class (ie: %20 becomes %2520). 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15214) Make Hadoop compatible with Guava 21.0

2018-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357580#comment-16357580
 ] 

ASF GitHub Bot commented on HADOOP-15214:
-

Github user medb closed the pull request at:

https://github.com/apache/hadoop/pull/318


> Make Hadoop compatible with Guava 21.0
> --
>
> Key: HADOOP-15214
> URL: https://issues.apache.org/jira/browse/HADOOP-15214
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15214.001.patch
>
>
> There are only 3 changes that need to be done to make Hadoop compile with 
> Guava 21.0 dependency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small

2018-02-08 Thread Aki Tanaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357510#comment-16357510
 ] 

Aki Tanaka commented on HADOOP-15206:
-

Added the updated patch. Please let me know if I still misunderstand something.

I made following changes to the original code
 * Not advertising a new byte position when reading from BZip2 Header (position 
0) 

 * Move reading position to right after the BZip2 header (position 5) when the 
position is between 1 and 4

This implementation moves the start position forcibly without checking whether 
the BZ2 file has a header or not. Because I could not determine whether the 
header exists when the start position is 4. However, I think it's safe to move 
the position even if the file does not have a bz2 header because we cannot put 
2 bz2 blocks in the first 4 bytes of the file.

> BZip2 drops and duplicates records when input split size is small
> -
>
> Key: HADOOP-15206
> URL: https://issues.apache.org/jira/browse/HADOOP-15206
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Aki Tanaka
>Priority: Major
> Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, 
> HADOOP-15206.002.patch, HADOOP-15206.003.patch
>
>
> BZip2 can drop and duplicate record when input split file is small. I 
> confirmed that this issue happens when the input split size is between 1byte 
> and 4bytes.
> I am seeing the following 2 problem behaviors.
>  
> 1. Drop record:
> BZip2 skips the first record in the input file when the input split size is 
> small
>  
> Set the split size to 3 and tested to load 100 records (0, 1, 2..99)
> {code:java}
> 2018-02-01 10:52:33,502 INFO  [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(317)) - 
> splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3
>  count=99{code}
> > The input format read only 99 records but not 100 records
>  
> 2. Duplicate Record:
> 2 input splits has same BZip2 records when the input split size is small
>  
> Set the split size to 1 and tested to load 100 records (0, 1, 2..99)
>  
> {code:java}
> 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file 
> /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1
>  count=99
> 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 
> at position 8
> {code}
>  
> I experienced this error when I execute Spark (SparkSQL) job under the 
> following conditions:
> * The file size of the input files are small (around 1KB)
> * Hadoop cluster has many slave nodes (able to launch many executor tasks)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15206) BZip2 drops and duplicates records when input split size is small

2018-02-08 Thread Aki Tanaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aki Tanaka updated HADOOP-15206:

Attachment: HADOOP-15206.003.patch

> BZip2 drops and duplicates records when input split size is small
> -
>
> Key: HADOOP-15206
> URL: https://issues.apache.org/jira/browse/HADOOP-15206
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.0
>Reporter: Aki Tanaka
>Priority: Major
> Attachments: HADOOP-15206-test.patch, HADOOP-15206.001.patch, 
> HADOOP-15206.002.patch, HADOOP-15206.003.patch
>
>
> BZip2 can drop and duplicate record when input split file is small. I 
> confirmed that this issue happens when the input split size is between 1byte 
> and 4bytes.
> I am seeing the following 2 problem behaviors.
>  
> 1. Drop record:
> BZip2 skips the first record in the input file when the input split size is 
> small
>  
> Set the split size to 3 and tested to load 100 records (0, 1, 2..99)
> {code:java}
> 2018-02-01 10:52:33,502 INFO  [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(317)) - 
> splits[1]=file:/work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+3
>  count=99{code}
> > The input format read only 99 records but not 100 records
>  
> 2. Duplicate Record:
> 2 input splits has same BZip2 records when the input split size is small
>  
> Set the split size to 1 and tested to load 100 records (0, 1, 2..99)
>  
> {code:java}
> 2018-02-01 11:18:49,309 INFO [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(318)) - splits[3]=file 
> /work/count-mismatch2/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/TestTextInputFormat/test.bz2:3+1
>  count=99
> 2018-02-01 11:18:49,310 WARN [Thread-17] mapred.TestTextInputFormat 
> (TestTextInputFormat.java:verifyPartitions(308)) - conflict with 1 in split 4 
> at position 8
> {code}
>  
> I experienced this error when I execute Spark (SparkSQL) job under the 
> following conditions:
> * The file size of the input files are small (around 1KB)
> * Hadoop cluster has many slave nodes (able to launch many executor tasks)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15216) S3AInputStream to handle reconnect on read() failure better

2018-02-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357458#comment-16357458
 ] 

Steve Loughran commented on HADOOP-15216:
-

+on s3guard, GET could be 403 -> fail

> S3AInputStream to handle reconnect on read() failure better
> ---
>
> Key: HADOOP-15216
> URL: https://issues.apache.org/jira/browse/HADOOP-15216
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> {{S3AInputStream}} handles any IOE through a close() of stream and single 
> re-invocation of the read, with 
> * no backoff
> * no abort of the HTTPS connection, which is just returned to the pool, If 
> httpclient hasn't noticed the failure, it may get returned to the caller on 
> the next read
> Proposed
> * switch to invoker
> * retry policy explicitly for stream (EOF => throw, timeout => close, sleep, 
> retry, etc)
> We could think about extending the fault injection to inject stream read 
> failures intermittently too, though it would need something in S3AInputStream 
> to (optionally) wrap the http input streams with the failing stream. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15214) Make Hadoop compatible with Guava 21.0

2018-02-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357443#comment-16357443
 ] 

Hudson commented on HADOOP-15214:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13634 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13634/])
HADOOP-15214. Make Hadoop compatible with Guava 21.0. Contributed by (stevel: 
rev 996796f1048369e0f307f935ba01af64cc751a85)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ReencryptionHandler.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/RunJar.java
* (edit) 
hadoop-common-project/hadoop-kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/KMS.java


> Make Hadoop compatible with Guava 21.0
> --
>
> Key: HADOOP-15214
> URL: https://issues.apache.org/jira/browse/HADOOP-15214
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15214.001.patch
>
>
> There are only 3 changes that need to be done to make Hadoop compile with 
> Guava 21.0 dependency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15214) Make Hadoop compatible with Guava 21.0

2018-02-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357414#comment-16357414
 ] 

Steve Loughran commented on HADOOP-15214:
-

+1

Tests appear unrelated, HDFS playing up

committed to trunk; patch doesn't apply to branch-3.0; I think the runjar 
changes aren't needed. 

Igor, if you want this to go into branch-3.0, re-open this and add a patch 
named HADOOP-15214-branch-3.0-002.patch to get it tested against that branch. 
thanks

> Make Hadoop compatible with Guava 21.0
> --
>
> Key: HADOOP-15214
> URL: https://issues.apache.org/jira/browse/HADOOP-15214
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15214.001.patch
>
>
> There are only 3 changes that need to be done to make Hadoop compile with 
> Guava 21.0 dependency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15214) Make Hadoop compatible with Guava 21.0

2018-02-08 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15214:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

> Make Hadoop compatible with Guava 21.0
> --
>
> Key: HADOOP-15214
> URL: https://issues.apache.org/jira/browse/HADOOP-15214
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15214.001.patch
>
>
> There are only 3 changes that need to be done to make Hadoop compile with 
> Guava 21.0 dependency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15214) Make Hadoop compatible with Guava 21.0

2018-02-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357403#comment-16357403
 ] 

Steve Loughran commented on HADOOP-15214:
-

looks like HADOOP-14705 got the stopwatch in to KMS; easy to inadvertently do.

Is there a way for us to have findbugs block imports of specific classes? We 
should be able to prevent this.

> Make Hadoop compatible with Guava 21.0
> --
>
> Key: HADOOP-15214
> URL: https://issues.apache.org/jira/browse/HADOOP-15214
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Minor
> Attachments: HADOOP-15214.001.patch
>
>
> There are only 3 changes that need to be done to make Hadoop compile with 
> Guava 21.0 dependency



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15216) S3AInputStream to handle reconnect on read() failure better

2018-02-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357390#comment-16357390
 ] 

Steve Loughran commented on HADOOP-15216:
-

HADOOP-13761 covers the condition where S3Guard finds the file in its 
geFileStatus in the {{FileSystem.open()}} call, but when S3AInputStream 
initiates the GET a 404 comes back: FNFE should be handled with backoff too

* Maybe: special handling for that first attempt, as an FNFE on later ones 
probably means someone deleted the file
* The situation of HEAD -> 200, GET -> 400 could also arise if the GET went to 
a different shard from the HEAD. So the condition could also arise in 
non-S3guarded buckets, sometimes

> S3AInputStream to handle reconnect on read() failure better
> ---
>
> Key: HADOOP-15216
> URL: https://issues.apache.org/jira/browse/HADOOP-15216
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> {{S3AInputStream}} handles any IOE through a close() of stream and single 
> re-invocation of the read, with 
> * no backoff
> * no abort of the HTTPS connection, which is just returned to the pool, If 
> httpclient hasn't noticed the failure, it may get returned to the caller on 
> the next read
> Proposed
> * switch to invoker
> * retry policy explicitly for stream (EOF => throw, timeout => close, sleep, 
> retry, etc)
> We could think about extending the fault injection to inject stream read 
> failures intermittently too, though it would need something in S3AInputStream 
> to (optionally) wrap the http input streams with the failing stream. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13761) S3Guard: implement retries for DDB failures and throttling; translate exceptions

2018-02-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357386#comment-16357386
 ] 

Steve Loughran commented on HADOOP-13761:
-

I was looking at the input stream retry logic in the context of SPARK-23308, 
where someone else's S3 client was getting SocketException during their reads., 
suspected EC2 network throttling. 

created: HADOOP-15216

For s3guard, yes. an FNFE in the stream open is something we may want to retry 
on. Without s3guard the open() would fail fast, so it's only when s3guard = on 
and there's some delete inconsistency surfacing

Rename is scary

> S3Guard: implement retries for DDB failures and throttling; translate 
> exceptions
> 
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Blocker
> Attachments: HADOOP-13761.001.patch, HADOOP-13761.002.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15216) S3AInputStream to handle reconnect on read() failure better

2018-02-08 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15216:
---

 Summary: S3AInputStream to handle reconnect on read() failure 
better
 Key: HADOOP-15216
 URL: https://issues.apache.org/jira/browse/HADOOP-15216
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.0.0
Reporter: Steve Loughran


{{S3AInputStream}} handles any IOE through a close() of stream and single 
re-invocation of the read, with 
* no backoff
* no abort of the HTTPS connection, which is just returned to the pool, If 
httpclient hasn't noticed the failure, it may get returned to the caller on the 
next read

Proposed
* switch to invoker
* retry policy explicitly for stream (EOF => throw, timeout => close, sleep, 
retry, etc)

We could think about extending the fault injection to inject stream read 
failures intermittently too, though it would need something in S3AInputStream 
to (optionally) wrap the http input streams with the failing stream. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15124) Slow FileSystem.Statistics counters implementation

2018-02-08 Thread Igor Dvorzhak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355810#comment-16355810
 ] 

Igor Dvorzhak edited comment on HADOOP-15124 at 2/8/18 6:22 PM:


Hi Eddy,

I have updated PR and attached patch.


was (Author: medb):
Hi Eddy,

Update PR and attached patch.

> Slow FileSystem.Statistics counters implementation
> --
>
> Key: HADOOP-15124
> URL: https://issues.apache.org/jira/browse/HADOOP-15124
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common
>Affects Versions: 2.9.0, 2.8.3, 2.7.5, 3.0.0
>Reporter: Igor Dvorzhak
>Assignee: Igor Dvorzhak
>Priority: Major
>  Labels: common, filesystem, statistics
> Attachments: HADOOP-15124.001.patch
>
>
> While profiling 1TB TeraGen job on Hadoop 2.8.2 cluster (Google Dataproc, 2 
> workers, GCS connector) I saw that FileSystem.Statistics code paths Wall time 
> is 5.58% and CPU time is 26.5% of total execution time.
> After switching FileSystem.Statistics implementation to LongAdder, consumed 
> Wall time decreased to 0.006% and CPU time to 0.104% of total execution time.
> Total job runtime decreased from 66 mins to 61 mins.
> These results are not conclusive, because I didn't benchmark multiple times 
> to average results, but regardless of performance gains switching to 
> LongAdder simplifies code and reduces its complexity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14920) KMSClientProvider won't work with KMS delegation token retrieved from non-Java client.

2018-02-08 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357348#comment-16357348
 ] 

Xiaoyu Yao commented on HADOOP-14920:
-

cherry-pick to branch-2.8

> KMSClientProvider won't work with KMS delegation token retrieved from 
> non-Java client.
> --
>
> Key: HADOOP-14920
> URL: https://issues.apache.org/jira/browse/HADOOP-14920
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 2.9.0, 3.1.0, 2.8.4
>
> Attachments: HADOOP-14920.001.patch, HADOOP-14920.002.patch, 
> HADOOP-14920.003.patch
>
>
> HADOOP-13381 added support to use KMS delegation token to connect to KMS 
> server for key operations. However, the logic to check if the UGI container 
> KMS delegation token assumes that the token must contain a service attribute. 
> Otherwise, a KMS delegation token won't be recognized.
> For delegation token obtained via non-java client such curl (http), the 
> default DelegationTokenAuthenticationHandler only support *renewer* parameter 
> and assume the client itself will add the service attribute. This makes a 
> java client with KMSClientProvdier can't use for KMS delegation token 
> retrieved form non-java client because the token does not contain a service 
> attribute. 
> I did some investigation on this and found two solutions:
> 1. Similar use case exists for webhdfs, and webhdfs supports it with a 
> ["service" 
> parameter|https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Delegation_Token].
> We can do this similarly by allowing client to specify a service attribute in 
> the request URL and included in the token returned like webhdfs. Even though 
> this will change in DelegationTokenAuthenticationHandler and may affect many 
> other web component,  this seems to be a clean and low risk solution because 
> it will be an optional parameter. Also, other components get non-java client 
> interop support for free if they have the similar use case. 
> 2. The other way to solve this is to release the token check in 
> KMSClientProvider to check only the token kind instead of the service.  This 
> is an easy work around but seems less optimal to me. 
> cc: [~xiaochen] for additional input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14920) KMSClientProvider won't work with KMS delegation token retrieved from non-Java client.

2018-02-08 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HADOOP-14920:

Fix Version/s: 2.8.4

> KMSClientProvider won't work with KMS delegation token retrieved from 
> non-Java client.
> --
>
> Key: HADOOP-14920
> URL: https://issues.apache.org/jira/browse/HADOOP-14920
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 2.9.0, 3.1.0, 2.8.4
>
> Attachments: HADOOP-14920.001.patch, HADOOP-14920.002.patch, 
> HADOOP-14920.003.patch
>
>
> HADOOP-13381 added support to use KMS delegation token to connect to KMS 
> server for key operations. However, the logic to check if the UGI container 
> KMS delegation token assumes that the token must contain a service attribute. 
> Otherwise, a KMS delegation token won't be recognized.
> For delegation token obtained via non-java client such curl (http), the 
> default DelegationTokenAuthenticationHandler only support *renewer* parameter 
> and assume the client itself will add the service attribute. This makes a 
> java client with KMSClientProvdier can't use for KMS delegation token 
> retrieved form non-java client because the token does not contain a service 
> attribute. 
> I did some investigation on this and found two solutions:
> 1. Similar use case exists for webhdfs, and webhdfs supports it with a 
> ["service" 
> parameter|https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Get_Delegation_Token].
> We can do this similarly by allowing client to specify a service attribute in 
> the request URL and included in the token returned like webhdfs. Even though 
> this will change in DelegationTokenAuthenticationHandler and may affect many 
> other web component,  this seems to be a clean and low risk solution because 
> it will be an optional parameter. Also, other components get non-java client 
> interop support for free if they have the similar use case. 
> 2. The other way to solve this is to release the token check in 
> KMSClientProvider to check only the token kind instead of the service.  This 
> is an easy work around but seems less optimal to me. 
> cc: [~xiaochen] for additional input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15213) JniBasedUnixGroupsNetgroupMapping.java and ShellBasedUnixGroupsNetgroupMapping.java use netgroup.substring(1)

2018-02-08 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357153#comment-16357153
 ] 

Kihwal Lee commented on HADOOP-15213:
-

I repeat. {color:#d04437}Netgroups are only intended for service ACLs in these 
modules. {color} "@" and {{substring()}} are there for reasons. They are not 
bugs. If this is not the way you use netgroups, {color:#d04437}this isn't the 
group mapping module you are looking for{color}. 

> JniBasedUnixGroupsNetgroupMapping.java and 
> ShellBasedUnixGroupsNetgroupMapping.java use netgroup.substring(1) 
> --
>
> Key: HADOOP-15213
> URL: https://issues.apache.org/jira/browse/HADOOP-15213
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
> Environment: SUSE Linux Enterprise Server 11 (x86_64)
> VERSION = 11
> PATCHLEVEL = 3
>Reporter: Dhirendra Khanka
>Priority: Minor
>
>  
> Part of the code below shown from below 2 classes
>  org.apache.hadoop.security.JniBasedUnixGroupsNetgroupMapping.java
> {code:java}
>  protected synchronized List getUsersForNetgroup(String netgroup) {
>     String[] users = null;
>     try {
>   // JNI code does not expect '@' at the begining of the group name
>   users = getUsersForNetgroupJNI(netgroup.substring(1));
>     } catch (Exception e) {
>   if (LOG.isDebugEnabled()) {
>     LOG.debug("Error getting users for netgroup " + netgroup, e);
>   } else {
>     LOG.info("Error getting users for netgroup " + netgroup + 
>     ": " + e.getMessage());
>   }
>     }
>     if (users != null && users.length != 0) {
>   return Arrays.asList(users);
>     }
>     return new LinkedList();
>   }{code}
> org.apache.hadoop.security.ShellBasedUnixGroupsNetgroupMapping.java
>  
> {code:java}
> protected String execShellGetUserForNetgroup(final String netgroup)
>  throws IOException {
>  String result = "";
>  try
> { // shell command does not expect '@' at the begining of the group name 
> result = Shell.execCommand( 
> Shell.getUsersForNetgroupCommand(netgroup.substring(1))); }
> catch (ExitCodeException e)
> { // if we didn't get the group - just return empty list; LOG.warn("error 
> getting users for netgroup " + netgroup, e); }
> return result;
>  }
> {code}
>  The comments from the code above expect the input to contain '@' , however 
> when executing the shell directly the output has the below form which does 
> not contain any ampersand symbol. 
> {code:java}
> :~> getent netgroup mynetgroup1
> mynetgroup1   ( , a3xsds, ) ( , beekvkl, ) ( , redcuan, ) ( , 
> uedfmst, ){code}
>  
> I have created a test code and removed the substring function and then ran it 
> on the cluster using hadoop jar. The code returned netgroups correctly after 
> the modification. I have limited knowledge on netgroup. The issue was 
> discovered when
> hadoop.security.group.mapping = 
> *org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback* was added 
> to core-site.xml and it failed to apply netgroup access.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14060) HTTP servlet /logs should require authentication and authorization

2018-02-08 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HADOOP-14060:

Target Version/s: 3.1.0, 3.0.1  (was: 3.1.0)

> HTTP servlet /logs should require authentication and authorization
> --
>
> Key: HADOOP-14060
> URL: https://issues.apache.org/jira/browse/HADOOP-14060
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 3.0.0-alpha4
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Blocker
> Attachments: HADOOP-14060-tmp.001.patch
>
>
> HADOOP-14047 makes KMS call {{HttpServer2#setACL}}. Access control works fine 
> for /conf, /jmx, /logLevel, and /stacks, but not for /logs.
> The code in {{AdminAuthorizedServlet#doGet}} for /logs and 
> {{ConfServlet#doGet}} for /conf are quite similar. This makes me believe that 
> /logs should subject to the same access control as intended by the original 
> developer.
> IMHO this could either be my misconfiguration or there is a bug somewhere in 
> {{HttpServer2}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14060) HTTP servlet /logs should require authentication and authorization

2018-02-08 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HADOOP-14060:

Priority: Blocker  (was: Major)

> HTTP servlet /logs should require authentication and authorization
> --
>
> Key: HADOOP-14060
> URL: https://issues.apache.org/jira/browse/HADOOP-14060
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 3.0.0-alpha4
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Blocker
> Attachments: HADOOP-14060-tmp.001.patch
>
>
> HADOOP-14047 makes KMS call {{HttpServer2#setACL}}. Access control works fine 
> for /conf, /jmx, /logLevel, and /stacks, but not for /logs.
> The code in {{AdminAuthorizedServlet#doGet}} for /logs and 
> {{ConfServlet#doGet}} for /conf are quite similar. This makes me believe that 
> /logs should subject to the same access control as intended by the original 
> developer.
> IMHO this could either be my misconfiguration or there is a bug somewhere in 
> {{HttpServer2}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15213) JniBasedUnixGroupsNetgroupMapping.java and ShellBasedUnixGroupsNetgroupMapping.java use netgroup.substring(1)

2018-02-08 Thread Dhirendra Khanka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356697#comment-16356697
 ] 

Dhirendra Khanka commented on HADOOP-15213:
---

Ok so forget Jni implementation, what about ShellBasedUnixGroupsNetgroupMapping 
 I tested below code on the cluster for ShellBasedUnixGroupsNetgroupMapping
{code:java}
package com.teradata;

import org.apache.hadoop.fs.*;
import org.apache.hadoop.util.GenericOptionsParser;

import java.io.IOException;
import java.util.LinkedList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.security.*;
import org.apache.hadoop.util.Shell;
import org.apache.hadoop.util.Shell.ExitCodeException;

public class usernetgroups {
public static void main(String[] args) throws Exception {

FileSystem fs = FileSystem.get(new Configuration());
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, 
args).getRemainingArgs();
if (otherArgs.length != 0) {
try {
//  System.out.print("ShellBasedUnixGroupsMapping 
for user : "+ otherArgs[0]+"--> " );
ShellBasedUnixGroupsMapping map = new 
ShellBasedUnixGroupsMapping();
//  System.out.println(map.getGroups(otherArgs[0]));

//  
System.out.print("ShellBasedUnixGroupsNetgroupMapping for user : "+ 
otherArgs[0]+"--> " );
ShellBasedUnixGroupsNetgroupMapping map1 = new 
ShellBasedUnixGroupsNetgroupMapping();
//  
System.out.println(map1.getGroups(otherArgs[0]).toString());


String netgroups  = 
getUsersForNetgroup(otherArgs[1]).toString();
System.out.println("netgroup users--> " + 
netgroups);

} catch (Exception e) {
// TODO: handle exception
System.out.println(e.getMessage());
}
}
}
  protected static List getUsersForNetgroup(String netgroup) 
throws IOException {

List users = new LinkedList();

// returns a string similar to this:
// group   ( , user, ) ( domain, user1, 
host.com )
String usersRaw = 
execShellGetUserForNetgroup(netgroup);
// get rid of spaces, makes splitting much easier
//System.out.println("1 " +usersRaw);
usersRaw = usersRaw.replaceAll(" +", "");
// remove netgroup name at the beginning of the 
string
usersRaw = usersRaw.replaceFirst(
  netgroup.replaceFirst("@", "") + "[()]+",
  "");
  //  System.out.println("2 " +usersRaw);
// split string into user infos
String[] userInfos = usersRaw.split("[()]+");
for(String userInfo : userInfos) {
  // userInfo: xxx,user,yyy (xxx, yyy can be empty 
strings)
  // get rid of everything before first and after 
last comma
  String user = userInfo.replaceFirst("[^,]*,", "");
  user = user.replaceFirst(",.*$", "");
  // voila! got username!
  users.add(user);
  //System.out.println("user " + user);
}

return users;
  }
  protected static String execShellGetUserForNetgroup(final String 
netgroup)
  throws IOException {
String result = "";
try {
System.out.println(netgroup);
System.out.println(netgroup.substring(1));
  // shell command does not expect '@' at the begining of 
the group name

   //  result = 
Shell.execCommand(Shell.getUsersForNetgroupCommand(netgroup.substring(1)));
  //modified
 result = 
Shell.execCommand(Shell.getUsersForNetgroupCommand(netgroup));
   //   System.out.println("modified_result -->"+ result);
} catch (ExitCodeException e) {
  // if we didn't get the group 

[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-02-08 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356633#comment-16356633
 ] 

SammiChen commented on HADOOP-14999:


Add two more,
 # TestAliyunOSSBlockOutputStream. Need tests to cover big file upload, at 
least bigger than {color:#660e7a}MULTIPART_UPLOAD_SIZE{color} size.
 # Any performance comparison data? using the original code and the patch code. 

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and asynchronously:
>  - improve the performance of uploading file to OSS server. Firstly, this 
> mechanism splits result to multiple small blocks and upload them in parallel. 
> Then, getting result and uploading blocks are asynchronous.
>  - avoid buffering too large result into local disk. To cite an extreme 
> example, there is a task which will output 100GB or even larger, we may need 
> to output this 100GB to local disk and then upload it. Sometimes, it is 
> inefficient and limited to disk space.
> This patch reuse {{SemaphoredDelegatingExecutor}} as executor service and 
> depends on HADOOP-15039.
> Attached {{asynchronous_file_uploading.pdf}} illustrated the difference 
> between previous {{AliyunOSSOutputStream}} and 
> {{AliyunOSSBlockOutputStream}}, i.e. this asynchronous multi-part based 
> uploading mechanism.
> 1. {{AliyunOSSOutputStream}}: we need to output the whole result to local 
> disk before we can upload it to OSS. This will poses two problems:
>  - if the output file is too large, it will run out of the local disk.
>  - if the output file is too large, task will wait long time to upload result 
> to OSS before finish, wasting much compute resource.
> 2. {{AliyunOSSBlockOutputStream}}: we cut the task output into small blocks, 
> i.e. some small local file, and each block will be packaged into a uploading 
> task. These tasks will be submitted into {{SemaphoredDelegatingExecutor}}. 
> {{SemaphoredDelegatingExecutor}} will upload this blocks in parallel, this 
> will improve performance greatly.
> 3. Each task will retry 3 times to upload block to Aliyun OSS. If one of 
> those tasks failed, the whole file uploading will failed, and we will abort 
> current uploading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-02-08 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356629#comment-16356629
 ] 

SammiChen edited comment on HADOOP-14999 at 2/8/18 8:15 AM:


Hi [~uncleGen],  thanks for refine the patch. Here are a few comments.

1.  AliyunOSSFileSystemStore.

{color:#660e7a}uploadPartSize {color}= 
conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#660e7a}multipartThreshold {color}= 
conf.getLong({color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY{color},
 {color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT{color});
 partSize = conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#80}if {color}(partSize < 
{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color}) {
 partSize = {color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color};
 }

 What's the difference usage of "uploadPartSize" and "partSize" with the same 
initial value? It seems "partSize" is not used in other places.

Also please refine the multi upload related constant properties, put related 
property in adjacent place. Seems "

{color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color}" should be called 
"{color:#660e7a}MULTIPART_UPLOAD_PART_SIZE_DEFAULT{color}".  And 
"{color:#660e7a}MULTIPART_UPLOAD_SIZE {color}= {color:#ff}104857600{color}" 
is the temp file size. Try to make the property name carries the accurate 
meaning.

{color:#808080}// Size of each of or multipart pieces in bytes{color}

{color:#80}public static final {color}String 
{color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY {color}=
 {color:#008000}"fs.oss.multipart.upload.size"{color};
 {color:#80}public static final long 
{color}{color:#660e7a}MULTIPART_UPLOAD_SIZE {color}= 
{color:#ff}104857600{color}; {color:#808080}// 100 MB{color}

{color:#80}public static final long 
{color}{color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT {color}= {color:#ff}10 
{color}* {color:#ff}1024 {color}* {color:#ff}1024{color};
 {color:#80}public static final int 
{color}{color:#660e7a}MULTIPART_UPLOAD_PART_NUM_LIMIT {color}= 
{color:#ff}1{color};

{color:#808080}// Minimum size in bytes before we start a multipart uploads or 
copy{color}

{color:#80}public static final {color}String 
{color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY {color}=
 {color:#008000}"fs.oss.multipart.upload.threshold"{color};
 {color:#80}public static final long 
{color}{color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT {color}=
 {color:#ff}20 {color}* {color:#ff}1024 {color}* 
{color:#ff}1024{color};

{color:#80}public static final long 
{color}{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE {color}= 
{color:#ff}100 {color}* {color:#ff}1024L{color};

 

2. AliyunOSSUtils#createTmpFileForWrite

    Change the order of following statements,

{color:#80}if {color}({color:#660e7a}directoryAllocator {color}== 
{color:#80}null{color}) {
 {color:#660e7a}directoryAllocator {color}= {color:#80}new 
{color}LocalDirAllocator({color:#660e7a}BUFFER_DIR_KEY{color});
 }
 {color:#80}if {color}(conf.get({color:#660e7a}BUFFER_DIR_KEY{color}) == 
{color:#80}null{color}) {
 conf.set({color:#660e7a}BUFFER_DIR_KEY{color}, 
conf.get({color:#008000}"hadoop.tmp.dir"{color}) + 
{color:#008000}"/oss"{color});
 }

Also is "{color:#660e7a}directoryAllocator{color}" final?

3. AliyunOSSUtils#intOption,  longOption

   Precondition doesn't support "%d".  Add test case to cover the logic. 
Suggest change the name to more meaning full names like getXOption. Pay 
attention to the code style, the indent.

4. TestAliyunOSSBlockOutputStream.  Add random length file tests here. Only 
1024 aligned file length is not enough.

5. AliyunOSSBlockOutputStream

   {color:#808080} Asynchronous multi-part based uploading mechanism to support 
huge file{color}{color:#808080}* which is larger than 5GB.{color}

Where is this 5GB threshold checked in the code?

The resources are well cleaned after close() is called. But they are not 
cleaned when exception happens during the write() process.

 


was (Author: sammi):
Hi [~uncleGen],  thanks for refine the patch. Here are a few comments.

1.  AliyunOSSFileSystemStore.

{color:#660e7a}uploadPartSize {color}= 
conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#660e7a}multipartThreshold {color}= 
conf.getLong({color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY{color},
 {color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT{color});
 partSize = conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#80}if {color}(partSize < 
{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color}) {
 partSize = 

[jira] [Commented] (HADOOP-14999) AliyunOSS: provide one asynchronous multi-part based uploading mechanism

2018-02-08 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356629#comment-16356629
 ] 

SammiChen commented on HADOOP-14999:


Hi [~uncleGen],  thanks for refine the patch. Here are a few comments.

1.  AliyunOSSFileSystemStore.

{color:#660e7a}uploadPartSize {color}= 
conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#660e7a}multipartThreshold {color}= 
conf.getLong({color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY{color},
 {color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT{color});
 partSize = conf.getLong({color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY{color},
 {color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color});
 {color:#80}if {color}(partSize < 
{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color}) {
 partSize = {color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE{color};
 }

 What's the difference usage of "uploadPartSize" and "partSize" with the same 
initial value? It seems "partSize" is not used in other places.

Also please refine the multi upload related constant properties, put related 
property in adjacent place. Seems "

{color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT{color}" should be called 
"{color:#660e7a}MULTIPART_UPLOAD_PART_SIZE_DEFAULT{color}".  And 
"{color:#660e7a}MULTIPART_UPLOAD_SIZE {color}= {color:#ff}104857600{color}" 
is the temp file size. Try to make the property name carries the accurate 
meaning.

{color:#808080}// Size of each of or multipart pieces in 
bytes{color}{color:#80}public static final {color}String 
{color:#660e7a}MULTIPART_UPLOAD_SIZE_KEY {color}=
 {color:#008000}"fs.oss.multipart.upload.size"{color};
 {color:#80}public static final long 
{color}{color:#660e7a}MULTIPART_UPLOAD_SIZE {color}= 
{color:#ff}104857600{color}; {color:#808080}// 100 
MB{color}{color:#80}public static final long 
{color}{color:#660e7a}MULTIPART_UPLOAD_SIZE_DEFAULT {color}= {color:#ff}10 
{color}* {color:#ff}1024 {color}* {color:#ff}1024{color};
 {color:#80}public static final int 
{color}{color:#660e7a}MULTIPART_UPLOAD_PART_NUM_LIMIT {color}= 
{color:#ff}1{color};

{color:#808080}// Minimum size in bytes before we start a multipart uploads or 
copy{color}{color:#80}public static final {color}String 
{color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_KEY {color}=
 {color:#008000}"fs.oss.multipart.upload.threshold"{color};
 {color:#80}public static final long 
{color}{color:#660e7a}MIN_MULTIPART_UPLOAD_THRESHOLD_DEFAULT {color}=
 {color:#ff}20 {color}* {color:#ff}1024 {color}* 
{color:#ff}1024{color};

{color:#80}public static final long 
{color}{color:#660e7a}MIN_MULTIPART_UPLOAD_PART_SIZE {color}= 
{color:#ff}100 {color}* {color:#ff}1024L{color};

 

2. AliyunOSSUtils#createTmpFileForWrite

    Change the order of following statements,

{color:#80}if {color}({color:#660e7a}directoryAllocator {color}== 
{color:#80}null{color}) {
 {color:#660e7a}directoryAllocator {color}= {color:#80}new 
{color}LocalDirAllocator({color:#660e7a}BUFFER_DIR_KEY{color});
 }
 {color:#80}if {color}(conf.get({color:#660e7a}BUFFER_DIR_KEY{color}) == 
{color:#80}null{color}) {
 conf.set({color:#660e7a}BUFFER_DIR_KEY{color}, 
conf.get({color:#008000}"hadoop.tmp.dir"{color}) + 
{color:#008000}"/oss"{color});
 }

Also is "{color:#660e7a}directoryAllocator{color}" final?

3. AliyunOSSUtils#intOption,  longOption

   Precondition doesn't support "%d".  Add test case to cover the logic. 
Suggest change the name to more meaning full names like getXOption. Pay 
attention to the code style, the indent.

4. TestAliyunOSSBlockOutputStream.  Add random length file tests here. Only 
1024 aligned file length is not enough.

5. AliyunOSSBlockOutputStream

   {color:#808080} Asynchronous multi-part based uploading mechanism to support 
huge file{color}{color:#808080}* which is larger than 5GB.{color}

Where is this 5GB threshold checked in the code?

The resources are well cleaned after close() is called. But they are not 
cleaned when exception happens during the write() process.

 

> AliyunOSS: provide one asynchronous multi-part based uploading mechanism
> 
>
> Key: HADOOP-14999
> URL: https://issues.apache.org/jira/browse/HADOOP-14999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
> Attachments: HADOOP-14999.001.patch, HADOOP-14999.002.patch, 
> HADOOP-14999.003.patch, HADOOP-14999.004.patch, HADOOP-14999.005.patch, 
> HADOOP-14999.006.patch, HADOOP-14999.007.patch, 
> asynchronous_file_uploading.pdf
>
>
> This mechanism is designed for uploading file in parallel and