date:20190103

[GitHub] hddong opened a new pull request #455: YARN-9176.[Submarine] Repair 404 error of links in documentation

2019-01-03 Thread GitBox

hddong opened a new pull request #455: YARN-9176.[Submarine] Repair 404 error 
of links in documentation
URL: https://github.com/apache/hadoop/pull/455
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-16023) Support system /etc/krb5.conf for auth_to_local rules

2019-01-03 Thread Bolke de Bruin (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-16023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733895#comment-16733895
 ] 

Bolke de Bruin commented on HADOOP-16023:
-

I noticed that both the JDK and Apache Kerby have issues in their parsers. I 
have raised an issue with the JDK. Working on a patch for Apache Kerby.

> Support system /etc/krb5.conf for auth_to_local rules
> -
>
> Key: HADOOP-16023
> URL: https://issues.apache.org/jira/browse/HADOOP-16023
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Major
>  Labels: security
>
> Hadoop has long maintained its own configuration for Kerberos' auth_to_local 
> rules. To the user this is counter intuitive and increases the complexity of 
> maintaining a secure system as the normal way of configuring these 
> auth_to_local rules is done in the site wide krb5.conf usually /etc/krb5.conf.
> With HADOOP-15996 there is now support for configuring how Hadoop should 
> evaluate auth_to_local rules. A "system" mechanism should be added. 
> It should be investigated how to properly parse krb5.conf. JDK seems to be 
> lacking as it is unable to obtain auth_to_local rules due to a bug in its 
> parser. Apache Kerby has an implementation that could be used. A native (C) 
> version is also a possibility. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15992) JSON License is included in the transitive dependency of aliyun-sdk-oss 3.0.0

2019-01-03 Thread Sunil Govindan (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733857#comment-16733857
 ] 

Sunil Govindan commented on HADOOP-15992:
-

Hi [~ajisakaa]

Latest patch is not applying on to trunk. Could u pls help. Thanks

> JSON License is included in the transitive dependency of aliyun-sdk-oss 3.0.0
> -
>
> Key: HADOOP-15992
> URL: https://issues.apache.org/jira/browse/HADOOP-15992
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Blocker
> Attachments: HADOOP-15992.01.patch
>
>
> This is the output of {{mvn dependency:tree}}
> {noformat}
> [INFO] +- com.aliyun.oss:aliyun-sdk-oss:jar:3.0.0:compile
> [INFO] |  +- org.jdom:jdom:jar:1.1:compile
> [INFO] |  +- com.sun.jersey:jersey-json:jar:1.19:compile
> [INFO] |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
> [INFO] |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
> [INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
> [INFO] |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
> [INFO] |  |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile
> [INFO] |  |  \- org.codehaus.jackson:jackson-xc:jar:1.9.13:compile
> [INFO] |  +- com.aliyun:aliyun-java-sdk-core:jar:3.4.0:compile
> [INFO] |  |  \- org.json:json:jar:20170516:compile
> {noformat}
> The license of org.json:json:jar:20170516:compile is JSON License, which 
> cannot be included.
> https://www.apache.org/legal/resolved.html#json



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Sameer Choudhary (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733840#comment-16733840
 ] 

Sameer Choudhary commented on HADOOP-15229:
---

[~ste...@apache.org]

 > you can't do a seek to an offset in the file, because the results are coming 
in dynamically from a POST; there's no GET for a content length. Which brings 
it down to: do you skip() or read-and-discard. The trouble with skip is I'm not 
sure about all its failure modes here. {{skip(count)}} can return a value < 
{{count}} and you are left wondering what to do? Keep retrying until total == 
count? Now I know of a way to check for end of stream/errors in the select, 
that may be possible.

What is the difference here between skip and read-and-discard? I believe by 
skip you meant not even to deserialize the payload in the Record Message. S3 
Select's streaming protocol encodes results of the query in Record Messages. 
The message contains the number of bytes in the payload of the message 
([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html).]
 

Maybe for skip we can just take a look at Record Message header section and 
skip the message entirely, if we have not reached our desired length yet. There 
are following termination conditions:
 * If we get a RecordLevelError then we can throw.
 * If we don't receive End Message for a defined time then we can timeout.
 * If we get End Message before desired bytes were skipped then we can return 
response that the result is smaller than desired.
 * Last is the happy case where we have more bytes than desired for skipping. 
We may need to look at each header and keep a counter of the bytes range the 
current Record Message contains. We could start deserializing the payload when 
byte range consists of non skipped bytes.

I believe the above sum up all of the edge cases that could occur with the 
response.

AWS SDK abstracts the handling of the messages thus it might get tricky to 
implement this in a simple manner.

> I worry that some test may only show up if the file is above a certain size, 
> as it will come in on a later page of responses, won't it?

 

That is true. S3 Select sends data in Record Messages where each message can 
contain more than one logical CSV output rows. There are two cases here:
 # If we get an error before the first message is sent then we will receive a 
HTTP status Code <> 200. Along with the error code and error message.
 # Else, if the error occurs after first message is sent then first message 
will contain certain number of output rows with HTTP Status Code == 200, 
followed by EventStreamException later.

For tests, how about:
 * Test 1: A CSV file with just a couple of rows where second row contents will 
result in a type cast error. This should return an HTTP status code 400 with 
appropriate error message and error code.
 * Test 2: A CSV file just large enough so that we get HTTP status code 200 
followed by an EventStremException with appropriate error message and error 
code.

 

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, 
> HADOOP-15229-015.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@had

[jira] [Comment Edited] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Sameer Choudhary (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733840#comment-16733840
 ] 

Sameer Choudhary edited comment on HADOOP-15229 at 1/4/19 6:02 AM:
---

[~ste...@apache.org]

Thanks for handling the edge case around incomplete response. 
{quote}you can't do a seek to an offset in the file, because the results are 
coming in dynamically from a POST; there's no GET for a content length. Which 
brings it down to: do you skip() or read-and-discard. The trouble with skip is 
I'm not sure about all its failure modes here. {{skip(count)}} can return a 
value < {{count}} and you are left wondering what to do? Keep retrying until 
total == count? Now I know of a way to check for end of stream/errors in the 
select, that may be possible.
{quote}
What is the difference here between skip and read-and-discard? I believe by 
skip you meant not even to deserialize the payload in the Record Message. S3 
Select's streaming protocol encodes results of the query in Record Messages. 
The message contains the number of bytes in the payload of the message 
([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html).]
 

Maybe for skip we can just take a look at Record Message header section and 
skip the message entirely, if we have not reached our desired length yet. There 
are following termination conditions:
 * If we get a RecordLevelError then we can throw.
 * If we don't receive End Message for a defined time then we can timeout.
 * If we get End Message before desired bytes were skipped then we can return 
response that the result is smaller than desired.
 * Last is the happy case where we have more bytes than desired for skipping. 
We may need to look at each header and keep a counter of the bytes range the 
current Record Message contains. We could start deserializing the payload when 
byte range consists of non skipped bytes.

I believe the above sum up all of the edge cases that could occur with the 
response.

AWS SDK abstracts the handling of the messages thus it might get tricky to 
implement this in a simple manner.
{quote}I worry that some test may only show up if the file is above a certain 
size, as it will come in on a later page of responses, won't it?
{quote}
 

That is true. S3 Select sends data in Record Messages where each message can 
contain more than one logical CSV output rows. There are two cases here:
 # If we get an error before the first message is sent then we will receive a 
HTTP status Code <> 200. Along with the error code and error message.
 # Else, if the error occurs after first message is sent then first message 
will contain certain number of output rows with HTTP Status Code == 200, 
followed by EventStreamException later.

For tests, how about:
 * Test 1: A CSV file with just a couple of rows where second row contents will 
result in a type cast error. This should return an HTTP status code 400 with 
appropriate error message and error code.
 * Test 2: A CSV file just large enough so that we get HTTP status code 200 
followed by an EventStremException with appropriate error message and error 
code.

 


was (Author: sameer.chouhdary):
[~ste...@apache.org]
{quote}you can't do a seek to an offset in the file, because the results are 
coming in dynamically from a POST; there's no GET for a content length. Which 
brings it down to: do you skip() or read-and-discard. The trouble with skip is 
I'm not sure about all its failure modes here. {{skip(count)}} can return a 
value < {{count}} and you are left wondering what to do? Keep retrying until 
total == count? Now I know of a way to check for end of stream/errors in the 
select, that may be possible.
{quote}
What is the difference here between skip and read-and-discard? I believe by 
skip you meant not even to deserialize the payload in the Record Message. S3 
Select's streaming protocol encodes results of the query in Record Messages. 
The message contains the number of bytes in the payload of the message 
([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html).]
 

Maybe for skip we can just take a look at Record Message header section and 
skip the message entirely, if we have not reached our desired length yet. There 
are following termination conditions:
 * If we get a RecordLevelError then we can throw.
 * If we don't receive End Message for a defined time then we can timeout.
 * If we get End Message before desired bytes were skipped then we can return 
response that the result is smaller than desired.
 * Last is the happy case where we have more bytes than desired for skipping. 
We may need to look at each header and keep a counter of the bytes range the 
current Record Message contains. We could start deserializing the payload when 
byte range consists of non skipped bytes.

I believe the above sum up all of the edge cases that co

[jira] [Comment Edited] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Sameer Choudhary (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733840#comment-16733840
 ] 

Sameer Choudhary edited comment on HADOOP-15229 at 1/4/19 5:59 AM:
---

[~ste...@apache.org]
{quote}you can't do a seek to an offset in the file, because the results are 
coming in dynamically from a POST; there's no GET for a content length. Which 
brings it down to: do you skip() or read-and-discard. The trouble with skip is 
I'm not sure about all its failure modes here. {{skip(count)}} can return a 
value < {{count}} and you are left wondering what to do? Keep retrying until 
total == count? Now I know of a way to check for end of stream/errors in the 
select, that may be possible.
{quote}
What is the difference here between skip and read-and-discard? I believe by 
skip you meant not even to deserialize the payload in the Record Message. S3 
Select's streaming protocol encodes results of the query in Record Messages. 
The message contains the number of bytes in the payload of the message 
([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html).]
 

Maybe for skip we can just take a look at Record Message header section and 
skip the message entirely, if we have not reached our desired length yet. There 
are following termination conditions:
 * If we get a RecordLevelError then we can throw.
 * If we don't receive End Message for a defined time then we can timeout.
 * If we get End Message before desired bytes were skipped then we can return 
response that the result is smaller than desired.
 * Last is the happy case where we have more bytes than desired for skipping. 
We may need to look at each header and keep a counter of the bytes range the 
current Record Message contains. We could start deserializing the payload when 
byte range consists of non skipped bytes.

I believe the above sum up all of the edge cases that could occur with the 
response.

AWS SDK abstracts the handling of the messages thus it might get tricky to 
implement this in a simple manner.
{quote}I worry that some test may only show up if the file is above a certain 
size, as it will come in on a later page of responses, won't it?
{quote}
 

That is true. S3 Select sends data in Record Messages where each message can 
contain more than one logical CSV output rows. There are two cases here:
 # If we get an error before the first message is sent then we will receive a 
HTTP status Code <> 200. Along with the error code and error message.
 # Else, if the error occurs after first message is sent then first message 
will contain certain number of output rows with HTTP Status Code == 200, 
followed by EventStreamException later.

For tests, how about:
 * Test 1: A CSV file with just a couple of rows where second row contents will 
result in a type cast error. This should return an HTTP status code 400 with 
appropriate error message and error code.
 * Test 2: A CSV file just large enough so that we get HTTP status code 200 
followed by an EventStremException with appropriate error message and error 
code.

 


was (Author: sameer.chouhdary):
[~ste...@apache.org]

 > you can't do a seek to an offset in the file, because the results are coming 
in dynamically from a POST; there's no GET for a content length. Which brings 
it down to: do you skip() or read-and-discard. The trouble with skip is I'm not 
sure about all its failure modes here. {{skip(count)}} can return a value < 
{{count}} and you are left wondering what to do? Keep retrying until total == 
count? Now I know of a way to check for end of stream/errors in the select, 
that may be possible.

What is the difference here between skip and read-and-discard? I believe by 
skip you meant not even to deserialize the payload in the Record Message. S3 
Select's streaming protocol encodes results of the query in Record Messages. 
The message contains the number of bytes in the payload of the message 
([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html).]
 

Maybe for skip we can just take a look at Record Message header section and 
skip the message entirely, if we have not reached our desired length yet. There 
are following termination conditions:
 * If we get a RecordLevelError then we can throw.
 * If we don't receive End Message for a defined time then we can timeout.
 * If we get End Message before desired bytes were skipped then we can return 
response that the result is smaller than desired.
 * Last is the happy case where we have more bytes than desired for skipping. 
We may need to look at each header and keep a counter of the bytes range the 
current Record Message contains. We could start deserializing the payload when 
byte range consists of non skipped bytes.

I believe the above sum up all of the edge cases that could occur with the 
response.

AWS SDK abstracts the handling of the messa

[jira] [Commented] (HADOOP-16019) ZKDelegationTokenSecretManager won't log exception message occured in function setJaasConfiguration

2019-01-03 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733823#comment-16733823
 ] 

Hadoop QA commented on HADOOP-16019:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 22m  
5s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  3s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 34s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
48s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.ssl.TestSSLFactory |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HADOOP-16019 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12953695/HADOOP-16019.2.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4251cef240ae 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / dfceffa |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/15722/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/15722/artifact/out/patch-unit-hadoop-common-project_hadoop-common

[jira] [Commented] (HADOOP-16019) ZKDelegationTokenSecretManager won't log exception message occured in function setJaasConfiguration

2019-01-03 Thread luhuachao (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733744#comment-16733744
 ] 

luhuachao commented on HADOOP-16019:


confused , is it better to use ex.getMessage() in the 
message?[~ste...@apache.org]

> ZKDelegationTokenSecretManager won't log exception message occured in 
> function setJaasConfiguration
> ---
>
> Key: HADOOP-16019
> URL: https://issues.apache.org/jira/browse/HADOOP-16019
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.1.0
>Reporter: luhuachao
>Priority: Minor
> Attachments: HADOOP-16019.1.patch, HADOOP-16019.2.patch
>
>
> * when the config  ZK_DTSM_ZK_KERBEROS_KEYTAB  or 
> ZK_DTSM_ZK_KERBEROS_PRINCIPAL are not set, the IllegalArgumentException 
> message cannot be logged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-16019) ZKDelegationTokenSecretManager won't log exception message occured in function setJaasConfiguration

2019-01-03 Thread luhuachao (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luhuachao updated HADOOP-16019:
---
Attachment: HADOOP-16019.2.patch

> ZKDelegationTokenSecretManager won't log exception message occured in 
> function setJaasConfiguration
> ---
>
> Key: HADOOP-16019
> URL: https://issues.apache.org/jira/browse/HADOOP-16019
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.1.0
>Reporter: luhuachao
>Priority: Minor
> Attachments: HADOOP-16019.1.patch, HADOOP-16019.2.patch
>
>
> * when the config  ZK_DTSM_ZK_KERBEROS_KEYTAB  or 
> ZK_DTSM_ZK_KERBEROS_PRINCIPAL are not set, the IllegalArgumentException 
> message cannot be logged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread wujinhu (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733719#comment-16733719
 ] 

wujinhu commented on HADOOP-15323:
--

Thanks [~cheersyang] :)

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.1.2, 3.2.1, 2.9.3
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733598#comment-16733598
 ] 

Hadoop QA commented on HADOOP-15229:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 24 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
5s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
56s{color} | {color:green} root generated 0 new + 1488 unchanged - 2 fixed = 
1488 total (was 1490) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 41s{color} | {color:orange} root: The patch generated 22 new + 1097 
unchanged - 3 fixed = 1119 total (was 1100) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 144 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
3s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  9m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 24s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
49s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 17s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
43s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  6m 32s{color} 
| {color:red} hadoop-streaming in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
36s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {colo

[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2019-01-03 Thread Steve Loughran (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733557#comment-16733557
 ] 

Steve Loughran commented on HADOOP-15819:
-

I think we should write something down about "effective use of FS instances", 
so yes, a new JIRA please

I wonder what we should say

# tests are fastest if they can recycle the existing FS instance from the same 
JVM
# if you do that, you MUST NOT close them.
# if you want a guarantee of 100% isolation, or an instance with unique config, 
create a new instance
# which you MUST close in teardown to avoid leakage of thread pools &c.

+ what you've proposed "do not add...", with the other point being "only do 
this if you are trying to get unmodified code to pick up your custom instance, 
such as when playing with mocks"



> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedF

[jira] [Assigned] (HADOOP-9359) Add Windows build and unit test to test-patch pre-commit testing

2019-01-03 Thread Matt Foley (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reassigned HADOOP-9359:
--

Assignee: (was: Matt Foley)

> Add Windows build and unit test to test-patch pre-commit testing
> 
>
> Key: HADOOP-9359
> URL: https://issues.apache.org/jira/browse/HADOOP-9359
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Reporter: Matt Foley
>Priority: Major
>
> The "test-patch" utility is triggered by "Patch Available" state in Jira, and 
> runs nine different sets of builds, tests, and static analysis tools.  
> Currently only the Linux environment is tested.  Need to add tests for Java 
> build under Windows, and unit test execution under Windows.
> At this time, the community has decided that "-1" on these new additional 
> tests shall not block commits to the code base.  However, contributors and 
> code reviewers are encouraged to utilize the information provided by these 
> tests to help keep Hadoop cross-platform compatible.  Modify 
> http://wiki.apache.org/hadoop/HowToContribute to document this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-9359) Add Windows build and unit test to test-patch pre-commit testing

2019-01-03 Thread Matt Foley (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-9359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley resolved HADOOP-9359.

Resolution: Won't Fix

Closed due to evident insufficient interest.

> Add Windows build and unit test to test-patch pre-commit testing
> 
>
> Key: HADOOP-9359
> URL: https://issues.apache.org/jira/browse/HADOOP-9359
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Reporter: Matt Foley
>Priority: Major
>
> The "test-patch" utility is triggered by "Patch Available" state in Jira, and 
> runs nine different sets of builds, tests, and static analysis tools.  
> Currently only the Linux environment is tested.  Need to add tests for Java 
> build under Windows, and unit test execution under Windows.
> At this time, the community has decided that "-1" on these new additional 
> tests shall not block commits to the code base.  However, contributors and 
> code reviewers are encouraged to utilize the information provided by these 
> tests to help keep Hadoop cross-platform compatible.  Modify 
> http://wiki.apache.org/hadoop/HowToContribute to document this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-7158) Reduce RPC packet size for homogeneous arrays, such as the array responses to listStatus() and getBlockLocations()

2019-01-03 Thread Matt Foley (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reassigned HADOOP-7158:
--

Assignee: (was: Matt Foley)

> Reduce RPC packet size for homogeneous arrays, such as the array responses to 
> listStatus() and getBlockLocations()
> --
>
> Key: HADOOP-7158
> URL: https://issues.apache.org/jira/browse/HADOOP-7158
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.22.0
>Reporter: Matt Foley
>Priority: Major
>
> While commenting on HADOOP-6949, which proposes a big improvement in the RPC 
> wire format for arrays of primitives, Konstantin Shvachko said:
> "Can/should we extend this to arrays of non-primitive types? This should 
> benefit return types for calls like listStatus() and getBlockLocations() on a 
> large directory."
> The improvement for primitive arrays is based on not type-labeling every 
> element in the array, so the array in question must be strictly homogenous; 
> it cannot have subtypes of the assignable type.  For instance, it could not 
> be applied to heartbeat responses of DatanodeCommand[], whose array elements 
> carry subtypes of DatanodeCommand, each of which must be type-labeled 
> independently.  However, as Konstantin points out, it could really help 
> lengthy response arrays for things like listStatus() and getBlockLocations().
> I will attach a prototype implementation to this Jira, for discussion.  
> However, since it can't be automatically applied to all arrays passing 
> through RPC, I'll just providing the wrapper type.  By using it, a caller is 
> asserting that the array is strictly homogeneous in the above sense.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-7809) Backport HADOOP-5839 to 0.20-security - fixes to ec2 scripts to allow remote job submission

2019-01-03 Thread Matt Foley (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-7809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley resolved HADOOP-7809.

Resolution: Won't Do

Aged out.

> Backport HADOOP-5839 to 0.20-security - fixes to ec2 scripts to allow remote 
> job submission
> ---
>
> Key: HADOOP-7809
> URL: https://issues.apache.org/jira/browse/HADOOP-7809
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: contrib/cloud
>Reporter: Joydeep Sen Sarma
>Priority: Major
> Attachments: ASF.LICENSE.NOT.GRANTED--hadoop-5839.2.patch
>
>
> The fix for HADOOP-5839 was committed to 0.21 more than a year ago.  This bug 
> is to backport the change (which is only 14 lines) to branch-0.20-security.
> ===
> Original description:
> i would very much like the option of submitting jobs from a workstation 
> outside ec2 to a hadoop cluster in ec2. This has been explored here:
> http://www.nabble.com/public-IP-for-datanode-on-EC2-tt19336240.html
> the net result of this is that we can make this work (along with using a 
> socks proxy) with a couple of changes in the ec2 scripts:
> a) use public 'hostname' for fs.default.name setting (instead of the private 
> hostname being used currently)
> b) mark hadoop.rpc.socket.factory.class.default as final variable in the 
> generated hadoop-site.xml (that applies to server side)
> #a has no downside as far as i can tell since public hostnames resolve to 
> internal/private IP addresses within ec2 (so traffic is optimally routed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-7809) Backport HADOOP-5839 to 0.20-security - fixes to ec2 scripts to allow remote job submission

2019-01-03 Thread Matt Foley (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-7809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley reassigned HADOOP-7809:
--

Assignee: (was: Matt Foley)

> Backport HADOOP-5839 to 0.20-security - fixes to ec2 scripts to allow remote 
> job submission
> ---
>
> Key: HADOOP-7809
> URL: https://issues.apache.org/jira/browse/HADOOP-7809
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: contrib/cloud
>Reporter: Joydeep Sen Sarma
>Priority: Major
> Attachments: ASF.LICENSE.NOT.GRANTED--hadoop-5839.2.patch
>
>
> The fix for HADOOP-5839 was committed to 0.21 more than a year ago.  This bug 
> is to backport the change (which is only 14 lines) to branch-0.20-security.
> ===
> Original description:
> i would very much like the option of submitting jobs from a workstation 
> outside ec2 to a hadoop cluster in ec2. This has been explored here:
> http://www.nabble.com/public-IP-for-datanode-on-EC2-tt19336240.html
> the net result of this is that we can make this work (along with using a 
> socks proxy) with a couple of changes in the ec2 scripts:
> a) use public 'hostname' for fs.default.name setting (instead of the private 
> hostname being used currently)
> b) mark hadoop.rpc.socket.factory.class.default as final variable in the 
> generated hadoop-site.xml (that applies to server side)
> #a has no downside as far as i can tell since public hostnames resolve to 
> internal/private IP addresses within ec2 (so traffic is optimally routed).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Steve Loughran (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733308#comment-16733308
 ] 

Steve Loughran commented on HADOOP-15229:
-

h2. Aaron's comments

thanks for these: will address all the little changes.

bq. Exceptions in the openFile vs Future.get. 

I'm leaving the choice open; some will fail fast (link resolution failure in 
FileContext, bad must() option.). Raising path handle support as unsupported
falls out of how the base implementation works...subclasses may want to fail 
fast.
It's a tradeoff, amplified by how IOExceptions can't be directly thrown in a 
future or followon l-exp. Delaying the IO in the base class gets client code 
used to the fact that even FNFEs don't surface until later (so helps line 
people up for slower object store IO). But you end up in games with remapping 
exceptions -games you were going to have to end up working with anyway.

I did consider having the future actually return the IOE as part of the 
signature, e.g. a result of type

{code}
CompletableFuture> 
{code}

or some special tuple of IO, 
{code}
final class Result {
  final Optional result;
  final IOException exception;
  private Result(T t, IOException e) { result = Optional.of(t); exception = e;}
  // get or throw
  public T get() throws IOException { if (exception!=null) throw exception; 
return result.get();}
  public Optional result() { return result;}
  public IOException exception() { return exception;}
  public static Result of(T t) { return new Result(t, null);}
  public static Result thrown(IOException e) { return new 
Result(Optional.empty(), e);}
  ...etc  
}
{code}

This would be guaranteed to either have a result, or an exception. But I 
couldn't see how easily this would chain together across futures; you'd still 
be catching and setting everywhere: I need more practise working with the 
completable future stuff before thinking about what would make a good model 
here. I don't want to do some grand project which is unusable. Plus there are 
some other libraries which may do it better, and I haven't done that research. 
After all, if I did try to do this "properly", I would want it standalone 
anyway.

bq. On the current seek() implementation of the Select input stream, what are 
the next enhancements you think we will need? Can you elaborate a bit on the 
need for single-byte reads as a seek implementation? Is it a limitation of the 
underlying AWS stream or SELECT rest API?

you can't do a seek to an offset in the file, because the results are coming in 
dynamically from a POST; there's no GET for a content length. Which brings it 
down to: do you skip() or read-and-discard. The trouble with skip is I'm not 
sure about all its failure modes here. {{skip(count)}} can return a value < 
{{count}} and you are left wondering what to do? Keep retrying until total == 
count? Now I know of a way to check for end of stream/errors in the select, 
that may be possible.

Seek() will occur in real life, as TextFileFormat will always split if there's 
no codec. If you bind to the no-op codec (which you need for gz & bz2 files 
anyway), then the split doesn't happen. Really though you'd need a way to turn 
text file splits off entirely even if there is no codec. It's wasteful to split 
an s3 select dataset.

Note: latest patch brings positioned read() up in sync with seek, so that you 
can do a forward read(position), just not a backwards one —and it updates the 
getPos() value
after. This will support any code which does positionedRead() with skips, as 
long as it never goes backwards or expects the seek position to be unchanged 
after.

BTW, I don't see any tests in {{AbstractContractSeekTest}} which do read fully 
backwards after a previous seek/read or a previous positioned read. Will add 
that
as the contract test and then copy into a select test where we do expect it to 
fail with a path IOE. Because right now, those contract tests won't find the 
failure condition I'm creating here. 


h3. failures in {{ITestS3ATemporaryCredentials}}

HADOOP-14556 does a lot there with setting up STS; don't worry about it here. 
Do check then though. I think the builder 


h2. Sameer Choudhary's comments

good point about malformed CSV, will need to create a test there.

Latest iteration will see a read() raise EOFException and then only map to -1 
if the end stream event was returned.
{code}
try {
  byteRead = once("read()", uri, () -> wrappedStream.read());
} catch (EOFException e) {
  // this could be one of: end of file, some IO failure
  if (completedSuccessfully.get()) {
// read was successful
return -1;
  } else {
// the stream closed prematurely
LOG.info("Reading of S3 Select data from {} failed before all results "
+ " were generated.", uri);
streamStatistics.readException();
throw new PathIOExcep

[jira] [Updated] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Steve Loughran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15229:

Status: Patch Available  (was: Open)

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, 
> HADOOP-15229-015.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Steve Loughran (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733319#comment-16733319
 ] 

Steve Loughran commented on HADOOP-15229:
-

HADOOP-15226 patch 015

* WriteOperationsHelper contains code to build/execute S3 request (not 
S3AFileSystem)'; SelectBinding takes that as a constructor and has pulled in 
most of the select setup code from S3AFS. 
* SelectInputStream handles progress callbacks from select request, read() 
distinguishes EOF on error from EOF completed
* SelectInputStream positioned readable supports forward positioned reads too, 
though it increments the getPos() counter. (documented). Tests for that
* Tests for selecting on empty files. Answer: you get an empty result. 
* various minor review comments

Tested: all select tests (inc scale) w/ S3 ireland; kicking off full suite now

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, 
> HADOOP-15229-015.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Steve Loughran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15229:

Status: Open  (was: Patch Available)

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, 
> HADOOP-15229-015.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2019-01-03 Thread Steve Loughran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15229:

Attachment: HADOOP-15229-015.patch

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch, 
> HADOOP-15229-009.patch, HADOOP-15229-010.patch, HADOOP-15229-011.patch, 
> HADOOP-15229-012.patch, HADOOP-15229-013.patch, HADOOP-15229-014.patch, 
> HADOOP-15229-015.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15997) KMS client uses wrong UGI after HADOOP-14445

2019-01-03 Thread Wei-Chiu Chuang (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733278#comment-16733278
 ] 

Wei-Chiu Chuang edited comment on HADOOP-15997 at 1/3/19 5:41 PM:
--

The test failure is due to HADOOP-16016, unrelated to this patch. It's 
triggered by a change in the latest JDK8. So we're all clear.


was (Author: jojochuang):
The test failure is due to HADOOP-16016, unrelated to this patch. It's 
triggered by a change in the latest JDK8.

> KMS client uses wrong UGI after HADOOP-14445
> 
>
> Key: HADOOP-15997
> URL: https://issues.apache.org/jira/browse/HADOOP-15997
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 3.2.0, 3.0.4, 3.1.2
> Environment: Hadoop 3.0.x (CDH6.x), Kerberized, HDFS at-rest 
> encryption, multiple KMS
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Attachments: HADOOP-15997.001.patch, HADOOP-15997.02.patch
>
>
> After HADOOP-14445, KMS client always authenticates itself using the 
> credentials from login user, rather than current user.
> {noformat}
> 2018-12-07 15:58:30,663 DEBUG [main] 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider: Using loginUser when 
> Kerberos is enabled but the actual user does not have either KMS Delegation 
> Token or Kerberos Credentials
> {noformat}
> The log message {{"Using loginUser when Kerberos is enabled but the actual 
> user does not have either KMS Delegation Token or Kerberos Credentials"}} is 
> printed because {{KMSClientProvider#containsKmsDt()}} is null when it 
> definitely has the kms delegation token.
> In fact, {{KMSClientProvider#containsKmsDt()}} should select delegation token 
> using {{clientTokenProvider.selectDelegationToken(creds)}} rather than 
> checking if its dtService is in the user credentials.
> This is done correctly in {{KMSClientProvider#createAuthenticatedURL}} though.
> We found this bug when it broke Cloudera's Backup and Disaster Recovery tool.
>  
> [~daryn] [~xiaochen] mind taking a look? HADOOP-14445 is a huge patch but it 
> is almost perfect except for this bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15997) KMS client uses wrong UGI after HADOOP-14445

2019-01-03 Thread Wei-Chiu Chuang (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733278#comment-16733278
 ] 

Wei-Chiu Chuang commented on HADOOP-15997:
--

The test failure is due to HADOOP-16016, unrelated to this patch. It's 
triggered by a change in the latest JDK8.

> KMS client uses wrong UGI after HADOOP-14445
> 
>
> Key: HADOOP-15997
> URL: https://issues.apache.org/jira/browse/HADOOP-15997
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: kms
>Affects Versions: 3.2.0, 3.0.4, 3.1.2
> Environment: Hadoop 3.0.x (CDH6.x), Kerberized, HDFS at-rest 
> encryption, multiple KMS
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
> Attachments: HADOOP-15997.001.patch, HADOOP-15997.02.patch
>
>
> After HADOOP-14445, KMS client always authenticates itself using the 
> credentials from login user, rather than current user.
> {noformat}
> 2018-12-07 15:58:30,663 DEBUG [main] 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider: Using loginUser when 
> Kerberos is enabled but the actual user does not have either KMS Delegation 
> Token or Kerberos Credentials
> {noformat}
> The log message {{"Using loginUser when Kerberos is enabled but the actual 
> user does not have either KMS Delegation Token or Kerberos Credentials"}} is 
> printed because {{KMSClientProvider#containsKmsDt()}} is null when it 
> definitely has the kms delegation token.
> In fact, {{KMSClientProvider#containsKmsDt()}} should select delegation token 
> using {{clientTokenProvider.selectDelegationToken(creds)}} rather than 
> checking if its dtService is in the user credentials.
> This is done correctly in {{KMSClientProvider#createAuthenticatedURL}} though.
> We found this bug when it broke Cloudera's Backup and Disaster Recovery tool.
>  
> [~daryn] [~xiaochen] mind taking a look? HADOOP-14445 is a huge patch but it 
> is almost perfect except for this bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15938) [JDK 11] hadoop-annotations build fails with 'Failed to check signatures'

2019-01-03 Thread Dinesh Chitlangia (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Chitlangia updated HADOOP-15938:
---
Affects Version/s: 0.3.2

> [JDK 11] hadoop-annotations build fails with 'Failed to check signatures'
> -
>
> Key: HADOOP-15938
> URL: https://issues.apache.org/jira/browse/HADOOP-15938
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 0.3.2
> Environment: openjdk version "11" 2018-09-25
>Reporter: Devaraj K
>Assignee: Dinesh Chitlangia
>Priority: Major
> Attachments: HADOOP-15938.001.patch
>
>
> {code:xml}
> [INFO] Checking unresolved references to 
> org.codehaus.mojo.signature:java18:1.0
> [ERROR] Bad class file 
> /hadoop/hadoop-common-project/hadoop-annotations/target/classes/org/apache/hadoop/classification/InterfaceAudience.class
> {code}
> {code:xml}
> [ERROR] Failed to execute goal 
> org.codehaus.mojo:animal-sniffer-maven-plugin:1.16:check (signature-check) on 
> project hadoop-annotations: Failed to check signatures: Bad class file 
> /hadoop/hadoop-common-project/hadoop-annotations/target/classes/org/apache/hadoop/classification/InterfaceAudience.class:
>  IllegalArgumentException -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.codehaus.mojo:animal-sniffer-maven-plugin:1.16:check 
> (signature-check) on project hadoop-annotations: Failed to check signatures
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:213)
> {code}
> {code:xml}
> Caused by: java.io.IOException: Bad class file 
> /hadoop/hadoop-common-project/hadoop-annotations/target/classes/org/apache/hadoop/classification/InterfaceAudience.class
> at org.codehaus.mojo.animal_sniffer.ClassListBuilder.process 
> (ClassListBuilder.java:91)
> {code}
> {code:xml}
> Caused by: java.lang.IllegalArgumentException
> at org.objectweb.asm.ClassReader. (Unknown Source)
> at org.objectweb.asm.ClassReader. (Unknown Source)
> at org.objectweb.asm.ClassReader. (Unknown Source)
> at org.codehaus.mojo.animal_sniffer.ClassListBuilder.process 
> (ClassListBuilder.java:69)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2019-01-03 Thread Gabor Bota (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733167#comment-16733167
 ] 

Gabor Bota edited comment on HADOOP-15819 at 1/3/19 3:52 PM:
-

We could extend the docs with something like "Do not add FileSystem instances 
(with e.g org.apache.hadoop.fs.FileSystem#addFileSystemForTesting) to the cache 
that will be modified during the test runs. This can cause other tests to fail 
when using the same modified or closed FS instance. For more details see 
HADOOP-15819."

Should I create a new jira for this and upload a patch?


was (Author: gabor.bota):
We could extend the docs with something like "Do not add FileSystem instances 
to the cache that will be modified during the test runs with e.g 
org.apache.hadoop.fs.FileSystem#addFileSystemForTesting. This can cause other 
tests to fail when using the same modified or closed FS instance. For more 
details see HADOOP-15819."

Should I create a new jira for this and upload a patch?

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach

[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run

2019-01-03 Thread Gabor Bota (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733167#comment-16733167
 ] 

Gabor Bota commented on HADOOP-15819:
-

We could extend the docs with something like "Do not add FileSystem instances 
to the cache that will be modified during the test runs with e.g 
org.apache.hadoop.fs.FileSystem#addFileSystemForTesting. This can cause other 
tests to fail when using the same modified or closed FS instance. For more 
details see HADOOP-15819."

Should I create a new jira for this and upload a patch?

> S3A integration test failures: FileSystem is closed! - without parallel test 
> run
> 
>
> Key: HADOOP-15819
> URL: https://issues.apache.org/jira/browse/HADOOP-15819
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.1
>Reporter: Gabor Bota
>Assignee: Adam Antal
>Priority: Critical
> Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, 
> HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, 
> S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip
>
>
> Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against 
> Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these 
> failures:
> {noformat}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob)
>   Time elapsed: 0.027 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob
> [ERROR] 
> testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.021 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob)
>   Time elapsed: 0.022 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 
> s <<< FAILURE! - in 
> org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest
> [ERROR] 
> testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest)
>   Time elapsed: 0.023 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob
> [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob)  
> Time elapsed: 0.039 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 
> s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
> [ERROR] 
> testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)  
> Time elapsed: 0.014 s  <<< ERROR!
> java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed!
> {noformat}
> The big issue is that the tests are running in a serial manner - no test is 
> running on top of the other - so we should not see that the tests are failing 
> like this. The issue could be in how we handle 
> org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same 
> S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B 
> test will get the same FileSystem object from the cache and try to use it, 
> but it is closed.
> We see this a lot in our downstream testing too. It's not possible to tell 
> that the failed regression test result is an implementation issue in the 
> runtime code or a test implementation problem. 
> I've checked when and what closes the S3AFileSystem with a sightly modified 
> version of S3AFileSystem which logs the closers of the fs if an error should 
> occur. I'll attach this modified java file for reference. See the next 
> example of the result when it's running:
> {noformat}
> 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem 
> (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): 
> java.lang.RuntimeException: Using closed FS!.
>   at 
> org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73)
>   at 
> org.apache.hadoop.fs

[jira] [Commented] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733132#comment-16733132
 ] 

Weiwei Yang commented on HADOOP-15323:
--

I've committed the patch to branch-2.9, branch-2, branch-3.0, branch-3.1, 
branch-3.2 and trunk. Thanks for the contribution [~wujinhu].

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.1.2, 3.2.1, 2.9.3
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HADOOP-15323:
-
Fix Version/s: 2.9.3

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.1.2, 3.2.1, 2.9.3
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HADOOP-15323:
-
Fix Version/s: 2.10.0

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733098#comment-16733098
 ] 

Hudson commented on HADOOP-15323:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15692 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15692/])
HADOOP-15323. AliyunOSS: Improve copy file performance for (wwei: rev 
040a202b202a37f3b922cd321eb0a8ded457d88b)
* (edit) 
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSFileSystemStore.java
* (edit) 
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSFileSystem.java
* (edit) 
hadoop-tools/hadoop-aliyun/src/test/java/org/apache/hadoop/fs/aliyun/oss/TestAliyunOSSBlockOutputStream.java
* (edit) 
hadoop-tools/hadoop-aliyun/src/test/java/org/apache/hadoop/fs/aliyun/oss/TestAliyunOSSFileSystemContract.java
* (edit) 
hadoop-tools/hadoop-aliyun/src/main/java/org/apache/hadoop/fs/aliyun/oss/AliyunOSSCopyFileTask.java
* (edit) 
hadoop-tools/hadoop-aliyun/src/test/java/org/apache/hadoop/fs/aliyun/oss/TestAliyunOSSFileSystemStore.java
* (edit) 
hadoop-tools/hadoop-aliyun/src/site/markdown/tools/hadoop-aliyun/index.md
* (edit) 
hadoop-tools/hadoop-aliyun/src/test/java/org/apache/hadoop/fs/aliyun/oss/contract/TestAliyunOSSContractDistCp.java


> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HADOOP-15323:
-
Fix Version/s: 3.0.4

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HADOOP-15323:
-
Fix Version/s: 3.1.2

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.3.0, 3.1.2, 3.2.1
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HADOOP-15323:
-
Fix Version/s: 3.2.1

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated HADOOP-15323:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-03 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732962#comment-16732962
 ] 

Hadoop QA commented on HADOOP-16018:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
40s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HADOOP-16018 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12953609/HADOOP-16018-002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fd932b7a6ca3 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cb26f15 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/15720/testReport/ |
| Max. process+thread count | 340 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/15720/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16

[jira] [Commented] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-03 Thread Kai Xie (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732951#comment-16732951
 ] 

Kai Xie commented on HADOOP-16018:
--

Thanks Steve for the review and amend.

Please use `gigi...@gmail.com` and my full name is Kai Xie

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Kai Xie
>Assignee: Kai Xie
>Priority: Major
> Attachments: HADOOP-16018-002.patch, HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-16025) Update the year to 2019

2019-01-03 Thread Ayush Saxena (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-16025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732913#comment-16732913
 ] 

Ayush Saxena commented on HADOOP-16025:
---

No worries!!!
 Thanks [~ste...@apache.org] for the commit and author.  :)

> Update the year to 2019
> ---
>
> Key: HADOOP-16025
> URL: https://issues.apache.org/jira/browse/HADOOP-16025
> Project: Hadoop Common
>  Issue Type: Task
>  Components: build
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 2.7.8, 3.0.4, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HADOOP-16025-01.patch
>
>
> Update the year to 2019 from 2018.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-03 Thread Steve Loughran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16018:

Status: Patch Available  (was: Open)

+1
attaching the patch I'm going to commit; moved comment from simpler /* */ to a 
javadoc one.

Kai, for the commit, I want to credit you on the --author tag, for which I need 
an email address. What email address do you want me to use? 

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2, 3.2.0
>Reporter: Kai X
>Assignee: Kai X
>Priority: Major
> Attachments: HADOOP-16018-002.patch, HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-03 Thread Steve Loughran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16018:

Attachment: HADOOP-16018-002.patch

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Kai X
>Assignee: Kai X
>Priority: Major
> Attachments: HADOOP-16018-002.patch, HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-16018) DistCp won't reassemble chunks when blocks per chunk > 0

2019-01-03 Thread Steve Loughran (JIRA)



 [ 
https://issues.apache.org/jira/browse/HADOOP-16018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16018:

Status: Open  (was: Patch Available)

> DistCp won't reassemble chunks when blocks per chunk > 0
> 
>
> Key: HADOOP-16018
> URL: https://issues.apache.org/jira/browse/HADOOP-16018
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.9.2, 3.2.0
>Reporter: Kai X
>Assignee: Kai X
>Priority: Major
> Attachments: HADOOP-16018.01.patch
>
>
> I was investigating why hadoop-distcp-2.9.2 won't reassemble chunks of the 
> same file when blocks per chunk has been set > 0.
> In the CopyCommitter::commitJob, this logic can prevent chunks from 
> reassembling if blocks per chunk is equal to 0:
> {code:java}
> if (blocksPerChunk > 0) {
>   concatFileChunks(conf);
> }
> {code}
> Then in CopyCommitter's ctor, blocksPerChunk is initialised from the config:
> {code:java}
> blocksPerChunk = context.getConfiguration().getInt(
> DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel(), 0);
> {code}
>  
> But here the config key DistCpOptionSwitch.BLOCKS_PER_CHUNK.getConfigLabel() 
> will always returns empty string because it is constructed without config 
> label:
> {code:java}
> BLOCKS_PER_CHUNK("",
> new Option("blocksperchunk", true, "If set to a positive value, files"
> + "with more blocks than this value will be split into chunks of "
> + " blocks to be transferred in parallel, and "
> + "reassembled on the destination. By default,  is "
> + "0 and the files will be transmitted in their entirety without "
> + "splitting. This switch is only applicable when the source file "
> + "system implements getBlockLocations method and the target file "
> + "system implements concat method"))
> {code}
> As a result it will fall back to the default value 0 for blocksPerChunk, and 
> prevent the chunks from reassembling.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-16025) Update the year to 2019

2019-01-03 Thread Steve Loughran (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-16025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732884#comment-16732884
 ] 

Steve Loughran commented on HADOOP-16025:
-

quick note: I committed this with the --author option but in copying and 
pasting things I managed to get your email addr wrong in the tag

{code}
Author: Ayush Saxena 
Date:   Wed Jan 2 22:24:01 2019 +

HADOOP-16025. Update the year to 2019.

Contributed by Ayush Saxena.
{code}
Apologies -it's not used in automated code anyway. I'm just trying to get the 
author fields to actually credit the patch author. I'll do better next time.

> Update the year to 2019
> ---
>
> Key: HADOOP-16025
> URL: https://issues.apache.org/jira/browse/HADOOP-16025
> Project: Hadoop Common
>  Issue Type: Task
>  Components: build
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 2.7.8, 3.0.4, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HADOOP-16025-01.patch
>
>
> Update the year to 2019 from 2018.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15992) JSON License is included in the transitive dependency of aliyun-sdk-oss 3.0.0

2019-01-03 Thread Sunil Govindan (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732781#comment-16732781
 ] 

Sunil Govindan commented on HADOOP-15992:
-

+1. Committing shortly to trunk/branch-3.2 and branch-3.2.0

[~ajisakaa] do we need to put this to any other branches?

> JSON License is included in the transitive dependency of aliyun-sdk-oss 3.0.0
> -
>
> Key: HADOOP-15992
> URL: https://issues.apache.org/jira/browse/HADOOP-15992
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Blocker
> Attachments: HADOOP-15992.01.patch
>
>
> This is the output of {{mvn dependency:tree}}
> {noformat}
> [INFO] +- com.aliyun.oss:aliyun-sdk-oss:jar:3.0.0:compile
> [INFO] |  +- org.jdom:jdom:jar:1.1:compile
> [INFO] |  +- com.sun.jersey:jersey-json:jar:1.19:compile
> [INFO] |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
> [INFO] |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
> [INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
> [INFO] |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
> [INFO] |  |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile
> [INFO] |  |  \- org.codehaus.jackson:jackson-xc:jar:1.9.13:compile
> [INFO] |  +- com.aliyun:aliyun-java-sdk-core:jar:3.4.0:compile
> [INFO] |  |  \- org.json:json:jar:20170516:compile
> {noformat}
> The license of org.json:json:jar:20170516:compile is JSON License, which 
> cannot be included.
> https://www.apache.org/legal/resolved.html#json



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732778#comment-16732778
 ] 

Weiwei Yang commented on HADOOP-15323:
--

[~wujinhu], sure, +1 to the patch. I'll commit the patch today. Thanks

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread wujinhu (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732771#comment-16732771
 ] 

wujinhu commented on HADOOP-15323:
--

Thanks [~cheersyang] I have added debug log and docs as you suggested.

We should add this feature to branch 2.x also, because some of our users still 
use hadoop 2.x now.

Please help to merge into those branches too. Many thanks.:)

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732763#comment-16732763
 ] 

Weiwei Yang commented on HADOOP-15323:
--

Latest patch looks good. [~wujinhu], pls confirm what branches this needs to be 
committed. Can we get it in only 3.x versions as it has a behavior change. I'll 
commit once I get this back from you. Thanks.

> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
> Key: HADOOP-15323
> URL: https://issues.apache.org/jira/browse/HADOOP-15323
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3, 3.3.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15323.001.patch, HADOOP-15323.002.patch, 
> HADOOP-15323.003.patch
>
>
> Aliyun OSS will support shallow copy which means server will only copy 
> metadata when copy object operation occurs. 
> With shallow copy, we can use copyObject api instead of multi-part copy api 
> if we do not change object storage type & encryption type & source object  is 
> uploaded by Put / Multipart upload api.
> We will try to use copyObject api, and check result. If shallow copy disabled 
> for this object, then we will use multipart copy. So, I will remove 
> fs.oss.multipart.upload.threshold configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15323) AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore

2019-01-03 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HADOOP-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732737#comment-16732737
 ] 

Hadoop QA commented on HADOOP-15323:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-aliyun in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HADOOP-15323 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12953579/HADOOP-15323.003.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0f3b3a272d4d 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / cb26f15 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/15719/testReport/ |
| Max. process+thread count | 340 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/15719/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> AliyunOSS: Improve copy file performance for AliyunOSSFileSystemStore
> -
>
>

47 matches

Mail list logo