date:20170111

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Status: Patch Available  (was: Open)

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.3.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.4.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.3.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.4.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Status: Open  (was: Patch Available)

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.3.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.4.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.3.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.4.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-13836) Securing Hadoop RPC using SSL

2017-01-11 Thread kartheek muthyala (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820360#comment-15820360
 ] 

kartheek muthyala edited comment on HADOOP-13836 at 1/12/17 7:23 AM:
-

Hi all,

I am posting the performance analysis of SSL feature implemented using the 
current patch. We ran Teragen and Terasort to compare the overhead with SSL on 
a small cluster.


was (Author: kartheek):
Hi all,

I am posting the performance analysis of SSL feature implemented using the 
current patch, using Teragen and Terasort ran on a small cluster.

> Securing Hadoop RPC using SSL
> -
>
> Key: HADOOP-13836
> URL: https://issues.apache.org/jira/browse/HADOOP-13836
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: kartheek muthyala
>Assignee: kartheek muthyala
> Attachments: HADOOP-13836.patch, Secure IPC OSS Proposal-1.pdf, 
> SecureIPC Performance Analysis-OSS.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication & 
> Security Layer (SASL), with the Kerberos ticket based authentication or 
> Digest-md5 checksum based authentication protocols. This proposal is about 
> enhancing this cipher suite with SSL/TLS based encryption and authentication. 
> SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard, that 
> provides data security and integrity across two different end points in a 
> network. This protocol has made its way to a number of applications such as 
> web browsing, email, internet faxing, messaging, VOIP etc. And supporting 
> this cipher suite at the core of Hadoop would give a good synergy with the 
> applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes 
> of communication
> 1.Plain 
> 2. SASL encryption with an underlying authentication
> 3. SSL based encryption and authentication (x509 certificate)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Attachment: HADOOP-distcp-multithreaded-mapper-trunk.4.patch

Added a message to DistCpOptions.toString() for easy debugging.

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.3.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.4.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.3.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.4.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13836) Securing Hadoop RPC using SSL

2017-01-11 Thread kartheek muthyala (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kartheek muthyala updated HADOOP-13836:
---
Attachment: SecureIPC Performance Analysis-OSS.pdf

Hi all,

I am posting the performance analysis of SSL feature implemented using the 
current patch, using Teragen and Terasort ran on a small cluster.

> Securing Hadoop RPC using SSL
> -
>
> Key: HADOOP-13836
> URL: https://issues.apache.org/jira/browse/HADOOP-13836
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: kartheek muthyala
>Assignee: kartheek muthyala
> Attachments: HADOOP-13836.patch, Secure IPC OSS Proposal-1.pdf, 
> SecureIPC Performance Analysis-OSS.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication & 
> Security Layer (SASL), with the Kerberos ticket based authentication or 
> Digest-md5 checksum based authentication protocols. This proposal is about 
> enhancing this cipher suite with SSL/TLS based encryption and authentication. 
> SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard, that 
> provides data security and integrity across two different end points in a 
> network. This protocol has made its way to a number of applications such as 
> web browsing, email, internet faxing, messaging, VOIP etc. And supporting 
> this cipher suite at the core of Hadoop would give a good synergy with the 
> applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes 
> of communication
> 1.Plain 
> 2. SASL encryption with an underlying authentication
> 3. SSL based encryption and authentication (x509 certificate)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Attachment: HADOOP-distcp-multithreaded-mapper-branch26.4.patch

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.3.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.4.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.3.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-13836) Securing Hadoop RPC using SSL

2017-01-11 Thread kartheek muthyala (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820325#comment-15820325
 ] 

kartheek muthyala edited comment on HADOOP-13836 at 1/12/17 7:05 AM:
-

Hi all,

Please find attached the first version of enable SSL to Hadoop RPC design draft.
[~kaizh], [~daryn] and [~ste...@apache.org], Kindly provide your comments on 
the same.


 


was (Author: kartheek):
Secure IPC design draft - v1

> Securing Hadoop RPC using SSL
> -
>
> Key: HADOOP-13836
> URL: https://issues.apache.org/jira/browse/HADOOP-13836
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: kartheek muthyala
>Assignee: kartheek muthyala
> Attachments: HADOOP-13836.patch, Secure IPC OSS Proposal-1.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication & 
> Security Layer (SASL), with the Kerberos ticket based authentication or 
> Digest-md5 checksum based authentication protocols. This proposal is about 
> enhancing this cipher suite with SSL/TLS based encryption and authentication. 
> SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard, that 
> provides data security and integrity across two different end points in a 
> network. This protocol has made its way to a number of applications such as 
> web browsing, email, internet faxing, messaging, VOIP etc. And supporting 
> this cipher suite at the core of Hadoop would give a good synergy with the 
> applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes 
> of communication
> 1.Plain 
> 2. SASL encryption with an underlying authentication
> 3. SSL based encryption and authentication (x509 certificate)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13836) Securing Hadoop RPC using SSL

2017-01-11 Thread kartheek muthyala (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kartheek muthyala updated HADOOP-13836:
---
Attachment: Secure IPC OSS Proposal-1.pdf

Secure IPC design draft - v1

> Securing Hadoop RPC using SSL
> -
>
> Key: HADOOP-13836
> URL: https://issues.apache.org/jira/browse/HADOOP-13836
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: kartheek muthyala
>Assignee: kartheek muthyala
> Attachments: HADOOP-13836.patch, Secure IPC OSS Proposal-1.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication & 
> Security Layer (SASL), with the Kerberos ticket based authentication or 
> Digest-md5 checksum based authentication protocols. This proposal is about 
> enhancing this cipher suite with SSL/TLS based encryption and authentication. 
> SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard, that 
> provides data security and integrity across two different end points in a 
> network. This protocol has made its way to a number of applications such as 
> web browsing, email, internet faxing, messaging, VOIP etc. And supporting 
> this cipher suite at the core of Hadoop would give a good synergy with the 
> applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes 
> of communication
> 1.Plain 
> 2. SASL encryption with an underlying authentication
> 3. SSL based encryption and authentication (x509 certificate)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13962) Update ADLS SDK to 2.1.4

2017-01-11 Thread John Zhuge (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820218#comment-15820218
 ] 

John Zhuge commented on HADOOP-13962:
-

Patch 001 also passed basic {{dfs -ls/-put/-cat}} and 
teragen/terasort/teravalidate.

> Update ADLS SDK to 2.1.4
> 
>
> Key: HADOOP-13962
> URL: https://issues.apache.org/jira/browse/HADOOP-13962
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Assignee: John Zhuge
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13962.001.patch
>
>
> ADLS has multiple upgrades since the version 2.0.11 we are using: 2.1.1, 
> 2.1.2, and 2.1.4. Change list: 
> https://github.com/Azure/azure-data-lake-store-java/blob/master/CHANGES.md.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13933) Add haadmin -getAllServiceState option to get the HA state of all the NameNodes/ResourceManagers

2017-01-11 Thread Surendra Singh Lilhore (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820194#comment-15820194
 ] 

Surendra Singh Lilhore commented on HADOOP-13933:
-

Thanks [~linyiqun] for review...

Attached updated patch v5...

This command will not print anything if it fail to get service IDs from 
configuration, so I added one check.
{code}
+if (targetIds.isEmpty()) {
+  errOut.println("Failed to get service IDs");
+  return -1;
+}
{code}

Please review...

> Add haadmin -getAllServiceState option to get the HA state of all the 
> NameNodes/ResourceManagers
> 
>
> Key: HADOOP-13933
> URL: https://issues.apache.org/jira/browse/HADOOP-13933
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HADOOP-13933.002.patch, HADOOP-13933.003.patch, 
> HADOOP-13933.003.patch, HADOOP-13933.004.patch, HADOOP-13933.005.patch, 
> HDFS-9559.01.patch
>
>
> Currently we have one command to get state of namenode.
> {code}
> ./hdfs haadmin -getServiceState 
> {code}
> It will be good to have command which will give state of all the namenodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13933) Add haadmin -getAllServiceState option to get the HA state of all the NameNodes/ResourceManagers

2017-01-11 Thread Surendra Singh Lilhore (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HADOOP-13933:

Attachment: HADOOP-13933.005.patch

> Add haadmin -getAllServiceState option to get the HA state of all the 
> NameNodes/ResourceManagers
> 
>
> Key: HADOOP-13933
> URL: https://issues.apache.org/jira/browse/HADOOP-13933
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
> Attachments: HADOOP-13933.002.patch, HADOOP-13933.003.patch, 
> HADOOP-13933.003.patch, HADOOP-13933.004.patch, HADOOP-13933.005.patch, 
> HDFS-9559.01.patch
>
>
> Currently we have one command to get state of namenode.
> {code}
> ./hdfs haadmin -getServiceState 
> {code}
> It will be good to have command which will give state of all the namenodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13983) Print better error when accessing a non-existent store

2017-01-11 Thread John Zhuge (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HADOOP-13983:

Release Note:   (was: The error message when accessing a non-existent store 
is not user friendly:
{noformat}
$ hdfs dfs -ls adl://STORE.azuredatalakestore.net/
ls: Operation GETFILESTATUS failed with exception java.net.UnknownHostException 
: STORE.azuredatalakestore.net
{noformat}

The Hadoop is configured with a valid SPI but STORE does not exist.)
 Description: 
The error message when accessing a non-existent store is not user friendly:
{noformat}
$ hdfs dfs -ls adl://STORE.azuredatalakestore.net/
ls: Operation GETFILESTATUS failed with exception java.net.UnknownHostException 
: STORE.azuredatalakestore.net
{noformat}
Hadoop is configured with a valid SPI but {{STORE}} does not exist.

> Print better error when accessing a non-existent store
> --
>
> Key: HADOOP-13983
> URL: https://issues.apache.org/jira/browse/HADOOP-13983
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Priority: Minor
>  Labels: supportability
>
> The error message when accessing a non-existent store is not user friendly:
> {noformat}
> $ hdfs dfs -ls adl://STORE.azuredatalakestore.net/
> ls: Operation GETFILESTATUS failed with exception 
> java.net.UnknownHostException : STORE.azuredatalakestore.net
> {noformat}
> Hadoop is configured with a valid SPI but {{STORE}} does not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13979) Live unit tests leak files in home dir on ADLS store

2017-01-11 Thread John Zhuge (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HADOOP-13979:

Summary: Live unit tests leak files in home dir on ADLS store  (was: Live 
unit tests leaks files in home dir on ADLS store)

> Live unit tests leak files in home dir on ADLS store
> 
>
> Key: HADOOP-13979
> URL: https://issues.apache.org/jira/browse/HADOOP-13979
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Priority: Minor
>
> Live unit tests left 61 files in user home dir on ADLS store {{jzadls}}:
> {noformat}
> /user
> /user/jzhuge
> /user/jzhuge/06b74549-c9d5-41b3-9f32-660e3284200d
> /user/jzhuge/0b71b60d-7501-40b2-a86c-c1ed2542997f
> /user/jzhuge/1311d721-8a31-4eda-9d5b-be4fc47ce62a
> ...
> {noformat}
> However, failed to reproduce on store {{jzhugeadls}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13982) Print better error when accessing a store without permission

2017-01-11 Thread John Zhuge (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HADOOP-13982:

Summary: Print better error when accessing a store without permission  
(was: Print better error message when accessing a store without permission)

> Print better error when accessing a store without permission
> 
>
> Key: HADOOP-13982
> URL: https://issues.apache.org/jira/browse/HADOOP-13982
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>  Labels: supportability
>
> The error message when accessing a store without permission is not user 
> friendly:
> {noformat}
> $ hdfs dfs -ls adl://STORE.azuredatalakestore.net/
> ls: Operation GETFILESTATUS failed with HTTP403 : null
> {noformat}
> Store {{STORE}} exists but Hadoop is configured with an SPI that does not 
> have access to the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13983) Print better error when accessing a non-existent store

2017-01-11 Thread John Zhuge (JIRA)

John Zhuge created HADOOP-13983:
---

 Summary: Print better error when accessing a non-existent store
 Key: HADOOP-13983
 URL: https://issues.apache.org/jira/browse/HADOOP-13983
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/adl
Affects Versions: 3.0.0-alpha2
Reporter: John Zhuge
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13982) Print better error message when accessing a store without permission

2017-01-11 Thread John Zhuge (JIRA)

John Zhuge created HADOOP-13982:
---

 Summary: Print better error message when accessing a store without 
permission
 Key: HADOOP-13982
 URL: https://issues.apache.org/jira/browse/HADOOP-13982
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/adl
Affects Versions: 3.0.0-alpha2
Reporter: John Zhuge


The error message when accessing a store without permission is not user 
friendly:
{noformat}
$ hdfs dfs -ls adl://STORE.azuredatalakestore.net/
ls: Operation GETFILESTATUS failed with HTTP403 : null
{noformat}

Store {{STORE}} exists but Hadoop is configured with an SPI that does not have 
access to the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13877) S3Guard: fix TestDynamoDBMetadataStore when fs.s3a.s3guard.ddb.table is set

2017-01-11 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820036#comment-15820036
 ] 

Aaron Fabbri commented on HADOOP-13877:
---

+1 v2 patch, pending testing.

I'm cool with the additional improvements in [~liuml07]'s patch.  I'll test it 
now.

> S3Guard: fix TestDynamoDBMetadataStore when fs.s3a.s3guard.ddb.table is set
> ---
>
> Key: HADOOP-13877
> URL: https://issues.apache.org/jira/browse/HADOOP-13877
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13877-HADOOP-13345.001.patch, 
> HADOOP-13877-HADOOP-13345.002.patch
>
>
> I see a couple of failures in the DynamoDB MetadataStore unit test when I set 
> {{fs.s3a.s3guard.ddb.table}} in my test/resources/core-site.xml.
> I have a fix already, so I'll take this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13981) S3Guard CLI: Add documentation

2017-01-11 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820005#comment-15820005
 ] 

Aaron Fabbri commented on HADOOP-13981:
---

Is there a central place for CLI documentation, or would it be ok to include 
this in s3guard.md?

> S3Guard CLI: Add documentation
> --
>
> Key: HADOOP-13981
> URL: https://issues.apache.org/jira/browse/HADOOP-13981
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>
> I believe we still need documentation for the new S3Guard CLI commands.  
> Synopsis of all the commands and some examples would be great.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.

2017-01-11 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820003#comment-15820003
 ] 

Aaron Fabbri commented on HADOOP-13650:
---

I created another JIRA HADOOP-13980 for the fsck stuff.  I also created 
HADOOP-13981 for documentation.. We still need that, right?  

> S3Guard: Provide command line tools to manipulate metadata store.
> -
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13650-HADOOP-13345.000.patch, 
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch, 
> HADOOP-13650-HADOOP-13345.003.patch, HADOOP-13650-HADOOP-13345.004.patch, 
> HADOOP-13650-HADOOP-13345.005.patch, HADOOP-13650-HADOOP-13345.006.patch, 
> HADOOP-13650-HADOOP-13345.007.patch, HADOOP-13650-HADOOP-13345.008.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata 
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the 
> file metadata between metadata store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13981) S3Guard CLI: Add documentation

2017-01-11 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13981:
-

 Summary: S3Guard CLI: Add documentation
 Key: HADOOP-13981
 URL: https://issues.apache.org/jira/browse/HADOOP-13981
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


I believe we still need documentation for the new S3Guard CLI commands.  
Synopsis of all the commands and some examples would be great.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13980) S3Guard CLI: Add fsck check command

2017-01-11 Thread Aaron Fabbri (JIRA)

Aaron Fabbri created HADOOP-13980:
-

 Summary: S3Guard CLI: Add fsck check command
 Key: HADOOP-13980
 URL: https://issues.apache.org/jira/browse/HADOOP-13980
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Aaron Fabbri
Assignee: Aaron Fabbri


As discussed in HADOOP-13650, we want to add an S3Guard CLI command which 
compares S3 with MetadataStore, and returns a failure status if any invariants 
are violated.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.

2017-01-11 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819990#comment-15819990
 ] 

Aaron Fabbri commented on HADOOP-13650:
---

Also wanted to comment on the addition of fsck features.  IMHO we should do it 
as a separate JIRA.  We have diff, import, and destroy, which together provide 
basic tools for diagnosis and repair.  I think we should also have a "fsck 
check" command that simply returns failure code if any invariants are violated. 
 In particular, it should fail if a MetadataStore directory is marked as 
authoritative, and its contents differ from that of S3.  That violates the 
"this is the full directory contents" invariant of the 
DirListingMetadata#isAuthoritative flag.  Of course, DynamoDB MS does not 
currently persist the isAuthoritative flag on listings, so this would always 
pass.  When we add that feature (which will be needed for performance 
improvements), this will be a good tool to see if things have diverged (e.g. 
due to client crashing or concurrent modifications to overlapping subtrees).

Along those lines, a "fsck fix" command could, for any directory where that 
invariant was failing, reload the contents of that directory from S3.  Eventual 
list consistency could cause false positives here, which the "fsck fix" would 
persist, so that is a concern.

Note the "fsck check" command could also return failure when a path exists in 
the MetadataStore but not in S3.  Again this is subject to eventual list 
consistency and that would need to be documented.  It could have a configurable 
time period after which we assume list consistency would not be an issue (e.g. 
if a two-day old file exists in MetadataStore but not S3, it is likely to *not* 
be due to eventual consistency).


> S3Guard: Provide command line tools to manipulate metadata store.
> -
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-13650-HADOOP-13345.000.patch, 
> HADOOP-13650-HADOOP-13345.001.patch, HADOOP-13650-HADOOP-13345.002.patch, 
> HADOOP-13650-HADOOP-13345.003.patch, HADOOP-13650-HADOOP-13345.004.patch, 
> HADOOP-13650-HADOOP-13345.005.patch, HADOOP-13650-HADOOP-13345.006.patch, 
> HADOOP-13650-HADOOP-13345.007.patch, HADOOP-13650-HADOOP-13345.008.patch
>
>
> Similar systems like EMRFS has the CLI tools to manipulate the metadata 
> store, i.e., create or delete metadata store, or {{import}}, {{sync}} the 
> file metadata between metadata store and S3. 
> http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emrfs-cli-reference.html
> S3Guard should offer similar functionality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13978) Update project release notes for 3.0.0-alpha2

2017-01-11 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819991#comment-15819991
 ] 

Andrew Wang commented on HADOOP-13978:
--

Arun, for sure, go ahead. I didn't see this pop up as a New Feature when I 
generated the alpha2 release notes, so if there's a related fixed JIRA for this 
feature it'd be helpful to also add release notes there.

> Update project release notes for 3.0.0-alpha2
> -
>
> Key: HADOOP-13978
> URL: https://issues.apache.org/jira/browse/HADOOP-13978
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HADOOP-13978.001.patch
>
>
> Let's update the website release notes for 3.0.0-alpha2's changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13650) S3Guard: Provide command line tools to manipulate metadata store.

2017-01-11 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819969#comment-15819969
 ] 

Aaron Fabbri commented on HADOOP-13650:
---

Nice work on this [~eddyxu].  Overall I think it looks good.  Nice work on the 
test code too.

{code}
+   * @param create create the metadata store if not exists.
+   * @return a initialized metadata store.
+   */
+  MetadataStore initMetadataStore(boolean create) throws IOException {
{code}

Minor  comment clarification: "@param create When using DynamoDB, create table 
if it does not exist"

{code}
+  }
+} else {
+  // CLI does not specify metadata store URI, it uses default metadata 
store
+  // DynamoDB instead.
+  ms = new DynamoDBMetadataStore();
{code}

What if create == true here?

{code}
+  void initS3AFileSystem(String path) throws IOException {
+URI uri;
{code}
Do we need to enforce that this FileSystem does *not* have a MetadataStore
configured?  (You want raw S3 access without the MetadataStore for this tool.)

{code}
+private void importDir(S3AFileStatus status) throws IOException {
+  Preconditions.checkArgument(status.isDirectory());
+  RemoteIterator it =
+  s3a.listFiles(status.getPath(), true);
+
+  while (it.hasNext()) {
+LocatedFileStatus located = it.next();
+S3AFileStatus child;
+if (located.isDirectory()) {
+  // Because S3 does not actually store directory, this returned dir
+  // must be an empty directory.
{code}

I thought S3A's listFiles discovers non-empty directories via the prefixes 
returned from S3.  In fs.s3a.Listing#buildNextStatusBatch().

{code}
+  final boolean isEmptyDir = true;
+  child = new S3AFileStatus(isEmptyDir, located.getPath(),
+  located.getOwner());
{code}

Should we add to dirCache here?

{code}
+/**
+ * Print difference between two file statuses to the output stream.
+ *
+ * @param statusFromMS file status from metadata store.
+ * @param statusFromS3 file status from S3.
+ * @param out output stream.
+ */
+private static void printDiff(S3AFileStatus statusFromMS,
{code}

To help future readers of the code, maybe say "Print difference, if any, ..."
in that function comment?

{code}
+private static void printDiff(S3AFileStatus statusFromMS,
+  S3AFileStatus statusFromS3,
+  PrintStream out) {
+  Preconditions.checkArgument(
+  !(statusFromMS == null && statusFromS3 == null));
+  if (statusFromMS == null) {
+out.printf("%s%s%s%n", S3_PREFIX, SEP, formatFileStatus(statusFromS3));
+  } else if (statusFromS3 == null) {
+out.printf("%s%s%s%n", MS_PREFIX, SEP, formatFileStatus(statusFromMS));
+  }
+  // TODO: Do we need to compare the internal fields of two FileStatuses?
{code}
If so, modification times are likely to not match.  I think your code here is a 
good first step and we could add FileStatus comparison in the future.


{code}
+  for (Path path : allPaths) {
+S3AFileStatus s3status = s3Children.get(path);
+S3AFileStatus msStatus = msChildren.get(path);
+printDiff(msStatus, s3status, out);
+if ((s3status != null && s3status.isDirectory()) ||
+(msStatus != null && msStatus.isDirectory())) {
+  compareDir(msStatus, s3status, out);
+}
+  }
{code}

This looks like it will work.. We could also just use two Sets, then, in 
pseudocode:
msSetCopy = copyOf(msSet)
print "In MetadataStore but not S3: ", msSetCopy.removeAll(s3aSet)
print "In S3 but not in MetadataStore: ", s3aSet.removeAll(msSet)

Your approach allows for adding comparison of the two FileStatus', in addition
to just checking Path set memberships, so it is actually more flexible. (So 
your code looks good here)

Also in your Diff class:
{code}
+public int run(String[] args, PrintStream out) throws IOException {
+  List paths = parseArgs(args);
+  if (paths.isEmpty()) {
+out.println(USAGE);
+return INVALID_ARGUMENT;
+  }
+  String s3Path = paths.get(0);
+  // Make sure that S3AFileSystem does not hold an actual MetadataStore
+  // implementation.
+  Configuration conf = getConf();
+  conf.setClass(S3_METADATA_STORE_IMPL, NullMetadataStore.class,
+  MetadataStore.class);
{code}
Ahh. you enforce no MetadataStore here.. Should we move this up to 
initS3AFileSystem()?
Seems like all commands should have this?



> S3Guard: Provide command line tools to manipulate metadata store.
> -
>
> Key: HADOOP-13650
> URL: https://issues.apache.org/jira/browse/HADOOP-13650
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345

[jira] [Commented] (HADOOP-13957) prevent bad PATHs

2017-01-11 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819855#comment-15819855
 ] 

Allen Wittenauer commented on HADOOP-13957:
---

Is there a use case for something writable in the path?  Right now to own the 
box just means installing a trojan bash.

> prevent bad PATHs
> -
>
> Key: HADOOP-13957
> URL: https://issues.apache.org/jira/browse/HADOOP-13957
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>
> Apache Hadoop daemons should fail to start if the shell PATH contains world 
> writable directories or '.' (cwd).  Doing so would close an attack vector on 
> misconfigured systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13673) Update scripts to be smarter when running with privilege

2017-01-11 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819826#comment-15819826
 ] 

Allen Wittenauer commented on HADOOP-13673:
---

ping [~andrew.wang].  I'd like to get this into -alpha2.

> Update scripts to be smarter when running with privilege
> 
>
> Key: HADOOP-13673
> URL: https://issues.apache.org/jira/browse/HADOOP-13673
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
> Attachments: HADOOP-13673.00.patch, HADOOP-13673.01.patch, 
> HADOOP-13673.02.patch, HADOOP-13673.03.patch
>
>
> As work continues on HADOOP-13397, it's become evident that we need better 
> hooks to start daemons as specifically configured users.  Via the 
> (command)_(subcommand)_USER environment variables in 3.x, we actually have a 
> standardized way to do that.  This in turn means we can make the sbin scripts 
> super functional with a bit of updating:
> * Consolidate start-dfs.sh and start-secure-dns.sh into one script
> * Make start-\*.sh and stop-\*.sh know how to switch users when run as root
> * Undeprecate start/stop-all.sh so that it could be used as root for 
> production purposes and as a single user for non-production users



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13673) Update scripts to be smarter when running with privilege

2017-01-11 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-13673:
--
Release Note: Apache Hadoop is now able to switch to the appropriate user 
prior to launching commands so long as the command is being run with a 
privileged user and the appropriate set of _USER variables are defined.  This 
re-enables sbin/start-all.sh and sbin/stop-all.sh as well as fixes the 
sbin/start-dfs.sh and sbin/stop-dfs.sh to work with both secure and unsecure 
systems.  (was: Apache Hadoop is now able to switch to the appropriate user 
prior to launching commands so long as the command is being run with a 
privileged user and the appropriate _USER variable is set.  This re-enables 
sbin/start-all.sh, sbin/stop-all.sh, and fixes the sbin/start-dfs.sh and 
sbin/stop-dfs.sh to work with both secure and unsecure systems.)

> Update scripts to be smarter when running with privilege
> 
>
> Key: HADOOP-13673
> URL: https://issues.apache.org/jira/browse/HADOOP-13673
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
> Attachments: HADOOP-13673.00.patch, HADOOP-13673.01.patch, 
> HADOOP-13673.02.patch, HADOOP-13673.03.patch
>
>
> As work continues on HADOOP-13397, it's become evident that we need better 
> hooks to start daemons as specifically configured users.  Via the 
> (command)_(subcommand)_USER environment variables in 3.x, we actually have a 
> standardized way to do that.  This in turn means we can make the sbin scripts 
> super functional with a bit of updating:
> * Consolidate start-dfs.sh and start-secure-dns.sh into one script
> * Make start-\*.sh and stop-\*.sh know how to switch users when run as root
> * Undeprecate start/stop-all.sh so that it could be used as root for 
> production purposes and as a single user for non-production users



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13957) prevent bad PATHs

2017-01-11 Thread Andres Perez (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819843#comment-15819843
 ] 

Andres Perez commented on HADOOP-13957:
---

Maybe this should be implemented as a configuration option that you can 
enable/disable this check.

{{hadoop.security.check-path = true|false}}

> prevent bad PATHs
> -
>
> Key: HADOOP-13957
> URL: https://issues.apache.org/jira/browse/HADOOP-13957
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>
> Apache Hadoop daemons should fail to start if the shell PATH contains world 
> writable directories or '.' (cwd).  Doing so would close an attack vector on 
> misconfigured systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13979) Live unit tests leaks files in home dir on ADLS store

2017-01-11 Thread John Zhuge (JIRA)

John Zhuge created HADOOP-13979:
---

 Summary: Live unit tests leaks files in home dir on ADLS store
 Key: HADOOP-13979
 URL: https://issues.apache.org/jira/browse/HADOOP-13979
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/adl
Affects Versions: 3.0.0-alpha2
Reporter: John Zhuge
Priority: Minor


Live unit tests left 61 files in user home dir on ADLS store {{jzadls}}:
{noformat}
/user
/user/jzhuge
/user/jzhuge/06b74549-c9d5-41b3-9f32-660e3284200d
/user/jzhuge/0b71b60d-7501-40b2-a86c-c1ed2542997f
/user/jzhuge/1311d721-8a31-4eda-9d5b-be4fc47ce62a
...
{noformat}

However, failed to reproduce on store {{jzhugeadls}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13673) Update scripts to be smarter when running with privilege

2017-01-11 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-13673:
--
Release Note: Apache Hadoop is now able to switch to appropriate user prior 
to launching commands so long as the command is being run with a privileged 
user and the appropriate _USER variable is set.  This re-enables 
sbin/start-all.sh, sbin/stop-all.sh, and fixes the sbin/start-dfs.sh and 
sbin/stop-dfs.sh to work with both secure and unsecure systems.
  Issue Type: New Feature  (was: Bug)

> Update scripts to be smarter when running with privilege
> 
>
> Key: HADOOP-13673
> URL: https://issues.apache.org/jira/browse/HADOOP-13673
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
> Attachments: HADOOP-13673.00.patch, HADOOP-13673.01.patch, 
> HADOOP-13673.02.patch, HADOOP-13673.03.patch
>
>
> As work continues on HADOOP-13397, it's become evident that we need better 
> hooks to start daemons as specifically configured users.  Via the 
> (command)_(subcommand)_USER environment variables in 3.x, we actually have a 
> standardized way to do that.  This in turn means we can make the sbin scripts 
> super functional with a bit of updating:
> * Consolidate start-dfs.sh and start-secure-dns.sh into one script
> * Make start-\*.sh and stop-\*.sh know how to switch users when run as root
> * Undeprecate start/stop-all.sh so that it could be used as root for 
> production purposes and as a single user for non-production users



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13673) Update scripts to be smarter when running with privilege

2017-01-11 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-13673:
--
Release Note: Apache Hadoop is now able to switch to the appropriate user 
prior to launching commands so long as the command is being run with a 
privileged user and the appropriate _USER variable is set.  This re-enables 
sbin/start-all.sh, sbin/stop-all.sh, and fixes the sbin/start-dfs.sh and 
sbin/stop-dfs.sh to work with both secure and unsecure systems.  (was: Apache 
Hadoop is now able to switch to appropriate user prior to launching commands so 
long as the command is being run with a privileged user and the appropriate 
_USER variable is set.  This re-enables sbin/start-all.sh, sbin/stop-all.sh, 
and fixes the sbin/start-dfs.sh and sbin/stop-dfs.sh to work with both secure 
and unsecure systems.)

> Update scripts to be smarter when running with privilege
> 
>
> Key: HADOOP-13673
> URL: https://issues.apache.org/jira/browse/HADOOP-13673
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
> Attachments: HADOOP-13673.00.patch, HADOOP-13673.01.patch, 
> HADOOP-13673.02.patch, HADOOP-13673.03.patch
>
>
> As work continues on HADOOP-13397, it's become evident that we need better 
> hooks to start daemons as specifically configured users.  Via the 
> (command)_(subcommand)_USER environment variables in 3.x, we actually have a 
> standardized way to do that.  This in turn means we can make the sbin scripts 
> super functional with a bit of updating:
> * Consolidate start-dfs.sh and start-secure-dns.sh into one script
> * Make start-\*.sh and stop-\*.sh know how to switch users when run as root
> * Undeprecate start/stop-all.sh so that it could be used as root for 
> production purposes and as a single user for non-production users



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13978) Update project release notes for 3.0.0-alpha2

2017-01-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819792#comment-15819792
 ] 

Hadoop QA commented on HADOOP-13978:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13978 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12847126/HADOOP-13978.001.patch
 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 7ccbecc08761 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e6f13fe |
| modules | C: hadoop-project U: hadoop-project |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11425/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Update project release notes for 3.0.0-alpha2
> -
>
> Key: HADOOP-13978
> URL: https://issues.apache.org/jira/browse/HADOOP-13978
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HADOOP-13978.001.patch
>
>
> Let's update the website release notes for 3.0.0-alpha2's changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13978) Update project release notes for 3.0.0-alpha2

2017-01-11 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819782#comment-15819782
 ] 

Arun Suresh commented on HADOOP-13978:
--

[~andrew.wang], Can I maybe update the patch to add some information on 
scheduling of opportunistic containers : YARN-5646 ?

> Update project release notes for 3.0.0-alpha2
> -
>
> Key: HADOOP-13978
> URL: https://issues.apache.org/jira/browse/HADOOP-13978
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HADOOP-13978.001.patch
>
>
> Let's update the website release notes for 3.0.0-alpha2's changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13978) Update project release notes for 3.0.0-alpha2

2017-01-11 Thread Kai Zheng (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819765#comment-15819765
 ] 

Kai Zheng commented on HADOOP-13978:


It is pretty good to me, looking at the Aliyun OSS related. Thanks Andrew.

> Update project release notes for 3.0.0-alpha2
> -
>
> Key: HADOOP-13978
> URL: https://issues.apache.org/jira/browse/HADOOP-13978
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HADOOP-13978.001.patch
>
>
> Let's update the website release notes for 3.0.0-alpha2's changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13978) Update project release notes for 3.0.0-alpha2

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-13978:
-
Status: Patch Available  (was: Open)

> Update project release notes for 3.0.0-alpha2
> -
>
> Key: HADOOP-13978
> URL: https://issues.apache.org/jira/browse/HADOOP-13978
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HADOOP-13978.001.patch
>
>
> Let's update the website release notes for 3.0.0-alpha2's changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13978) Update project release notes for 3.0.0-alpha2

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-13978:
-
Attachment: HADOOP-13978.001.patch

Patch attached. I generated the release notes and found two juicy JIRAs, the 
Aliyun OSS connector and the shaded client JARs.

[~drankye] [~busbey] this look good to you?

> Update project release notes for 3.0.0-alpha2
> -
>
> Key: HADOOP-13978
> URL: https://issues.apache.org/jira/browse/HADOOP-13978
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HADOOP-13978.001.patch
>
>
> Let's update the website release notes for 3.0.0-alpha2's changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13978) Update project release notes for 3.0.0-alpha2

2017-01-11 Thread Andrew Wang (JIRA)

Andrew Wang created HADOOP-13978:


 Summary: Update project release notes for 3.0.0-alpha2
 Key: HADOOP-13978
 URL: https://issues.apache.org/jira/browse/HADOOP-13978
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0-alpha2
Reporter: Andrew Wang
Assignee: Andrew Wang


Let's update the website release notes for 3.0.0-alpha2's changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819733#comment-15819733
 ] 

Hadoop QA commented on HADOOP-13975:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 
1 new + 103 unchanged - 1 fixed = 104 total (was 104) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 
19s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13975 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12847121/HADOOP-distcp-multithreaded-mapper-trunk.3.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux bec31e74dbf4 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e6f13fe |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11424/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11424/testReport/ |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11424/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue

[jira] [Updated] (HADOOP-13050) Upgrade to AWS SDK 1.11.45

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-13050:
-
Release Note: The dependency on the AWS SDK has been bumped to 1.11.45.

> Upgrade to AWS SDK 1.11.45
> --
>
> Key: HADOOP-13050
> URL: https://issues.apache.org/jira/browse/HADOOP-13050
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13050-001.patch, HADOOP-13050-002.patch, 
> HADOOP-13050-branch-2-003.patch, HADOOP-13050-branch-2-004.patch, 
> HADOOP-13050-branch-2.002.patch, HADOOP-13050-branch-2.003.patch
>
>
> HADOOP-13044 highlights that AWS SDK 10.6 —shipping in Hadoop 2.7+, doesn't 
> work on open jdk >= 8u60, because a change in the JDK broke the version of 
> Joda time that AWS uses.
> Fix, update the JDK. Though, that implies updating http components: 
> HADOOP-12767.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-12718) Incorrect error message by fs -put local dir without permission

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-12718:
-
Release Note: 


The `hadoop fs -ls` command now prints "Permission denied" rather than "No such 
file or directory" when the user doesn't have permission to traverse the path.

> Incorrect error message by fs -put local dir without permission
> ---
>
> Key: HADOOP-12718
> URL: https://issues.apache.org/jira/browse/HADOOP-12718
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: John Zhuge
>Assignee: John Zhuge
>  Labels: supportability
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-12718.001.patch, HADOOP-12718.002.patch, 
> HADOOP-12718.003.patch, HADOOP-12718.004.patch, HADOOP-12718.005.patch, 
> HADOOP-12718.006.patch, HADOOP-12718.007.patch, HADOOP-12718.008.patch, 
> TestFsShellCopyPermission-output.001.txt, 
> TestFsShellCopyPermission-output.002.txt, TestFsShellCopyPermission.001.patch
>
>
> When the user doesn't have access permission to the local directory, the 
> "hadoop fs -put" command prints a confusing error message "No such file or 
> directory".
> {noformat}
> $ whoami
> systest
> $ cd /home/systest
> $ ls -ld .
> drwx--. 4 systest systest 4096 Jan 13 14:21 .
> $ mkdir d1
> $ sudo -u hdfs hadoop fs -put d1 /tmp
> put: `d1': No such file or directory
> {noformat}
> It will be more informative if the message is:
> {noformat}
> put: d1 (Permission denied)
> {noformat}
> If the source is a local file, the error message is ok:
> {noformat}
> put: f1 (Permission denied)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Status: Patch Available  (was: Open)

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.3.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.3.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Attachment: HADOOP-distcp-multithreaded-mapper-trunk.3.patch

Fixed checkstyle issues.

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.3.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.3.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Status: Open  (was: Patch Available)

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.3.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.3.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-11804) Shaded Hadoop client artifacts and minicluster

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-11804:
-
Release Note: 


The `hadoop-client` Maven artifact available in 2.x releases pulls
Hadoop's transitive dependencies onto a Hadoop application's classpath.
This can be problematic if the versions of these transitive dependencies
conflict with the versions used by the application.

[HADOOP-11804](https://issues.apache.org/jira/browse/HADOOP-11804) adds
new `hadoop-client-api` and `hadoop-client-runtime` artifacts that
shade Hadoop's dependencies into a single jar. This avoids leaking
Hadoop's dependencies onto the application's classpath.

Adding a release note so this pops up in alpha2. Sean, please feel free to 
update with any corrections.

> Shaded Hadoop client artifacts and minicluster
> --
>
> Key: HADOOP-11804
> URL: https://issues.apache.org/jira/browse/HADOOP-11804
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-11804.1.patch, HADOOP-11804.10.patch, 
> HADOOP-11804.11.patch, HADOOP-11804.12.patch, HADOOP-11804.13.patch, 
> HADOOP-11804.14.patch, HADOOP-11804.2.patch, HADOOP-11804.3.patch, 
> HADOOP-11804.4.patch, HADOOP-11804.5.patch, HADOOP-11804.6.patch, 
> HADOOP-11804.7.patch, HADOOP-11804.8.patch, HADOOP-11804.9.patch, 
> hadoop-11804-client-test.tar.gz
>
>
> make a hadoop-client-api and hadoop-client-runtime that i.e. HBase can use to 
> talk with a Hadoop cluster without seeing any of the implementation 
> dependencies.
> see proposal on parent for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Attachment: HADOOP-distcp-multithreaded-mapper-branch26.3.patch

Fixed checkstyle issues.

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.3.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13977) IntelliJ Compilation error in ITUseMiniCluster.java

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-13977:
-
Affects Version/s: 3.0.0-alpha2

> IntelliJ Compilation error in ITUseMiniCluster.java
> ---
>
> Key: HADOOP-13977
> URL: https://issues.apache.org/jira/browse/HADOOP-13977
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Miklos Szegedi
>Assignee: Sean Busbey
>
> The repro steps:
> mvn clean install -DskipTests and then "Build/Build Project" in IntelliJ IDEA 
> to update indexes, etc.
> ...hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java
> Error:(34, 28) java: package org.apache.hadoop.fs does not exist
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Status: Patch Available  (was: Open)

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13961) Fix compilation failure from missing hadoop-kms test jar

2017-01-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819631#comment-15819631
 ] 

Hudson commented on HADOOP-13961:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #0 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/0/])
HADOOP-13961. Fix compilation failure from missing hadoop-kms test jar. (wang: 
rev 5f336512d08c0fb74e0404fcde1288926eeba06d)
* (edit) hadoop-common-project/hadoop-kms/pom.xml
* (edit) hadoop-hdfs-project/hadoop-hdfs/pom.xml
* (edit) hadoop-project/pom.xml


> Fix compilation failure from missing hadoop-kms test jar
> 
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: Andrew Wang
>Priority: Blocker
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13961.001.patch, HADOOP-13961.002.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Attachment: HADOOP-distcp-multithreaded-mapper-branch26.2.patch

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Status: Open  (was: Patch Available)

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Attachment: HADOOP-distcp-multithreaded-mapper-trunk.2.patch

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-branch26.2.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.2.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13961) Fix compilation failure from missing hadoop-kms test jar

2017-01-11 Thread John Zhuge (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819594#comment-15819594
 ] 

John Zhuge commented on HADOOP-13961:
-

Thanks [~andrew.wang], [~sjlee0], [~aw], and [~kasha] for reporting the issue.

> Fix compilation failure from missing hadoop-kms test jar
> 
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: Andrew Wang
>Priority: Blocker
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13961.001.patch, HADOOP-13961.002.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13961) Fix compilation failure from missing hadoop-kms test jar

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-13961:
-
   Resolution: Fixed
Fix Version/s: 3.0.0-alpha2
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the fast action here John and Sangjin!

> Fix compilation failure from missing hadoop-kms test jar
> 
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: Andrew Wang
>Priority: Blocker
> Fix For: 3.0.0-alpha2
>
> Attachments: HADOOP-13961.001.patch, HADOOP-13961.002.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819572#comment-15819572
 ] 

Hadoop QA commented on HADOOP-13975:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 
5 new + 103 unchanged - 1 fixed = 108 total (was 104) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
33s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13975 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12847103/HADOOP-distcp-multithreaded-mapper-trunk.1.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 60f01ed0a8db 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d51f8ba |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11422/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11422/testReport/ |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11422/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue

[jira] [Updated] (HADOOP-13961) Fix compilation failure from missing hadoop-kms test jar

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-13961:
-
Summary: Fix compilation failure from missing hadoop-kms test jar  (was: 
Fix compilation failure)

> Fix compilation failure from missing hadoop-kms test jar
> 
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: HADOOP-13961.001.patch, HADOOP-13961.002.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13961) mvn install fails on trunk

2017-01-11 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819571#comment-15819571
 ] 

Andrew Wang commented on HADOOP-13961:
--

+1 LGTM, did some local testing too. Will commit shortly.

> mvn install fails on trunk
> --
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: Sangjin Lee
>Priority: Blocker
> Attachments: HADOOP-13961.001.patch, HADOOP-13961.002.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-13961) mvn install fails on trunk

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang reassigned HADOOP-13961:


Assignee: Andrew Wang  (was: Sangjin Lee)

> mvn install fails on trunk
> --
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: HADOOP-13961.001.patch, HADOOP-13961.002.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13961) Fix compilation failure

2017-01-11 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HADOOP-13961:
-
Summary: Fix compilation failure  (was: mvn install fails on trunk)

> Fix compilation failure
> ---
>
> Key: HADOOP-13961
> URL: https://issues.apache.org/jira/browse/HADOOP-13961
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Karthik Kambatla
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: HADOOP-13961.001.patch, HADOOP-13961.002.patch
>
>
> mvn install fails for me on trunk on a new environment with the following:
> {noformat}
> [ERROR] Failed to execute goal on project hadoop-hdfs: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-hdfs:jar:3.0.0-alpha2-SNAPSHOT: Could not find 
> artifact 
> org.apache.hadoop:hadoop-kms:jar:classes:3.0.0-alpha2-20161228.102554-925 in 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
> {noformat}
> This works on an existing dev setup, likely because I have the jar in my m2 
> cache. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13977) IntelliJ Compilation error in ITUseMiniCluster.java

2017-01-11 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819533#comment-15819533
 ] 

Miklos Szegedi commented on HADOOP-13977:
-

I checked the root cause. Building in IntelliJ worked before

> IntelliJ Compilation error in ITUseMiniCluster.java
> ---
>
> Key: HADOOP-13977
> URL: https://issues.apache.org/jira/browse/HADOOP-13977
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Sean Busbey
>
> The repro steps:
> mvn clean install -DskipTests and then "Build/Build Project" in IntelliJ IDEA 
> to update indexes, etc.
> ...hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java
> Error:(34, 28) java: package org.apache.hadoop.fs does not exist
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (HADOOP-13977) IntelliJ Compilation error in ITUseMiniCluster.java

2017-01-11 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated HADOOP-13977:

Comment: was deleted

(was: I verified and this was introduced in HADOOP-11656.)

> IntelliJ Compilation error in ITUseMiniCluster.java
> ---
>
> Key: HADOOP-13977
> URL: https://issues.apache.org/jira/browse/HADOOP-13977
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Sean Busbey
>
> The repro steps:
> mvn clean install -DskipTests and then "Build/Build Project" in IntelliJ IDEA 
> to update indexes, etc.
> ...hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java
> Error:(34, 28) java: package org.apache.hadoop.fs does not exist
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13805) UGI.getCurrentUser() fails if user does not have a keytab associated

2017-01-11 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819530#comment-15819530
 ] 

Wei-Chiu Chuang commented on HADOOP-13805:
--

[~tucu00] I am not expert in UGI and user authentication. But looking at this 
jira and the patch, I think you are right. UGI should not use isKeytab to 
determine if it should renew.

> UGI.getCurrentUser() fails if user does not have a keytab associated
> 
>
> Key: HADOOP-13805
> URL: https://issues.apache.org/jira/browse/HADOOP-13805
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0, 2.9.0, 3.0.0-alpha2
>Reporter: Alejandro Abdelnur
>Assignee: Xiao Chen
> Attachments: HADOOP-13805.01.patch, HADOOP-13805.02.patch, 
> HADOOP-13805.03.patch, HADOOP-13805.04.patch, HADOOP-13805.05.patch
>
>
> HADOOP-13558 intention was to avoid UGI from trying to renew the TGT when the 
> UGI is created from an existing Subject as in that case the keytab is not 
> 'own' by UGI but by the creator of the Subject.
> In HADOOP-13558 we introduced a new private UGI constructor 
> {{UserGroupInformation(Subject subject, final boolean externalKeyTab)}} and 
> we use with TRUE only when doing a {{UGI.loginUserFromSubject()}}.
> The problem is, when we call {{UGI.getCurrentUser()}}, and UGI was created 
> via a Subject (via the {{UGI.loginUserFromSubject()}} method), we call {{new 
> UserGroupInformation(subject)}} which will delegate to 
> {{UserGroupInformation(Subject subject, final boolean externalKeyTab)}}  and 
> that will use externalKeyTab == *FALSE*. 
> Then the UGI returned by {{UGI.getCurrentUser()}} will attempt to login using 
> a non-existing keytab if the TGT expired.
> This problem is experienced in {{KMSClientProvider}} when used by the HDFS 
> filesystem client accessing an an encryption zone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13977) IntelliJ Compilation error in ITUseMiniCluster.java

2017-01-11 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819529#comment-15819529
 ] 

Miklos Szegedi commented on HADOOP-13977:
-

I verified and this was introduced in HADOOP-11656.

> IntelliJ Compilation error in ITUseMiniCluster.java
> ---
>
> Key: HADOOP-13977
> URL: https://issues.apache.org/jira/browse/HADOOP-13977
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Sean Busbey
>
> The repro steps:
> mvn clean install -DskipTests and then "Build/Build Project" in IntelliJ IDEA 
> to update indexes, etc.
> ...hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java
> Error:(34, 28) java: package org.apache.hadoop.fs does not exist
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13977) IntelliJ Compilation error in ITUseMiniCluster.java

2017-01-11 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created HADOOP-13977:
---

 Summary: IntelliJ Compilation error in ITUseMiniCluster.java
 Key: HADOOP-13977
 URL: https://issues.apache.org/jira/browse/HADOOP-13977
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Sean Busbey


The repro steps:
mvn clean install -DskipTests and then "Build/Build Project" in IntelliJ IDEA 
to update indexes, etc.
...hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java
Error:(34, 28) java: package org.apache.hadoop.fs does not exist
...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819514#comment-15819514
 ] 

Zheng Shao edited comment on HADOOP-13975 at 1/11/17 11:26 PM:
---

Example usage:

bin/hadoop distcp -Dmapreduce.job.user.classpath.first=true -prbugp -m 8 
-numThreadsPerMap 16 hdfs://sourcehdfs/srcdir hdfs://targethdfs/

This uses 8 mapper, each of which with 16 threads, to do distcp.  It's 
equivalent of running 128 mappers except this is more efficient (as long as we 
don't hit the resource bottleneck on a single machine).

Note that "Dmapreduce.job.user.classpath.first=true" is needed if you only 
update the client-side hadoop-tools jar but not the server (YARN nodemanager) 
side yet.



was (Author: zshao):
Example usage:

bin/hadoop distcp -Dmapreduce.job.user.classpath.first=true -prbugp -m 2 
-numThreadsPerMap 16 hdfs://sourcehdfs/srcdir hdfs://targethdfs/

Note that "Dmapreduce.job.user.classpath.first=true" is needed if you only 
update the client-side hadoop-tools jar but not the server (YARN nodemanager) 
side yet.


> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819514#comment-15819514
 ] 

Zheng Shao commented on HADOOP-13975:
-

Example usage:

bin/hadoop distcp -Dmapreduce.job.user.classpath.first=true -prbugp -m 2 
-numThreadsPerMap 16 hdfs://sourcehdfs/srcdir hdfs://targethdfs/

Note that "Dmapreduce.job.user.classpath.first=true" is needed if you only 
update the client-side hadoop-tools jar but not the server (YARN nodemanager) 
side yet.


> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Status: Patch Available  (was: In Progress)

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Attachment: HADOOP-distcp-multithreaded-mapper-trunk.1.patch

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, 
> HADOOP-distcp-multithreaded-mapper-trunk.1.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HADOOP-13975:

Attachment: HADOOP-distcp-multithreaded-mapper-branch26.1.patch

> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
> Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-13975 started by Zheng Shao.
---
> Allow DistCp to use MultiThreadedMapper
> ---
>
> Key: HADOOP-13975
> URL: https://issues.apache.org/jira/browse/HADOOP-13975
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools/distcp
>Affects Versions: 3.0.0-alpha1
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>Priority: Minor
>
> Although distcp allow users to control the parallelism via number of mappers, 
> sometimes it's desirable to run fewer mappers but more threads per mapper.  
> Since distcp is network bound (either by throughput or more frequently by 
> latency of creating connections, opening files, reading/writing files, and 
> closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and 
> connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13976) Path globbing does not match newlines

2017-01-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819460#comment-15819460
 ] 

Hadoop QA commented on HADOOP-13976:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
38s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13976 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12847089/HADOOP-13976.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a739462a540c 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / e648b6e |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11421/testReport/ |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11421/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Path globbing does not match newlines
> -
>
> Key: HADOOP-13976
> URL: https://issues.apache.org/jira/browse/HADOOP-13976
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HADOOP-13976.001.patch
>
>
> Need to add the DOTALL flag to allow for newlines to be accepted as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail:

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-11 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819365#comment-15819365
 ] 

Koji Noguchi commented on HADOOP-13114:
---

bq. Could you please elucidate your concern if its not that?

My point is, this command won't be useful unless the compressed outputs are 
directly readable by hadoop jobs.
Avro, Orc, RCFile, SequenceFile etc and other common file formats all have 
their own ways of compressing and simply gzip/bzip-ing the entire files won't 
do any good.
Worse, I don't think the patch provides a way to uncompress them back.

bq.  but that means we'd make assumptions about Hadoop's use cases

And I'd say you're assuming users would only call this distcp+compress on text 
files only.
Files with other fileformat would become unreadable (until uncompressed back).


I agree with Nathan on the naming. If the command is called 
{{dist-text-compress}}, then I'll have no concerns.

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13976) Path globbing does not match newlines

2017-01-11 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HADOOP-13976:
-
Attachment: HADOOP-13976.001.patch

> Path globbing does not match newlines
> -
>
> Key: HADOOP-13976
> URL: https://issues.apache.org/jira/browse/HADOOP-13976
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HADOOP-13976.001.patch
>
>
> Need to add the DOTALL flag to allow for newlines to be accepted as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13976) Path globbing does not match newlines

2017-01-11 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated HADOOP-13976:
-
Status: Patch Available  (was: Open)

> Path globbing does not match newlines
> -
>
> Key: HADOOP-13976
> URL: https://issues.apache.org/jira/browse/HADOOP-13976
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HADOOP-13976.001.patch
>
>
> Need to add the DOTALL flag to allow for newlines to be accepted as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13976) Path globbing does not match newlines

2017-01-11 Thread Eric Badger (JIRA)

Eric Badger created HADOOP-13976:


 Summary: Path globbing does not match newlines
 Key: HADOOP-13976
 URL: https://issues.apache.org/jira/browse/HADOOP-13976
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


Need to add the DOTALL flag to allow for newlines to be accepted as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

2017-01-11 Thread Zheng Shao (JIRA)

Zheng Shao created HADOOP-13975:
---

 Summary: Allow DistCp to use MultiThreadedMapper
 Key: HADOOP-13975
 URL: https://issues.apache.org/jira/browse/HADOOP-13975
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools/distcp
Affects Versions: 3.0.0-alpha1
Reporter: Zheng Shao
Assignee: Zheng Shao
Priority: Minor


Although distcp allow users to control the parallelism via number of mappers, 
sometimes it's desirable to run fewer mappers but more threads per mapper.  
Since distcp is network bound (either by throughput or more frequently by 
latency of creating connections, opening files, reading/writing files, and 
closing files), this can make each mapper much more efficient.

In that way, a lot of resources can be shared so we can save memory and 
connections to NameNode.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-13956) Read ADLS credentials from Credential Provider

2017-01-11 Thread John Zhuge (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-13956 started by John Zhuge.
---
> Read ADLS credentials from Credential Provider
> --
>
> Key: HADOOP-13956
> URL: https://issues.apache.org/jira/browse/HADOOP-13956
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Critical
>
> Read ADLS credentials using Hadoop CredentialProvider API. See 
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-11 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818945#comment-15818945
 ] 

Chris Nauroth commented on HADOOP-13589:


Reviewing patch 004, we still need to update {{NullMetadataStore#toString}} to 
avoid the unnecessary {{StringBuilder}}.  Also, some of the contract subclasses 
are now instantiating their own {{Configuration}} while others are calling 
{{super.createConfiguration()}} and then overriding specific properties.  I 
think the latter is better so that these suites would inherit changes in the 
configuration logic from the base class.

This definitely works though for running the contract tests with S3Guard 
enabled and integrated with DynamoDB.  I did a test run both with and without 
the new profiles.

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch, 
> HADOOP-13589-HADOOP-13345-004.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration

2017-01-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818916#comment-15818916
 ] 

Hudson commented on HADOOP-13336:
-

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11107 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11107/])
HADOOP-13336 S3A to support per-bucket configuration. Contributed by (stevel: 
rev e648b6e1382336af69434dfbf9161bced3caa244)
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java
* (edit) hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AConfiguration.java
* (edit) 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/S3AScaleTestBase.java
* (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAWSCredentialsProvider.java
* (edit) 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/ITestS3AInputStreamPerformance.java


> S3A to support per-bucket configuration
> ---
>
> Key: HADOOP-13336
> URL: https://issues.apache.org/jira/browse/HADOOP-13336
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13336-006.patch, HADOOP-13336-007.patch, 
> HADOOP-13336-010.patch, HADOOP-13336-011.patch, 
> HADOOP-13336-HADOOP-13345-001.patch, HADOOP-13336-HADOOP-13345-002.patch, 
> HADOOP-13336-HADOOP-13345-003.patch, HADOOP-13336-HADOOP-13345-004.patch, 
> HADOOP-13336-HADOOP-13345-005.patch, HADOOP-13336-HADOOP-13345-006.patch, 
> HADOOP-13336-HADOOP-13345-008.patch, HADOOP-13336-HADOOP-13345-009.patch, 
> HADOOP-13336-HADOOP-13345-010.patch
>
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13336) S3A to support per-bucket configuration

2017-01-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818706#comment-15818706
 ] 

Hadoop QA commented on HADOOP-13336:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} root: The patch generated 0 new + 30 unchanged - 1 
fixed = 30 total (was 31) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
14s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13336 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846841/HADOOP-13336-011.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux c39f99865c8d 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a5ec1e3 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11419/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11419/testReport/ |
| modules | C: hadoop-common-project/hadoop-common

[jira] [Commented] (HADOOP-13912) S3a Multipart Committer (avoid rename)

2017-01-11 Thread chao Guan (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818702#comment-15818702
 ] 

chao Guan commented on HADOOP-13912:


Do you mean HADOOP-13786?

> S3a Multipart Committer (avoid rename)
> --
>
> Key: HADOOP-13912
> URL: https://issues.apache.org/jira/browse/HADOOP-13912
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Thomas Demoor
>Assignee: Thomas Demoor
>
> Object stores do not have an efficient rename operation, which is used by the 
> Hadoop FileOutputCommitter to atomically promote the "winning" attempt out of 
> the multiple (speculative) attempts to the final path. These slow job commits 
> are one of the main friction points when using object stores in Hadoop.There 
> have been quite some attempts at resolving this: HADOOP-9565, Apache Spark 
> DirectOutputCommitters, ... but they have proven not to be robust in face of 
> adversity (network partitions, ...).
> The current ticket proposes to do the atomic commit by using the S3 Multipart 
> API, which allows multiple concurrent uploads on the same objectname, each in 
> its own "temporary space, identified by the UploadId which is returned as a 
> response to InitiateMultipartUpload. Every attempt writes directly to the 
> final {{outputPath}}. Data is uploaded using Put Part and as a response an 
> ETag for the part is returned and stored. The CompleteMultipartUpload is 
> postponed. Instead, we persist the UploadId (using a _temporary subdir or 
> elsewhere) and the ETags. When a certain "job" wins 
> {{CompleteMultipartUpload}} is called for each of its files using the proper 
> list of Part ETags. 
> Completing a MultipartUpload is a metadata only operation (internally in S3) 
> and is thus orders of magnitude faster than the rename-based approach which 
> moves all the data. 
> Required work: 
> * Expose the multipart initiate and complete calls in S3AOutputStream to 
> S3AFilesystem 
> * Use these multipart calls in a custom committer as described above. I 
> propose to build on the S3ACommitter [~ste...@apache.org] is doing for 
> HADOOP-13786



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818690#comment-15818690
 ] 

Hadoop QA commented on HADOOP-13589:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 13 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 9s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
34s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13589 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846851/HADOOP-13589-HADOOP-13345-004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux ca9ff1c91f91 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 
20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HADOOP-13345 / a5cc315 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11420/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11420/testReport/ |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11420/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL:

[jira] [Commented] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818662#comment-15818662
 ] 

Steve Loughran commented on HADOOP-13589:
-

tests completed against s3 ireland, with params {{-Ds3guard -Ddynamo 
-Dparallel-tests -DtestsThreadCount=6}}.

I suspect for all s3guard patches we should think about declaring which options 
were used, to just check completeness of the option space

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch, 
> HADOOP-13589-HADOOP-13345-004.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-11 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13589:

Status: Patch Available  (was: Open)

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch, 
> HADOOP-13589-HADOOP-13345-004.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-11 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13589:

Attachment: HADOOP-13589-HADOOP-13345-004.patch

Patch 004: setup s3guard for all FS contract tests

Mostly this was just paste in the method from AbstractS3ATestCase; a few tests 
were already setting up the config; for these I inserted the s3guard setup in 
the appropriate place

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch, 
> HADOOP-13589-HADOOP-13345-004.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13972) Access multiple Azure Data Lake stores with different SPIs

2017-01-11 Thread John Zhuge (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818555#comment-15818555
 ] 

John Zhuge commented on HADOOP-13972:
-

Sounds good [~ste...@apache.org]. I am all for code reuse and consistency.

> Access multiple Azure Data Lake stores with different SPIs
> --
>
> Key: HADOOP-13972
> URL: https://issues.apache.org/jira/browse/HADOOP-13972
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>
> Useful when distcp needs to access 2 Data Lake stores with different SPIs.
> Of course, a workaround is to grant the same SPI access permission to both 
> stores, but sometimes it might not be feasible.
> One idea is to embed the store name in the configuration property names, 
> e.g., {{dfs.adls.oauth2..client.id}}. Per-store keys will be consulted 
> first, then fall back to the global keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13974) S3a CLI to support list/purge of pending multipart commits

2017-01-11 Thread Steve Loughran (JIRA)

Steve Loughran created HADOOP-13974:
---

 Summary: S3a CLI to support list/purge of pending multipart commits
 Key: HADOOP-13974
 URL: https://issues.apache.org/jira/browse/HADOOP-13974
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: HADOOP-13345
Reporter: Steve Loughran


The S3A CLI will need to be able to list and delete pending multipart commits. 

We can do the cleanup already via fs.s3a properties. The CLI will let scripts 
stat for outstanding data (have a different exit code) and permit batch jobs to 
explicitly trigger cleanups.

This will become critical with the multipart committer, as there's a 
significantly higher likelihood of commits remaining outstanding.

We may also want to be able to enumerate/cancel all pending commits in the FS 
tree



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818513#comment-15818513
 ] 

Steve Loughran commented on HADOOP-13786:
-

Final committer will need to handle speculative commits to same destination 
path.

# individual executors will need to write to job-specific subdir of a 
{{.temp_multipart_put}} dir
# central job committer to only commit work of tasks known to have completed
# ...and for those that did not complete: rm the pending multipart blocks by 
reading etags and deleting the pending blocks
# ideally: track destination filenames and catch conflict with >1 task 
generating same dest file.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13336) S3A to support per-bucket configuration

2017-01-11 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13336:

Status: Patch Available  (was: Open)

> S3A to support per-bucket configuration
> ---
>
> Key: HADOOP-13336
> URL: https://issues.apache.org/jira/browse/HADOOP-13336
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13336-006.patch, HADOOP-13336-007.patch, 
> HADOOP-13336-010.patch, HADOOP-13336-011.patch, 
> HADOOP-13336-HADOOP-13345-001.patch, HADOOP-13336-HADOOP-13345-002.patch, 
> HADOOP-13336-HADOOP-13345-003.patch, HADOOP-13336-HADOOP-13345-004.patch, 
> HADOOP-13336-HADOOP-13345-005.patch, HADOOP-13336-HADOOP-13345-006.patch, 
> HADOOP-13336-HADOOP-13345-008.patch, HADOOP-13336-HADOOP-13345-009.patch, 
> HADOOP-13336-HADOOP-13345-010.patch
>
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13336) S3A to support per-bucket configuration

2017-01-11 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13336:

Attachment: HADOOP-13336-011.patch

Turns out that the patch doesn't work in branch-2.8, as 
HADOOP_SECURITY_CREDENTIAL_PROVIDER_PATH isn't in CommonConfigurationKeysPublic.

Patch 011 inlines the property to S3AUtils, makes package private for testing. 
Although it's only needed for Branch-2.8, I'm going to apply it everywhere for 
consistency

> S3A to support per-bucket configuration
> ---
>
> Key: HADOOP-13336
> URL: https://issues.apache.org/jira/browse/HADOOP-13336
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13336-006.patch, HADOOP-13336-007.patch, 
> HADOOP-13336-010.patch, HADOOP-13336-011.patch, 
> HADOOP-13336-HADOOP-13345-001.patch, HADOOP-13336-HADOOP-13345-002.patch, 
> HADOOP-13336-HADOOP-13345-003.patch, HADOOP-13336-HADOOP-13345-004.patch, 
> HADOOP-13336-HADOOP-13345-005.patch, HADOOP-13336-HADOOP-13345-006.patch, 
> HADOOP-13336-HADOOP-13345-008.patch, HADOOP-13336-HADOOP-13345-009.patch, 
> HADOOP-13336-HADOOP-13345-010.patch
>
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13336) S3A to support per-bucket configuration

2017-01-11 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13336:

Status: Open  (was: Patch Available)

> S3A to support per-bucket configuration
> ---
>
> Key: HADOOP-13336
> URL: https://issues.apache.org/jira/browse/HADOOP-13336
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13336-006.patch, HADOOP-13336-007.patch, 
> HADOOP-13336-010.patch, HADOOP-13336-HADOOP-13345-001.patch, 
> HADOOP-13336-HADOOP-13345-002.patch, HADOOP-13336-HADOOP-13345-003.patch, 
> HADOOP-13336-HADOOP-13345-004.patch, HADOOP-13336-HADOOP-13345-005.patch, 
> HADOOP-13336-HADOOP-13345-006.patch, HADOOP-13336-HADOOP-13345-008.patch, 
> HADOOP-13336-HADOOP-13345-009.patch, HADOOP-13336-HADOOP-13345-010.patch
>
>
> S3a now supports different regions, by way of declaring the endpoint —but you 
> can't do things like read in one region, write back in another (e.g. a distcp 
> backup), because only one region can be specified in a configuration.
> If s3a supported region declaration in the URL, e.g. s3a://b1.frankfurt 
> s3a://b2.seol , then this would be possible. 
> Swift does this with a full filesystem binding/config: endpoints, username, 
> etc, in the XML file. Would we need to do that much? It'd be simpler 
> initially to use a domain suffix of a URL to set the region of a bucket from 
> the domain and have the aws library sort the details out itself, maybe with 
> some config options for working with non-AWS infra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-13912) S3a Multipart Committer (avoid rename)

2017-01-11 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13912.
-
Resolution: Duplicate

closing as duplicate of HADOOP-1786; adding subjiras there

> S3a Multipart Committer (avoid rename)
> --
>
> Key: HADOOP-13912
> URL: https://issues.apache.org/jira/browse/HADOOP-13912
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Thomas Demoor
>Assignee: Thomas Demoor
>
> Object stores do not have an efficient rename operation, which is used by the 
> Hadoop FileOutputCommitter to atomically promote the "winning" attempt out of 
> the multiple (speculative) attempts to the final path. These slow job commits 
> are one of the main friction points when using object stores in Hadoop.There 
> have been quite some attempts at resolving this: HADOOP-9565, Apache Spark 
> DirectOutputCommitters, ... but they have proven not to be robust in face of 
> adversity (network partitions, ...).
> The current ticket proposes to do the atomic commit by using the S3 Multipart 
> API, which allows multiple concurrent uploads on the same objectname, each in 
> its own "temporary space, identified by the UploadId which is returned as a 
> response to InitiateMultipartUpload. Every attempt writes directly to the 
> final {{outputPath}}. Data is uploaded using Put Part and as a response an 
> ETag for the part is returned and stored. The CompleteMultipartUpload is 
> postponed. Instead, we persist the UploadId (using a _temporary subdir or 
> elsewhere) and the ETags. When a certain "job" wins 
> {{CompleteMultipartUpload}} is called for each of its files using the proper 
> list of Part ETags. 
> Completing a MultipartUpload is a metadata only operation (internally in S3) 
> and is thus orders of magnitude faster than the rename-based approach which 
> moves all the data. 
> Required work: 
> * Expose the multipart initiate and complete calls in S3AOutputStream to 
> S3AFilesystem 
> * Use these multipart calls in a custom committer as described above. I 
> propose to build on the S3ACommitter [~ste...@apache.org] is doing for 
> HADOOP-13786



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13786) Add S3Guard committer for zero-rename commits to consistent S3 endpoints

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818440#comment-15818440
 ] 

Steve Loughran commented on HADOOP-13786:
-

Multipart algorithm summarised by Thomas Demoor on HADOOP-13912

bq. Object stores do not have an efficient rename operation, which is used by 
the Hadoop FileOutputCommitter to atomically promote the "winning" attempt out 
of the multiple (speculative) attempts to the final path. These slow job 
commits are one of the main friction points when using object stores in 
Hadoop.There have been quite some attempts at resolving this: HADOOP-9565Link, 
Apache Spark DirectOutputCommitters, ... but they have proven not to be robust 
in face of adversity (network partitions, ...).
bq. The current ticket proposes to do the atomic commit by using the S3 
Multipart API, which allows multiple concurrent uploads on the same objectname, 
each in its own "temporary space, identified by the UploadId which is returned 
as a response to InitiateMultipartUpload. Every attempt writes directly to the 
final outputPath. Data is uploaded using Put Part and as a response an ETag for 
the part is returned and stored. The CompleteMultipartUpload is postponed. 
Instead, we persist the UploadId (using a _temporary subdir or elsewhere) and 
the ETags. When a certain "job" wins CompleteMultipartUpload is called for each 
of its files using the proper list of Part ETags.
bq. Completing a MultipartUpload is a metadata only operation (internally in 
S3) and is thus orders of magnitude faster than the rename-based approach which 
moves all the data.

> Add S3Guard committer for zero-rename commits to consistent S3 endpoints
> 
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13786-HADOOP-13345-001.patch
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13972) Access multiple Azure Data Lake stores with different SPIs

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818304#comment-15818304
 ] 

Steve Loughran commented on HADOOP-13972:
-

I propose you follow the pattern of HADOOP-13336, where every property 
{{fs.s3a.bucket.$BUCKET.field1.field2=value}} is patched in to 
{{fs.s3a.field1.field2=}} for the FS named $BUCKET. This allows for every FS 
property to be overridden, rather than per-store keys. 

# This structure follows that of the hdfs HA nn property settings.
# It's easy to add; you don't need to implement the complex "read one then 
fallback" strategy, just do a patch in FileSystem.init().

I'm about the put the patch into branch-2.8+ with the core code in S3AUtils. 
Once it's in, you could pull it up into hadoop common and make more generic, 
that way we'd have a consistent implementation across filesystems.

> Access multiple Azure Data Lake stores with different SPIs
> --
>
> Key: HADOOP-13972
> URL: https://issues.apache.org/jira/browse/HADOOP-13972
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>
> Useful when distcp needs to access 2 Data Lake stores with different SPIs.
> Of course, a workaround is to grant the same SPI access permission to both 
> stores, but sometimes it might not be feasible.
> One idea is to embed the store name in the configuration property names, 
> e.g., {{dfs.adls.oauth2..client.id}}. Per-store keys will be consulted 
> first, then fall back to the global keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818289#comment-15818289
 ] 

Steve Loughran commented on HADOOP-13589:
-

I thought I'd done that but you are right, I only do it for the abstract s3a 
ones.

now, if we went for java8+ code I could play games with methods in interfaces 
as mixis; for now I'll have to copy and past.

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13589) S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.

2017-01-11 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13589:

Status: Open  (was: Patch Available)

> S3Guard: Allow execution of all S3A integration tests with S3Guard enabled.
> ---
>
> Key: HADOOP-13589
> URL: https://issues.apache.org/jira/browse/HADOOP-13589
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Chris Nauroth
>Assignee: Steve Loughran
> Attachments: HADOOP-13589-HADOOP-13345-001.patch, 
> HADOOP-13589-HADOOP-13345-002.patch, HADOOP-13589-HADOOP-13345-002.patch
>
>
> With S3Guard enabled, S3A must continue to be functionally correct.  The best 
> way to verify this is to execute our existing S3A integration tests in a mode 
> with S3A enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13876) S3Guard: better support for multi-bucket access including read-only

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818281#comment-15818281
 ] 

Steve Loughran commented on HADOOP-13876:
-

we can just add it to core-site.xml and have it happen to every test. That's 
essentially what I've done in my auth-keys, as well as declaring it uses the 
us-east endpoint

> S3Guard: better support for multi-bucket access including read-only
> ---
>
> Key: HADOOP-13876
> URL: https://issues.apache.org/jira/browse/HADOOP-13876
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Aaron Fabbri
>Assignee: Mingliang Liu
> Attachments: HADOOP-13876-HADOOP-13345.000.patch
>
>
> HADOOP-13449 adds support for DynamoDBMetadataStore.
> The code currently supports two options for choosing DynamoDB table names:
> 1. Use name of each s3 bucket and auto-create a DynamoDB table for each.
> 2. Configure a table name in the {{fs.s3a.s3guard.ddb.table}} parameter.
> One of the issues is with accessing read-only buckets.  If a user accesses a 
> read-only bucket with credentials that do not have DynamoDB write 
> permissions, they will get errors when trying to access the read-only bucket. 
>  This manifests causes test failures for {{ITestS3AAWSCredentialsProvider}}.
> Goals for this JIRA:
> - Fix {{ITestS3AAWSCredentialsProvider}} in a way that makes sense for the 
> real use-case.
> - Allow for a "one DynamoDB table per cluster" configuration with a way to 
> chose which credentials are used for DynamoDB.
> - Document limitations etc. in the s3guard.md site doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13970) garbage data read from the beginning of a tar file

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818275#comment-15818275
 ] 

Steve Loughran commented on HADOOP-13970:
-

at a guess, there's no untarring taking place: the csv reader is treating the 
entire .tar file as a single file, so failing.

Looks like this is the solution: untar before passing in the files: 
http://stackoverflow.com/questions/38635905/reading-in-multiple-files-compressed-in-tar-gz-archive-into-spark

> garbage data read from the beginning of a tar file
> --
>
> Key: HADOOP-13970
> URL: https://issues.apache.org/jira/browse/HADOOP-13970
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 2.6.0
> Environment: Linux EL6
>Reporter: Steve Yang
> Attachments: taxi_simplified.tar
>
>
> Hadoop from CDH 5.7.1
> on Spark using databricks ('com.databricks:spark-csv_2.10:1.5.0') to read in 
> a tar file which consists of 3 .csv files. 
> sqlCtx.read().format("com.databricks.spark.csv").option(...)
> .load(objectName);
> The tar file contains 3 files:
> taxi_simplified1.csv
> taxi2.csv
> simplified3.csv
> where the first line (header) is:
> trip_distance,dropoff_datetime,dropoff_geocode,passenger_count,medallion,rate_code,tip_amount,total_amount,store_and_fwd_flag,mta_tax,pickup_geocode,trip_time_in_secs,surcharge,vendor_id,tolls_amount,fare_amount,pickup_datetime,hack_license,payment_type,ordertime
> Note the first column header is "trip_distance". But the read data shows:
> taxi_simplified1.csv^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@644^@0010013^@3001121^@0046004^@13002371150^@013521^@
>  
> 0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ustar
>   
> ^@optitest^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@trip_distance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13973) S3 requests failing: java.lang.IllegalStateException: Connection is not open

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818246#comment-15818246
 ] 

Steve Loughran commented on HADOOP-13973:
-

The problem arises in S3A reopen(), which attempts to open/reopen HTTP 
connection. We currently don't do any reties there.

Some hints
* 
http://stackoverflow.com/questions/29488106/java-lang-illegalstateexception-connection-pool-shut-down-while-using-spring-re
* 
http://stackoverflow.com/questions/25889925/apache-poolinghttpclientconnectionmanager-throwing-illegal-state-exception

the second hints that there's a fix in org.apache.httpcomponents:httpclient 
4.4+; though you need to make an extra method call in setup: I don't know if 
the AWS SDK does this


> S3 requests failing: java.lang.IllegalStateException: Connection is not open
> 
>
> Key: HADOOP-13973
> URL: https://issues.apache.org/jira/browse/HADOOP-13973
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: EC2 cluster
>Reporter: Rajesh Balamohan
>Assignee: Steve Loughran
> Fix For: 2.8.0
>
>
> S3 requests failing with an error coming from Http client, 
> "java.lang.IllegalStateException: Connection is not open"
> Some online discussion implies that this is related to shared connection pool 
> shutdown & fixed in http client 4.4+. Hadoop & AWS SDK use v 4.5.2 so the fix 
> is in, we just need to make sure the pool is being set up right.
> There's a problem here of course: it may require moving to a later version of 
> the AWS SDK, with the consequences on jackson , as seen in HADOOP-13050. 
> And that's if there is a patched version out there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13973) S3 requests failing: java.lang.IllegalStateException: Connection is not open

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818235#comment-15818235
 ] 

Steve Loughran commented on HADOOP-13973:
-

Stack 2: GET
{code}
 java.lang.IllegalStateException: Connection is not open
at org.apache.http.util.Asserts.check(Asserts.java:34)
at 
org.apache.http.impl.SocketHttpClientConnection.assertOpen(SocketHttpClientConnection.java:75)
at 
org.apache.http.impl.AbstractHttpClientConnection.sendRequestHeader(AbstractHttpClientConnection.java:253)
at 
org.apache.http.impl.conn.DefaultClientConnection.sendRequestHeader(DefaultClientConnection.java:278)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestHeader(ManagedClientConnectionImpl.java:223)
at 
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:204)
at 
com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:47)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:488)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at 
com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:728)
at 
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
at 
com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1191)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:148)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:281)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:364)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.readFully(S3AInputStream.java:579)
at 
org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:501)
at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:369)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:62)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.ReaderImpl.(ReaderImpl.java:32)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedOrcFile.createReader(EncodedOrcFile.java:28)
{code}

> S3 requests failing: java.lang.IllegalStateException: Connection is not open
> 
>
> Key: HADOOP-13973
> URL: https://issues.apache.org/jira/browse/HADOOP-13973
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: EC2 cluster
>Reporter: Rajesh Balamohan
>Assignee: Steve Loughran
> Fix For: 2.8.0
>
>
> S3 requests failing with an error coming from Http client, 
> "java.lang.IllegalStateException: Connection is not open"
> Some online discussion implies that this is related to shared connection pool 
> shutdown & fixed in http client 4.4+. Hadoop & AWS SDK use v 4.5.2 so the fix 
> is in, we just need to make sure the pool is being set up right.
> There's a problem here of course: it may require moving to a later version of 
> the AWS SDK, with the consequences on jackson , as seen in HADOOP-13050. 
> And that's if there is a patched version out there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13973) S3 requests failing: java.lang.IllegalStateException: Connection is not open

2017-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818232#comment-15818232
 ] 

Steve Loughran commented on HADOOP-13973:
-

Stack one, HEAD request
{code}
Caused by: java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at 
org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:184)
at 
org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:217)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:186)
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:72)
at com.amazonaws.http.conn.$Proxy51.requestConnection(Unknown Source)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:416)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at 
com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:728)
at 
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1050)
at 
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1027)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:830)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1473)
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:501)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:782)
{code}

> S3 requests failing: java.lang.IllegalStateException: Connection is not open
> 
>
> Key: HADOOP-13973
> URL: https://issues.apache.org/jira/browse/HADOOP-13973
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: EC2 cluster
>Reporter: Rajesh Balamohan
>Assignee: Steve Loughran
> Fix For: 2.8.0
>
>
> S3 requests failing with an error coming from Http client, 
> "java.lang.IllegalStateException: Connection is not open"
> Some online discussion implies that this is related to shared connection pool 
> shutdown & fixed in http client 4.4+. Hadoop & AWS SDK use v 4.5.2 so the fix 
> is in, we just need to make sure the pool is being set up right.
> There's a problem here of course: it may require moving to a later version of 
> the AWS SDK, with the consequences on jackson , as seen in HADOOP-13050. 
> And that's if there is a patched version out there



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 116 matches

Mail list logo