[jira] [Commented] (HADOOP-18761) Revert HADOOP-18535 because mysql-connector-java is GPL

2023-06-09 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730823#comment-17730823
 ] 

Owen O'Malley commented on HADOOP-18761:


We should not revert the HADOOP-18535 patch. Its use of mysql connector is 
allowed.

> Revert HADOOP-18535 because mysql-connector-java is GPL
> ---
>
> Key: HADOOP-18761
> URL: https://issues.apache.org/jira/browse/HADOOP-18761
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Wei-Chiu Chuang
>Priority: Blocker
>  Labels: pull-request-available
>
> While preparing for 3.3.6 RC, I realized the mysql-connector-java dependency 
> added by HADOOP-18535 is GPL licensed.
> Source: https://github.com/mysql/mysql-connector-j/blob/release/8.0/LICENSE 
> See legal discussion at LEGAL-423.
> I looked at the original jira and github PR and I don't think the license 
> issue was noticed. 
> Is it possible to get rid of the mysql connector dependency? As far as I can 
> tell the dependency is very limited.
> If not, I guess I'll have to revert the commits for now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18535) Implement token storage solution based on MySQL

2023-06-09 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730822#comment-17730822
 ] 

Owen O'Malley commented on HADOOP-18535:


This patch adds a "provided" dependence of the mysql connector, which is gpl 
licensed. This is allowed because this is an optional component of Hadoop and 
the user will need to install the mysql connector.

https://www.apache.org/legal/resolved.html#optional

> Implement token storage solution based on MySQL
> ---
>
> Key: HADOOP-18535
> URL: https://issues.apache.org/jira/browse/HADOOP-18535
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>
> Hadoop RBF supports custom implementations of secret managers. At the moment, 
> the only available implementation is ZKDelegationTokenSecretManagerImpl, 
> which stores tokens and delegation keys in Zookeeper.
> During our investigation, we found that the performance of routers is limited 
> by the writes to the Zookeeper token store, which impacts requests for token 
> creation, renewal and cancellation. An alternative secret manager 
> implementation has been created, based on MySQL, to handle a higher number of 
> writes.
> We measured the throughput of each token operation (create/renew/cancel) on 
> different setups and obtained the following results:
>  # Sending requests directly to Namenode (no RBF):
> Token creations: 290 reqs per sec
> Token renewals: 86 reqs per sec
> Token cancellations: 97 reqs per sec
>  # Sending requests to routers using Zookeeper based secret manager:
> Token creations: 31 reqs per sec
> Token renewals: 29 reqs per sec
> Token cancellations: 40 reqs per sec
>  # Sending requests to routers using SQL based secret manager:
> Token creations: 241 reqs per sec
> Token renewals: 103 reqs per sec
> Token cancellations: 114 reqs per sec
> We noticed a significant improvement when using a SQL secret manager, 
> comparable to the throughput offered by Namenodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18535) Implement token storage solution based on MySQL

2023-02-22 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18535.

Fix Version/s: 3.3.6
   3.4.0
   Resolution: Fixed

Thanks, Hector!

> Implement token storage solution based on MySQL
> ---
>
> Key: HADOOP-18535
> URL: https://issues.apache.org/jira/browse/HADOOP-18535
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.6, 3.4.0
>
>
> Hadoop RBF supports custom implementations of secret managers. At the moment, 
> the only available implementation is ZKDelegationTokenSecretManagerImpl, 
> which stores tokens and delegation keys in Zookeeper.
> During our investigation, we found that the performance of routers is limited 
> by the writes to the Zookeeper token store, which impacts requests for token 
> creation, renewal and cancellation. An alternative secret manager 
> implementation has been created, based on MySQL, to handle a higher number of 
> writes.
> We measured the throughput of each token operation (create/renew/cancel) on 
> different setups and obtained the following results:
>  # Sending requests directly to Namenode (no RBF):
> Token creations: 290 reqs per sec
> Token renewals: 86 reqs per sec
> Token cancellations: 97 reqs per sec
>  # Sending requests to routers using Zookeeper based secret manager:
> Token creations: 31 reqs per sec
> Token renewals: 29 reqs per sec
> Token cancellations: 40 reqs per sec
>  # Sending requests to routers using SQL based secret manager:
> Token creations: 241 reqs per sec
> Token renewals: 103 reqs per sec
> Token cancellations: 114 reqs per sec
> We noticed a significant improvement when using a SQL secret manager, 
> comparable to the throughput offered by Namenodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion

2022-11-18 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18324.

Fix Version/s: 3.4.0
   3.3.5
   2.10.3
   Resolution: Fixed

> Interrupting RPC Client calls can lead to thread exhaustion
> ---
>
> Key: HADOOP-18324
> URL: https://issues.apache.org/jira/browse/HADOOP-18324
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.4.0, 2.10.2, 3.3.3
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5, 2.10.3
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently the IPC client creates a boundless number of threads to write the 
> rpc request to the socket. The NameNode uses timeouts on its RPC calls to the 
> Journal Node and a stuck JN will cause the NN to create an infinite set of 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18444) Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash

2022-09-23 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18444.

Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> Add Support for localized trash for ViewFileSystem in 
> Trash.moveToAppropriateTrash
> --
>
> Key: HADOOP-18444
> URL: https://issues.apache.org/jira/browse/HADOOP-18444
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> Trash.moveToAppropriateTrash is used by _hadoop cli -rm_ and hive, to move 
> files to trash. However, its current implementation does not support 
> localized trash policy we added to ViewFileSystem in HADOOP-18144.
> The reason is in moveToAppropriateTrash, it first resolves a path and then 
> uses the resolvedFs, to initialize the trash. As a result, it uses 
> getTrashRoot() implementation from targetFs, not ViewFileSystem. The new 
> localized trash policy we implemented in ViewFileSystem is not invoked.
> With the new localized trash policy for ViewFileSystem, the trash root would 
> be local to a mount point, thus, for ViewFileSystem with this flag turned on, 
> there is no need to resolve the path in moveToAppropriateTrash. Rename in 
> ViewFileSystem can resolve the logical paths correctly and be able to move a 
> file to trash within a mount point. 
> Code section of current moveToAppropriateTrash implementation.
> {code:java}
> public static boolean moveToAppropriateTrash(FileSystem fs, Path p,
> Configuration conf) throws IOException {
>   Path fullyResolvedPath = fs.resolvePath(p);
>   FileSystem fullyResolvedFs =
>   FileSystem.get(fullyResolvedPath.toUri(), conf);
>   ...
>   Trash trash = new Trash(fullyResolvedFs, conf);
>   return trash.moveToTrash(fullyResolvedPath);
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18434) Proxy users do not share RPC connections

2022-08-31 Thread Owen O'Malley (Jira)
Owen O'Malley created HADOOP-18434:
--

 Summary: Proxy users do not share RPC connections
 Key: HADOOP-18434
 URL: https://issues.apache.org/jira/browse/HADOOP-18434
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


When the Hive MetaStore uses Storage-Based Authorization, it needs to perform 
checks against the NameNode as the query's user. Unfortunately, RPC's 
ConnectionId uses the UGI's equal & hash functions, which check for the 
subject's object equality.

 

Thus, we've seen the HMS spawn thousands of threads before they go idle and are 
eventually closed. If the peak goes over 10k threads the HMS becomes unstable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user

2022-08-24 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-13144:
---
Fix Version/s: 3.3.9

> Enhancing IPC client throughput via multiple connections per user
> -
>
> Key: HADOOP-13144
> URL: https://issues.apache.org/jira/browse/HADOOP-13144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Jason Kace
>Assignee: Íñigo Goiri
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
> Attachments: HADOOP-13144-performance.patch, HADOOP-13144.000.patch, 
> HADOOP-13144.001.patch, HADOOP-13144.002.patch, HADOOP-13144.003.patch, 
> HADOOP-13144_overload_enhancement.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single 
> connection thread for each {{ConnectionId}}.  The {{ConnectionId}} is unique 
> to the connection's remote address, ticket and protocol.  Each ConnectionId 
> is 1:1 mapped to a connection thread by the client via a map cache.
> The result is to serialize all IPC read/write activity through a single 
> thread for a each user/ticket + address.  If a single user makes repeated 
> calls (1k-100k/sec) to the same destination, the IPC client becomes a 
> bottleneck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18406) Adds alignment context to call path for creating RPC proxy with multiple connections per user.

2022-08-24 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18406.

Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> Adds alignment context to call path for creating RPC proxy with multiple 
> connections per user.
> --
>
> Key: HADOOP-18406
> URL: https://issues.apache.org/jira/browse/HADOOP-18406
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> HDFS-13274 (RBF: Extend RouterRpcClient to use multiple sockets) gets the RPC 
> proxy using methods which do not allow using an alignment context. These 
> methods were added in HADOOP-13144 (Enhancing IPC client throughput via 
> multiple connections per user).
> This change adds an alignment context as an argument for methods in the call 
> path for creating the proxy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18345) Enhance client protocol to propagate last seen state IDs for multiple nameservices.

2022-08-23 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18345.

Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> Enhance client protocol to propagate last seen state IDs for multiple 
> nameservices.
> ---
>
> Key: HADOOP-18345
> URL: https://issues.apache.org/jira/browse/HADOOP-18345
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The RPCHeader in the client protocol currently contains a single value to 
> indicate the last seen state ID for a namenode.
> {noformat}
> optional int64 stateId = 8; // The last seen Global State ID
> {noformat}
> When there are multiple namenodes, such as in router based federation, the 
> headers need to carry the state IDs for each of these nameservices that are 
> part of the federation.
> This change is a prerequisite for HDFS-13522: RBF: Support observer node from 
> Router-Based Federation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion

2022-06-30 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561206#comment-17561206
 ] 

Owen O'Malley edited comment on HADOOP-18324 at 6/30/22 11:00 PM:
--

We had a NameNode taken out with 10k threads of which 9500 were "IPC Parameter 
Sending Thread ".


was (Author: owen.omalley):
We had a NameNode taken out with 10k threads of which 9500 where "IPC Parameter 
Sending Thread ".

> Interrupting RPC Client calls can lead to thread exhaustion
> ---
>
> Key: HADOOP-18324
> URL: https://issues.apache.org/jira/browse/HADOOP-18324
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.4.0, 2.10.2, 3.3.3
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Critical
>
> Currently the IPC client creates a boundless number of threads to write the 
> rpc request to the socket. The NameNode uses timeouts on its RPC calls to the 
> Journal Node and a stuck JN will cause the NN to create an infinite set of 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion

2022-06-30 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561206#comment-17561206
 ] 

Owen O'Malley commented on HADOOP-18324:


We had a NameNode taken out with 10k threads of which 9500 where "IPC Parameter 
Sending Thread ".

> Interrupting RPC Client calls can lead to thread exhaustion
> ---
>
> Key: HADOOP-18324
> URL: https://issues.apache.org/jira/browse/HADOOP-18324
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.4.0, 2.10.2, 3.3.3
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Critical
>
> Currently the IPC client creates a boundless number of threads to write the 
> rpc request to the socket. The NameNode uses timeouts on its RPC calls to the 
> Journal Node and a stuck JN will cause the NN to create an infinite set of 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion

2022-06-30 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-18324:
---
Affects Version/s: 3.3.3
   2.10.2
   3.4.0

> Interrupting RPC Client calls can lead to thread exhaustion
> ---
>
> Key: HADOOP-18324
> URL: https://issues.apache.org/jira/browse/HADOOP-18324
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.4.0, 2.10.2, 3.3.3
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Critical
>
> Currently the IPC client creates a boundless number of threads to write the 
> rpc request to the socket. The NameNode uses timeouts on its RPC calls to the 
> Journal Node and a stuck JN will cause the NN to create an infinite set of 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion

2022-06-30 Thread Owen O'Malley (Jira)
Owen O'Malley created HADOOP-18324:
--

 Summary: Interrupting RPC Client calls can lead to thread 
exhaustion
 Key: HADOOP-18324
 URL: https://issues.apache.org/jira/browse/HADOOP-18324
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Currently the IPC client creates a boundless number of threads to write the rpc 
request to the socket. The NameNode uses timeouts on its RPC calls to the 
Journal Node and a stuck JN will cause the NN to create an infinite set of 
threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18193) Support nested mount points in INodeTree

2022-05-11 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18193.

Fix Version/s: 3.4.0
   Resolution: Fixed

I just committed this. Thanks, Lei!

> Support nested mount points in INodeTree
> 
>
> Key: HADOOP-18193
> URL: https://issues.apache.org/jira/browse/HADOOP-18193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Nested Mount Point in ViewFs.pdf
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Defining following client mount table config is not supported in  INodeTree 
> and will throw FileAlreadyExistsException
>  
> {code:java}
> fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar
> fs.viewfs.mounttable.link./foo=hdfs://nn02/foo
> {code}
> INodeTree has 2 methods that need change to support nested mount points.
> {code:java}
> createLink(): build INodeTree during fs init.
> resolve(): resolve path in INodeTree with viewfs apis.
> {code}
> ViewFileSystem and ViewFs maintains an INodeTree instance(fsState) in both 
> classes and call fsState.resolve(..) to resolve path to specific mount point. 
> INodeTree.resolve encapsulates the logic of nested mount point resolving. So 
> no changes are expected in both classes. 
> AC:
>  # INodeTree.createlink should support creating nested mount 
> points.(INodeTree is constructed during fs init)
>  # INodeTree.resolve should support resolve path based on nested mount 
> points. (INodeTree.resolve is used in viewfs apis)
>  # No regression in existing ViewFileSystem and ViewFs apis.
>  # Ensure some important apis are not broken with nested mount points. 
> (Rename, getContentSummary, listStatus...)
>  
> Spec:
> Please review attached pdf for spec about this feature.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18222) Prevent DelegationTokenSecretManagerMetrics from registering multiple times

2022-05-10 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-18222:
---
Fix Version/s: 3.4.0
   3.3.4
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Hector!

> Prevent DelegationTokenSecretManagerMetrics from registering multiple times 
> 
>
> Key: HADOOP-18222
> URL: https://issues.apache.org/jira/browse/HADOOP-18222
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> After committing HADOOP-18167, we received reports of the following error 
> when ResourceManager is initialized:
> {noformat}
> Caused by: java.io.IOException: Problem starting http server
> at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1389)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:475)
> ... 4 more
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source 
> DelegationTokenSecretManagerMetrics already exists!
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> at 
> org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:71)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$DelegationTokenSecretManagerMetrics.create(AbstractDelegationTokenSecretManager.java:878)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.(AbstractDelegationTokenSecretManager.java:152)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$DelegationTokenSecretManager.(DelegationTokenManager.java:72)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.(DelegationTokenManager.java:122)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:161)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:130)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:214)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180)
> at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.init(RMAuthenticationFilter.java:53){noformat}
> This can happen if MetricsSystemImpl#init is called and multiple metrics are 
> registered with the same name. A proposed solution is to declare the metrics 
> in AbstractDelegationTokenSecretManager as singleton, which would prevent 
> multiple instances DelegationTokenSecretManagerMetrics from being registered.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18169) getDelegationTokens in ViewFs should also fetch the token from the fallback FS

2022-03-31 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18169.

Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

> getDelegationTokens in ViewFs should also fetch the token from the fallback FS
> --
>
> Key: HADOOP-18169
> URL: https://issues.apache.org/jira/browse/HADOOP-18169
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> getDelegationTokens in ViewFs does not include the delegationToken from the 
> fallback FS, while it should. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18169) getDelegationTokens in ViewFs should also fetch the token from the fallback FS

2022-03-31 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HADOOP-18169:
--

Assignee: Xing Lin

> getDelegationTokens in ViewFs should also fetch the token from the fallback FS
> --
>
> Key: HADOOP-18169
> URL: https://issues.apache.org/jira/browse/HADOOP-18169
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> getDelegationTokens in ViewFs does not include the delegationToken from the 
> fallback FS, while it should. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16254) Add proxy address in IPC connection

2022-03-21 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-16254:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

This has been fixed by using the CallerContext.

> Add proxy address in IPC connection
> ---
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Blocker
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, 
> HADOOP-16254.004.patch, HADOOP-16254.005.patch, HADOOP-16254.006.patch, 
> HADOOP-16254.007.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18129) Change URI[] in INodeLink to String[] to reduce memory footprint of ViewFileSystem

2022-03-17 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18129.

Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

I committed this. Thanks, Abhishek!

> Change URI[] in INodeLink to String[] to reduce memory footprint of 
> ViewFileSystem
> --
>
> Key: HADOOP-18129
> URL: https://issues.apache.org/jira/browse/HADOOP-18129
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Abhishek Das
>Assignee: Abhishek Das
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Around 40k instances of INodeLink each of which is taking memory ranging from 
> 1680bytes to 1160bytes. Multiplying 40k with 1160bytes will give us 
> approximate 45mb.
> With changing from URI to String in INodeLink the memory consumed by each of 
> the INodeLink objects has reduced from ~1160 bytes to ~320 bytes. Overall 
> size becomes (40k X 320) 12mb



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18127) Backport HADOOP-13055 into branch-2.10

2022-03-15 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18127.

Resolution: Fixed

I just committed these backports.

> Backport HADOOP-13055 into branch-2.10
> --
>
> Key: HADOOP-18127
> URL: https://issues.apache.org/jira/browse/HADOOP-18127
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HADOOP-13055 introduce linkMergeSlash and linkFallback for ViewFileSystem. 
> Would be good to backport it to branch-2.10



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18144) getTrashRoot/s in ViewFileSystem should return viewFS path, not targetFS path

2022-03-14 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HADOOP-18144:
--

Assignee: Xing Lin

> getTrashRoot/s in ViewFileSystem should return viewFS path, not targetFS path
> -
>
> Key: HADOOP-18144
> URL: https://issues.apache.org/jira/browse/HADOOP-18144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> It is probably incorrect that we return a targetFS path from getTrashRoot() 
> in ViewFileSystem, as that path will be used later on by ViewFileSystem in 
> other operations, such as rename. ViewFileSystem is assuming the path that it 
> receives is a viewFS path, but not a target FS path. For example, rename() in 
> ViewFileSystem will call getUriPath() for src/dst path, which will remove the 
> scheme/authority and then try to resolve the path-only component. It thus 
> sometimes leads to incorrect path resolution, as we are doing the path 
> resolution again on a targetFS path. 
>  
> On the other hand, it is not always trivial/feasible to determine the correct 
> viewFS path for a given trash root in targetFS path. 
> Example:
> Assume we have a mount point for /user/foo -> abfs:/containerA
> User foo calls getTrashRoot("/a/b/c") and "/a/b/c" does not match any mount 
> point. We fall back to the fallback hdfs, which by default returns 
> hdfs://localhost/user/foo/.Trash. In this case, it is incorrect to return the 
> trash root as viewfs:/user/foo, as it will be resolved to the abfs mount 
> point, instead of the fallback hdfs.
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18144) getTrashRoot/s in ViewFileSystem should return viewFS path, not targetFS path

2022-03-14 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18144.

Fix Version/s: 3.4.0
   Resolution: Fixed

> getTrashRoot/s in ViewFileSystem should return viewFS path, not targetFS path
> -
>
> Key: HADOOP-18144
> URL: https://issues.apache.org/jira/browse/HADOOP-18144
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> It is probably incorrect that we return a targetFS path from getTrashRoot() 
> in ViewFileSystem, as that path will be used later on by ViewFileSystem in 
> other operations, such as rename. ViewFileSystem is assuming the path that it 
> receives is a viewFS path, but not a target FS path. For example, rename() in 
> ViewFileSystem will call getUriPath() for src/dst path, which will remove the 
> scheme/authority and then try to resolve the path-only component. It thus 
> sometimes leads to incorrect path resolution, as we are doing the path 
> resolution again on a targetFS path. 
>  
> On the other hand, it is not always trivial/feasible to determine the correct 
> viewFS path for a given trash root in targetFS path. 
> Example:
> Assume we have a mount point for /user/foo -> abfs:/containerA
> User foo calls getTrashRoot("/a/b/c") and "/a/b/c" does not match any mount 
> point. We fall back to the fallback hdfs, which by default returns 
> hdfs://localhost/user/foo/.Trash. In this case, it is incorrect to return the 
> trash root as viewfs:/user/foo, as it will be resolved to the abfs mount 
> point, instead of the fallback hdfs.
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18153) The CallerContext should not use ":" as the separator.

2022-03-04 Thread Owen O'Malley (Jira)
Owen O'Malley created HADOOP-18153:
--

 Summary: The CallerContext should not use ":" as the separator.
 Key: HADOOP-18153
 URL: https://issues.apache.org/jira/browse/HADOOP-18153
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Since the goal of having fields in the CallerContext is to support adding ip 
addresses, we need to pick something that is compatible with both ip4 and ip6. 
":" fails that test, since every ip6 address uses it extensively.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16254) Add proxy address in IPC connection

2022-02-24 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497809#comment-17497809
 ] 

Owen O'Malley commented on HADOOP-16254:


[~ayushtkn] , yeah I hadn't found that jira, so thank you.

Of course, using the caller context will work, with the only major downside is 
that if the user sets a caller context that is close to the limit, it could 
cause bytes to get dropped. We might want to pick a shorter lead string (4 
bytes?) to minimize that chance. (Or bump up the default limit by 50 bytes?)

> Add proxy address in IPC connection
> ---
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Blocker
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, 
> HADOOP-16254.004.patch, HADOOP-16254.005.patch, HADOOP-16254.006.patch, 
> HADOOP-16254.007.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18139) Allow configuration of zookeeper server principal

2022-02-24 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18139.

Fix Version/s: 3.4.0
   3.3.3
   Resolution: Fixed

Thanks for the review, Íñigo!

> Allow configuration of zookeeper server principal
> -
>
> Key: HADOOP-18139
> URL: https://issues.apache.org/jira/browse/HADOOP-18139
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: auth
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Allow configuration of zookeeper server principal.
> This would allow the Router to specify the principal.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18139) RBF: Allow configuration of zookeeper server principal in router

2022-02-23 Thread Owen O'Malley (Jira)
Owen O'Malley created HADOOP-18139:
--

 Summary: RBF: Allow configuration of zookeeper server principal in 
router
 Key: HADOOP-18139
 URL: https://issues.apache.org/jira/browse/HADOOP-18139
 Project: Hadoop Common
  Issue Type: Improvement
  Components: auth
Reporter: Owen O'Malley
Assignee: Owen O'Malley






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16254) Add proxy address in IPC connection

2022-02-17 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17494310#comment-17494310
 ] 

Owen O'Malley commented on HADOOP-16254:


This feature is a blocker for us at LinkedIn. We can (and do) have an internal 
fork, but we need a way for Hadoop RBF to get the real client ip address. This 
is critical for a few reasons:
 * The data locality on read is managed by the NameNode based on the client ip 
address.
 * The write pipeline is managed by the NameNode again based on the client ip 
address.
 * The HDFS audit log needs to contain the actual client IP address and not the 
routers' address.
 * If we were actually using client ip filtering, it would be required for that 
also.

I think the proposed option to only use the parameter if the user is configured 
as a proxy is reasonable, especially when tied to the option to limit it to a 
fixed range of proxy ip addresses. [~daryn] do you still have concerns about 
this approach?

> Add proxy address in IPC connection
> ---
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Blocker
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, 
> HADOOP-16254.004.patch, HADOOP-16254.005.patch, HADOOP-16254.006.patch, 
> HADOOP-16254.007.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16254) Add proxy address in IPC connection

2022-02-17 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-16254:
---
Priority: Blocker  (was: Major)

> Add proxy address in IPC connection
> ---
>
> Key: HADOOP-16254
> URL: https://issues.apache.org/jira/browse/HADOOP-16254
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: ipc
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Blocker
> Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, 
> HADOOP-16254.004.patch, HADOOP-16254.005.patch, HADOOP-16254.006.patch, 
> HADOOP-16254.007.patch
>
>
> In order to support data locality of RBF, we need to add new field about 
> client hostname in the RPC headers of Router protocol calls.
>  clientHostname represents hostname of client and forward by Router to 
> Namenode to support data locality friendly. See more [RBF Data Locality 
> Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf]
>  in HDFS-13248 and [maillist 
> vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18110) ViewFileSystem: Add Support for Localized Trash Root

2022-02-10 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-18110.

Fix Version/s: 3.4.0
   Resolution: Fixed

> ViewFileSystem: Add Support for Localized Trash Root
> 
>
> Key: HADOOP-18110
> URL: https://issues.apache.org/jira/browse/HADOOP-18110
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> getTrashRoot() in ViewFileSystem calls getTrashRoot() from underlying 
> filesystem, to return the trash root. Most of the time, we get a trash root 
> in user home dir. This can lead to problems when an application wants to 
> delete a file in a mounted point using moveToTrash() in TrashPolicyDefault, 
> because we can not rename across multiple filesystems/hdfs namenodes. 
>  
> We propose the following extension to getTrashRoot/getTrashRoots in 
> ViewFileSystem: add a flag to return a localized trash root for 
> ViewFileSystem. A localized trash root is a trash root which starts from the 
> root of a mount point (e.g., /mountpointRoot/.Trash/\{user}). 
> * If CONFIG_VIEWFS_MOUNT_POINT_LOCAL_TRASH is not set to true, or
> * when the path p is in a snapshot or an encryption zone, return
> * the default trash root in user home dir.
> *
> * when CONFIG_VIEWFS_MOUNT_POINT_LOCAL_TRASH is set to true,
> * 1) if path p is mounted from the same targetFS as user home dir,
> * return a trash root in user home dir.
> * 2) else, return a trash root in the mounted targetFS
> *



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2020-09-21 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17199483#comment-17199483
 ] 

Owen O'Malley commented on HADOOP-11867:


To follow up on this, the benchmarks compare:

File systems:
 * raw = raw local file system
 * local = local file system with checksums layered on top

ByteBuffer implementation:
 * direct = direct byte buffers
 * array = array backed byte buffers

Read method:
 * asyncFileChanArray = reading using java's async file channel (no hadoop fs)
 * asyncRead = my new code added in this PR
 * syncRead = the current code

So, the current code is by far the slowest and using the Java native async file 
channel is the fastest. (The code in this PR uses the async file channel and 
goes through the hadoop fs api, so that isn't surprising.) The nice bit is that 
the raw fs code gets close the the native async file channel speeds. Even the 
local fs with the checksum reads & validation is still very fast (3.75x the 
current checksum code).

 

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3, hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2020-09-21 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11867:
---
Comment: was deleted

(was: | (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 
23s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
49s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~18.04-b09 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
38s{color} | {color:red} hadoop-common in trunk failed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-common-project in trunk failed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~18.04-b09 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
36s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
22s{color} | {color:red} benchmark in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m  
8s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
19s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_252-8u252-b09-1~18.04-b09 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 55s{color} | {color:orange} hadoop-common-project: The patch generated 29 
new + 93 unchanged - 1 fixed = 122 total (was 94) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
38s{color} | {color:red} benchmark in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
37s{color} | {color:red} hadoop-common in the patch failed with JDK 

[jira] [Issue Comment Deleted] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2020-09-21 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11867:
---
Comment: was deleted

(was: | (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 24m 
14s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
42s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
23s{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
30s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m  
9s{color} | {color:red} benchmark in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 23m 
24s{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 23m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
43s{color} | {color:green} the patch passed with JDK Private 
Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  0s{color} | {color:orange} hadoop-common-project: The patch generated 29 
new + 93 unchanged - 1 fixed = 122 total (was 94) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
23s{color} | {color:red} benchmark in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
23s{color} | {color:red} benchmark in the patch failed with JDK 
Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} benchmark in the patch failed with JDK 

[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2020-02-03 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029389#comment-17029389
 ] 

Owen O'Malley commented on HADOOP-11867:


Ok, my first patch adds the API, the default method, and the utilities for 
this. I also included the implementation for RawLocalFileSystem and 
ChecksumFileSystem because they were easiest to test and let me use the APIs in 
non-trivial ways. I also included a benchmark that tests against the local file 
system:

{code}
Benchmark  (bufferKind)  (fileSystemKind)  Mode  Cnt
 Score Error  Units
AsyncBenchmark.asyncFileChanArraydirect   N/A  avgt   20  
1432.396 ± 232.443  us/op
AsyncBenchmark.asyncFileChanArray array   N/A  avgt   20  
1551.400 ±  65.639  us/op
AsyncBenchmark.asyncRead direct local  avgt   20  
2514.926 ± 245.603  us/op
AsyncBenchmark.asyncRead direct   raw  avgt   20  
1440.434 ± 207.504  us/op
AsyncBenchmark.asyncRead  array local  avgt   20  
2798.031 ± 135.023  us/op
AsyncBenchmark.asyncRead  array   raw  avgt   20  
1524.360 ±  54.462  us/op
AsyncBenchmark.syncRead N/A local  avgt   20  
9515.604 ± 123.311  us/op
AsyncBenchmark.syncRead N/A   raw  avgt   20  
2402.039 ±  36.620  us/op
{code}

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3, hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16214) Kerberos name implementation in Hadoop does not accept principals with more than two components

2019-04-19 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822274#comment-16822274
 ] 

Owen O'Malley commented on HADOOP-16214:


[~ibuenros] So you are mapping principals like "owen/pii/admin" to local 
accounts like "owen_pii_admin"? That would let you split permissions based on 
whether they want "pii" access and "admin" access. I assume the roles nest such 
that:
//@ means that role2 is a subset of role1 and that 
user has both roles. Is that what you are doing?

> Kerberos name implementation in Hadoop does not accept principals with more 
> than two components
> ---
>
> Key: HADOOP-16214
> URL: https://issues.apache.org/jira/browse/HADOOP-16214
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: auth
>Reporter: Issac Buenrostro
>Priority: Major
> Attachments: Add-service-freeipa.png, HADOOP-16214.001.patch, 
> HADOOP-16214.002.patch, HADOOP-16214.003.patch, HADOOP-16214.004.patch, 
> HADOOP-16214.005.patch, HADOOP-16214.006.patch, HADOOP-16214.007.patch, 
> HADOOP-16214.008.patch, HADOOP-16214.009.patch, HADOOP-16214.010.patch, 
> HADOOP-16214.011.patch
>
>
> org.apache.hadoop.security.authentication.util.KerberosName is in charge of 
> converting a Kerberos principal to a user name in Hadoop for all of the 
> services requiring authentication.
> Although the Kerberos spec 
> ([https://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html])
>  allows for an arbitrary number of components in the principal, the Hadoop 
> implementation will throw a "Malformed Kerberos name:" error if the principal 
> has more than two components (because the regex can only read serviceName and 
> hostName).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16214) Kerberos name implementation in Hadoop does not accept principals with more than two components

2019-04-19 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822102#comment-16822102
 ] 

Owen O'Malley commented on HADOOP-16214:


[~xkrogen] can you clarify what your principals look like and what is the 
intended semantics behind them? The standard hadoop usage of @ and 
/@ is very much the standard and I'd love to hear what 
your use case is like.

That said, [~daryn]'s approach makes more sense. Extending the parser to handle 
additional components without redefining the current behaviors is much better. 
Adding support for $3, $4, ... makes sense and won't break current systems.

> Kerberos name implementation in Hadoop does not accept principals with more 
> than two components
> ---
>
> Key: HADOOP-16214
> URL: https://issues.apache.org/jira/browse/HADOOP-16214
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: auth
>Reporter: Issac Buenrostro
>Priority: Major
> Attachments: HADOOP-16214.001.patch, HADOOP-16214.002.patch, 
> HADOOP-16214.003.patch, HADOOP-16214.004.patch, HADOOP-16214.005.patch, 
> HADOOP-16214.006.patch, HADOOP-16214.007.patch, HADOOP-16214.008.patch, 
> HADOOP-16214.009.patch, HADOOP-16214.010.patch, HADOOP-16214.011.patch
>
>
> org.apache.hadoop.security.authentication.util.KerberosName is in charge of 
> converting a Kerberos principal to a user name in Hadoop for all of the 
> services requiring authentication.
> Although the Kerberos spec 
> ([https://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html])
>  allows for an arbitrary number of components in the principal, the Hadoop 
> implementation will throw a "Malformed Kerberos name:" error if the principal 
> has more than two components (because the regex can only read serviceName and 
> hostName).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-12-05 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710718#comment-16710718
 ] 

Owen O'Malley commented on HADOOP-11867:


Currently, the implementation of the PositionedReadable.readFully(long, byte[], 
int, int) locks the stream so that you won't process multiple reads in parallel 
without a specific implementation that makes things better. For the REST-based 
file systems, absolutely the goal is to convert it into a single read with 
multiple ranges in the request.

I agree completely that implementing a prototype is a good first step, before 
locking down the exact semantics. My current thoughts:
 * You have no guarantees about the order the results are returned.
 * If the file system has mutable files, it is the application's responsibility 
to perform adequate locking prior to calling the read operations. (So yes, you 
get no guarantees about consistency of reads.) Since this case doesn't apply to 
the vast majority of users, I wouldn't want to complicate
 * Overlapping ranges are permitted.
 * It is up to the file system whether the reads lock the stream. The current 
implementation does it because it uses seek/read/seek to implement the 
positioned read. We should allow implementations to do it more directly. I 
don't think there should be any guarantees about ordering between async reads 
or sync read on the same stream.
 * Since the future contains the FileRange they passed in, they could pass in 
an extension that tracks the additional information that they need.

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-12-03 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707860#comment-16707860
 ] 

Owen O'Malley commented on HADOOP-15229:


My comment remains that it seems problematic having all of the options present 
at the open call. Being able to change the properties of the stream later is 
more convenient. In particular, I really don't think we should make a 
convoluted call back structure just so that we can have the options available 
at creation.

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-12-03 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707584#comment-16707584
 ] 

Owen O'Malley edited comment on HADOOP-15229 at 12/3/18 5:44 PM:
-

[~ste...@apache.org] with the problems were hitting, you're probably right that 
we need something more like HADOOP-11867.

So we could define an interface for the input streams.

{code:java}
package org.apache.hadoop.fs;
public interface InputStreamOptions {
  /**
   * Test whether the given FileSystem supports the given option.
   */
  boolean supportsOption(String name);

  // repeat optional(...) and require(...) for long, double, and boolean values

  /**
   * Set an optional option to the specified value.
   */
  ReaderBuilder optional(String name, String value);

  /**
   * Set a required option to the specified value.
   */
  ReaderBuilder require(String name, String value);
}
{code}

Of course, we'd need default implementations that ignore optional and fail on 
require.

Next, we make PositionedReadable extend InputStreamOptions, so that with an 
open stream you can change the options for that stream.

Finally, we extend RecordReader API to also extend InputStreamOptions and pass 
the options down to the underlying stream.

Thoughts?


was (Author: owen.omalley):
[~ste...@apache.org] with the problems were hitting, you're probably right that 
we need something more like HADOOP-11867.

So we could define an interface for the input streams.

{code:java}
package org.apache.hadoop.fs;
public interface InputStreamOptions {
  /**
   * Test whether the given FileSystem supports the given option.
   */
  boolean supportsOption(String name);

  // repeat optional(...) and require(...) for long, double, and boolean values

  /**
   * Set an optional option to the specified value.
   */
  ReaderBuilder optional(String name, String value);

  /**
   * Set a required option to the specified value.
   */
  ReaderBuilder require(String name, String value);
}
{code:java}

Of course, we'd need default implementations that ignore optional and fail on 
require.

Next, we make PositionedReadable extend InputStreamOptions, so that with an 
open stream you can change the options for that stream.

Finally, we extend RecordReader API to also extend InputStreamOptions and pass 
the options down to the underlying stream.

Thoughts?

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-12-03 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707584#comment-16707584
 ] 

Owen O'Malley commented on HADOOP-15229:


[~ste...@apache.org] with the problems were hitting, you're probably right that 
we need something more like HADOOP-11867.

So we could define an interface for the input streams.

{code:java}
package org.apache.hadoop.fs;
public interface InputStreamOptions {
  /**
   * Test whether the given FileSystem supports the given option.
   */
  boolean supportsOption(String name);

  // repeat optional(...) and require(...) for long, double, and boolean values

  /**
   * Set an optional option to the specified value.
   */
  ReaderBuilder optional(String name, String value);

  /**
   * Set a required option to the specified value.
   */
  ReaderBuilder require(String name, String value);
}
{code:java}

Of course, we'd need default implementations that ignore optional and fail on 
require.

Next, we make PositionedReadable extend InputStreamOptions, so that with an 
open stream you can change the options for that stream.

Finally, we extend RecordReader API to also extend InputStreamOptions and pass 
the options down to the underlying stream.

Thoughts?

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-11-30 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705211#comment-16705211
 ] 

Owen O'Malley commented on HADOOP-11867:


Ok, looking at this deeper, I'd suggest that we add readAsync to 
PositionedReadable. That implies it is in FSDataInputStream as well as 
FSInputStream.

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-11-30 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705141#comment-16705141
 ] 

Owen O'Malley edited comment on HADOOP-11867 at 11/30/18 7:28 PM:
--

I'd like to propose the following API:

{code:java}
package org.apache.hadoop.fs;
/**
 * A range of bytes from a file.
 */
public static class FileRange {
  public final long offset;
  public final int length; // max length is 2^31 because of Java arrays
  public ByteBuffer buffer;
  public FileRange(long offset, int length, ByteBuffer buffer) { ... }
}

public class FSDataInputStream ... {
  ...
  /**
   * Perform an asynchronous read of the file with multiple ranges. This call
   * will return immediately and return futures that will contain the data
   * once it is read. The order of the physical reads is an implementation
   * detail of this method. Multiple requests may be converted into a single
   * read.
   *
   * If any ranges do not have a buffer, an array-based one of the appropriate
   * size will be created for it.
   * @param ranges the list of disk ranges to read
   * @return for each range, the future filled byte buffer will be returned.
   * @throws IOException if the file is not available
   */
  public CompletableFuture[] readAsync(List ranges
   ) throws IOException { ... }
  ...
}
{code}

FSDataInputStream will have a default implementation, but file systems will be 
able to create a more optimized solution for their files.

Thoughts?


was (Author: owen.omalley):
I'd like to propose the following API:

{code:java}
package org.apache.hadoop.fs;
/**
 * A range of bytes from a file.
 */
public static class FileRange {
  public final long offset;
  public final int length; // max length is 2^31 because of Java arrays
  public ByteBuffer buffer;
  public DiskRange(long offset, int length, ByteBuffer buffer) {
this.offset = offset;
this.length = length;
this.buffer = buffer;
  }
}

public class FSDataInputStream ... {
  ...
  /**
   * Perform an asynchronous read of the file with multiple ranges. This call
   * will return immediately and return futures that will contain the data
   * once it is read. The order of the physical reads is an implementation
   * detail of this method. Multiple requests may be converted into a single
   * read.
   *
   * If any ranges do not have a buffer, an array-based one of the appropriate
   * size will be created for it.
   * @param ranges the list of disk ranges to read
   * @return for each range, the future filled byte buffer will be returned.
   * @throws IOException if the file is not available
   */
  public CompletableFuture[] readAsync(List ranges
   ) throws IOException { ... }
  ...
}
{code}

FSDataInputStream will have a default implementation, but file systems will be 
able to create a more optimized solution for their files.

Thoughts?

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-11-30 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705141#comment-16705141
 ] 

Owen O'Malley edited comment on HADOOP-11867 at 11/30/18 7:23 PM:
--

I'd like to propose the following API:

{code:java}
package org.apache.hadoop.fs;
/**
 * A range of bytes from a file.
 */
public static class FileRange {
  public final long offset;
  public final int length; // max length is 2^31 because of Java arrays
  public ByteBuffer buffer;
  public DiskRange(long offset, int length, ByteBuffer buffer) {
this.offset = offset;
this.length = length;
this.buffer = buffer;
  }
}

public class FSDataInputStream ... {
  ...
  /**
   * Perform an asynchronous read of the file with multiple ranges. This call
   * will return immediately and return futures that will contain the data
   * once it is read. The order of the physical reads is an implementation
   * detail of this method. Multiple requests may be converted into a single
   * read.
   *
   * If any ranges do not have a buffer, an array-based one of the appropriate
   * size will be created for it.
   * @param ranges the list of disk ranges to read
   * @return for each range, the future filled byte buffer will be returned.
   * @throws IOException if the file is not available
   */
  public CompletableFuture[] readAsync(List ranges
   ) throws IOException { ... }
  ...
}
{code}

FSDataInputStream will have a default implementation, but file systems will be 
able to create a more optimized solution for their files.

Thoughts?


was (Author: owen.omalley):
I'd like to propose the following API:

{code:java}
package org.apache.hadoop.fs;
/**
 * A range of bytes from a file.
 */
public static class FileRange {
  public final long offset;
  public final int length; // max length is 2^31 because of Java arrays
  public ByteBuffer buffer;
  public DiskRange(long offset, int length, ByteBuffer buffer) {
this.offset = offset;
this.length = length;
this.buffer = buffer;
  }
}

public class FSDataInputStream ... {
  ...
  /**
   * Perform an asynchronous read of the file with multiple ranges. This call
   * will return immediately and return futures that will contain the data
   * once it is read. The order of the physical reads is an implementation
   * detail of this method. Multiple requests may be converted into a single
   * read.
   *
   * If any ranges do not have a buffer, an array-based one of the appropriate
   * size will be created for it.
   * @param ranges the list of disk ranges to read
   * @return for each range, the future filled byte buffer will be returned.
   * @throws IOException if the file is not available
   */
  public CompletableFuture[] readAsync(List ranges

 ) throws IOException { ... }
  ...
}
{code}

FSDataInputStream will have a default implementation, but file systems will be 
able to create a more optimized solution for their files.

Thoughts?

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-11-30 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705141#comment-16705141
 ] 

Owen O'Malley commented on HADOOP-11867:


I'd like to propose the following API:

{code:java}
package org.apache.hadoop.fs;
/**
 * A range of bytes from a file.
 */
public static class FileRange {
  public final long offset;
  public final int length; // max length is 2^31 because of Java arrays
  public ByteBuffer buffer;
  public DiskRange(long offset, int length, ByteBuffer buffer) {
this.offset = offset;
this.length = length;
this.buffer = buffer;
  }
}

public class FSDataInputStream ... {
  ...
  /**
   * Perform an asynchronous read of the file with multiple ranges. This call
   * will return immediately and return futures that will contain the data
   * once it is read. The order of the physical reads is an implementation
   * detail of this method. Multiple requests may be converted into a single
   * read.
   *
   * If any ranges do not have a buffer, an array-based one of the appropriate
   * size will be created for it.
   * @param ranges the list of disk ranges to read
   * @return for each range, the future filled byte buffer will be returned.
   * @throws IOException if the file is not available
   */
  public CompletableFuture[] readAsync(List ranges

 ) throws IOException { ... }
  ...
}
{code}

FSDataInputStream will have a default implementation, but file systems will be 
able to create a more optimized solution for their files.

Thoughts?

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

2018-11-30 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HADOOP-11867:
--

Assignee: Owen O'Malley  (was: Gopal V)

> FS API: Add a high-performance vectored Read to FSDataInputStream API
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-11-28 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702425#comment-16702425
 ] 

Owen O'Malley commented on HADOOP-15229:


[~ste...@apache.org] using interfaces leaves us a lot more choices going 
forward. The only point to be cautious of is that you can't replace classes 
with interfaces or interfaces with classes. Either way breaks binary 
compatibility.

Yes, I'd put the public interfaces into o.a.hadoop.fs. For the implementation 
stuff, I'd prefer that we put it into an impl package. It just makes it easier 
to control the javadoc from getting polluted with classes that are "java 
public", but not intended for users.

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()

2018-11-27 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700839#comment-16700839
 ] 

Owen O'Malley commented on HADOOP-15229:


Sorry for coming into this late and for bike shedding. *grin*

I'd suggest that the main builder API be an interface rather than abstract 
class. Maybe something like:
 
{code:java}
public interface ReaderBuilder {
  /**
   * Test whether the given FileSystem supports the given option.
   */
  boolean supportsOption(String name);

  // repeat optional(...) and require(...) for long, double, and boolean values

  /**
   * Set an optional option to the specified value.
   */
  ReaderBuilder optional(String name, String value);

  /**
   * Set a required option to the specified value.
   */
  ReaderBuilder require(String name, String value);

  /**
   * Use the options to build the stream.
   */
  FSDataInputStream build() throws IOException;
}

public abstract class FileSystem ... {
  ...
  ReaderBuilder openWithOptions(Path filename) throws IOException;
}
{code}

This minimizes the public API changes and exposure. I think such an option 
builder could start public, evolving.

> Add FileSystem builder-based openFile() API to match createFile()
> -
>
> Key: HADOOP-15229
> URL: https://issues.apache.org/jira/browse/HADOOP-15229
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, 
> HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, 
> HADOOP-15229-005.patch, HADOOP-15229-006.patch
>
>
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for 
> working with object stores, where getting the decision to do a full GET and 
> TCP abort on seek vs smaller GETs is fundamentally different: the wrong 
> option can cost you minutes. S3A and Azure both have adaptive policies now 
> (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" 
> "random" as an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. 
> Ideally with as much code reuse as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Closed] (HADOOP-15896) Refine Kerberos based AuthenticationHandler to check proxyuser ACL

2018-11-02 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HADOOP-15896.
--

> Refine Kerberos based AuthenticationHandler to check proxyuser ACL
> --
>
> Key: HADOOP-15896
> URL: https://issues.apache.org/jira/browse/HADOOP-15896
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Eric Yang
>Assignee: Larry McCay
>Priority: Major
>
> JWTRedirectAuthenticationHandler is based on KerberosAuthenticationHandler, 
> and authentication method in KerberosAuthenticationHandler basically do this:
>  {code}
> String clientPrincipal = gssContext.getSrcName().toString();
> KerberosName kerberosName = new KerberosName(clientPrincipal);
> String userName = kerberosName.getShortName();
> token = new AuthenticationToken(userName, clientPrincipal, getType());
> response.setStatus(HttpServletResponse.SC_OK);
> LOG.trace("SPNEGO completed for client principal [{}]",
> clientPrincipal);
> {code}
> It obtains the short name of the client principal and respond OK.  This is 
> fine for verifying end user.  However, in proxy user case (knox), this 
> authentication is insufficient because knox principal name is: 
> knox/host1.example@example.com . KerberosAuthenticationHandler will 
> gladly confirm that knox is knox.  Even if the 
> knox/host1.example@example.com is used from botnet.rogueresearchlab.tld 
> host.  KerberosAuthenticationHandler may not need to change, if it does not 
> have plan to support proxy, and ignores instance name of kerberos principal.  
> For JWTRedirectAuthenticationHandler which is designed for proxy use case.  
> It should check remote host matches the clientPrincipal instance name, 
> without this check, it makes Kerberos vulnerable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15896) Refine Kerberos based AuthenticationHandler to check proxyuser ACL

2018-11-02 Thread Owen O'Malley (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-15896.

Resolution: Not A Problem

This is working correctly. Do not attempt to change this behavior.

> Refine Kerberos based AuthenticationHandler to check proxyuser ACL
> --
>
> Key: HADOOP-15896
> URL: https://issues.apache.org/jira/browse/HADOOP-15896
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Eric Yang
>Assignee: Larry McCay
>Priority: Major
>
> JWTRedirectAuthenticationHandler is based on KerberosAuthenticationHandler, 
> and authentication method in KerberosAuthenticationHandler basically do this:
>  {code}
> String clientPrincipal = gssContext.getSrcName().toString();
> KerberosName kerberosName = new KerberosName(clientPrincipal);
> String userName = kerberosName.getShortName();
> token = new AuthenticationToken(userName, clientPrincipal, getType());
> response.setStatus(HttpServletResponse.SC_OK);
> LOG.trace("SPNEGO completed for client principal [{}]",
> clientPrincipal);
> {code}
> It obtains the short name of the client principal and respond OK.  This is 
> fine for verifying end user.  However, in proxy user case (knox), this 
> authentication is insufficient because knox principal name is: 
> knox/host1.example@example.com . KerberosAuthenticationHandler will 
> gladly confirm that knox is knox.  Even if the 
> knox/host1.example@example.com is used from botnet.rogueresearchlab.tld 
> host.  KerberosAuthenticationHandler may not need to change, if it does not 
> have plan to support proxy, and ignores instance name of kerberos principal.  
> For JWTRedirectAuthenticationHandler which is designed for proxy use case.  
> It should check remote host matches the clientPrincipal instance name, 
> without this check, it makes Kerberos vulnerable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15518) Authentication filter calling handler after request already authenticated

2018-06-08 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506425#comment-16506425
 ] 

Owen O'Malley commented on HADOOP-15518:


This looks good, [~kminder]. +1

> Authentication filter calling handler after request already authenticated
> -
>
> Key: HADOOP-15518
> URL: https://issues.apache.org/jira/browse/HADOOP-15518
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.7.1
>Reporter: Kevin Minder
>Assignee: Kevin Minder
>Priority: Major
> Attachments: HADOOP-15518-001.patch
>
>
> The hadoop-auth AuthenticationFilter will invoke its handler even if a prior 
> successful authentication has occurred in the current request.  This 
> primarily affects situations where multiple authentication mechanism has been 
> configured.  For example when core-site.xml's has 
> hadoop.http.authentication.type=kerberos and yarn-site.xml has 
> yarn.timeline-service.http-authentication.type=kerberos the result is an 
> attempt to perform two Kerberos authentications for the same request.  This 
> in turn results in Kerberos triggering a replay attack detection.  The 
> javadocs for AuthenticationHandler 
> ([https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationHandler.java)]
>  indicate for the authenticate method that
> {quote}This method is invoked by the AuthenticationFilter only if the HTTP 
> client request is not yet authenticated.
> {quote}
> This does not appear to be the case in practice.
> I've create a patch and tested on a limited number of functional use cases 
> (e.g. the timeline-service issue noted above).  If there is general agreement 
> that the change is valid I'll add unit tests to the patch.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13434) Add quoting to Shell class

2016-07-29 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-13434:
---
Attachment: HADOOP-13434.patch

I made some tweaks to make check style happy.

> Add quoting to Shell class
> --
>
> Key: HADOOP-13434
> URL: https://issues.apache.org/jira/browse/HADOOP-13434
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HADOOP-13434.patch, HADOOP-13434.patch, 
> HADOOP-13434.patch
>
>
> The Shell class makes assumptions that the parameters won't have spaces or 
> other special characters, even when it invokes bash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13434) Add quoting to Shell class

2016-07-29 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-13434:
---
Status: Patch Available  (was: Open)

> Add quoting to Shell class
> --
>
> Key: HADOOP-13434
> URL: https://issues.apache.org/jira/browse/HADOOP-13434
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HADOOP-13434.patch, HADOOP-13434.patch
>
>
> The Shell class makes assumptions that the parameters won't have spaces or 
> other special characters, even when it invokes bash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13434) Add quoting to Shell class

2016-07-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-13434:
---
Attachment: HADOOP-13434.patch

This iteration removes the use of bash from the cases where it wasn't necessary.

> Add quoting to Shell class
> --
>
> Key: HADOOP-13434
> URL: https://issues.apache.org/jira/browse/HADOOP-13434
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HADOOP-13434.patch, HADOOP-13434.patch
>
>
> The Shell class makes assumptions that the parameters won't have spaces or 
> other special characters, even when it invokes bash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HADOOP-13434) Add quoting to Shell class

2016-07-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-13434:
---
Comment: was deleted

(was: GitHub user omalley opened a pull request:

https://github.com/apache/hadoop/pull/118

HADOOP-13434. Add bash quoting to Shell class.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/hadoop hadoop-13434

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #118


commit e0d296a301c1e52a378e674cf7045f4ef3b8c62e
Author: Vinod Kumar Vavilapalli 
Date:   2014-03-08T04:50:31Z

YARN-1787. Fixed help messages for applicationattempt and container 
sub-commands in bin/yarn. Contributed by Zhijie Shen.
svn merge --ignore-ancestry -c 1575482 ../../trunk/


git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1575484 
13f79535-47bb-0310-9956-ffa450edef68

commit 059ba663e2890953fdd5a208d7d7969878ef7887
Author: Arpit Agarwal 
Date:   2014-03-08T16:41:26Z

HDFS-6078. Merging r1575561 from branch-2 to branch-2.4.

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1575562 
13f79535-47bb-0310-9956-ffa450edef68

commit 390ac348271cbb756f3de0ed5ee2951fcc7f34b7
Author: Colin McCabe 
Date:   2014-03-10T02:48:04Z

HDFS-6071. BlockReaderLocal does not return -1 on EOF when doing a 
zero-length read on a short file. (cmccabe)

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1575798 
13f79535-47bb-0310-9956-ffa450edef68

commit f93cc4d3d4edb74a728ba5254274e61c57ae66b9
Author: Jian He 
Date:   2014-03-10T18:05:29Z

Merge r1576026 from branch-2. Fixed ClientRMService#forceKillApplication 
not killing unmanaged application. Contributed by Karthik Kambatla

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576028 
13f79535-47bb-0310-9956-ffa450edef68

commit dc2f782d4fd85c9e416bed1e0098992b3c3a8db5
Author: Andrew Wang 
Date:   2014-03-10T19:05:44Z

HDFS-6070. Cleanup use of ReadStatistics in DFSInputStream.

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576051 
13f79535-47bb-0310-9956-ffa450edef68

commit 5fca55c41ce7ed15c5e542fbbd359f5ac1f2514a
Author: Vinod Kumar Vavilapalli 
Date:   2014-03-10T20:37:37Z

YARN-1788. Fixed a bug in ResourceManager to set the apps-completed and 
apps-killed metrics correctly for killed applications. Contributed by Varun 
Vasudev.
svn merge --ignore-ancestry -c 1576072 ../../trunk/


git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576074 
13f79535-47bb-0310-9956-ffa450edef68

commit d4936ec536920d802ca966869b54f3916fb698e9
Author: Jing Zhao 
Date:   2014-03-10T20:52:22Z

HDFS-6077. Merge change r1576077 from branch-2.

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576080 
13f79535-47bb-0310-9956-ffa450edef68

commit 883c146b4457def2389705409a20bc228fb59357
Author: Chris Nauroth 
Date:   2014-03-10T21:55:07Z

HDFS-6055. Merging change r1576098 from branch-2 to branch-2.4

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576100 
13f79535-47bb-0310-9956-ffa450edef68

commit ffc5328054b9262faefcb36e25eabf991d6e49a8
Author: Chris Nauroth 
Date:   2014-03-10T23:38:50Z

HADOOP-10399. Merging change r1576126 from branch-2 to branch-2.4

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576129 
13f79535-47bb-0310-9956-ffa450edef68

commit 1a5e3d36aa1a2a86a1ef8c76d5720e6349487a65
Author: Tsz-wo Sze 
Date:   2014-03-10T23:40:21Z

svn merge -c 1576128 from branch-2 for HDFS-5535.


git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576130 
13f79535-47bb-0310-9956-ffa450edef68

commit 810188777942f2a3e5e6c3d1ab2ac89912d4b95b
Author: Arpit Agarwal 
Date:   2014-03-11T00:00:22Z

HADOOP-10395. Merging r1576142 from branch-2 to branch-2.4.

git-svn-id: 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576144 
13f79535-47bb-0310-9956-ffa450edef68

commit 2ad1602b66753bb3cfa5274457a9b21a44374336
Author: Arpit Agarwal 
Date:   2014-03-11T00:04:09Z

HADOOP-10394. Merging r1576146 from branch-2 to branch-2.4.

git-svn-id: 

[jira] [Issue Comment Deleted] (HADOOP-13434) Add quoting to Shell class

2016-07-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-13434:
---
Comment: was deleted

(was: Github user omalley closed the pull request at:

https://github.com/apache/hadoop/pull/118
)

> Add quoting to Shell class
> --
>
> Key: HADOOP-13434
> URL: https://issues.apache.org/jira/browse/HADOOP-13434
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HADOOP-13434.patch
>
>
> The Shell class makes assumptions that the parameters won't have spaces or 
> other special characters, even when it invokes bash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13434) Add quoting to Shell class

2016-07-28 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-13434:
---
Attachment: HADOOP-13434.patch

This patch adds quoting to the Shell class on each command that invokes bash 
with string parameters.

> Add quoting to Shell class
> --
>
> Key: HADOOP-13434
> URL: https://issues.apache.org/jira/browse/HADOOP-13434
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HADOOP-13434.patch
>
>
> The Shell class makes assumptions that the parameters won't have spaces or 
> other special characters, even when it invokes bash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13434) Add quoting to Shell class

2016-07-27 Thread Owen O'Malley (JIRA)
Owen O'Malley created HADOOP-13434:
--

 Summary: Add quoting to Shell class
 Key: HADOOP-13434
 URL: https://issues.apache.org/jira/browse/HADOOP-13434
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


The Shell class makes assumptions that the parameters won't have spaces or 
other special characters, even when it invokes bash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12743) Fix git environment check during test-patch

2016-01-26 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118278#comment-15118278
 ] 

Owen O'Malley commented on HADOOP-12743:


+1, looks good to me

> Fix git environment check during test-patch
> ---
>
> Key: HADOOP-12743
> URL: https://issues.apache.org/jira/browse/HADOOP-12743
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Ray Chiang
>Assignee: Allen Wittenauer
> Attachments: HADOOP-12743.00.patch
>
>
> I'm seeing this error when running test-patch in Apache Hadoop:
> {quote}
> 
> 
> Confirming git environment
> 
> 
> ERROR: /Users/rchiang/Dev_01/ah_01/patchprocess is not a git repo.
> 
> 
> {quote}
> From a follow up email:
>  it’s a trivial bug in the yetus-wrapper.  Missing a popd after extraction.  
> Simple fix is to run the command twice. (since the short circuit kicks in 
> after yetus is cached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12494) fetchdt stores the token based on token kind instead of token service

2015-10-27 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977027#comment-14977027
 ] 

Owen O'Malley commented on HADOOP-12494:


+1

> fetchdt stores the token based on token kind instead of token service
> -
>
> Key: HADOOP-12494
> URL: https://issues.apache.org/jira/browse/HADOOP-12494
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: HeeSoo Kim
>Assignee: HeeSoo Kim
> Fix For: 3.0.0
>
> Attachments: HADOOP-12494, HADOOP-12494.patch
>
>
> The fetchdt command stores the token in a file. However, the key of token is 
> a token kind instead of a token service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-3315) New binary file format

2015-09-15 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746809#comment-14746809
 ] 

Owen O'Malley commented on HADOOP-3315:
---

TFile never got that much traction. You really should use ORC or other high 
performance columnar format that is self-describing. See http://orc.apache.org/

> New binary file format
> --
>
> Key: HADOOP-3315
> URL: https://issues.apache.org/jira/browse/HADOOP-3315
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: io
>Reporter: Owen O'Malley
>Assignee: Hong Tang
> Fix For: 0.20.1
>
> Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, 
> HADOOP-3315_20080915_TFILE.patch, TFile Specification 20081217.pdf, 
> hadoop-3315-0507.patch, hadoop-3315-0509-2.patch, hadoop-3315-0509.patch, 
> hadoop-3315-0513.patch, hadoop-3315-0514.patch, hadoop-3315-0601.patch, 
> hadoop-3315-0602.patch, hadoop-3315-0605.patch, hadoop-3315-0612.patch, 
> hadoop-3315-0623-2.patch, hadoop-3315-0701-yhadoop-20.patch, 
> hadoop-3315-0710-1-hadoop-20.patch, hadoop-trunk-tfile.patch, 
> hadoop-trunk-tfile.patch
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12358) FSShell should prompt before deleting directories bigger than a configured size

2015-08-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720288#comment-14720288
 ] 

Owen O'Malley commented on HADOOP-12358:


I agree with Allen. This is a bad feature that will break lots of users.

The trash feature already does this better and because it has been used for 
many years, is expected behavior.

 FSShell should prompt before deleting directories bigger than a configured 
 size
 ---

 Key: HADOOP-12358
 URL: https://issues.apache.org/jira/browse/HADOOP-12358
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HADOOP-12358.00.patch, HADOOP-12358.01.patch, 
 HADOOP-12358.02.patch, HADOOP-12358.03.patch


 We have seen many cases with customers deleting data inadvertently with 
 -skipTrash. The FSShell should prompt user if the size of the data or the 
 number of files being deleted is bigger than a threshold even though 
 -skipTrash is being used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12249) pull argument parsing into a function

2015-07-31 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649821#comment-14649821
 ] 

Owen O'Malley commented on HADOOP-12249:


This looks good to me, Allen. +1

 pull argument parsing into a function
 -

 Key: HADOOP-12249
 URL: https://issues.apache.org/jira/browse/HADOOP-12249
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
  Labels: scripts, shell
 Attachments: HADOOP-12249.00.patch, HADOOP-12294.01.patch


 In order to enable significantly better unit testing as well as enhanced 
 functionality, large portions of *-config.sh should be pulled into functions. 
  See first comment for more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10854) unit tests for the shell scripts

2015-07-31 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649827#comment-14649827
 ] 

Owen O'Malley commented on HADOOP-10854:


+1 looks good, Allen.

 unit tests for the shell scripts
 

 Key: HADOOP-10854
 URL: https://issues.apache.org/jira/browse/HADOOP-10854
 Project: Hadoop Common
  Issue Type: Test
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
  Labels: scripts
 Attachments: HADOOP-10854.00.patch, HADOOP-10854.01.patch, 
 HADOOP-10854.02.patch, HADOOP-10854.03.patch, HADOOP-10854.04.patch, 
 HADOOP-10854.05.patch


 With HADOOP-9902 moving a lot of the core functionality to functions, we 
 should build some unit tests for them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (HADOOP-11902) Prune old javadoc versions from the website.

2015-05-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HADOOP-11902.
--

 Prune old javadoc versions from the website.
 

 Key: HADOOP-11902
 URL: https://issues.apache.org/jira/browse/HADOOP-11902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: redirect.patch, removed-files.txt


 We have a lot of old versions of javadoc on the website. We should prune the 
 old versions and redirect the old urls to the current versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11902) Prune old javadoc versions from the website.

2015-05-05 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11902:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

I committed this. Thanks, Allen.

 Prune old javadoc versions from the website.
 

 Key: HADOOP-11902
 URL: https://issues.apache.org/jira/browse/HADOOP-11902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: redirect.patch, removed-files.txt


 We have a lot of old versions of javadoc on the website. We should prune the 
 old versions and redirect the old urls to the current versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11917) test-patch.sh should work with ${BASEDIR}/patchprocess setups

2015-05-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528949#comment-14528949
 ] 

Owen O'Malley commented on HADOOP-11917:


+1

 test-patch.sh should work with ${BASEDIR}/patchprocess setups
 -

 Key: HADOOP-11917
 URL: https://issues.apache.org/jira/browse/HADOOP-11917
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
 Attachments: HADOOP-11917.01.patch, HADOOP-11917.patch


 There are a bunch of problems with this kind of setup: configuration and code 
 changes in test-patch.sh required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11904) test-patch.sh goes into an infinite loop on non-maven builds

2015-05-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528919#comment-14528919
 ] 

Owen O'Malley commented on HADOOP-11904:


+1 looks good, Allen

 test-patch.sh goes into an infinite loop on non-maven builds
 

 Key: HADOOP-11904
 URL: https://issues.apache.org/jira/browse/HADOOP-11904
 Project: Hadoop Common
  Issue Type: Test
  Components: test
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Critical
 Attachments: HADOOP-11904.patch


 If post HADOOP-11746 test patch is given a non-maven-based build, it goes 
 into an infinite loop looking for modules pom.xml.  There should be an escape 
 clause after switching branches to see if it is maven based. If it is not 
 maven based, then test-patch should either abort or re-exec using that 
 version's test-patch script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-04 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11896:
---
Attachment: Apache Hadoop Releases.pdf

Here is my proposed enhancement to the releases page for those of you that 
don't want to look through the patch. :)

 Redesign the releases page on the Hadoop site
 -

 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: Apache Hadoop Releases.pdf


 Redesign the Hadoop site to:
 * Move the recent releases to the top of the page by reducing the huge table 
 of contents.
 * Provide a direct link (via the mirror page) to each release tarball for the 
 last few releases.
 * Provide the sha256 for each of the recent release tarballs.
 * Provide a direct link the GPG signature for the recent release tarballs.
 * Provide directions on how to verify the GPG signature.
 * Provide a link to the signatures in 
 https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-04 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11896:
---
Attachment: HADOOP-11896.patch

This patch:
* turns back on verification, which was turned off because of a bug in Forrest 
0.8
* updates the release page with quick links to the 3 most recent releases
* adds directions to test the checksums of the release artifacts
* adds a link to the historic hadoop releases
* cuts down the number of TOC entries so that the real information isn't hidden 
below a huge TOC.

 Redesign the releases page on the Hadoop site
 -

 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: Apache Hadoop Releases.pdf, HADOOP-11896.patch


 Redesign the Hadoop site to:
 * Move the recent releases to the top of the page by reducing the huge table 
 of contents.
 * Provide a direct link (via the mirror page) to each release tarball for the 
 last few releases.
 * Provide the sha256 for each of the recent release tarballs.
 * Provide a direct link the GPG signature for the recent release tarballs.
 * Provide directions on how to verify the GPG signature.
 * Provide a link to the signatures in 
 https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-04 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527017#comment-14527017
 ] 

Owen O'Malley commented on HADOOP-11896:


[~cnauroth] shasum is available easily on all Linux boxes and is also available 
by default on MacOS. I guess I'd rather have the more portable solution.

 Redesign the releases page on the Hadoop site
 -

 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: Apache Hadoop Releases.pdf, HADOOP-11896.patch


 Redesign the Hadoop site to:
 * Move the recent releases to the top of the page by reducing the huge table 
 of contents.
 * Provide a direct link (via the mirror page) to each release tarball for the 
 last few releases.
 * Provide the sha256 for each of the recent release tarballs.
 * Provide a direct link the GPG signature for the recent release tarballs.
 * Provide directions on how to verify the GPG signature.
 * Provide a link to the signatures in 
 https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-04 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11896:
---
Attachment: HADOOP-11896.patch
Apache Hadoop Releases.pdf

Added links to the release notes from the current releases as requested by 
[~vinodkv].

 Redesign the releases page on the Hadoop site
 -

 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: Apache Hadoop Releases.pdf, Apache Hadoop Releases.pdf, 
 HADOOP-11896.patch, HADOOP-11896.patch


 Redesign the Hadoop site to:
 * Move the recent releases to the top of the page by reducing the huge table 
 of contents.
 * Provide a direct link (via the mirror page) to each release tarball for the 
 last few releases.
 * Provide the sha256 for each of the recent release tarballs.
 * Provide a direct link the GPG signature for the recent release tarballs.
 * Provide directions on how to verify the GPG signature.
 * Provide a link to the signatures in 
 https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11919) Add release 2.4.1 to the table at the top of the releases page

2015-05-04 Thread Owen O'Malley (JIRA)
Owen O'Malley created HADOOP-11919:
--

 Summary: Add release 2.4.1 to the table at the top of the releases 
page
 Key: HADOOP-11919
 URL: https://issues.apache.org/jira/browse/HADOOP-11919
 Project: Hadoop Common
  Issue Type: Bug
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: site


Add the 2.4.1 release to the table at the top of the site page.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11902) Prune old javadoc versions from the website.

2015-05-04 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527836#comment-14527836
 ] 

Owen O'Malley commented on HADOOP-11902:


[~aw] I've created HADOOP-11919 to add Hadoop 2.4.1 to the table at the top of 
the releases page.

 Prune old javadoc versions from the website.
 

 Key: HADOOP-11902
 URL: https://issues.apache.org/jira/browse/HADOOP-11902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 We have a lot of old versions of javadoc on the website. We should prune the 
 old versions and redirect the old urls to the current versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11902) Prune old javadoc versions from the website.

2015-05-04 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11902:
---
Status: Patch Available  (was: Open)

 Prune old javadoc versions from the website.
 

 Key: HADOOP-11902
 URL: https://issues.apache.org/jira/browse/HADOOP-11902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: redirect.patch, removed-files.txt


 We have a lot of old versions of javadoc on the website. We should prune the 
 old versions and redirect the old urls to the current versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11902) Prune old javadoc versions from the website.

2015-05-04 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11902:
---
Attachment: redirect.patch
removed-files.txt

Here is the list of 95371 files that I'm proposing deleting. I've also included 
redirect patch showing the mapping of the urls.

 Prune old javadoc versions from the website.
 

 Key: HADOOP-11902
 URL: https://issues.apache.org/jira/browse/HADOOP-11902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: redirect.patch, removed-files.txt


 We have a lot of old versions of javadoc on the website. We should prune the 
 old versions and redirect the old urls to the current versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-04 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley closed HADOOP-11896.
--

 Redesign the releases page on the Hadoop site
 -

 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: site

 Attachments: Apache Hadoop Releases.pdf, Apache Hadoop Releases.pdf, 
 HADOOP-11896.patch, HADOOP-11896.patch


 Redesign the Hadoop site to:
 * Move the recent releases to the top of the page by reducing the huge table 
 of contents.
 * Provide a direct link (via the mirror page) to each release tarball for the 
 last few releases.
 * Provide the sha256 for each of the recent release tarballs.
 * Provide a direct link the GPG signature for the recent release tarballs.
 * Provide directions on how to verify the GPG signature.
 * Provide a link to the signatures in 
 https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-04 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-11896.

   Resolution: Fixed
Fix Version/s: site

I committed this with 1677710.

 Redesign the releases page on the Hadoop site
 -

 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: site

 Attachments: Apache Hadoop Releases.pdf, Apache Hadoop Releases.pdf, 
 HADOOP-11896.patch, HADOOP-11896.patch


 Redesign the Hadoop site to:
 * Move the recent releases to the top of the page by reducing the huge table 
 of contents.
 * Provide a direct link (via the mirror page) to each release tarball for the 
 last few releases.
 * Provide the sha256 for each of the recent release tarballs.
 * Provide a direct link the GPG signature for the recent release tarballs.
 * Provide directions on how to verify the GPG signature.
 * Provide a link to the signatures in 
 https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11902) Prune old javadoc versions from the website.

2015-05-01 Thread Owen O'Malley (JIRA)
Owen O'Malley created HADOOP-11902:
--

 Summary: Prune old javadoc versions from the website.
 Key: HADOOP-11902
 URL: https://issues.apache.org/jira/browse/HADOOP-11902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley


We have a lot of old versions of javadoc on the website. We should prune the 
old versions and redirect the old urls to the current versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11902) Prune old javadoc versions from the website.

2015-05-01 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524248#comment-14524248
 ] 

Owen O'Malley commented on HADOOP-11902:


Lets keep:
* 2.7.0
* 2.6.0
* 2.5.2
* 0.23.11
* 1.2.1

Let's remove:
* r0.23.6
* r0.23.7
* r0.23.8
* r0.23.9
* r0.23.10
* r1.0.4
* r1.1.1
* r1.1.2
* r1.2.0
* r2.0.2-alpha
* r2.0.3-alpha
* r2.0.4-alpha
* r2.0.5-alpha
* r2.0.6-alpha
* r2.1.0-beta
* r2.2.0
* r2.3.0
* r2.4.0
* r2.4.1
* r2.5.0
* r2.5.1


 Prune old javadoc versions from the website.
 

 Key: HADOOP-11902
 URL: https://issues.apache.org/jira/browse/HADOOP-11902
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 We have a lot of old versions of javadoc on the website. We should prune the 
 old versions and redirect the old urls to the current versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-01 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524311#comment-14524311
 ] 

Owen O'Malley commented on HADOOP-11896:


[~andrew.wang] The https://people.apache.org/keys/group/hadoop.asc URL is 
automatically maintained while the KEYS file is often sadly out of date. The 
only case where the KEYS file is better is if a release manager resigns from 
the Hadoop project and is no longer included in the group/hadoop.asc.

That said, either will work and I don't care that much.

 Redesign the releases page on the Hadoop site
 -

 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Redesign the Hadoop site to:
 * Move the recent releases to the top of the page by reducing the huge table 
 of contents.
 * Provide a direct link (via the mirror page) to each release tarball for the 
 last few releases.
 * Provide the sha256 for each of the recent release tarballs.
 * Provide a direct link the GPG signature for the recent release tarballs.
 * Provide directions on how to verify the GPG signature.
 * Provide a link to the signatures in 
 https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-01 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523421#comment-14523421
 ] 

Owen O'Malley commented on HADOOP-11896:


The more specific directions that I'm proposing look like:

{noformat}
  % wget https://people.apache.org/keys/group/hadoop.asc
  % gpg --import hadoop.asc
  % gpg --verify hadoop-X.Y.Z-src.tar.gz.asc
{noformat}

 Redesign the releases page on the Hadoop site
 -

 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Redesign the Hadoop site to:
 * Move the recent releases to the top of the page by reducing the huge table 
 of contents.
 * Provide a direct link (via the mirror page) to each release tarball for the 
 last few releases.
 * Provide the sha256 for each of the recent release tarballs.
 * Provide a direct link the GPG signature for the recent release tarballs.
 * Provide directions on how to verify the GPG signature.
 * Provide a link to the signatures in 
 https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11896) Redesign the releases page on the Hadoop site

2015-05-01 Thread Owen O'Malley (JIRA)
Owen O'Malley created HADOOP-11896:
--

 Summary: Redesign the releases page on the Hadoop site
 Key: HADOOP-11896
 URL: https://issues.apache.org/jira/browse/HADOOP-11896
 Project: Hadoop Common
  Issue Type: Improvement
  Components: site
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Redesign the Hadoop site to:
* Move the recent releases to the top of the page by reducing the huge table of 
contents.
* Provide a direct link (via the mirror page) to each release tarball for the 
last few releases.
* Provide the sha256 for each of the recent release tarballs.
* Provide a direct link the GPG signature for the recent release tarballs.
* Provide directions on how to verify the GPG signature.
* Provide a link to the signatures in 
https://people.apache.org/keys/group/hadoop.asc



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11746) rewrite test-patch.sh

2015-04-08 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485894#comment-14485894
 ] 

Owen O'Malley commented on HADOOP-11746:


This looks like a nice improvement.

It would be great if we could name patch files like git.patch and have it 
apply to the git hash . That would let you upload patches for branches 
without worrying about conflicting changes.

 rewrite test-patch.sh
 -

 Key: HADOOP-11746
 URL: https://issues.apache.org/jira/browse/HADOOP-11746
 Project: Hadoop Common
  Issue Type: Test
  Components: build, test
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
 Attachments: HADOOP-11746-00.patch, HADOOP-11746-01.patch, 
 HADOOP-11746-02.patch, HADOOP-11746-03.patch, HADOOP-11746-04.patch, 
 HADOOP-11746-05.patch, HADOOP-11746-06.patch, HADOOP-11746-07.patch, 
 HADOOP-11746-09.patch, HADOOP-11746-10.patch, HADOOP-11746-11.patch


 This code is bad and you should feel bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11717) Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth

2015-04-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483341#comment-14483341
 ] 

Owen O'Malley commented on HADOOP-11717:


I think this is good to go. If we want to generalize it further when we have 
additional use cases to support, we can do that. This just provides a plugin 
for web sso that is useful to users that don't want to use spnego for web ui.

 Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth
 -

 Key: HADOOP-11717
 URL: https://issues.apache.org/jira/browse/HADOOP-11717
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.8.0

 Attachments: HADOOP-11717-1.patch, HADOOP-11717-2.patch, 
 HADOOP-11717-3.patch, HADOOP-11717-4.patch, HADOOP-11717-5.patch, 
 HADOOP-11717-6.patch, HADOOP-11717-7.patch, HADOOP-11717-8.patch, 
 RedirectingWebSSOwithJWTforHadoopWebUIs.pdf


 Extend AltKerberosAuthenticationHandler to provide WebSSO flow for UIs.
 The actual authentication is done by some external service that the handler 
 will redirect to when there is no hadoop.auth cookie and no JWT token found 
 in the incoming request.
 Using JWT provides a number of benefits:
 * It is not tied to any specific authentication mechanism - so buys us many 
 SSO integrations
 * It is cryptographically verifiable for determining whether it can be trusted
 * Checking for expiration allows for a limited lifetime and window for 
 compromised use
 This will introduce the use of nimbus-jose-jwt library for processing, 
 validating and parsing JWT tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11717) Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth

2015-04-07 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-11717:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Larry!

 Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth
 -

 Key: HADOOP-11717
 URL: https://issues.apache.org/jira/browse/HADOOP-11717
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.8.0

 Attachments: HADOOP-11717-1.patch, HADOOP-11717-2.patch, 
 HADOOP-11717-3.patch, HADOOP-11717-4.patch, HADOOP-11717-5.patch, 
 HADOOP-11717-6.patch, HADOOP-11717-7.patch, HADOOP-11717-8.patch, 
 RedirectingWebSSOwithJWTforHadoopWebUIs.pdf


 Extend AltKerberosAuthenticationHandler to provide WebSSO flow for UIs.
 The actual authentication is done by some external service that the handler 
 will redirect to when there is no hadoop.auth cookie and no JWT token found 
 in the incoming request.
 Using JWT provides a number of benefits:
 * It is not tied to any specific authentication mechanism - so buys us many 
 SSO integrations
 * It is cryptographically verifiable for determining whether it can be trusted
 * Checking for expiration allows for a limited lifetime and window for 
 compromised use
 This will introduce the use of nimbus-jose-jwt library for processing, 
 validating and parsing JWT tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11717) Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth

2015-04-02 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393661#comment-14393661
 ] 

Owen O'Malley commented on HADOOP-11717:


I think this looks good. +1

 Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth
 -

 Key: HADOOP-11717
 URL: https://issues.apache.org/jira/browse/HADOOP-11717
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Attachments: HADOOP-11717-1.patch, HADOOP-11717-2.patch, 
 HADOOP-11717-3.patch, HADOOP-11717-4.patch, HADOOP-11717-5.patch, 
 HADOOP-11717-6.patch, HADOOP-11717-7.patch


 Extend AltKerberosAuthenticationHandler to provide WebSSO flow for UIs.
 The actual authentication is done by some external service that the handler 
 will redirect to when there is no hadoop.auth cookie and no JWT token found 
 in the incoming request.
 Using JWT provides a number of benefits:
 * It is not tied to any specific authentication mechanism - so buys us many 
 SSO integrations
 * It is cryptographically verifiable for determining whether it can be trusted
 * Checking for expiration allows for a limited lifetime and window for 
 compromised use
 This will introduce the use of nimbus-jose-jwt library for processing, 
 validating and parsing JWT tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11620) Add Support for Load Balancing across a group of KMS servers for HA

2015-02-24 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335034#comment-14335034
 ] 

Owen O'Malley commented on HADOOP-11620:


I agree with Larry that the other pattern was better. It is a little strange 
using a compound like host1;host2 for the host part of the URI, but moving an 
override of the port number into the host part is too confusing for little 
gain. Please go back to the previous version.

 Add Support for Load Balancing across a group of KMS servers for HA
 ---

 Key: HADOOP-11620
 URL: https://issues.apache.org/jira/browse/HADOOP-11620
 Project: Hadoop Common
  Issue Type: Improvement
  Components: kms
Affects Versions: 2.6.0
Reporter: Arun Suresh
Assignee: Arun Suresh
 Attachments: HADOOP-11620.1.patch, HADOOP-11620.2.patch, 
 HADOOP-11620.3.patch


 This patch needs to add support for :
 * specification of multiple hostnames in the kms key provider uri
 * KMS client to load balance requests across the hosts specified in the kms 
 keyprovider uri.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10908) ClusterNodeSetup, CommandsManual, FileSystemShell needs updating

2015-01-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265220#comment-14265220
 ] 

Owen O'Malley commented on HADOOP-10908:


+1 

Thanks for updating the documentation, Alan!

 ClusterNodeSetup, CommandsManual, FileSystemShell needs updating
 

 Key: HADOOP-10908
 URL: https://issues.apache.org/jira/browse/HADOOP-10908
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
 Attachments: HADOOP-10908-01.patch, HADOOP-10908-02.patch, 
 HADOOP-10908-03.patch, HADOOP-10908.patch


 A lot of the instructions in the cluster node setup are not good practices 
 post-9902.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11055) non-daemon pid files are missing

2014-09-16 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136172#comment-14136172
 ] 

Owen O'Malley commented on HADOOP-11055:


+1 looks good, Allen.

 non-daemon pid files are missing
 

 Key: HADOOP-11055
 URL: https://issues.apache.org/jira/browse/HADOOP-11055
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Blocker
 Attachments: HADOOP-11055.patch


 Somewhere along the way, daemons run in default mode lost pid files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10970) Cleanup KMS configuration keys

2014-08-19 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102382#comment-14102382
 ] 

Owen O'Malley commented on HADOOP-10970:


The comment on the hadoop.security.key.provider.path value says 'URI' instead 
of 'URI path'. Please fix it.

 Cleanup KMS configuration keys
 --

 Key: HADOOP-10970
 URL: https://issues.apache.org/jira/browse/HADOOP-10970
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hadoop-10970.001.patch, hadoop-10970.002.patch, 
 hadoop-10970.003.patch


 It'd be nice to add descriptions to the config keys in kms-site.xml.
 Also, it'd be good to rename key.provider.path to key.provider.uri for 
 clarity, or just drop .path.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10970) Cleanup KMS configuration keys

2014-08-15 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098757#comment-14098757
 ] 

Owen O'Malley commented on HADOOP-10970:


-1 to removing the path ability.

What would the lookup be doing? What would an example look like?

 Cleanup KMS configuration keys
 --

 Key: HADOOP-10970
 URL: https://issues.apache.org/jira/browse/HADOOP-10970
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hadoop-10970.001.patch


 It'd be nice to add descriptions to the config keys in kms-site.xml.
 Also, it'd be good to rename key.provider.path to key.provider.uri for 
 clarity, or just drop .path.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10904) Provide Alt to Clear Text Passwords through Cred Provider API

2014-08-15 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098768#comment-14098768
 ] 

Owen O'Malley commented on HADOOP-10904:


Daryn, This is a very different use case from svn+ssh and this reads really 
badly. 

jks+hdfs://nn.example.com/foo/bar.jks 

furthermore, it doesn't nest right:

jks+har+hdfs://nn.example.com/foo/bar

is a complete mess. What are you trying to accomplish that this makes difficult?

 Provide Alt to Clear Text Passwords through Cred Provider API
 -

 Key: HADOOP-10904
 URL: https://issues.apache.org/jira/browse/HADOOP-10904
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay

 This is an umbrella jira to track various child tasks to uptake the 
 credential provider API to enable deployments without storing 
 passwords/credentials in clear text.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10912) Modify scripts to use relative paths

2014-07-30 Thread Owen O'Malley (JIRA)
Owen O'Malley created HADOOP-10912:
--

 Summary: Modify scripts to use relative paths
 Key: HADOOP-10912
 URL: https://issues.apache.org/jira/browse/HADOOP-10912
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Reporter: Owen O'Malley


Make all of the scripts use relative paths for defaulting HADOOP_HOME and the 
related paths. This is useful for any deployments that don't use 
/usr/lib/hadoop as the prefix, including but not limited to tar ball 
installations  and side by side installs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10912) Modify scripts to use relative paths

2014-07-30 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-10912:
---

Attachment: hadoop-10912.path

This is probably subsumed by Alan's HADOOP-9902, but here is a rough patch to 
show generally how this would work without Alan's complete re-write.

 Modify scripts to use relative paths
 

 Key: HADOOP-10912
 URL: https://issues.apache.org/jira/browse/HADOOP-10912
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Reporter: Owen O'Malley
 Attachments: hadoop-10912.path


 Make all of the scripts use relative paths for defaulting HADOOP_HOME and the 
 related paths. This is useful for any deployments that don't use 
 /usr/lib/hadoop as the prefix, including but not limited to tar ball 
 installations  and side by side installs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support

2014-07-29 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077904#comment-14077904
 ] 

Owen O'Malley commented on HADOOP-10791:


Alejandro, 
It looks like it would make sense to use the KeyProvider for this. Having a 
KeyProvider implementation that reads from Zookeeper would be pretty easy. 

 AuthenticationFilter should support externalizing the secret for signing and 
 provide rotation support
 -

 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter
 Attachments: HADOOP-10791.patch, HADOOP-10791.patch


 It should be possible to externalize the secret used to sign the hadoop-auth 
 cookies.
 In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
 the case of Oozie HA, the secret could be stored in Oozie HA control data in 
 ZooKeeper.
 In addition, it is desirable for the secret to change periodically, this 
 means that the AuthenticationService should remember a previous secret for 
 the max duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support

2014-07-29 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077905#comment-14077905
 ] 

Owen O'Malley commented on HADOOP-10791:


In particular, this is just a rolling random key that you want to preserve the 
last two values of. It doesn't make sense to require zookeeper if the user 
doesn't already have it deployed.

 AuthenticationFilter should support externalizing the secret for signing and 
 provide rotation support
 -

 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter
 Attachments: HADOOP-10791.patch, HADOOP-10791.patch


 It should be possible to externalize the secret used to sign the hadoop-auth 
 cookies.
 In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
 the case of Oozie HA, the secret could be stored in Oozie HA control data in 
 ZooKeeper.
 In addition, it is desirable for the secret to change periodically, this 
 means that the AuthenticationService should remember a previous secret for 
 the max duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support

2014-07-29 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078499#comment-14078499
 ] 

Owen O'Malley commented on HADOOP-10791:


Alejandro, there is a huge difference between requiring zookeeper for HA and 
requiring zookeeper for spnego. The unnecessary complexity is creating a plugin 
interface for the one use case that is completely covered by the plugin 
interface you already have.

 AuthenticationFilter should support externalizing the secret for signing and 
 provide rotation support
 -

 Key: HADOOP-10791
 URL: https://issues.apache.org/jira/browse/HADOOP-10791
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Robert Kanter
 Attachments: HADOOP-10791.patch, HADOOP-10791.patch


 It should be possible to externalize the secret used to sign the hadoop-auth 
 cookies.
 In the case of WebHDFS the shared secret used by NN and DNs could be used. In 
 the case of Oozie HA, the secret could be stored in Oozie HA control data in 
 ZooKeeper.
 In addition, it is desirable for the secret to change periodically, this 
 means that the AuthenticationService should remember a previous secret for 
 the max duration of hadoop-auth cookie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-07-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069429#comment-14069429
 ] 

Owen O'Malley commented on HADOOP-10607:


Alejandro,
   I'm puzzled why you are puzzled. We've always added components and 
functionality to Hadoop that are useful to upstream components. A mechanism for 
managing passwords without storing them in plain text passwords is a wonderful 
addition. There are many places in the Hadoop ecosystem where passwords are 
stored in config files, such as hadoop-auth and the hive metastore. Giving them 
a common structure for removing them is a good thing.

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0, 2.6.0

 Attachments: 10607-10.patch, 10607-11.patch, 10607-12.patch, 
 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607-6.patch, 
 10607-7.patch, 10607-8.patch, 10607-9.patch, 10607-branch-2.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-07-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-10607:
---

Fix Version/s: 2.5.0

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0, 2.5.0

 Attachments: 10607-10.patch, 10607-11.patch, 10607-12.patch, 
 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607-6.patch, 
 10607-7.patch, 10607-8.patch, 10607-9.patch, 10607-branch-2.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications

2014-07-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-10607:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

 Create an API to Separate Credentials/Password Storage from Applications
 

 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0, 2.6.0

 Attachments: 10607-10.patch, 10607-11.patch, 10607-12.patch, 
 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607-6.patch, 
 10607-7.patch, 10607-8.patch, 10607-9.patch, 10607-branch-2.patch, 10607.patch


 As with the filesystem API, we need to provide a generic mechanism to support 
 multiple credential storage mechanisms that are potentially from third 
 parties. 
 We need the ability to eliminate the storage of passwords and secrets in 
 clear text within configuration files or within code.
 Toward that end, I propose an API that is configured using a list of URLs of 
 CredentialProviders. The implementation will look for implementations using 
 the ServiceLoader interface and thus support third party libraries.
 Two providers will be included in this patch. One using the credentials cache 
 in MapReduce jobs and the other using Java KeyStores from either HDFS or 
 local file system. 
 A CredShell CLI will also be included in this patch which provides the 
 ability to manage the credentials within the stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10732) Update without holding write lock in JavaKeyStoreProvider#innerSetCredential()

2014-07-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-10732:
---

   Resolution: Fixed
Fix Version/s: 2.6.0
   3.0.0
   Status: Resolved  (was: Patch Available)

The v2 patch removes the synchronization around the hash table, so I'm going to 
use the v1 patch.

I just committed this to trunk and branch-2. Thanks, Ted!

 Update without holding write lock in JavaKeyStoreProvider#innerSetCredential()
 --

 Key: HADOOP-10732
 URL: https://issues.apache.org/jira/browse/HADOOP-10732
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 3.0.0, 2.6.0

 Attachments: hadoop-10732-v1.txt, hadoop-10732-v2.txt


 In 
 hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/alias/JavaKeyStoreProvider.java,
 innerSetCredential() doesn't wrap update with writeLock.lock() / 
 writeLock.unlock().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   4   5   6   7   8   >