[jira] [Commented] (HADOOP-18761) Revert HADOOP-18535 because mysql-connector-java is GPL
[ https://issues.apache.org/jira/browse/HADOOP-18761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730823#comment-17730823 ] Owen O'Malley commented on HADOOP-18761: We should not revert the HADOOP-18535 patch. Its use of mysql connector is allowed. > Revert HADOOP-18535 because mysql-connector-java is GPL > --- > > Key: HADOOP-18761 > URL: https://issues.apache.org/jira/browse/HADOOP-18761 > Project: Hadoop Common > Issue Type: Task >Reporter: Wei-Chiu Chuang >Priority: Blocker > Labels: pull-request-available > > While preparing for 3.3.6 RC, I realized the mysql-connector-java dependency > added by HADOOP-18535 is GPL licensed. > Source: https://github.com/mysql/mysql-connector-j/blob/release/8.0/LICENSE > See legal discussion at LEGAL-423. > I looked at the original jira and github PR and I don't think the license > issue was noticed. > Is it possible to get rid of the mysql connector dependency? As far as I can > tell the dependency is very limited. > If not, I guess I'll have to revert the commits for now. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18535) Implement token storage solution based on MySQL
[ https://issues.apache.org/jira/browse/HADOOP-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730822#comment-17730822 ] Owen O'Malley commented on HADOOP-18535: This patch adds a "provided" dependence of the mysql connector, which is gpl licensed. This is allowed because this is an optional component of Hadoop and the user will need to install the mysql connector. https://www.apache.org/legal/resolved.html#optional > Implement token storage solution based on MySQL > --- > > Key: HADOOP-18535 > URL: https://issues.apache.org/jira/browse/HADOOP-18535 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > Hadoop RBF supports custom implementations of secret managers. At the moment, > the only available implementation is ZKDelegationTokenSecretManagerImpl, > which stores tokens and delegation keys in Zookeeper. > During our investigation, we found that the performance of routers is limited > by the writes to the Zookeeper token store, which impacts requests for token > creation, renewal and cancellation. An alternative secret manager > implementation has been created, based on MySQL, to handle a higher number of > writes. > We measured the throughput of each token operation (create/renew/cancel) on > different setups and obtained the following results: > # Sending requests directly to Namenode (no RBF): > Token creations: 290 reqs per sec > Token renewals: 86 reqs per sec > Token cancellations: 97 reqs per sec > # Sending requests to routers using Zookeeper based secret manager: > Token creations: 31 reqs per sec > Token renewals: 29 reqs per sec > Token cancellations: 40 reqs per sec > # Sending requests to routers using SQL based secret manager: > Token creations: 241 reqs per sec > Token renewals: 103 reqs per sec > Token cancellations: 114 reqs per sec > We noticed a significant improvement when using a SQL secret manager, > comparable to the throughput offered by Namenodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18535) Implement token storage solution based on MySQL
[ https://issues.apache.org/jira/browse/HADOOP-18535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18535. Fix Version/s: 3.3.6 3.4.0 Resolution: Fixed Thanks, Hector! > Implement token storage solution based on MySQL > --- > > Key: HADOOP-18535 > URL: https://issues.apache.org/jira/browse/HADOOP-18535 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Fix For: 3.3.6, 3.4.0 > > > Hadoop RBF supports custom implementations of secret managers. At the moment, > the only available implementation is ZKDelegationTokenSecretManagerImpl, > which stores tokens and delegation keys in Zookeeper. > During our investigation, we found that the performance of routers is limited > by the writes to the Zookeeper token store, which impacts requests for token > creation, renewal and cancellation. An alternative secret manager > implementation has been created, based on MySQL, to handle a higher number of > writes. > We measured the throughput of each token operation (create/renew/cancel) on > different setups and obtained the following results: > # Sending requests directly to Namenode (no RBF): > Token creations: 290 reqs per sec > Token renewals: 86 reqs per sec > Token cancellations: 97 reqs per sec > # Sending requests to routers using Zookeeper based secret manager: > Token creations: 31 reqs per sec > Token renewals: 29 reqs per sec > Token cancellations: 40 reqs per sec > # Sending requests to routers using SQL based secret manager: > Token creations: 241 reqs per sec > Token renewals: 103 reqs per sec > Token cancellations: 114 reqs per sec > We noticed a significant improvement when using a SQL secret manager, > comparable to the throughput offered by Namenodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion
[ https://issues.apache.org/jira/browse/HADOOP-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18324. Fix Version/s: 3.4.0 3.3.5 2.10.3 Resolution: Fixed > Interrupting RPC Client calls can lead to thread exhaustion > --- > > Key: HADOOP-18324 > URL: https://issues.apache.org/jira/browse/HADOOP-18324 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 3.4.0, 2.10.2, 3.3.3 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5, 2.10.3 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently the IPC client creates a boundless number of threads to write the > rpc request to the socket. The NameNode uses timeouts on its RPC calls to the > Journal Node and a stuck JN will cause the NN to create an infinite set of > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18444) Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash
[ https://issues.apache.org/jira/browse/HADOOP-18444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18444. Fix Version/s: 3.4.0 3.3.9 Resolution: Fixed > Add Support for localized trash for ViewFileSystem in > Trash.moveToAppropriateTrash > -- > > Key: HADOOP-18444 > URL: https://issues.apache.org/jira/browse/HADOOP-18444 > Project: Hadoop Common > Issue Type: Bug >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > Trash.moveToAppropriateTrash is used by _hadoop cli -rm_ and hive, to move > files to trash. However, its current implementation does not support > localized trash policy we added to ViewFileSystem in HADOOP-18144. > The reason is in moveToAppropriateTrash, it first resolves a path and then > uses the resolvedFs, to initialize the trash. As a result, it uses > getTrashRoot() implementation from targetFs, not ViewFileSystem. The new > localized trash policy we implemented in ViewFileSystem is not invoked. > With the new localized trash policy for ViewFileSystem, the trash root would > be local to a mount point, thus, for ViewFileSystem with this flag turned on, > there is no need to resolve the path in moveToAppropriateTrash. Rename in > ViewFileSystem can resolve the logical paths correctly and be able to move a > file to trash within a mount point. > Code section of current moveToAppropriateTrash implementation. > {code:java} > public static boolean moveToAppropriateTrash(FileSystem fs, Path p, > Configuration conf) throws IOException { > Path fullyResolvedPath = fs.resolvePath(p); > FileSystem fullyResolvedFs = > FileSystem.get(fullyResolvedPath.toUri(), conf); > ... > Trash trash = new Trash(fullyResolvedFs, conf); > return trash.moveToTrash(fullyResolvedPath); > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18434) Proxy users do not share RPC connections
Owen O'Malley created HADOOP-18434: -- Summary: Proxy users do not share RPC connections Key: HADOOP-18434 URL: https://issues.apache.org/jira/browse/HADOOP-18434 Project: Hadoop Common Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley When the Hive MetaStore uses Storage-Based Authorization, it needs to perform checks against the NameNode as the query's user. Unfortunately, RPC's ConnectionId uses the UGI's equal & hash functions, which check for the subject's object equality. Thus, we've seen the HMS spawn thousands of threads before they go idle and are eventually closed. If the peak goes over 10k threads the HMS becomes unstable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13144) Enhancing IPC client throughput via multiple connections per user
[ https://issues.apache.org/jira/browse/HADOOP-13144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-13144: --- Fix Version/s: 3.3.9 > Enhancing IPC client throughput via multiple connections per user > - > > Key: HADOOP-13144 > URL: https://issues.apache.org/jira/browse/HADOOP-13144 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Jason Kace >Assignee: Íñigo Goiri >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > Attachments: HADOOP-13144-performance.patch, HADOOP-13144.000.patch, > HADOOP-13144.001.patch, HADOOP-13144.002.patch, HADOOP-13144.003.patch, > HADOOP-13144_overload_enhancement.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > The generic IPC client ({{org.apache.hadoop.ipc.Client}}) utilizes a single > connection thread for each {{ConnectionId}}. The {{ConnectionId}} is unique > to the connection's remote address, ticket and protocol. Each ConnectionId > is 1:1 mapped to a connection thread by the client via a map cache. > The result is to serialize all IPC read/write activity through a single > thread for a each user/ticket + address. If a single user makes repeated > calls (1k-100k/sec) to the same destination, the IPC client becomes a > bottleneck. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18406) Adds alignment context to call path for creating RPC proxy with multiple connections per user.
[ https://issues.apache.org/jira/browse/HADOOP-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18406. Fix Version/s: 3.4.0 3.3.9 Resolution: Fixed > Adds alignment context to call path for creating RPC proxy with multiple > connections per user. > -- > > Key: HADOOP-18406 > URL: https://issues.apache.org/jira/browse/HADOOP-18406 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Simbarashe Dzinamarira >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > HDFS-13274 (RBF: Extend RouterRpcClient to use multiple sockets) gets the RPC > proxy using methods which do not allow using an alignment context. These > methods were added in HADOOP-13144 (Enhancing IPC client throughput via > multiple connections per user). > This change adds an alignment context as an argument for methods in the call > path for creating the proxy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18345) Enhance client protocol to propagate last seen state IDs for multiple nameservices.
[ https://issues.apache.org/jira/browse/HADOOP-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18345. Fix Version/s: 3.4.0 3.3.9 Resolution: Fixed > Enhance client protocol to propagate last seen state IDs for multiple > nameservices. > --- > > Key: HADOOP-18345 > URL: https://issues.apache.org/jira/browse/HADOOP-18345 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Simbarashe Dzinamarira >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > The RPCHeader in the client protocol currently contains a single value to > indicate the last seen state ID for a namenode. > {noformat} > optional int64 stateId = 8; // The last seen Global State ID > {noformat} > When there are multiple namenodes, such as in router based federation, the > headers need to carry the state IDs for each of these nameservices that are > part of the federation. > This change is a prerequisite for HDFS-13522: RBF: Support observer node from > Router-Based Federation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion
[ https://issues.apache.org/jira/browse/HADOOP-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561206#comment-17561206 ] Owen O'Malley edited comment on HADOOP-18324 at 6/30/22 11:00 PM: -- We had a NameNode taken out with 10k threads of which 9500 were "IPC Parameter Sending Thread ". was (Author: owen.omalley): We had a NameNode taken out with 10k threads of which 9500 where "IPC Parameter Sending Thread ". > Interrupting RPC Client calls can lead to thread exhaustion > --- > > Key: HADOOP-18324 > URL: https://issues.apache.org/jira/browse/HADOOP-18324 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 3.4.0, 2.10.2, 3.3.3 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Critical > > Currently the IPC client creates a boundless number of threads to write the > rpc request to the socket. The NameNode uses timeouts on its RPC calls to the > Journal Node and a stuck JN will cause the NN to create an infinite set of > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion
[ https://issues.apache.org/jira/browse/HADOOP-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561206#comment-17561206 ] Owen O'Malley commented on HADOOP-18324: We had a NameNode taken out with 10k threads of which 9500 where "IPC Parameter Sending Thread ". > Interrupting RPC Client calls can lead to thread exhaustion > --- > > Key: HADOOP-18324 > URL: https://issues.apache.org/jira/browse/HADOOP-18324 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 3.4.0, 2.10.2, 3.3.3 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Critical > > Currently the IPC client creates a boundless number of threads to write the > rpc request to the socket. The NameNode uses timeouts on its RPC calls to the > Journal Node and a stuck JN will cause the NN to create an infinite set of > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion
[ https://issues.apache.org/jira/browse/HADOOP-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-18324: --- Affects Version/s: 3.3.3 2.10.2 3.4.0 > Interrupting RPC Client calls can lead to thread exhaustion > --- > > Key: HADOOP-18324 > URL: https://issues.apache.org/jira/browse/HADOOP-18324 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 3.4.0, 2.10.2, 3.3.3 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Critical > > Currently the IPC client creates a boundless number of threads to write the > rpc request to the socket. The NameNode uses timeouts on its RPC calls to the > Journal Node and a stuck JN will cause the NN to create an infinite set of > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18324) Interrupting RPC Client calls can lead to thread exhaustion
Owen O'Malley created HADOOP-18324: -- Summary: Interrupting RPC Client calls can lead to thread exhaustion Key: HADOOP-18324 URL: https://issues.apache.org/jira/browse/HADOOP-18324 Project: Hadoop Common Issue Type: Bug Components: ipc Reporter: Owen O'Malley Assignee: Owen O'Malley Currently the IPC client creates a boundless number of threads to write the rpc request to the socket. The NameNode uses timeouts on its RPC calls to the Journal Node and a stuck JN will cause the NN to create an infinite set of threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18193) Support nested mount points in INodeTree
[ https://issues.apache.org/jira/browse/HADOOP-18193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18193. Fix Version/s: 3.4.0 Resolution: Fixed I just committed this. Thanks, Lei! > Support nested mount points in INodeTree > > > Key: HADOOP-18193 > URL: https://issues.apache.org/jira/browse/HADOOP-18193 > Project: Hadoop Common > Issue Type: Improvement > Components: viewfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: Nested Mount Point in ViewFs.pdf > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Defining following client mount table config is not supported in INodeTree > and will throw FileAlreadyExistsException > > {code:java} > fs.viewfs.mounttable.link./foo/bar=hdfs://nn1/foo/bar > fs.viewfs.mounttable.link./foo=hdfs://nn02/foo > {code} > INodeTree has 2 methods that need change to support nested mount points. > {code:java} > createLink(): build INodeTree during fs init. > resolve(): resolve path in INodeTree with viewfs apis. > {code} > ViewFileSystem and ViewFs maintains an INodeTree instance(fsState) in both > classes and call fsState.resolve(..) to resolve path to specific mount point. > INodeTree.resolve encapsulates the logic of nested mount point resolving. So > no changes are expected in both classes. > AC: > # INodeTree.createlink should support creating nested mount > points.(INodeTree is constructed during fs init) > # INodeTree.resolve should support resolve path based on nested mount > points. (INodeTree.resolve is used in viewfs apis) > # No regression in existing ViewFileSystem and ViewFs apis. > # Ensure some important apis are not broken with nested mount points. > (Rename, getContentSummary, listStatus...) > > Spec: > Please review attached pdf for spec about this feature. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18222) Prevent DelegationTokenSecretManagerMetrics from registering multiple times
[ https://issues.apache.org/jira/browse/HADOOP-18222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-18222: --- Fix Version/s: 3.4.0 3.3.4 Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks, Hector! > Prevent DelegationTokenSecretManagerMetrics from registering multiple times > > > Key: HADOOP-18222 > URL: https://issues.apache.org/jira/browse/HADOOP-18222 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.4 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > After committing HADOOP-18167, we received reports of the following error > when ResourceManager is initialized: > {noformat} > Caused by: java.io.IOException: Problem starting http server > at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1389) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:475) > ... 4 more > Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source > DelegationTokenSecretManagerMetrics already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:71) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$DelegationTokenSecretManagerMetrics.create(AbstractDelegationTokenSecretManager.java:878) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.(AbstractDelegationTokenSecretManager.java:152) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$DelegationTokenSecretManager.(DelegationTokenManager.java:72) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.(DelegationTokenManager.java:122) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:161) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:130) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:214) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180) > at > org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.init(RMAuthenticationFilter.java:53){noformat} > This can happen if MetricsSystemImpl#init is called and multiple metrics are > registered with the same name. A proposed solution is to declare the metrics > in AbstractDelegationTokenSecretManager as singleton, which would prevent > multiple instances DelegationTokenSecretManagerMetrics from being registered. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18169) getDelegationTokens in ViewFs should also fetch the token from the fallback FS
[ https://issues.apache.org/jira/browse/HADOOP-18169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18169. Fix Version/s: 3.4.0 3.3.3 Resolution: Fixed > getDelegationTokens in ViewFs should also fetch the token from the fallback FS > -- > > Key: HADOOP-18169 > URL: https://issues.apache.org/jira/browse/HADOOP-18169 > Project: Hadoop Common > Issue Type: Bug >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > getDelegationTokens in ViewFs does not include the delegationToken from the > fallback FS, while it should. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18169) getDelegationTokens in ViewFs should also fetch the token from the fallback FS
[ https://issues.apache.org/jira/browse/HADOOP-18169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reassigned HADOOP-18169: -- Assignee: Xing Lin > getDelegationTokens in ViewFs should also fetch the token from the fallback FS > -- > > Key: HADOOP-18169 > URL: https://issues.apache.org/jira/browse/HADOOP-18169 > Project: Hadoop Common > Issue Type: Bug >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > getDelegationTokens in ViewFs does not include the delegationToken from the > fallback FS, while it should. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16254) Add proxy address in IPC connection
[ https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-16254: --- Resolution: Duplicate Status: Resolved (was: Patch Available) This has been fixed by using the CallerContext. > Add proxy address in IPC connection > --- > > Key: HADOOP-16254 > URL: https://issues.apache.org/jira/browse/HADOOP-16254 > Project: Hadoop Common > Issue Type: New Feature > Components: ipc >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Blocker > Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, > HADOOP-16254.004.patch, HADOOP-16254.005.patch, HADOOP-16254.006.patch, > HADOOP-16254.007.patch > > > In order to support data locality of RBF, we need to add new field about > client hostname in the RPC headers of Router protocol calls. > clientHostname represents hostname of client and forward by Router to > Namenode to support data locality friendly. See more [RBF Data Locality > Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf] > in HDFS-13248 and [maillist > vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E]. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18129) Change URI[] in INodeLink to String[] to reduce memory footprint of ViewFileSystem
[ https://issues.apache.org/jira/browse/HADOOP-18129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18129. Fix Version/s: 3.4.0 3.3.3 Resolution: Fixed I committed this. Thanks, Abhishek! > Change URI[] in INodeLink to String[] to reduce memory footprint of > ViewFileSystem > -- > > Key: HADOOP-18129 > URL: https://issues.apache.org/jira/browse/HADOOP-18129 > Project: Hadoop Common > Issue Type: Bug >Reporter: Abhishek Das >Assignee: Abhishek Das >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Around 40k instances of INodeLink each of which is taking memory ranging from > 1680bytes to 1160bytes. Multiplying 40k with 1160bytes will give us > approximate 45mb. > With changing from URI to String in INodeLink the memory consumed by each of > the INodeLink objects has reduced from ~1160 bytes to ~320 bytes. Overall > size becomes (40k X 320) 12mb -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18127) Backport HADOOP-13055 into branch-2.10
[ https://issues.apache.org/jira/browse/HADOOP-18127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18127. Resolution: Fixed I just committed these backports. > Backport HADOOP-13055 into branch-2.10 > -- > > Key: HADOOP-18127 > URL: https://issues.apache.org/jira/browse/HADOOP-18127 > Project: Hadoop Common > Issue Type: Sub-task > Components: viewfs >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > HADOOP-13055 introduce linkMergeSlash and linkFallback for ViewFileSystem. > Would be good to backport it to branch-2.10 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18144) getTrashRoot/s in ViewFileSystem should return viewFS path, not targetFS path
[ https://issues.apache.org/jira/browse/HADOOP-18144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reassigned HADOOP-18144: -- Assignee: Xing Lin > getTrashRoot/s in ViewFileSystem should return viewFS path, not targetFS path > - > > Key: HADOOP-18144 > URL: https://issues.apache.org/jira/browse/HADOOP-18144 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > It is probably incorrect that we return a targetFS path from getTrashRoot() > in ViewFileSystem, as that path will be used later on by ViewFileSystem in > other operations, such as rename. ViewFileSystem is assuming the path that it > receives is a viewFS path, but not a target FS path. For example, rename() in > ViewFileSystem will call getUriPath() for src/dst path, which will remove the > scheme/authority and then try to resolve the path-only component. It thus > sometimes leads to incorrect path resolution, as we are doing the path > resolution again on a targetFS path. > > On the other hand, it is not always trivial/feasible to determine the correct > viewFS path for a given trash root in targetFS path. > Example: > Assume we have a mount point for /user/foo -> abfs:/containerA > User foo calls getTrashRoot("/a/b/c") and "/a/b/c" does not match any mount > point. We fall back to the fallback hdfs, which by default returns > hdfs://localhost/user/foo/.Trash. In this case, it is incorrect to return the > trash root as viewfs:/user/foo, as it will be resolved to the abfs mount > point, instead of the fallback hdfs. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18144) getTrashRoot/s in ViewFileSystem should return viewFS path, not targetFS path
[ https://issues.apache.org/jira/browse/HADOOP-18144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18144. Fix Version/s: 3.4.0 Resolution: Fixed > getTrashRoot/s in ViewFileSystem should return viewFS path, not targetFS path > - > > Key: HADOOP-18144 > URL: https://issues.apache.org/jira/browse/HADOOP-18144 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > It is probably incorrect that we return a targetFS path from getTrashRoot() > in ViewFileSystem, as that path will be used later on by ViewFileSystem in > other operations, such as rename. ViewFileSystem is assuming the path that it > receives is a viewFS path, but not a target FS path. For example, rename() in > ViewFileSystem will call getUriPath() for src/dst path, which will remove the > scheme/authority and then try to resolve the path-only component. It thus > sometimes leads to incorrect path resolution, as we are doing the path > resolution again on a targetFS path. > > On the other hand, it is not always trivial/feasible to determine the correct > viewFS path for a given trash root in targetFS path. > Example: > Assume we have a mount point for /user/foo -> abfs:/containerA > User foo calls getTrashRoot("/a/b/c") and "/a/b/c" does not match any mount > point. We fall back to the fallback hdfs, which by default returns > hdfs://localhost/user/foo/.Trash. In this case, it is incorrect to return the > trash root as viewfs:/user/foo, as it will be resolved to the abfs mount > point, instead of the fallback hdfs. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18153) The CallerContext should not use ":" as the separator.
Owen O'Malley created HADOOP-18153: -- Summary: The CallerContext should not use ":" as the separator. Key: HADOOP-18153 URL: https://issues.apache.org/jira/browse/HADOOP-18153 Project: Hadoop Common Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Since the goal of having fields in the CallerContext is to support adding ip addresses, we need to pick something that is compatible with both ip4 and ip6. ":" fails that test, since every ip6 address uses it extensively. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16254) Add proxy address in IPC connection
[ https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497809#comment-17497809 ] Owen O'Malley commented on HADOOP-16254: [~ayushtkn] , yeah I hadn't found that jira, so thank you. Of course, using the caller context will work, with the only major downside is that if the user sets a caller context that is close to the limit, it could cause bytes to get dropped. We might want to pick a shorter lead string (4 bytes?) to minimize that chance. (Or bump up the default limit by 50 bytes?) > Add proxy address in IPC connection > --- > > Key: HADOOP-16254 > URL: https://issues.apache.org/jira/browse/HADOOP-16254 > Project: Hadoop Common > Issue Type: New Feature > Components: ipc >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Blocker > Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, > HADOOP-16254.004.patch, HADOOP-16254.005.patch, HADOOP-16254.006.patch, > HADOOP-16254.007.patch > > > In order to support data locality of RBF, we need to add new field about > client hostname in the RPC headers of Router protocol calls. > clientHostname represents hostname of client and forward by Router to > Namenode to support data locality friendly. See more [RBF Data Locality > Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf] > in HDFS-13248 and [maillist > vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E]. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18139) Allow configuration of zookeeper server principal
[ https://issues.apache.org/jira/browse/HADOOP-18139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18139. Fix Version/s: 3.4.0 3.3.3 Resolution: Fixed Thanks for the review, Íñigo! > Allow configuration of zookeeper server principal > - > > Key: HADOOP-18139 > URL: https://issues.apache.org/jira/browse/HADOOP-18139 > Project: Hadoop Common > Issue Type: Improvement > Components: auth >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 20m > Remaining Estimate: 0h > > Allow configuration of zookeeper server principal. > This would allow the Router to specify the principal. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18139) RBF: Allow configuration of zookeeper server principal in router
Owen O'Malley created HADOOP-18139: -- Summary: RBF: Allow configuration of zookeeper server principal in router Key: HADOOP-18139 URL: https://issues.apache.org/jira/browse/HADOOP-18139 Project: Hadoop Common Issue Type: Improvement Components: auth Reporter: Owen O'Malley Assignee: Owen O'Malley -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16254) Add proxy address in IPC connection
[ https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494310#comment-17494310 ] Owen O'Malley commented on HADOOP-16254: This feature is a blocker for us at LinkedIn. We can (and do) have an internal fork, but we need a way for Hadoop RBF to get the real client ip address. This is critical for a few reasons: * The data locality on read is managed by the NameNode based on the client ip address. * The write pipeline is managed by the NameNode again based on the client ip address. * The HDFS audit log needs to contain the actual client IP address and not the routers' address. * If we were actually using client ip filtering, it would be required for that also. I think the proposed option to only use the parameter if the user is configured as a proxy is reasonable, especially when tied to the option to limit it to a fixed range of proxy ip addresses. [~daryn] do you still have concerns about this approach? > Add proxy address in IPC connection > --- > > Key: HADOOP-16254 > URL: https://issues.apache.org/jira/browse/HADOOP-16254 > Project: Hadoop Common > Issue Type: New Feature > Components: ipc >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Blocker > Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, > HADOOP-16254.004.patch, HADOOP-16254.005.patch, HADOOP-16254.006.patch, > HADOOP-16254.007.patch > > > In order to support data locality of RBF, we need to add new field about > client hostname in the RPC headers of Router protocol calls. > clientHostname represents hostname of client and forward by Router to > Namenode to support data locality friendly. See more [RBF Data Locality > Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf] > in HDFS-13248 and [maillist > vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E]. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16254) Add proxy address in IPC connection
[ https://issues.apache.org/jira/browse/HADOOP-16254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-16254: --- Priority: Blocker (was: Major) > Add proxy address in IPC connection > --- > > Key: HADOOP-16254 > URL: https://issues.apache.org/jira/browse/HADOOP-16254 > Project: Hadoop Common > Issue Type: New Feature > Components: ipc >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Blocker > Attachments: HADOOP-16254.001.patch, HADOOP-16254.002.patch, > HADOOP-16254.004.patch, HADOOP-16254.005.patch, HADOOP-16254.006.patch, > HADOOP-16254.007.patch > > > In order to support data locality of RBF, we need to add new field about > client hostname in the RPC headers of Router protocol calls. > clientHostname represents hostname of client and forward by Router to > Namenode to support data locality friendly. See more [RBF Data Locality > Design|https://issues.apache.org/jira/secure/attachment/12965092/RBF%20Data%20Locality%20Design.pdf] > in HDFS-13248 and [maillist > vote|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201904.mbox/%3CCAF3Ajax7hGxvowg4K_HVTZeDqC5H=3bfb7mv5sz5mgvadhv...@mail.gmail.com%3E]. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18110) ViewFileSystem: Add Support for Localized Trash Root
[ https://issues.apache.org/jira/browse/HADOOP-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-18110. Fix Version/s: 3.4.0 Resolution: Fixed > ViewFileSystem: Add Support for Localized Trash Root > > > Key: HADOOP-18110 > URL: https://issues.apache.org/jira/browse/HADOOP-18110 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > getTrashRoot() in ViewFileSystem calls getTrashRoot() from underlying > filesystem, to return the trash root. Most of the time, we get a trash root > in user home dir. This can lead to problems when an application wants to > delete a file in a mounted point using moveToTrash() in TrashPolicyDefault, > because we can not rename across multiple filesystems/hdfs namenodes. > > We propose the following extension to getTrashRoot/getTrashRoots in > ViewFileSystem: add a flag to return a localized trash root for > ViewFileSystem. A localized trash root is a trash root which starts from the > root of a mount point (e.g., /mountpointRoot/.Trash/\{user}). > * If CONFIG_VIEWFS_MOUNT_POINT_LOCAL_TRASH is not set to true, or > * when the path p is in a snapshot or an encryption zone, return > * the default trash root in user home dir. > * > * when CONFIG_VIEWFS_MOUNT_POINT_LOCAL_TRASH is set to true, > * 1) if path p is mounted from the same targetFS as user home dir, > * return a trash root in user home dir. > * 2) else, return a trash root in the mounted targetFS > * -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199483#comment-17199483 ] Owen O'Malley commented on HADOOP-11867: To follow up on this, the benchmarks compare: File systems: * raw = raw local file system * local = local file system with checksums layered on top ByteBuffer implementation: * direct = direct byte buffers * array = array backed byte buffers Read method: * asyncFileChanArray = reading using java's async file channel (no hadoop fs) * asyncRead = my new code added in this PR * syncRead = the current code So, the current code is by far the slowest and using the Java native async file channel is the fastest. (The code in this PR uses the async file channel and goes through the hadoop fs api, so that isn't surprising.) The nice bit is that the raw fs code gets close the the native async file channel speeds. Even the local fs with the checksum reads & validation is still very fast (3.75x the current checksum code). > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, fs/s3, hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Owen O'Malley >Priority: Major > Labels: performance, pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11867: --- Comment: was deleted (was: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 23s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 49s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 38s{color} | {color:red} hadoop-common in trunk failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 29s{color} | {color:red} hadoop-common-project in trunk failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 36s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 22s{color} | {color:red} benchmark in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 8s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 19s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_252-8u252-b09-1~18.04-b09 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 19s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 55s{color} | {color:orange} hadoop-common-project: The patch generated 29 new + 93 unchanged - 1 fixed = 122 total (was 94) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 38s{color} | {color:red} benchmark in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 37s{color} | {color:red} hadoop-common in the patch failed with JDK Ubu
[jira] [Issue Comment Deleted] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11867: --- Comment: was deleted (was: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 18s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 24m 14s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 42s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 20m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 23s{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 30s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 9s{color} | {color:red} benchmark in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 23m 24s{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 23m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 43s{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 0s{color} | {color:orange} hadoop-common-project: The patch generated 29 new + 93 unchanged - 1 fixed = 122 total (was 94) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | {color:red} benchmark in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 6s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} benchmark in the patch failed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} benchmark in the patch failed with JDK Priva
[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029389#comment-17029389 ] Owen O'Malley commented on HADOOP-11867: Ok, my first patch adds the API, the default method, and the utilities for this. I also included the implementation for RawLocalFileSystem and ChecksumFileSystem because they were easiest to test and let me use the APIs in non-trivial ways. I also included a benchmark that tests against the local file system: {code} Benchmark (bufferKind) (fileSystemKind) Mode Cnt Score Error Units AsyncBenchmark.asyncFileChanArraydirect N/A avgt 20 1432.396 ± 232.443 us/op AsyncBenchmark.asyncFileChanArray array N/A avgt 20 1551.400 ± 65.639 us/op AsyncBenchmark.asyncRead direct local avgt 20 2514.926 ± 245.603 us/op AsyncBenchmark.asyncRead direct raw avgt 20 1440.434 ± 207.504 us/op AsyncBenchmark.asyncRead array local avgt 20 2798.031 ± 135.023 us/op AsyncBenchmark.asyncRead array raw avgt 20 1524.360 ± 54.462 us/op AsyncBenchmark.syncRead N/A local avgt 20 9515.604 ± 123.311 us/op AsyncBenchmark.syncRead N/A raw avgt 20 2402.039 ± 36.620 us/op {code} > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, fs/s3, hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Owen O'Malley >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16214) Kerberos name implementation in Hadoop does not accept principals with more than two components
[ https://issues.apache.org/jira/browse/HADOOP-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822274#comment-16822274 ] Owen O'Malley commented on HADOOP-16214: [~ibuenros] So you are mapping principals like "owen/pii/admin" to local accounts like "owen_pii_admin"? That would let you split permissions based on whether they want "pii" access and "admin" access. I assume the roles nest such that: //@ means that role2 is a subset of role1 and that user has both roles. Is that what you are doing? > Kerberos name implementation in Hadoop does not accept principals with more > than two components > --- > > Key: HADOOP-16214 > URL: https://issues.apache.org/jira/browse/HADOOP-16214 > Project: Hadoop Common > Issue Type: Bug > Components: auth >Reporter: Issac Buenrostro >Priority: Major > Attachments: Add-service-freeipa.png, HADOOP-16214.001.patch, > HADOOP-16214.002.patch, HADOOP-16214.003.patch, HADOOP-16214.004.patch, > HADOOP-16214.005.patch, HADOOP-16214.006.patch, HADOOP-16214.007.patch, > HADOOP-16214.008.patch, HADOOP-16214.009.patch, HADOOP-16214.010.patch, > HADOOP-16214.011.patch > > > org.apache.hadoop.security.authentication.util.KerberosName is in charge of > converting a Kerberos principal to a user name in Hadoop for all of the > services requiring authentication. > Although the Kerberos spec > ([https://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html]) > allows for an arbitrary number of components in the principal, the Hadoop > implementation will throw a "Malformed Kerberos name:" error if the principal > has more than two components (because the regex can only read serviceName and > hostName). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16214) Kerberos name implementation in Hadoop does not accept principals with more than two components
[ https://issues.apache.org/jira/browse/HADOOP-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822102#comment-16822102 ] Owen O'Malley commented on HADOOP-16214: [~xkrogen] can you clarify what your principals look like and what is the intended semantics behind them? The standard hadoop usage of @ and /@ is very much the standard and I'd love to hear what your use case is like. That said, [~daryn]'s approach makes more sense. Extending the parser to handle additional components without redefining the current behaviors is much better. Adding support for $3, $4, ... makes sense and won't break current systems. > Kerberos name implementation in Hadoop does not accept principals with more > than two components > --- > > Key: HADOOP-16214 > URL: https://issues.apache.org/jira/browse/HADOOP-16214 > Project: Hadoop Common > Issue Type: Bug > Components: auth >Reporter: Issac Buenrostro >Priority: Major > Attachments: HADOOP-16214.001.patch, HADOOP-16214.002.patch, > HADOOP-16214.003.patch, HADOOP-16214.004.patch, HADOOP-16214.005.patch, > HADOOP-16214.006.patch, HADOOP-16214.007.patch, HADOOP-16214.008.patch, > HADOOP-16214.009.patch, HADOOP-16214.010.patch, HADOOP-16214.011.patch > > > org.apache.hadoop.security.authentication.util.KerberosName is in charge of > converting a Kerberos principal to a user name in Hadoop for all of the > services requiring authentication. > Although the Kerberos spec > ([https://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html]) > allows for an arbitrary number of components in the principal, the Hadoop > implementation will throw a "Malformed Kerberos name:" error if the principal > has more than two components (because the regex can only read serviceName and > hostName). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710718#comment-16710718 ] Owen O'Malley commented on HADOOP-11867: Currently, the implementation of the PositionedReadable.readFully(long, byte[], int, int) locks the stream so that you won't process multiple reads in parallel without a specific implementation that makes things better. For the REST-based file systems, absolutely the goal is to convert it into a single read with multiple ranges in the request. I agree completely that implementing a prototype is a good first step, before locking down the exact semantics. My current thoughts: * You have no guarantees about the order the results are returned. * If the file system has mutable files, it is the application's responsibility to perform adequate locking prior to calling the read operations. (So yes, you get no guarantees about consistency of reads.) Since this case doesn't apply to the vast majority of users, I wouldn't want to complicate * Overlapping ranges are permitted. * It is up to the file system whether the reads lock the stream. The current implementation does it because it uses seek/read/seek to implement the positioned read. We should allow implementations to do it more directly. I don't think there should be any guarantees about ordering between async reads or sync read on the same stream. * Since the future contains the FileRange they passed in, they could pass in an extension that tracks the additional information that they need. > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Owen O'Malley >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()
[ https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707860#comment-16707860 ] Owen O'Malley commented on HADOOP-15229: My comment remains that it seems problematic having all of the options present at the open call. Being able to change the properties of the stream later is more convenient. In particular, I really don't think we should make a convoluted call back structure just so that we can have the options available at creation. > Add FileSystem builder-based openFile() API to match createFile() > - > > Key: HADOOP-15229 > URL: https://issues.apache.org/jira/browse/HADOOP-15229 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, > HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, > HADOOP-15229-005.patch, HADOOP-15229-006.patch, HADOOP-15229-007.patch > > > Replicate HDFS-1170 and HADOOP-14365 with an API to open files. > A key requirement of this is not HDFS, it's to put in the fadvise policy for > working with object stores, where getting the decision to do a full GET and > TCP abort on seek vs smaller GETs is fundamentally different: the wrong > option can cost you minutes. S3A and Azure both have adaptive policies now > (first backward seek), but they still don't do it that well. > Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" > "random" as an option when they open files; I can imagine other options too. > The Builder model of [~eddyxu] is the one to mimic, method for method. > Ideally with as much code reuse as possible -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()
[ https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707584#comment-16707584 ] Owen O'Malley edited comment on HADOOP-15229 at 12/3/18 5:44 PM: - [~ste...@apache.org] with the problems were hitting, you're probably right that we need something more like HADOOP-11867. So we could define an interface for the input streams. {code:java} package org.apache.hadoop.fs; public interface InputStreamOptions { /** * Test whether the given FileSystem supports the given option. */ boolean supportsOption(String name); // repeat optional(...) and require(...) for long, double, and boolean values /** * Set an optional option to the specified value. */ ReaderBuilder optional(String name, String value); /** * Set a required option to the specified value. */ ReaderBuilder require(String name, String value); } {code} Of course, we'd need default implementations that ignore optional and fail on require. Next, we make PositionedReadable extend InputStreamOptions, so that with an open stream you can change the options for that stream. Finally, we extend RecordReader API to also extend InputStreamOptions and pass the options down to the underlying stream. Thoughts? was (Author: owen.omalley): [~ste...@apache.org] with the problems were hitting, you're probably right that we need something more like HADOOP-11867. So we could define an interface for the input streams. {code:java} package org.apache.hadoop.fs; public interface InputStreamOptions { /** * Test whether the given FileSystem supports the given option. */ boolean supportsOption(String name); // repeat optional(...) and require(...) for long, double, and boolean values /** * Set an optional option to the specified value. */ ReaderBuilder optional(String name, String value); /** * Set a required option to the specified value. */ ReaderBuilder require(String name, String value); } {code:java} Of course, we'd need default implementations that ignore optional and fail on require. Next, we make PositionedReadable extend InputStreamOptions, so that with an open stream you can change the options for that stream. Finally, we extend RecordReader API to also extend InputStreamOptions and pass the options down to the underlying stream. Thoughts? > Add FileSystem builder-based openFile() API to match createFile() > - > > Key: HADOOP-15229 > URL: https://issues.apache.org/jira/browse/HADOOP-15229 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, > HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, > HADOOP-15229-005.patch, HADOOP-15229-006.patch > > > Replicate HDFS-1170 and HADOOP-14365 with an API to open files. > A key requirement of this is not HDFS, it's to put in the fadvise policy for > working with object stores, where getting the decision to do a full GET and > TCP abort on seek vs smaller GETs is fundamentally different: the wrong > option can cost you minutes. S3A and Azure both have adaptive policies now > (first backward seek), but they still don't do it that well. > Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" > "random" as an option when they open files; I can imagine other options too. > The Builder model of [~eddyxu] is the one to mimic, method for method. > Ideally with as much code reuse as possible -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()
[ https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707584#comment-16707584 ] Owen O'Malley commented on HADOOP-15229: [~ste...@apache.org] with the problems were hitting, you're probably right that we need something more like HADOOP-11867. So we could define an interface for the input streams. {code:java} package org.apache.hadoop.fs; public interface InputStreamOptions { /** * Test whether the given FileSystem supports the given option. */ boolean supportsOption(String name); // repeat optional(...) and require(...) for long, double, and boolean values /** * Set an optional option to the specified value. */ ReaderBuilder optional(String name, String value); /** * Set a required option to the specified value. */ ReaderBuilder require(String name, String value); } {code:java} Of course, we'd need default implementations that ignore optional and fail on require. Next, we make PositionedReadable extend InputStreamOptions, so that with an open stream you can change the options for that stream. Finally, we extend RecordReader API to also extend InputStreamOptions and pass the options down to the underlying stream. Thoughts? > Add FileSystem builder-based openFile() API to match createFile() > - > > Key: HADOOP-15229 > URL: https://issues.apache.org/jira/browse/HADOOP-15229 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, > HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, > HADOOP-15229-005.patch, HADOOP-15229-006.patch > > > Replicate HDFS-1170 and HADOOP-14365 with an API to open files. > A key requirement of this is not HDFS, it's to put in the fadvise policy for > working with object stores, where getting the decision to do a full GET and > TCP abort on seek vs smaller GETs is fundamentally different: the wrong > option can cost you minutes. S3A and Azure both have adaptive policies now > (first backward seek), but they still don't do it that well. > Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" > "random" as an option when they open files; I can imagine other options too. > The Builder model of [~eddyxu] is the one to mimic, method for method. > Ideally with as much code reuse as possible -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705211#comment-16705211 ] Owen O'Malley commented on HADOOP-11867: Ok, looking at this deeper, I'd suggest that we add readAsync to PositionedReadable. That implies it is in FSDataInputStream as well as FSInputStream. > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Owen O'Malley >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705141#comment-16705141 ] Owen O'Malley edited comment on HADOOP-11867 at 11/30/18 7:28 PM: -- I'd like to propose the following API: {code:java} package org.apache.hadoop.fs; /** * A range of bytes from a file. */ public static class FileRange { public final long offset; public final int length; // max length is 2^31 because of Java arrays public ByteBuffer buffer; public FileRange(long offset, int length, ByteBuffer buffer) { ... } } public class FSDataInputStream ... { ... /** * Perform an asynchronous read of the file with multiple ranges. This call * will return immediately and return futures that will contain the data * once it is read. The order of the physical reads is an implementation * detail of this method. Multiple requests may be converted into a single * read. * * If any ranges do not have a buffer, an array-based one of the appropriate * size will be created for it. * @param ranges the list of disk ranges to read * @return for each range, the future filled byte buffer will be returned. * @throws IOException if the file is not available */ public CompletableFuture[] readAsync(List ranges ) throws IOException { ... } ... } {code} FSDataInputStream will have a default implementation, but file systems will be able to create a more optimized solution for their files. Thoughts? was (Author: owen.omalley): I'd like to propose the following API: {code:java} package org.apache.hadoop.fs; /** * A range of bytes from a file. */ public static class FileRange { public final long offset; public final int length; // max length is 2^31 because of Java arrays public ByteBuffer buffer; public DiskRange(long offset, int length, ByteBuffer buffer) { this.offset = offset; this.length = length; this.buffer = buffer; } } public class FSDataInputStream ... { ... /** * Perform an asynchronous read of the file with multiple ranges. This call * will return immediately and return futures that will contain the data * once it is read. The order of the physical reads is an implementation * detail of this method. Multiple requests may be converted into a single * read. * * If any ranges do not have a buffer, an array-based one of the appropriate * size will be created for it. * @param ranges the list of disk ranges to read * @return for each range, the future filled byte buffer will be returned. * @throws IOException if the file is not available */ public CompletableFuture[] readAsync(List ranges ) throws IOException { ... } ... } {code} FSDataInputStream will have a default implementation, but file systems will be able to create a more optimized solution for their files. Thoughts? > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Owen O'Malley >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705141#comment-16705141 ] Owen O'Malley edited comment on HADOOP-11867 at 11/30/18 7:23 PM: -- I'd like to propose the following API: {code:java} package org.apache.hadoop.fs; /** * A range of bytes from a file. */ public static class FileRange { public final long offset; public final int length; // max length is 2^31 because of Java arrays public ByteBuffer buffer; public DiskRange(long offset, int length, ByteBuffer buffer) { this.offset = offset; this.length = length; this.buffer = buffer; } } public class FSDataInputStream ... { ... /** * Perform an asynchronous read of the file with multiple ranges. This call * will return immediately and return futures that will contain the data * once it is read. The order of the physical reads is an implementation * detail of this method. Multiple requests may be converted into a single * read. * * If any ranges do not have a buffer, an array-based one of the appropriate * size will be created for it. * @param ranges the list of disk ranges to read * @return for each range, the future filled byte buffer will be returned. * @throws IOException if the file is not available */ public CompletableFuture[] readAsync(List ranges ) throws IOException { ... } ... } {code} FSDataInputStream will have a default implementation, but file systems will be able to create a more optimized solution for their files. Thoughts? was (Author: owen.omalley): I'd like to propose the following API: {code:java} package org.apache.hadoop.fs; /** * A range of bytes from a file. */ public static class FileRange { public final long offset; public final int length; // max length is 2^31 because of Java arrays public ByteBuffer buffer; public DiskRange(long offset, int length, ByteBuffer buffer) { this.offset = offset; this.length = length; this.buffer = buffer; } } public class FSDataInputStream ... { ... /** * Perform an asynchronous read of the file with multiple ranges. This call * will return immediately and return futures that will contain the data * once it is read. The order of the physical reads is an implementation * detail of this method. Multiple requests may be converted into a single * read. * * If any ranges do not have a buffer, an array-based one of the appropriate * size will be created for it. * @param ranges the list of disk ranges to read * @return for each range, the future filled byte buffer will be returned. * @throws IOException if the file is not available */ public CompletableFuture[] readAsync(List ranges ) throws IOException { ... } ... } {code} FSDataInputStream will have a default implementation, but file systems will be able to create a more optimized solution for their files. Thoughts? > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Owen O'Malley >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705141#comment-16705141 ] Owen O'Malley commented on HADOOP-11867: I'd like to propose the following API: {code:java} package org.apache.hadoop.fs; /** * A range of bytes from a file. */ public static class FileRange { public final long offset; public final int length; // max length is 2^31 because of Java arrays public ByteBuffer buffer; public DiskRange(long offset, int length, ByteBuffer buffer) { this.offset = offset; this.length = length; this.buffer = buffer; } } public class FSDataInputStream ... { ... /** * Perform an asynchronous read of the file with multiple ranges. This call * will return immediately and return futures that will contain the data * once it is read. The order of the physical reads is an implementation * detail of this method. Multiple requests may be converted into a single * read. * * If any ranges do not have a buffer, an array-based one of the appropriate * size will be created for it. * @param ranges the list of disk ranges to read * @return for each range, the future filled byte buffer will be returned. * @throws IOException if the file is not available */ public CompletableFuture[] readAsync(List ranges ) throws IOException { ... } ... } {code} FSDataInputStream will have a default implementation, but file systems will be able to create a more optimized solution for their files. Thoughts? > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Owen O'Malley >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reassigned HADOOP-11867: -- Assignee: Owen O'Malley (was: Gopal V) > FS API: Add a high-performance vectored Read to FSDataInputStream API > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: New Feature > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Owen O'Malley >Priority: Major > Labels: performance > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()
[ https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702425#comment-16702425 ] Owen O'Malley commented on HADOOP-15229: [~ste...@apache.org] using interfaces leaves us a lot more choices going forward. The only point to be cautious of is that you can't replace classes with interfaces or interfaces with classes. Either way breaks binary compatibility. Yes, I'd put the public interfaces into o.a.hadoop.fs. For the implementation stuff, I'd prefer that we put it into an impl package. It just makes it easier to control the javadoc from getting polluted with classes that are "java public", but not intended for users. > Add FileSystem builder-based openFile() API to match createFile() > - > > Key: HADOOP-15229 > URL: https://issues.apache.org/jira/browse/HADOOP-15229 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, > HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, > HADOOP-15229-005.patch, HADOOP-15229-006.patch > > > Replicate HDFS-1170 and HADOOP-14365 with an API to open files. > A key requirement of this is not HDFS, it's to put in the fadvise policy for > working with object stores, where getting the decision to do a full GET and > TCP abort on seek vs smaller GETs is fundamentally different: the wrong > option can cost you minutes. S3A and Azure both have adaptive policies now > (first backward seek), but they still don't do it that well. > Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" > "random" as an option when they open files; I can imagine other options too. > The Builder model of [~eddyxu] is the one to mimic, method for method. > Ideally with as much code reuse as possible -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()
[ https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700839#comment-16700839 ] Owen O'Malley commented on HADOOP-15229: Sorry for coming into this late and for bike shedding. *grin* I'd suggest that the main builder API be an interface rather than abstract class. Maybe something like: {code:java} public interface ReaderBuilder { /** * Test whether the given FileSystem supports the given option. */ boolean supportsOption(String name); // repeat optional(...) and require(...) for long, double, and boolean values /** * Set an optional option to the specified value. */ ReaderBuilder optional(String name, String value); /** * Set a required option to the specified value. */ ReaderBuilder require(String name, String value); /** * Use the options to build the stream. */ FSDataInputStream build() throws IOException; } public abstract class FileSystem ... { ... ReaderBuilder openWithOptions(Path filename) throws IOException; } {code} This minimizes the public API changes and exposure. I think such an option builder could start public, evolving. > Add FileSystem builder-based openFile() API to match createFile() > - > > Key: HADOOP-15229 > URL: https://issues.apache.org/jira/browse/HADOOP-15229 > Project: Hadoop Common > Issue Type: New Feature > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, > HADOOP-15229-003.patch, HADOOP-15229-004.patch, HADOOP-15229-004.patch, > HADOOP-15229-005.patch, HADOOP-15229-006.patch > > > Replicate HDFS-1170 and HADOOP-14365 with an API to open files. > A key requirement of this is not HDFS, it's to put in the fadvise policy for > working with object stores, where getting the decision to do a full GET and > TCP abort on seek vs smaller GETs is fundamentally different: the wrong > option can cost you minutes. S3A and Azure both have adaptive policies now > (first backward seek), but they still don't do it that well. > Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" > "random" as an option when they open files; I can imagine other options too. > The Builder model of [~eddyxu] is the one to mimic, method for method. > Ideally with as much code reuse as possible -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Closed] (HADOOP-15896) Refine Kerberos based AuthenticationHandler to check proxyuser ACL
[ https://issues.apache.org/jira/browse/HADOOP-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley closed HADOOP-15896. -- > Refine Kerberos based AuthenticationHandler to check proxyuser ACL > -- > > Key: HADOOP-15896 > URL: https://issues.apache.org/jira/browse/HADOOP-15896 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Eric Yang >Assignee: Larry McCay >Priority: Major > > JWTRedirectAuthenticationHandler is based on KerberosAuthenticationHandler, > and authentication method in KerberosAuthenticationHandler basically do this: > {code} > String clientPrincipal = gssContext.getSrcName().toString(); > KerberosName kerberosName = new KerberosName(clientPrincipal); > String userName = kerberosName.getShortName(); > token = new AuthenticationToken(userName, clientPrincipal, getType()); > response.setStatus(HttpServletResponse.SC_OK); > LOG.trace("SPNEGO completed for client principal [{}]", > clientPrincipal); > {code} > It obtains the short name of the client principal and respond OK. This is > fine for verifying end user. However, in proxy user case (knox), this > authentication is insufficient because knox principal name is: > knox/host1.example@example.com . KerberosAuthenticationHandler will > gladly confirm that knox is knox. Even if the > knox/host1.example@example.com is used from botnet.rogueresearchlab.tld > host. KerberosAuthenticationHandler may not need to change, if it does not > have plan to support proxy, and ignores instance name of kerberos principal. > For JWTRedirectAuthenticationHandler which is designed for proxy use case. > It should check remote host matches the clientPrincipal instance name, > without this check, it makes Kerberos vulnerable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-15896) Refine Kerberos based AuthenticationHandler to check proxyuser ACL
[ https://issues.apache.org/jira/browse/HADOOP-15896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-15896. Resolution: Not A Problem This is working correctly. Do not attempt to change this behavior. > Refine Kerberos based AuthenticationHandler to check proxyuser ACL > -- > > Key: HADOOP-15896 > URL: https://issues.apache.org/jira/browse/HADOOP-15896 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 2.8.0, 3.0.0-alpha1 >Reporter: Eric Yang >Assignee: Larry McCay >Priority: Major > > JWTRedirectAuthenticationHandler is based on KerberosAuthenticationHandler, > and authentication method in KerberosAuthenticationHandler basically do this: > {code} > String clientPrincipal = gssContext.getSrcName().toString(); > KerberosName kerberosName = new KerberosName(clientPrincipal); > String userName = kerberosName.getShortName(); > token = new AuthenticationToken(userName, clientPrincipal, getType()); > response.setStatus(HttpServletResponse.SC_OK); > LOG.trace("SPNEGO completed for client principal [{}]", > clientPrincipal); > {code} > It obtains the short name of the client principal and respond OK. This is > fine for verifying end user. However, in proxy user case (knox), this > authentication is insufficient because knox principal name is: > knox/host1.example@example.com . KerberosAuthenticationHandler will > gladly confirm that knox is knox. Even if the > knox/host1.example@example.com is used from botnet.rogueresearchlab.tld > host. KerberosAuthenticationHandler may not need to change, if it does not > have plan to support proxy, and ignores instance name of kerberos principal. > For JWTRedirectAuthenticationHandler which is designed for proxy use case. > It should check remote host matches the clientPrincipal instance name, > without this check, it makes Kerberos vulnerable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15518) Authentication filter calling handler after request already authenticated
[ https://issues.apache.org/jira/browse/HADOOP-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506425#comment-16506425 ] Owen O'Malley commented on HADOOP-15518: This looks good, [~kminder]. +1 > Authentication filter calling handler after request already authenticated > - > > Key: HADOOP-15518 > URL: https://issues.apache.org/jira/browse/HADOOP-15518 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 2.7.1 >Reporter: Kevin Minder >Assignee: Kevin Minder >Priority: Major > Attachments: HADOOP-15518-001.patch > > > The hadoop-auth AuthenticationFilter will invoke its handler even if a prior > successful authentication has occurred in the current request. This > primarily affects situations where multiple authentication mechanism has been > configured. For example when core-site.xml's has > hadoop.http.authentication.type=kerberos and yarn-site.xml has > yarn.timeline-service.http-authentication.type=kerberos the result is an > attempt to perform two Kerberos authentications for the same request. This > in turn results in Kerberos triggering a replay attack detection. The > javadocs for AuthenticationHandler > ([https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-auth/src/main/java/org/apache/hadoop/security/authentication/server/AuthenticationHandler.java)] > indicate for the authenticate method that > {quote}This method is invoked by the AuthenticationFilter only if the HTTP > client request is not yet authenticated. > {quote} > This does not appear to be the case in practice. > I've create a patch and tested on a limited number of functional use cases > (e.g. the timeline-service issue noted above). If there is general agreement > that the change is valid I'll add unit tests to the patch. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13434) Add quoting to Shell class
[ https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-13434: --- Attachment: HADOOP-13434.patch I made some tweaks to make check style happy. > Add quoting to Shell class > -- > > Key: HADOOP-13434 > URL: https://issues.apache.org/jira/browse/HADOOP-13434 > Project: Hadoop Common > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HADOOP-13434.patch, HADOOP-13434.patch, > HADOOP-13434.patch > > > The Shell class makes assumptions that the parameters won't have spaces or > other special characters, even when it invokes bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13434) Add quoting to Shell class
[ https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-13434: --- Status: Patch Available (was: Open) > Add quoting to Shell class > -- > > Key: HADOOP-13434 > URL: https://issues.apache.org/jira/browse/HADOOP-13434 > Project: Hadoop Common > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HADOOP-13434.patch, HADOOP-13434.patch > > > The Shell class makes assumptions that the parameters won't have spaces or > other special characters, even when it invokes bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13434) Add quoting to Shell class
[ https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-13434: --- Attachment: HADOOP-13434.patch This iteration removes the use of bash from the cases where it wasn't necessary. > Add quoting to Shell class > -- > > Key: HADOOP-13434 > URL: https://issues.apache.org/jira/browse/HADOOP-13434 > Project: Hadoop Common > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HADOOP-13434.patch, HADOOP-13434.patch > > > The Shell class makes assumptions that the parameters won't have spaces or > other special characters, even when it invokes bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HADOOP-13434) Add quoting to Shell class
[ https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-13434: --- Comment: was deleted (was: GitHub user omalley opened a pull request: https://github.com/apache/hadoop/pull/118 HADOOP-13434. Add bash quoting to Shell class. You can merge this pull request into a Git repository by running: $ git pull https://github.com/omalley/hadoop hadoop-13434 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #118 commit e0d296a301c1e52a378e674cf7045f4ef3b8c62e Author: Vinod Kumar Vavilapalli Date: 2014-03-08T04:50:31Z YARN-1787. Fixed help messages for applicationattempt and container sub-commands in bin/yarn. Contributed by Zhijie Shen. svn merge --ignore-ancestry -c 1575482 ../../trunk/ git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1575484 13f79535-47bb-0310-9956-ffa450edef68 commit 059ba663e2890953fdd5a208d7d7969878ef7887 Author: Arpit Agarwal Date: 2014-03-08T16:41:26Z HDFS-6078. Merging r1575561 from branch-2 to branch-2.4. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1575562 13f79535-47bb-0310-9956-ffa450edef68 commit 390ac348271cbb756f3de0ed5ee2951fcc7f34b7 Author: Colin McCabe Date: 2014-03-10T02:48:04Z HDFS-6071. BlockReaderLocal does not return -1 on EOF when doing a zero-length read on a short file. (cmccabe) git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1575798 13f79535-47bb-0310-9956-ffa450edef68 commit f93cc4d3d4edb74a728ba5254274e61c57ae66b9 Author: Jian He Date: 2014-03-10T18:05:29Z Merge r1576026 from branch-2. Fixed ClientRMService#forceKillApplication not killing unmanaged application. Contributed by Karthik Kambatla git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576028 13f79535-47bb-0310-9956-ffa450edef68 commit dc2f782d4fd85c9e416bed1e0098992b3c3a8db5 Author: Andrew Wang Date: 2014-03-10T19:05:44Z HDFS-6070. Cleanup use of ReadStatistics in DFSInputStream. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576051 13f79535-47bb-0310-9956-ffa450edef68 commit 5fca55c41ce7ed15c5e542fbbd359f5ac1f2514a Author: Vinod Kumar Vavilapalli Date: 2014-03-10T20:37:37Z YARN-1788. Fixed a bug in ResourceManager to set the apps-completed and apps-killed metrics correctly for killed applications. Contributed by Varun Vasudev. svn merge --ignore-ancestry -c 1576072 ../../trunk/ git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576074 13f79535-47bb-0310-9956-ffa450edef68 commit d4936ec536920d802ca966869b54f3916fb698e9 Author: Jing Zhao Date: 2014-03-10T20:52:22Z HDFS-6077. Merge change r1576077 from branch-2. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576080 13f79535-47bb-0310-9956-ffa450edef68 commit 883c146b4457def2389705409a20bc228fb59357 Author: Chris Nauroth Date: 2014-03-10T21:55:07Z HDFS-6055. Merging change r1576098 from branch-2 to branch-2.4 git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576100 13f79535-47bb-0310-9956-ffa450edef68 commit ffc5328054b9262faefcb36e25eabf991d6e49a8 Author: Chris Nauroth Date: 2014-03-10T23:38:50Z HADOOP-10399. Merging change r1576126 from branch-2 to branch-2.4 git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576129 13f79535-47bb-0310-9956-ffa450edef68 commit 1a5e3d36aa1a2a86a1ef8c76d5720e6349487a65 Author: Tsz-wo Sze Date: 2014-03-10T23:40:21Z svn merge -c 1576128 from branch-2 for HDFS-5535. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576130 13f79535-47bb-0310-9956-ffa450edef68 commit 810188777942f2a3e5e6c3d1ab2ac89912d4b95b Author: Arpit Agarwal Date: 2014-03-11T00:00:22Z HADOOP-10395. Merging r1576142 from branch-2 to branch-2.4. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576144 13f79535-47bb-0310-9956-ffa450edef68 commit 2ad1602b66753bb3cfa5274457a9b21a44374336 Author: Arpit Agarwal Date: 2014-03-11T00:04:09Z HADOOP-10394. Merging r1576146 from branch-2 to branch-2.4. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.4@1576147 13f79535-47bb-0310-9956-ffa450edef68 commit 27a33383f8065fefaf4fc69adb314beb536dd846 Author: Tsz-wo Sze Date: 2014-03-11T00:18:58Z svn merge -c 1576149 from branch-2 for HDFS-6060. NameNode should not check DataNo
[jira] [Issue Comment Deleted] (HADOOP-13434) Add quoting to Shell class
[ https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-13434: --- Comment: was deleted (was: Github user omalley closed the pull request at: https://github.com/apache/hadoop/pull/118 ) > Add quoting to Shell class > -- > > Key: HADOOP-13434 > URL: https://issues.apache.org/jira/browse/HADOOP-13434 > Project: Hadoop Common > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HADOOP-13434.patch > > > The Shell class makes assumptions that the parameters won't have spaces or > other special characters, even when it invokes bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13434) Add quoting to Shell class
[ https://issues.apache.org/jira/browse/HADOOP-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-13434: --- Attachment: HADOOP-13434.patch This patch adds quoting to the Shell class on each command that invokes bash with string parameters. > Add quoting to Shell class > -- > > Key: HADOOP-13434 > URL: https://issues.apache.org/jira/browse/HADOOP-13434 > Project: Hadoop Common > Issue Type: Bug >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HADOOP-13434.patch > > > The Shell class makes assumptions that the parameters won't have spaces or > other special characters, even when it invokes bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13434) Add quoting to Shell class
Owen O'Malley created HADOOP-13434: -- Summary: Add quoting to Shell class Key: HADOOP-13434 URL: https://issues.apache.org/jira/browse/HADOOP-13434 Project: Hadoop Common Issue Type: Bug Reporter: Owen O'Malley Assignee: Owen O'Malley The Shell class makes assumptions that the parameters won't have spaces or other special characters, even when it invokes bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12743) Fix git environment check during test-patch
[ https://issues.apache.org/jira/browse/HADOOP-12743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118278#comment-15118278 ] Owen O'Malley commented on HADOOP-12743: +1, looks good to me > Fix git environment check during test-patch > --- > > Key: HADOOP-12743 > URL: https://issues.apache.org/jira/browse/HADOOP-12743 > Project: Hadoop Common > Issue Type: Bug >Reporter: Ray Chiang >Assignee: Allen Wittenauer > Attachments: HADOOP-12743.00.patch > > > I'm seeing this error when running test-patch in Apache Hadoop: > {quote} > > > Confirming git environment > > > ERROR: /Users/rchiang/Dev_01/ah_01/patchprocess is not a git repo. > > > {quote} > From a follow up email: > it’s a trivial bug in the yetus-wrapper. Missing a popd after extraction. > Simple fix is to run the command twice. (since the short circuit kicks in > after yetus is cached). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12494) fetchdt stores the token based on token kind instead of token service
[ https://issues.apache.org/jira/browse/HADOOP-12494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977027#comment-14977027 ] Owen O'Malley commented on HADOOP-12494: +1 > fetchdt stores the token based on token kind instead of token service > - > > Key: HADOOP-12494 > URL: https://issues.apache.org/jira/browse/HADOOP-12494 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: HeeSoo Kim >Assignee: HeeSoo Kim > Fix For: 3.0.0 > > Attachments: HADOOP-12494, HADOOP-12494.patch > > > The fetchdt command stores the token in a file. However, the key of token is > a token kind instead of a token service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-3315) New binary file format
[ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746809#comment-14746809 ] Owen O'Malley commented on HADOOP-3315: --- TFile never got that much traction. You really should use ORC or other high performance columnar format that is self-describing. See http://orc.apache.org/ > New binary file format > -- > > Key: HADOOP-3315 > URL: https://issues.apache.org/jira/browse/HADOOP-3315 > Project: Hadoop Common > Issue Type: New Feature > Components: io >Reporter: Owen O'Malley >Assignee: Hong Tang > Fix For: 0.20.1 > > Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, > HADOOP-3315_20080915_TFILE.patch, TFile Specification 20081217.pdf, > hadoop-3315-0507.patch, hadoop-3315-0509-2.patch, hadoop-3315-0509.patch, > hadoop-3315-0513.patch, hadoop-3315-0514.patch, hadoop-3315-0601.patch, > hadoop-3315-0602.patch, hadoop-3315-0605.patch, hadoop-3315-0612.patch, > hadoop-3315-0623-2.patch, hadoop-3315-0701-yhadoop-20.patch, > hadoop-3315-0710-1-hadoop-20.patch, hadoop-trunk-tfile.patch, > hadoop-trunk-tfile.patch > > > SequenceFile's block compression format is too complex and requires 4 codecs > to compress or decompress. It would be good to have a file format that only > needs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12358) FSShell should prompt before deleting directories bigger than a configured size
[ https://issues.apache.org/jira/browse/HADOOP-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720288#comment-14720288 ] Owen O'Malley commented on HADOOP-12358: I agree with Allen. This is a bad feature that will break lots of users. The trash feature already does this better and because it has been used for many years, is expected behavior. > FSShell should prompt before deleting directories bigger than a configured > size > --- > > Key: HADOOP-12358 > URL: https://issues.apache.org/jira/browse/HADOOP-12358 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HADOOP-12358.00.patch, HADOOP-12358.01.patch, > HADOOP-12358.02.patch, HADOOP-12358.03.patch > > > We have seen many cases with customers deleting data inadvertently with > -skipTrash. The FSShell should prompt user if the size of the data or the > number of files being deleted is bigger than a threshold even though > -skipTrash is being used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10854) unit tests for the shell scripts
[ https://issues.apache.org/jira/browse/HADOOP-10854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649827#comment-14649827 ] Owen O'Malley commented on HADOOP-10854: +1 looks good, Allen. > unit tests for the shell scripts > > > Key: HADOOP-10854 > URL: https://issues.apache.org/jira/browse/HADOOP-10854 > Project: Hadoop Common > Issue Type: Test > Components: scripts >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Labels: scripts > Attachments: HADOOP-10854.00.patch, HADOOP-10854.01.patch, > HADOOP-10854.02.patch, HADOOP-10854.03.patch, HADOOP-10854.04.patch, > HADOOP-10854.05.patch > > > With HADOOP-9902 moving a lot of the core functionality to functions, we > should build some unit tests for them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12249) pull argument parsing into a function
[ https://issues.apache.org/jira/browse/HADOOP-12249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649821#comment-14649821 ] Owen O'Malley commented on HADOOP-12249: This looks good to me, Allen. +1 > pull argument parsing into a function > - > > Key: HADOOP-12249 > URL: https://issues.apache.org/jira/browse/HADOOP-12249 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Labels: scripts, shell > Attachments: HADOOP-12249.00.patch, HADOOP-12294.01.patch > > > In order to enable significantly better unit testing as well as enhanced > functionality, large portions of *-config.sh should be pulled into functions. > See first comment for more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (HADOOP-11902) Prune old javadoc versions from the website.
[ https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley closed HADOOP-11902. -- > Prune old javadoc versions from the website. > > > Key: HADOOP-11902 > URL: https://issues.apache.org/jira/browse/HADOOP-11902 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: redirect.patch, removed-files.txt > > > We have a lot of old versions of javadoc on the website. We should prune the > old versions and redirect the old urls to the current versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11902) Prune old javadoc versions from the website.
[ https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11902: --- Resolution: Fixed Status: Resolved (was: Patch Available) I committed this. Thanks, Allen. > Prune old javadoc versions from the website. > > > Key: HADOOP-11902 > URL: https://issues.apache.org/jira/browse/HADOOP-11902 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: redirect.patch, removed-files.txt > > > We have a lot of old versions of javadoc on the website. We should prune the > old versions and redirect the old urls to the current versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11917) test-patch.sh should work with ${BASEDIR}/patchprocess setups
[ https://issues.apache.org/jira/browse/HADOOP-11917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528949#comment-14528949 ] Owen O'Malley commented on HADOOP-11917: +1 > test-patch.sh should work with ${BASEDIR}/patchprocess setups > - > > Key: HADOOP-11917 > URL: https://issues.apache.org/jira/browse/HADOOP-11917 > Project: Hadoop Common > Issue Type: Bug >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Attachments: HADOOP-11917.01.patch, HADOOP-11917.patch > > > There are a bunch of problems with this kind of setup: configuration and code > changes in test-patch.sh required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11904) test-patch.sh goes into an infinite loop on non-maven builds
[ https://issues.apache.org/jira/browse/HADOOP-11904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528919#comment-14528919 ] Owen O'Malley commented on HADOOP-11904: +1 looks good, Allen > test-patch.sh goes into an infinite loop on non-maven builds > > > Key: HADOOP-11904 > URL: https://issues.apache.org/jira/browse/HADOOP-11904 > Project: Hadoop Common > Issue Type: Test > Components: test >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Critical > Attachments: HADOOP-11904.patch > > > If post HADOOP-11746 test patch is given a non-maven-based build, it goes > into an infinite loop looking for modules pom.xml. There should be an escape > clause after switching branches to see if it is maven based. If it is not > maven based, then test-patch should either abort or re-exec using that > version's test-patch script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (HADOOP-11896) Redesign the releases page on the Hadoop site
[ https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley closed HADOOP-11896. -- > Redesign the releases page on the Hadoop site > - > > Key: HADOOP-11896 > URL: https://issues.apache.org/jira/browse/HADOOP-11896 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: site > > Attachments: Apache Hadoop Releases.pdf, Apache Hadoop Releases.pdf, > HADOOP-11896.patch, HADOOP-11896.patch > > > Redesign the Hadoop site to: > * Move the recent releases to the top of the page by reducing the huge table > of contents. > * Provide a direct link (via the mirror page) to each release tarball for the > last few releases. > * Provide the sha256 for each of the recent release tarballs. > * Provide a direct link the GPG signature for the recent release tarballs. > * Provide directions on how to verify the GPG signature. > * Provide a link to the signatures in > https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11902) Prune old javadoc versions from the website.
[ https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11902: --- Attachment: redirect.patch removed-files.txt Here is the list of 95371 files that I'm proposing deleting. I've also included redirect patch showing the mapping of the urls. > Prune old javadoc versions from the website. > > > Key: HADOOP-11902 > URL: https://issues.apache.org/jira/browse/HADOOP-11902 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: redirect.patch, removed-files.txt > > > We have a lot of old versions of javadoc on the website. We should prune the > old versions and redirect the old urls to the current versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11902) Prune old javadoc versions from the website.
[ https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11902: --- Status: Patch Available (was: Open) > Prune old javadoc versions from the website. > > > Key: HADOOP-11902 > URL: https://issues.apache.org/jira/browse/HADOOP-11902 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: redirect.patch, removed-files.txt > > > We have a lot of old versions of javadoc on the website. We should prune the > old versions and redirect the old urls to the current versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11902) Prune old javadoc versions from the website.
[ https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527836#comment-14527836 ] Owen O'Malley commented on HADOOP-11902: [~aw] I've created HADOOP-11919 to add Hadoop 2.4.1 to the table at the top of the releases page. > Prune old javadoc versions from the website. > > > Key: HADOOP-11902 > URL: https://issues.apache.org/jira/browse/HADOOP-11902 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > We have a lot of old versions of javadoc on the website. We should prune the > old versions and redirect the old urls to the current versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11919) Add release 2.4.1 to the table at the top of the releases page
Owen O'Malley created HADOOP-11919: -- Summary: Add release 2.4.1 to the table at the top of the releases page Key: HADOOP-11919 URL: https://issues.apache.org/jira/browse/HADOOP-11919 Project: Hadoop Common Issue Type: Bug Components: site Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: site Add the 2.4.1 release to the table at the top of the site page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11896) Redesign the releases page on the Hadoop site
[ https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-11896. Resolution: Fixed Fix Version/s: site I committed this with 1677710. > Redesign the releases page on the Hadoop site > - > > Key: HADOOP-11896 > URL: https://issues.apache.org/jira/browse/HADOOP-11896 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: site > > Attachments: Apache Hadoop Releases.pdf, Apache Hadoop Releases.pdf, > HADOOP-11896.patch, HADOOP-11896.patch > > > Redesign the Hadoop site to: > * Move the recent releases to the top of the page by reducing the huge table > of contents. > * Provide a direct link (via the mirror page) to each release tarball for the > last few releases. > * Provide the sha256 for each of the recent release tarballs. > * Provide a direct link the GPG signature for the recent release tarballs. > * Provide directions on how to verify the GPG signature. > * Provide a link to the signatures in > https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11896) Redesign the releases page on the Hadoop site
[ https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11896: --- Attachment: HADOOP-11896.patch Apache Hadoop Releases.pdf Added links to the release notes from the current releases as requested by [~vinodkv]. > Redesign the releases page on the Hadoop site > - > > Key: HADOOP-11896 > URL: https://issues.apache.org/jira/browse/HADOOP-11896 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: Apache Hadoop Releases.pdf, Apache Hadoop Releases.pdf, > HADOOP-11896.patch, HADOOP-11896.patch > > > Redesign the Hadoop site to: > * Move the recent releases to the top of the page by reducing the huge table > of contents. > * Provide a direct link (via the mirror page) to each release tarball for the > last few releases. > * Provide the sha256 for each of the recent release tarballs. > * Provide a direct link the GPG signature for the recent release tarballs. > * Provide directions on how to verify the GPG signature. > * Provide a link to the signatures in > https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11896) Redesign the releases page on the Hadoop site
[ https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527017#comment-14527017 ] Owen O'Malley commented on HADOOP-11896: [~cnauroth] shasum is available easily on all Linux boxes and is also available by default on MacOS. I guess I'd rather have the more portable solution. > Redesign the releases page on the Hadoop site > - > > Key: HADOOP-11896 > URL: https://issues.apache.org/jira/browse/HADOOP-11896 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: Apache Hadoop Releases.pdf, HADOOP-11896.patch > > > Redesign the Hadoop site to: > * Move the recent releases to the top of the page by reducing the huge table > of contents. > * Provide a direct link (via the mirror page) to each release tarball for the > last few releases. > * Provide the sha256 for each of the recent release tarballs. > * Provide a direct link the GPG signature for the recent release tarballs. > * Provide directions on how to verify the GPG signature. > * Provide a link to the signatures in > https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11896) Redesign the releases page on the Hadoop site
[ https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11896: --- Attachment: HADOOP-11896.patch This patch: * turns back on verification, which was turned off because of a bug in Forrest 0.8 * updates the release page with quick links to the 3 most recent releases * adds directions to test the checksums of the release artifacts * adds a link to the historic hadoop releases * cuts down the number of TOC entries so that the real information isn't hidden below a huge TOC. > Redesign the releases page on the Hadoop site > - > > Key: HADOOP-11896 > URL: https://issues.apache.org/jira/browse/HADOOP-11896 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: Apache Hadoop Releases.pdf, HADOOP-11896.patch > > > Redesign the Hadoop site to: > * Move the recent releases to the top of the page by reducing the huge table > of contents. > * Provide a direct link (via the mirror page) to each release tarball for the > last few releases. > * Provide the sha256 for each of the recent release tarballs. > * Provide a direct link the GPG signature for the recent release tarballs. > * Provide directions on how to verify the GPG signature. > * Provide a link to the signatures in > https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11896) Redesign the releases page on the Hadoop site
[ https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11896: --- Attachment: Apache Hadoop Releases.pdf Here is my proposed enhancement to the releases page for those of you that don't want to look through the patch. :) > Redesign the releases page on the Hadoop site > - > > Key: HADOOP-11896 > URL: https://issues.apache.org/jira/browse/HADOOP-11896 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: Apache Hadoop Releases.pdf > > > Redesign the Hadoop site to: > * Move the recent releases to the top of the page by reducing the huge table > of contents. > * Provide a direct link (via the mirror page) to each release tarball for the > last few releases. > * Provide the sha256 for each of the recent release tarballs. > * Provide a direct link the GPG signature for the recent release tarballs. > * Provide directions on how to verify the GPG signature. > * Provide a link to the signatures in > https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11896) Redesign the releases page on the Hadoop site
[ https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524311#comment-14524311 ] Owen O'Malley commented on HADOOP-11896: [~andrew.wang] The https://people.apache.org/keys/group/hadoop.asc URL is automatically maintained while the KEYS file is often sadly out of date. The only case where the KEYS file is better is if a release manager resigns from the Hadoop project and is no longer included in the group/hadoop.asc. That said, either will work and I don't care that much. > Redesign the releases page on the Hadoop site > - > > Key: HADOOP-11896 > URL: https://issues.apache.org/jira/browse/HADOOP-11896 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > Redesign the Hadoop site to: > * Move the recent releases to the top of the page by reducing the huge table > of contents. > * Provide a direct link (via the mirror page) to each release tarball for the > last few releases. > * Provide the sha256 for each of the recent release tarballs. > * Provide a direct link the GPG signature for the recent release tarballs. > * Provide directions on how to verify the GPG signature. > * Provide a link to the signatures in > https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11902) Prune old javadoc versions from the website.
[ https://issues.apache.org/jira/browse/HADOOP-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524248#comment-14524248 ] Owen O'Malley commented on HADOOP-11902: Lets keep: * 2.7.0 * 2.6.0 * 2.5.2 * 0.23.11 * 1.2.1 Let's remove: * r0.23.6 * r0.23.7 * r0.23.8 * r0.23.9 * r0.23.10 * r1.0.4 * r1.1.1 * r1.1.2 * r1.2.0 * r2.0.2-alpha * r2.0.3-alpha * r2.0.4-alpha * r2.0.5-alpha * r2.0.6-alpha * r2.1.0-beta * r2.2.0 * r2.3.0 * r2.4.0 * r2.4.1 * r2.5.0 * r2.5.1 > Prune old javadoc versions from the website. > > > Key: HADOOP-11902 > URL: https://issues.apache.org/jira/browse/HADOOP-11902 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > We have a lot of old versions of javadoc on the website. We should prune the > old versions and redirect the old urls to the current versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11902) Prune old javadoc versions from the website.
Owen O'Malley created HADOOP-11902: -- Summary: Prune old javadoc versions from the website. Key: HADOOP-11902 URL: https://issues.apache.org/jira/browse/HADOOP-11902 Project: Hadoop Common Issue Type: Improvement Components: site Reporter: Owen O'Malley Assignee: Owen O'Malley We have a lot of old versions of javadoc on the website. We should prune the old versions and redirect the old urls to the current versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11896) Redesign the releases page on the Hadoop site
[ https://issues.apache.org/jira/browse/HADOOP-11896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523421#comment-14523421 ] Owen O'Malley commented on HADOOP-11896: The more specific directions that I'm proposing look like: {noformat} % wget https://people.apache.org/keys/group/hadoop.asc % gpg --import hadoop.asc % gpg --verify hadoop-X.Y.Z-src.tar.gz.asc {noformat} > Redesign the releases page on the Hadoop site > - > > Key: HADOOP-11896 > URL: https://issues.apache.org/jira/browse/HADOOP-11896 > Project: Hadoop Common > Issue Type: Improvement > Components: site >Reporter: Owen O'Malley >Assignee: Owen O'Malley > > Redesign the Hadoop site to: > * Move the recent releases to the top of the page by reducing the huge table > of contents. > * Provide a direct link (via the mirror page) to each release tarball for the > last few releases. > * Provide the sha256 for each of the recent release tarballs. > * Provide a direct link the GPG signature for the recent release tarballs. > * Provide directions on how to verify the GPG signature. > * Provide a link to the signatures in > https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11896) Redesign the releases page on the Hadoop site
Owen O'Malley created HADOOP-11896: -- Summary: Redesign the releases page on the Hadoop site Key: HADOOP-11896 URL: https://issues.apache.org/jira/browse/HADOOP-11896 Project: Hadoop Common Issue Type: Improvement Components: site Reporter: Owen O'Malley Assignee: Owen O'Malley Redesign the Hadoop site to: * Move the recent releases to the top of the page by reducing the huge table of contents. * Provide a direct link (via the mirror page) to each release tarball for the last few releases. * Provide the sha256 for each of the recent release tarballs. * Provide a direct link the GPG signature for the recent release tarballs. * Provide directions on how to verify the GPG signature. * Provide a link to the signatures in https://people.apache.org/keys/group/hadoop.asc -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11746) rewrite test-patch.sh
[ https://issues.apache.org/jira/browse/HADOOP-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485894#comment-14485894 ] Owen O'Malley commented on HADOOP-11746: This looks like a nice improvement. It would be great if we could name patch files like git.patch and have it apply to the git hash . That would let you upload patches for branches without worrying about conflicting changes. > rewrite test-patch.sh > - > > Key: HADOOP-11746 > URL: https://issues.apache.org/jira/browse/HADOOP-11746 > Project: Hadoop Common > Issue Type: Test > Components: build, test >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer > Attachments: HADOOP-11746-00.patch, HADOOP-11746-01.patch, > HADOOP-11746-02.patch, HADOOP-11746-03.patch, HADOOP-11746-04.patch, > HADOOP-11746-05.patch, HADOOP-11746-06.patch, HADOOP-11746-07.patch, > HADOOP-11746-09.patch, HADOOP-11746-10.patch, HADOOP-11746-11.patch > > > This code is bad and you should feel bad. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11717) Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth
[ https://issues.apache.org/jira/browse/HADOOP-11717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483341#comment-14483341 ] Owen O'Malley commented on HADOOP-11717: I think this is good to go. If we want to generalize it further when we have additional use cases to support, we can do that. This just provides a plugin for web sso that is useful to users that don't want to use spnego for web ui. > Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth > - > > Key: HADOOP-11717 > URL: https://issues.apache.org/jira/browse/HADOOP-11717 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Larry McCay >Assignee: Larry McCay > Fix For: 2.8.0 > > Attachments: HADOOP-11717-1.patch, HADOOP-11717-2.patch, > HADOOP-11717-3.patch, HADOOP-11717-4.patch, HADOOP-11717-5.patch, > HADOOP-11717-6.patch, HADOOP-11717-7.patch, HADOOP-11717-8.patch, > RedirectingWebSSOwithJWTforHadoopWebUIs.pdf > > > Extend AltKerberosAuthenticationHandler to provide WebSSO flow for UIs. > The actual authentication is done by some external service that the handler > will redirect to when there is no hadoop.auth cookie and no JWT token found > in the incoming request. > Using JWT provides a number of benefits: > * It is not tied to any specific authentication mechanism - so buys us many > SSO integrations > * It is cryptographically verifiable for determining whether it can be trusted > * Checking for expiration allows for a limited lifetime and window for > compromised use > This will introduce the use of nimbus-jose-jwt library for processing, > validating and parsing JWT tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11717) Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth
[ https://issues.apache.org/jira/browse/HADOOP-11717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-11717: --- Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this. Thanks, Larry! > Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth > - > > Key: HADOOP-11717 > URL: https://issues.apache.org/jira/browse/HADOOP-11717 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Larry McCay >Assignee: Larry McCay > Fix For: 2.8.0 > > Attachments: HADOOP-11717-1.patch, HADOOP-11717-2.patch, > HADOOP-11717-3.patch, HADOOP-11717-4.patch, HADOOP-11717-5.patch, > HADOOP-11717-6.patch, HADOOP-11717-7.patch, HADOOP-11717-8.patch, > RedirectingWebSSOwithJWTforHadoopWebUIs.pdf > > > Extend AltKerberosAuthenticationHandler to provide WebSSO flow for UIs. > The actual authentication is done by some external service that the handler > will redirect to when there is no hadoop.auth cookie and no JWT token found > in the incoming request. > Using JWT provides a number of benefits: > * It is not tied to any specific authentication mechanism - so buys us many > SSO integrations > * It is cryptographically verifiable for determining whether it can be trusted > * Checking for expiration allows for a limited lifetime and window for > compromised use > This will introduce the use of nimbus-jose-jwt library for processing, > validating and parsing JWT tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11717) Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth
[ https://issues.apache.org/jira/browse/HADOOP-11717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393661#comment-14393661 ] Owen O'Malley commented on HADOOP-11717: I think this looks good. +1 > Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth > - > > Key: HADOOP-11717 > URL: https://issues.apache.org/jira/browse/HADOOP-11717 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Larry McCay >Assignee: Larry McCay > Attachments: HADOOP-11717-1.patch, HADOOP-11717-2.patch, > HADOOP-11717-3.patch, HADOOP-11717-4.patch, HADOOP-11717-5.patch, > HADOOP-11717-6.patch, HADOOP-11717-7.patch > > > Extend AltKerberosAuthenticationHandler to provide WebSSO flow for UIs. > The actual authentication is done by some external service that the handler > will redirect to when there is no hadoop.auth cookie and no JWT token found > in the incoming request. > Using JWT provides a number of benefits: > * It is not tied to any specific authentication mechanism - so buys us many > SSO integrations > * It is cryptographically verifiable for determining whether it can be trusted > * Checking for expiration allows for a limited lifetime and window for > compromised use > This will introduce the use of nimbus-jose-jwt library for processing, > validating and parsing JWT tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11620) Add Support for Load Balancing across a group of KMS servers for HA
[ https://issues.apache.org/jira/browse/HADOOP-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335034#comment-14335034 ] Owen O'Malley commented on HADOOP-11620: I agree with Larry that the other pattern was better. It is a little strange using a compound like "host1;host2" for the host part of the URI, but moving an override of the port number into the host part is too confusing for little gain. Please go back to the previous version. > Add Support for Load Balancing across a group of KMS servers for HA > --- > > Key: HADOOP-11620 > URL: https://issues.apache.org/jira/browse/HADOOP-11620 > Project: Hadoop Common > Issue Type: Improvement > Components: kms >Affects Versions: 2.6.0 >Reporter: Arun Suresh >Assignee: Arun Suresh > Attachments: HADOOP-11620.1.patch, HADOOP-11620.2.patch, > HADOOP-11620.3.patch > > > This patch needs to add support for : > * specification of multiple hostnames in the kms key provider uri > * KMS client to load balance requests across the hosts specified in the kms > keyprovider uri. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10908) ClusterNodeSetup, CommandsManual, FileSystemShell needs updating
[ https://issues.apache.org/jira/browse/HADOOP-10908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265220#comment-14265220 ] Owen O'Malley commented on HADOOP-10908: +1 Thanks for updating the documentation, Alan! > ClusterNodeSetup, CommandsManual, FileSystemShell needs updating > > > Key: HADOOP-10908 > URL: https://issues.apache.org/jira/browse/HADOOP-10908 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer > Attachments: HADOOP-10908-01.patch, HADOOP-10908-02.patch, > HADOOP-10908-03.patch, HADOOP-10908.patch > > > A lot of the instructions in the cluster node setup are not good practices > post-9902. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11055) non-daemon pid files are missing
[ https://issues.apache.org/jira/browse/HADOOP-11055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136172#comment-14136172 ] Owen O'Malley commented on HADOOP-11055: +1 looks good, Allen. > non-daemon pid files are missing > > > Key: HADOOP-11055 > URL: https://issues.apache.org/jira/browse/HADOOP-11055 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Attachments: HADOOP-11055.patch > > > Somewhere along the way, daemons run in default mode lost pid files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10970) Cleanup KMS configuration keys
[ https://issues.apache.org/jira/browse/HADOOP-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102382#comment-14102382 ] Owen O'Malley commented on HADOOP-10970: The comment on the hadoop.security.key.provider.path value says 'URI' instead of 'URI path'. Please fix it. > Cleanup KMS configuration keys > -- > > Key: HADOOP-10970 > URL: https://issues.apache.org/jira/browse/HADOOP-10970 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hadoop-10970.001.patch, hadoop-10970.002.patch, > hadoop-10970.003.patch > > > It'd be nice to add descriptions to the config keys in kms-site.xml. > Also, it'd be good to rename key.provider.path to key.provider.uri for > clarity, or just drop ".path". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10904) Provide Alt to Clear Text Passwords through Cred Provider API
[ https://issues.apache.org/jira/browse/HADOOP-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098768#comment-14098768 ] Owen O'Malley commented on HADOOP-10904: Daryn, This is a very different use case from "svn+ssh" and this reads really badly. jks+hdfs://nn.example.com/foo/bar.jks furthermore, it doesn't nest right: jks+har+hdfs://nn.example.com/foo/bar is a complete mess. What are you trying to accomplish that this makes difficult? > Provide Alt to Clear Text Passwords through Cred Provider API > - > > Key: HADOOP-10904 > URL: https://issues.apache.org/jira/browse/HADOOP-10904 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Larry McCay >Assignee: Larry McCay > > This is an umbrella jira to track various child tasks to uptake the > credential provider API to enable deployments without storing > passwords/credentials in clear text. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10970) Cleanup KMS configuration keys
[ https://issues.apache.org/jira/browse/HADOOP-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098757#comment-14098757 ] Owen O'Malley commented on HADOOP-10970: -1 to removing the path ability. What would the lookup be doing? What would an example look like? > Cleanup KMS configuration keys > -- > > Key: HADOOP-10970 > URL: https://issues.apache.org/jira/browse/HADOOP-10970 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hadoop-10970.001.patch > > > It'd be nice to add descriptions to the config keys in kms-site.xml. > Also, it'd be good to rename key.provider.path to key.provider.uri for > clarity, or just drop ".path". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10912) Modify scripts to use relative paths
[ https://issues.apache.org/jira/browse/HADOOP-10912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-10912: --- Attachment: hadoop-10912.path This is probably subsumed by Alan's HADOOP-9902, but here is a rough patch to show generally how this would work without Alan's complete re-write. > Modify scripts to use relative paths > > > Key: HADOOP-10912 > URL: https://issues.apache.org/jira/browse/HADOOP-10912 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Reporter: Owen O'Malley > Attachments: hadoop-10912.path > > > Make all of the scripts use relative paths for defaulting HADOOP_HOME and the > related paths. This is useful for any deployments that don't use > /usr/lib/hadoop as the prefix, including but not limited to tar ball > installations and side by side installs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10912) Modify scripts to use relative paths
Owen O'Malley created HADOOP-10912: -- Summary: Modify scripts to use relative paths Key: HADOOP-10912 URL: https://issues.apache.org/jira/browse/HADOOP-10912 Project: Hadoop Common Issue Type: Improvement Components: scripts Reporter: Owen O'Malley Make all of the scripts use relative paths for defaulting HADOOP_HOME and the related paths. This is useful for any deployments that don't use /usr/lib/hadoop as the prefix, including but not limited to tar ball installations and side by side installs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support
[ https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078499#comment-14078499 ] Owen O'Malley commented on HADOOP-10791: Alejandro, there is a huge difference between requiring zookeeper for HA and requiring zookeeper for spnego. The unnecessary complexity is creating a plugin interface for the one use case that is completely covered by the plugin interface you already have. > AuthenticationFilter should support externalizing the secret for signing and > provide rotation support > - > > Key: HADOOP-10791 > URL: https://issues.apache.org/jira/browse/HADOOP-10791 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 2.4.1 >Reporter: Alejandro Abdelnur >Assignee: Robert Kanter > Attachments: HADOOP-10791.patch, HADOOP-10791.patch > > > It should be possible to externalize the secret used to sign the hadoop-auth > cookies. > In the case of WebHDFS the shared secret used by NN and DNs could be used. In > the case of Oozie HA, the secret could be stored in Oozie HA control data in > ZooKeeper. > In addition, it is desirable for the secret to change periodically, this > means that the AuthenticationService should remember a previous secret for > the max duration of hadoop-auth cookie. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support
[ https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077905#comment-14077905 ] Owen O'Malley commented on HADOOP-10791: In particular, this is just a rolling random key that you want to preserve the last two values of. It doesn't make sense to require zookeeper if the user doesn't already have it deployed. > AuthenticationFilter should support externalizing the secret for signing and > provide rotation support > - > > Key: HADOOP-10791 > URL: https://issues.apache.org/jira/browse/HADOOP-10791 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 2.4.1 >Reporter: Alejandro Abdelnur >Assignee: Robert Kanter > Attachments: HADOOP-10791.patch, HADOOP-10791.patch > > > It should be possible to externalize the secret used to sign the hadoop-auth > cookies. > In the case of WebHDFS the shared secret used by NN and DNs could be used. In > the case of Oozie HA, the secret could be stored in Oozie HA control data in > ZooKeeper. > In addition, it is desirable for the secret to change periodically, this > means that the AuthenticationService should remember a previous secret for > the max duration of hadoop-auth cookie. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10791) AuthenticationFilter should support externalizing the secret for signing and provide rotation support
[ https://issues.apache.org/jira/browse/HADOOP-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077904#comment-14077904 ] Owen O'Malley commented on HADOOP-10791: Alejandro, It looks like it would make sense to use the KeyProvider for this. Having a KeyProvider implementation that reads from Zookeeper would be pretty easy. > AuthenticationFilter should support externalizing the secret for signing and > provide rotation support > - > > Key: HADOOP-10791 > URL: https://issues.apache.org/jira/browse/HADOOP-10791 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 2.4.1 >Reporter: Alejandro Abdelnur >Assignee: Robert Kanter > Attachments: HADOOP-10791.patch, HADOOP-10791.patch > > > It should be possible to externalize the secret used to sign the hadoop-auth > cookies. > In the case of WebHDFS the shared secret used by NN and DNs could be used. In > the case of Oozie HA, the secret could be stored in Oozie HA control data in > ZooKeeper. > In addition, it is desirable for the secret to change periodically, this > means that the AuthenticationService should remember a previous secret for > the max duration of hadoop-auth cookie. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069429#comment-14069429 ] Owen O'Malley commented on HADOOP-10607: Alejandro, I'm puzzled why you are puzzled. We've always added components and functionality to Hadoop that are useful to upstream components. A mechanism for managing passwords without storing them in plain text passwords is a wonderful addition. There are many places in the Hadoop ecosystem where passwords are stored in config files, such as hadoop-auth and the hive metastore. Giving them a common structure for removing them is a good thing. > Create an API to Separate Credentials/Password Storage from Applications > > > Key: HADOOP-10607 > URL: https://issues.apache.org/jira/browse/HADOOP-10607 > Project: Hadoop Common > Issue Type: New Feature > Components: security >Reporter: Larry McCay >Assignee: Larry McCay > Fix For: 3.0.0, 2.6.0 > > Attachments: 10607-10.patch, 10607-11.patch, 10607-12.patch, > 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607-6.patch, > 10607-7.patch, 10607-8.patch, 10607-9.patch, 10607-branch-2.patch, 10607.patch > > > As with the filesystem API, we need to provide a generic mechanism to support > multiple credential storage mechanisms that are potentially from third > parties. > We need the ability to eliminate the storage of passwords and secrets in > clear text within configuration files or within code. > Toward that end, I propose an API that is configured using a list of URLs of > CredentialProviders. The implementation will look for implementations using > the ServiceLoader interface and thus support third party libraries. > Two providers will be included in this patch. One using the credentials cache > in MapReduce jobs and the other using Java KeyStores from either HDFS or > local file system. > A CredShell CLI will also be included in this patch which provides the > ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10793) KeyShell and CredentialShell args should use single-dash style
[ https://issues.apache.org/jira/browse/HADOOP-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065508#comment-14065508 ] Owen O'Malley commented on HADOOP-10793: It won't be incompatible if you accept either one or two dashes. > KeyShell and CredentialShell args should use single-dash style > -- > > Key: HADOOP-10793 > URL: https://issues.apache.org/jira/browse/HADOOP-10793 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 3.0.0 >Reporter: Mike Yoder > > Follow-on from HADOOP-10736 as per [~andrew.wang] - the key shell uses the > gnu double dash style for command line args, while other command line > programs use a single dash. Consider changing this, and consider another > argument parsing scheme, like the CommandLine class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10793) KeyShell and CredentialShell args should use single-dash style
[ https://issues.apache.org/jira/browse/HADOOP-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065493#comment-14065493 ] Owen O'Malley commented on HADOOP-10793: Sorry, to be clear I mean that you should fix the rest of the Hadoop commands to accept either one or two dashes. Obviously the old commands can't require two dashes without breaking compatibility. > KeyShell and CredentialShell args should use single-dash style > -- > > Key: HADOOP-10793 > URL: https://issues.apache.org/jira/browse/HADOOP-10793 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Affects Versions: 3.0.0 >Reporter: Mike Yoder > > Follow-on from HADOOP-10736 as per [~andrew.wang] - the key shell uses the > gnu double dash style for command line args, while other command line > programs use a single dash. Consider changing this, and consider another > argument parsing scheme, like the CommandLine class. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10607) Create an API to Separate Credentials/Password Storage from Applications
[ https://issues.apache.org/jira/browse/HADOOP-10607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065485#comment-14065485 ] Owen O'Malley commented on HADOOP-10607: Andrew, it has to get released before it can be used by external components. Is there a technical concern with it getting in the 2.6 release? > Create an API to Separate Credentials/Password Storage from Applications > > > Key: HADOOP-10607 > URL: https://issues.apache.org/jira/browse/HADOOP-10607 > Project: Hadoop Common > Issue Type: New Feature > Components: security >Reporter: Larry McCay >Assignee: Larry McCay > Fix For: 3.0.0, 2.6.0 > > Attachments: 10607-10.patch, 10607-11.patch, 10607-12.patch, > 10607-2.patch, 10607-3.patch, 10607-4.patch, 10607-5.patch, 10607-6.patch, > 10607-7.patch, 10607-8.patch, 10607-9.patch, 10607-branch-2.patch, 10607.patch > > > As with the filesystem API, we need to provide a generic mechanism to support > multiple credential storage mechanisms that are potentially from third > parties. > We need the ability to eliminate the storage of passwords and secrets in > clear text within configuration files or within code. > Toward that end, I propose an API that is configured using a list of URLs of > CredentialProviders. The implementation will look for implementations using > the ServiceLoader interface and thus support third party libraries. > Two providers will be included in this patch. One using the credentials cache > in MapReduce jobs and the other using Java KeyStores from either HDFS or > local file system. > A CredShell CLI will also be included in this patch which provides the > ability to manage the credentials within the stores. -- This message was sent by Atlassian JIRA (v6.2#6252)