[jira] [Commented] (YARN-6153) keepContainer does not work when AM retry window is set
[ https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864678#comment-15864678 ] Jian He commented on YARN-6153: --- could you also add a test case ? > keepContainer does not work when AM retry window is set > --- > > Key: YARN-6153 > URL: https://issues.apache.org/jira/browse/YARN-6153 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: kyungwan nam > Attachments: YARN-6153.001.patch > > > yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster. > I submitted a YARN application (slider app) that keepContainers=true, > attemptFailuresValidityInterval=30. > it did work properly when AM was failed firstly. > all containers launched by previous AM were resynced with new AM (attempt2) > without killing containers. > after 10 minutes, I thought AM failure count was reset by > attemptFailuresValidityInterval (5 minutes). > but, all containers were killed when AM was failed secondly. (new AM attempt3 > was launched properly) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6153) keepContainer does not work when AM retry window is set
[ https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864677#comment-15864677 ] Jian He commented on YARN-6153: --- [~kyungwan nam], thanks for the patch. Minor suggestion to the code: In RMAppImpl, we also have below code to detect whether a failure should be counted towards the max-retry. I think we can move the logic of checking the validity interval inside shouldCountTowardsMaxAttemptRetry itself, so that this method could be used by both RMAppImpl and RMAttemptImpl {code} if (attempt.shouldCountTowardsMaxAttemptRetry()) { if (this.attemptFailuresValidityInterval <= 0 || (attempt.getFinishTime() > endTime - this.attemptFailuresValidityInterval)) { completedAttempts++; } } {code} > keepContainer does not work when AM retry window is set > --- > > Key: YARN-6153 > URL: https://issues.apache.org/jira/browse/YARN-6153 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: kyungwan nam > Attachments: YARN-6153.001.patch > > > yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster. > I submitted a YARN application (slider app) that keepContainers=true, > attemptFailuresValidityInterval=30. > it did work properly when AM was failed firstly. > all containers launched by previous AM were resynced with new AM (attempt2) > without killing containers. > after 10 minutes, I thought AM failure count was reset by > attemptFailuresValidityInterval (5 minutes). > but, all containers were killed when AM was failed secondly. (new AM attempt3 > was launched properly) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854841#comment-15854841 ] Jian He commented on YARN-6013: --- [~Steven Rand], the server log you provided does not have any exceptions - it's for a different time range. Are you able to get the corresponding server log when the exception happens ? I also converted this to a hadoop common jira > ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when > RPC privacy is enabled > -- > > Key: YARN-6013 > URL: https://issues.apache.org/jira/browse/YARN-6013 > Project: Hadoop YARN > Issue Type: Bug > Components: client, yarn >Affects Versions: 2.8.0 >Reporter: Steven Rand >Priority: Critical > Attachments: YARN-6013-branch-2.8.0.002.patch, yarn-rm-log.txt > > > When privacy is enabled for RPC (hadoop.rpc.protection = privacy), > {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) > fails with an EOFException. I've reproduced this with Spark 2.0.2 built > against latest branch-2.8 and with a simple distcp job on latest branch-2.8. > Steps to reproduce using distcp: > 1. Set hadoop.rpc.protection equal to privacy > 2. Write data to HDFS. I did this with Spark as follows: > {code} > sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, > org.apache.commons.lang.RandomStringUtils.random(1024, > "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData") > {code} > 3. Attempt to distcp that data to another location in HDFS. For example: > {code} > hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData > hdfs:///tmp/testDataCopy > {code} > I observed this error in the ApplicationMaster's syslog: > {code} > 2016-12-19 19:13:50,097 INFO [eventHandlingThread] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer > setup for JobId: job_1482189777425_0004, File: > hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist > 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 > AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 > HostLocal:0 RackLocal:0 > 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() > for application_1482189777425_0004: ask=1 release= 0 newContainers=0 > finishedContainers=0 resourcelimit= knownNMs=3 > 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] > org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking > ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after > sleeping for 3ms. > java.io.EOFException: End of File Exception between local host is: > "/"; destination host is: "":8030; > : java.io.EOFException; For more details see: > http://wiki.apache.org/hadoop/EOFException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486) > at org.apache.hadoop.ipc.Client.call(Client.java:1428) > at org.apache.hadoop.ipc.Client.call(Client.java:1338) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy80.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.jav
[jira] [Commented] (YARN-6145) Improve log message on fail over
[ https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852228#comment-15852228 ] Jian He commented on YARN-6145: --- sample log for RequestHedgingRMFailoverProxyProvider after the patch {code} 17/02/03 22:34:26 INFO impl.TimelineClientImpl: Timeline service address: http://host:8188/ws/v1/timeline/ 17/02/03 22:34:26 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2] 17/02/03 22:34:26 INFO client.AHSProxy: Connecting to Application History server at host/172.22.126.225:10200 17/02/03 22:34:27 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]... 17/02/03 22:34:27 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM on [rm2] 17/02/03 22:34:28 INFO mapreduce.JobSubmitter: number of splits:1 17/02/03 22:34:29 INFO impl.YarnClientImpl: Submitted application application_1486160572621_0002 {code} > Improve log message on fail over > > > Key: YARN-6145 > URL: https://issues.apache.org/jira/browse/YARN-6145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-6145.1.patch > > > On failover, a series of exception stack shown in the log, which is harmless, > but confusing to user. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6145) Improve log message on fail over
[ https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-6145: -- Attachment: (was: YARN-6145.1.patch) > Improve log message on fail over > > > Key: YARN-6145 > URL: https://issues.apache.org/jira/browse/YARN-6145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-6145.1.patch > > > On failover, a series of exception stack shown in the log, which is harmless, > but confusing to user. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6145) Improve log message on fail over
[ https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-6145: -- Attachment: YARN-6145.1.patch > Improve log message on fail over > > > Key: YARN-6145 > URL: https://issues.apache.org/jira/browse/YARN-6145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-6145.1.patch > > > On failover, a series of exception stack shown in the log, which is harmless, > but confusing to user. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6145) Improve log message on fail over
[ https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852207#comment-15852207 ] Jian He commented on YARN-6145: --- Sample log for ConfiguredRMFailoverProxyProvider after the patch: {code} 17/02/03 21:45:18 INFO impl.TimelineClientImpl: Timeline service address: http://host:8188/ws/v1/timeline/ 17/02/03 21:45:18 INFO client.AHSProxy: Connecting to Application History server at host/172.22.126.225:10200 17/02/03 21:45:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 17/02/03 21:45:19 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 1 failover attempts. Trying to failover after sleeping for 24348ms. 17/02/03 21:45:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 17/02/03 21:45:44 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 2 failover attempts. Trying to failover after sleeping for 20126ms. 17/02/03 21:46:04 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 17/02/03 21:46:04 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 3 failover attempts. Trying to failover after sleeping for 44768ms. 17/02/03 21:46:48 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 17/02/03 21:46:48 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 4 failover attempts. Trying to failover after sleeping for 20670ms. 17/02/03 21:47:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 17/02/03 21:47:09 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 5 failover attempts. Trying to failover after sleeping for 42523ms. 17/02/03 21:47:52 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 17/02/03 21:47:52 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 6 failover attempts. Trying to failover after sleeping for 16803ms. {code} > Improve log message on fail over > > > Key: YARN-6145 > URL: https://issues.apache.org/jira/browse/YARN-6145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-6145.1.patch > > > On failover, a series of exception stack shown in the log, which is harmless, > but confusing to user. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6145) Improve log message on fail over
[ https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-6145: -- Attachment: YARN-6145.1.patch A couple of messages are changed to debug level, as the caller will eventually log when retry ends. Added few logs in RequestHedgingRMFailoverProxyProvider RetryInvocationHandler is also changed to not print the stack if at retrying > Improve log message on fail over > > > Key: YARN-6145 > URL: https://issues.apache.org/jira/browse/YARN-6145 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-6145.1.patch > > > On failover, a series of exception stack shown in the log, which is harmless, > but confusing to user. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6145) Improve log message on fail over
Jian He created YARN-6145: - Summary: Improve log message on fail over Key: YARN-6145 URL: https://issues.apache.org/jira/browse/YARN-6145 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He On failover, a series of exception stack shown in the log, which is harmless, but confusing to user. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3053) [Security] Review and implement authentication in ATS v.2
[ https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840442#comment-15840442 ] Jian He commented on YARN-3053: --- Yeah, makes sense to me. Let's move the discussion to YARN-6121 for off-apps. I think we have general consensus here for managed AMs. Would you like to update the design doc and may be open sub-jiras and start the development ? > [Security] Review and implement authentication in ATS v.2 > - > > Key: YARN-3053 > URL: https://issues.apache.org/jira/browse/YARN-3053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Labels: YARN-5355, yarn-5355-merge-blocker > Attachments: ATSv2Authentication(draft).pdf > > > Per design in YARN-2928, we want to evaluate and review the system for > security, and ensure proper security in the system. > This includes proper authentication, token management, access control, and > any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-3053) [Security] Review and implement authentication in ATS v.2
[ https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838885#comment-15838885 ] Jian He edited comment on YARN-3053 at 1/26/17 12:27 AM: - bq. Because for such clients we will not have a mechanism to pass the token when collector/NM restarts. sorry, didn't get that. For such apps, won't the client still need to pass the new address to the AMs in some way. IIUC, it has no difference with passing the token. Also, I'm not sure the original collector design had accounted for unmanaged AM in general case. (I think the collector is not even launched currently for unmanaged AM). A lot other details need to be freshed out. was (Author: jianhe): bq. Because for such clients we will not have a mechanism to pass the token when collector/NM restarts. sorry, didn't get that. For such apps, won't the client still need to pass the new address to the AMs in app's own way. IIUC, it has no difference with passing the token. Also, I'm not sure the original collector design had accounted for unmanaged AM in general case. (I think the collector is not even launched currently for unmanaged AM). A lot other details need to be freshed out. > [Security] Review and implement authentication in ATS v.2 > - > > Key: YARN-3053 > URL: https://issues.apache.org/jira/browse/YARN-3053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Labels: YARN-5355, yarn-5355-merge-blocker > Attachments: ATSv2Authentication(draft).pdf > > > Per design in YARN-2928, we want to evaluate and review the system for > security, and ensure proper security in the system. > This includes proper authentication, token management, access control, and > any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3053) [Security] Review and implement authentication in ATS v.2
[ https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838885#comment-15838885 ] Jian He commented on YARN-3053: --- bq. Because for such clients we will not have a mechanism to pass the token when collector/NM restarts. sorry, didn't get that. For such apps, won't the client still need to pass the new address to the AMs in app's own way. IIUC, it has no difference with passing the token. Also, I'm not sure the original collector design had accounted for unmanaged AM in general case. (I think the collector is not even launched currently for unmanaged AM). A lot other details need to be freshed out. > [Security] Review and implement authentication in ATS v.2 > - > > Key: YARN-3053 > URL: https://issues.apache.org/jira/browse/YARN-3053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Labels: YARN-5355, yarn-5355-merge-blocker > Attachments: ATSv2Authentication(draft).pdf > > > Per design in YARN-2928, we want to evaluate and review the system for > security, and ensure proper security in the system. > This includes proper authentication, token management, access control, and > any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3053) [Security] Review and implement authentication in ATS v.2
[ https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836412#comment-15836412 ] Jian He commented on YARN-3053: --- bq. Any need to generate the token if security is not enabled? Ah, right. we still need the field for collector address in insecure mode. > [Security] Review and implement authentication in ATS v.2 > - > > Key: YARN-3053 > URL: https://issues.apache.org/jira/browse/YARN-3053 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Labels: YARN-5355, yarn-5355-merge-blocker > Attachments: ATSv2Authentication(draft).pdf > > > Per design in YARN-2928, we want to evaluate and review the system for > security, and ensure proper security in the system. > This includes proper authentication, token management, access control, and > any other relevant security aspects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832738#comment-15832738 ] Jian He commented on YARN-5910: --- testFinishedAppRemovalAfterRMRestart passed locally for me.. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch, > YARN-5910.7.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: YARN-5910.7.patch new patch addressed all comments > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch, > YARN-5910.7.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832277#comment-15832277 ] Jian He commented on YARN-5910: --- bq. It's confusing that the max size check is using capacity() but the error message uses position(). missed to change that.. bq. I'm curious on the reasoning for removing the assert for NEW state? Because I feel that's obvious and not needed.. bq. TestAppManager fails consistently for me with the patch applied and passes consistently without. Please investigate. It's because the am containerLaunchContext is null in the UT which failed with NPE in the new code "submissionContext.getAMContainerSpec().getTokensConf()". I think it's ok to assume am ContainerLaunchContext being not null? As I see other code does the same in this call path, like "submissionContext.getAMContainerSpec().getApplicationACLs()" in RMAppManager. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at >
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: YARN-5910.6.patch > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: (was: YARN-5910.6.patch) > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831123#comment-15831123 ] Jian He commented on YARN-5910: --- Thanks again for the reviews ! bq. I'd either move the regex example into the description itself done. bq. I could just specify one property with a gigantic payload good point.. thought the number of configs indirectly means the size, and was lazy at calculating the numbers.. missed this scenario.. I changed to check based on bytes. bq. I am wondering how users/admins are going to debug their settings for the new property good point.. it was there when I was debugging this feature.. I added the debug level logging in both YarnRunner and DelegationTokenRenewer uploaded a patch that addressed all comments. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.j
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: YARN-5910.6.patch > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: YARN-5910.5.patch Uploaded a patch that addressed all the comments. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828844#comment-15828844 ] Jian He commented on YARN-5910: --- bq. whether we may need some RM-specific configs to be able to successfully connect with kerberos. There may be some remappings that the admins only bothered to configure on the RM or are RM specific? sorry, didn't get you. The 'dfs.namenode.kerberos.principal' is actually HDFS config, not RM config. If two clusters have different DFS principal name configured, when MR client asks for the delegation token from both clusters, I guess this check will fail, because it cannot differentiate the cluster. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828810#comment-15828810 ] Jian He edited comment on YARN-5910 at 1/18/17 9:41 PM: bq. Yeah, I'm thinking it's unnecessary to check both. sounds good, I'll remove the is security enabled check in YARNRunner. Regarding the if security enabled check in ClientRMSerivce, do you also prefer removing it ? bq. Configuration.addResource will add a resource object to the list of resources for the config and never get rid of them. This will cause every app-specific conf to be tracked by renewerConf forever, resulting in a memory leak. Ah, I see. Good point. I didn't understand you previous comment about this. So I've done the experiment. Actually, we don't need RM's own config for renew. Additionally, we need to pass in the dfs.namenode.kerberos.principal from the client to pass the check in SaslRpcClient#getServerPrincipal where it checks whether the remote principle equals to the local config. I have one question about this design: the dfs.namenode.kerberos.principal is not differentiated by clusterId. So it assumes all clusters will have the same value for 'dfs.namenode.kerberos.principal' ? This applies to all other service including RM as well. So I can just use appConfig in DelegationTokenRenewer. I'll also add the config limit in RM. was (Author: jianhe): bq. Yeah, I'm thinking it's unnecessary to check both. sounds good, I'll remove the is security enabled check in YARNRunner. Regarding the if security enabled check in ClientRMSerivce, do you also prefer removing it ? bq. Configuration.addResource will add a resource object to the list of resources for the config and never get rid of them. This will cause every app-specific conf to be tracked by renewerConf forever, resulting in a memory leak. Ah, I see. Good point. I didn't understand you previous comment about this. So I've done the experiment. Actually, we don't need RM's own config for renew. Additionally, we need to pass in the dfs.namenode.kerberos.principal from the client to pass the check in SaslRpcClient#getServerPrincipal where it checks whether the remote principle equals to the local config. I have one question about this design: the dfs.namenode.kerberos.principal is not differentiated by clusterId. So it assumes all clusters will have the same value for 'dfs.namenode.kerberos.principal' ? This applies to all other service including as well. So I can just use appConfig in DelegationTokenRenewer. I'll also add the config limit in RM. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.h
[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828810#comment-15828810 ] Jian He edited comment on YARN-5910 at 1/18/17 9:41 PM: bq. Yeah, I'm thinking it's unnecessary to check both. sounds good, I'll remove the is security enabled check in YARNRunner. Regarding the if security enabled check in ClientRMSerivce, do you also prefer removing it ? bq. Configuration.addResource will add a resource object to the list of resources for the config and never get rid of them. This will cause every app-specific conf to be tracked by renewerConf forever, resulting in a memory leak. Ah, I see. Good point. I didn't understand you previous comment about this. So I've done the experiment. Actually, we don't need RM's own config for renew. Additionally, we need to pass in the dfs.namenode.kerberos.principal from the client to pass the check in SaslRpcClient#getServerPrincipal where it checks whether the remote principle equals to the local config. I have one question about this design: the dfs.namenode.kerberos.principal is not differentiated by clusterId. So it assumes all clusters will have the same value for 'dfs.namenode.kerberos.principal' ? This applies to all other service including as well. So I can just use appConfig in DelegationTokenRenewer. I'll also add the config limit in RM. was (Author: jianhe): bq. Yeah, I'm thinking it's unnecessary to check both. sounds good, I'll remove the is security enabled check in YARNRunner. Regarding the if security enabled check in ClientRMSerivce, do you also prefer removing it ? bq. Configuration.addResource will add a resource object to the list of resources for the config and never get rid of them. This will cause every app-specific conf to be tracked by renewerConf forever, resulting in a memory leak. Ah, I see. Good point. I didn't understand you previous comment about this. So I've done the experiment. Actually, we don't need RM's own config for renew. Additionally, we need to pass in the dfs.namenode.kerberos.principal from the client to pass the check in SaslRpcClient#getServerPrincipal where it checks whether the remote principle equals to the local config. I have one question about this design: the dfs.namenode.kerberos.principal is not differentiated by clusterId. So when MR client asks delegation token from both clusters, it assumes all clusters will have the same value for 'dfs.namenode.kerberos.principal' ? So I can just use appConfig in DelegationTokenRenewer. I'll also add the config limit in RM. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.
[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828810#comment-15828810 ] Jian He commented on YARN-5910: --- bq. Yeah, I'm thinking it's unnecessary to check both. sounds good, I'll remove the is security enabled check in YARNRunner. Regarding the if security enabled check in ClientRMSerivce, do you also prefer removing it ? bq. Configuration.addResource will add a resource object to the list of resources for the config and never get rid of them. This will cause every app-specific conf to be tracked by renewerConf forever, resulting in a memory leak. Ah, I see. Good point. I didn't understand you previous comment about this. So I've done the experiment. Actually, we don't need RM's own config for renew. Additionally, we need to pass in the dfs.namenode.kerberos.principal from the client to pass the check in SaslRpcClient#getServerPrincipal where it checks whether the remote principle equals to the local config. I have one question about this design: the dfs.namenode.kerberos.principal is not differentiated by clusterId. So when MR client asks delegation token from both clusters, it assumes all clusters will have the same value for 'dfs.namenode.kerberos.principal' ? So I can just use appConfig in DelegationTokenRenewer. I'll also add the config limit in RM. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager
[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828654#comment-15828654 ] Jian He edited comment on YARN-5910 at 1/18/17 7:43 PM: Hi Jason, thank you very much for the review ! bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet it clearly is not the default value. removed it bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm wondering if this can cause issues where the current cluster isn't secure but the RM needs to renew the job's tokens for a remote secure cluster or some other secure service. Seems like if this conf is set then that's all we need to know. Currently, the RM DelegationTokenRewener will only add the tokens if security is enabled (code in RMAppManager#submitApplication), so I think with this existing implemtation, we can assume this feature is for security enabled only ? bq. Similarly the code explicitly fails in ClientRMService if the conf is there when security is disabled which seems like we're taking a case that isn't optimal but should work benignly and explicitly making sure it fails. Not sure that's user friendly behavior. My intention was to prevent user from sending conf in non-secure mode(which anyways is not needed if my above reply is true), in case the conf size huge which may increase load on RM. On ther other hand, Varun chatted offline that we can add a limit config in RM to limit the size of configs, your opinion ? bq. Nit: For the ByteBuffer usage in parseCredentials and parseTokensConf, the rewind method calls seem unnecessary since we're throwing the buffers away immediately afterwards. Actually, the bytebuffer is a direct reference from the containerLaunchContext, not a copy. I think this is also required because it was specifically to solve issues in YARN-2893. bq. Should the Configuration constructor call in parseTokensConf be using the version that does not load defaults? If not then I recommend we at least allow a conf to be passed in to use as a copy constructor.Loading a new Configuration from scratch is really expensive and we should avoid it if possible. See the discussion on HADOOP-11223 for details. Good point. I actually did the same in YarnRunner#setAppConf method, but missed this place. bq. In DelegationTokenRenewer, why aren't we using the appConf as-is when renewing the tokens? I wasn't sure whether the mere appConf is enough for the connection - (Is there any kerberos related configs for RM itself are required for authentication?). Let me do some experiments, if this works, I'll just use appConf. bq. Also it looks like we're polluting subsequent app-conf renewals with prior app configurations, as well as simply leaking appConf objects as renewerConf resources infinitum. I don't see where renewerConf gets reset in-between. My previous patch made a copy of each appConf and merge with RM's conf(for the reason I wasn't sure whether RM's own conf is required) and use that for renwer. But then I think this maybe bad because every app will have its own copy of configs, which may largely increase the memory size if the number of apps is very big. So, in the latest patch I changed it to let all apps share the same renewerConf - this is based on the assumption that "dfs.nameservices" must have distint keys for each distinct cluster, so we won't have situation where two apps use different configs for the same cluster - it is true that unnecessary configs used by 1st app will be shared by subsequent apps. bq. Arguably there should be a unit tests that verifies a first app with token conf key A and a second app with token conf key B doesn't leave a situation where the renewals of the second app are polluted with conf key A. If the mere appConf works, we should be fine. bq. Speaking of unit tests, I see where we fixed up the YARN unit tests to pass the new conf but not a new test that verifies the specified conf is used appropriately when renewing for that app and not for other apps that didn't specify a conf. Yep, I'll add the UT. was (Author: jianhe): Hi Jason, thank you very much for the review ! bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet it clearly is not the default value. removed it bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm wondering if this can cause issues where the current cluster isn't secure but the RM needs to renew the job's tokens for a remote secure cluster or some other secure service. Seems like if this conf is set then that's all we need to know. Currently, the RM DelegationTokenRewener will only add the tokens if security is enabled (code in RMAppManager#submitApplication), so I think with this existing implemtation, we can assume this feature is for security enabled only ? bq. Similarly the code explicit
[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828654#comment-15828654 ] Jian He commented on YARN-5910: --- Hi Jason, thank you very much for the review ! bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet it clearly is not the default value. removed it bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm wondering if this can cause issues where the current cluster isn't secure but the RM needs to renew the job's tokens for a remote secure cluster or some other secure service. Seems like if this conf is set then that's all we need to know. Currently, the RM DelegationTokenRewener will only add the tokens if security is enabled (code in RMAppManager#submitApplication), so I think with this existing implemtation, we can assume this feature is for security enabled only ? bq. Similarly the code explicitly fails in ClientRMService if the conf is there when security is disabled which seems like we're taking a case that isn't optimal but should work benignly and explicitly making sure it fails. Not sure that's user friendly behavior. My intention was to prevent user from sending conf in non-secure mode(which anyways is not needed if my above reply is true), in case the conf size huge which may increase load on RM. On ther other hand, Varun chatted offline that we can add a limit config in RM to limit the size of configs, your opinion ? bq. Nit: For the ByteBuffer usage in parseCredentials and parseTokensConf, the rewind method calls seem unnecessary since we're throwing the buffers away immediately afterwards. Actually, the bytebuffer is a direct reference from the containerLaunchContext, not a copy. I think this is also required because it was specifically to solve issues in YARN-2893. bq. Should the Configuration constructor call in parseTokensConf be using the version that does not load defaults? If not then I recommend we at least allow a conf to be passed in to use as a copy constructor.Loading a new Configuration from scratch is really expensive and we should avoid it if possible. See the discussion on HADOOP-11223 for details. Good point. I actually did the same in YarnRunner#setAppConf method, but missed this place. bq. In DelegationTokenRenewer, why aren't we using the appConf as-is when renewing the tokens? I wasn't sure whether the mere appConf is enough for the connection - (Is there any kerberos related configs for RM itself are required for authentication?). Let me do some experiments, if this works, I'll just use appConf. bq. Also it looks like we're polluting subsequent app-conf renewals with prior app configurations, as well as simply leaking appConf objects as renewerConf resources infinitum. I don't see where renewerConf gets reset in-between. My previous patch made a copy of each appConf and merge with RM's conf(for the reason I wasn't sure whether RM's own conf is required) and use that for renwer. But then I think this maybe bad because every app will have its own copy of configs, which may largely increase the memory size if the number of apps is very big. So, in the latest patch I changed it to let all apps share the same renewerConf - this is based on the assumption that "dfs.nameservices" must have distint keys for each distinct cluster, so we won't have situation where two apps use different configs for the same cluster - it is true that unnecessary configs used by 1st app will be shared by subsequent apps. bq. Arguably there should be a unit tests that verifies a first app with token conf key A and a second app with token conf key B doesn't leave a situation where the renewals of the second app are polluted with conf key A. If the mere appConf works, we should be fine. Speaking of unit tests, I see where we fixed up the YARN unit tests to pass the new conf but not a new test that verifies the specified conf is used appropriately when renewing for that app and not for other apps that didn't specify a conf. Yep, I'll add the UT. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{h
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: YARN-5910.4.patch fixed failed UT. testRMAppSubmitWithValidTokens is removed, because it's not actually being tested as expected as security is not enabled in the test scope, and the test scenario should already be covered in other place. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, > YARN-5910.3.patch, YARN-5910.4.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: YARN-5910.3.patch Fixed jenkins issues > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch, YARN-5910.3.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: YARN-5910.2.patch Updated the patch with minor changes. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch, YARN-5910.2.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6016) Bugs in AMRMProxy handling (local)AMRMToken
[ https://issues.apache.org/jira/browse/YARN-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822667#comment-15822667 ] Jian He commented on YARN-6016: --- lgtm too, thanks > Bugs in AMRMProxy handling (local)AMRMToken > --- > > Key: YARN-6016 > URL: https://issues.apache.org/jira/browse/YARN-6016 > Project: Hadoop YARN > Issue Type: Bug > Components: federation >Reporter: Botong Huang >Assignee: Botong Huang >Priority: Minor > Attachments: YARN-6016.v1.patch, YARN-6016.v2.patch, > YARN-6016.v3.patch > > > Two AMRMProxy bugs: > First, the AMRMToken from RM should not be propagated to AM, since AMRMProxy > will create a local AMRMToken for it. > Second, the AMRMProxy Context is now parse the localAMRMTokenKeyId from > amrmToken, but should be from localAmrmToken. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819587#comment-15819587 ] Jian He commented on YARN-6072: --- looks good to me, +1 > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminService > During resource manager service start() .EmbeddedElector starts first and > invokes {
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816388#comment-15816388 ] Jian He commented on YARN-5995: --- How about start with below: - Time cost of write op - MutableRate (which contains the total number of ops and avg time) - total failed ops - MutableCounterLong > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816109#comment-15816109 ] Jian He commented on YARN-6072: --- bq. // Set HA configuration should be done before login I don't know why this comment is added. In my understanding, it should at least be fine to move "add admin service" before "add elector service". bq. Hmm yes but additionally we get the log trace too, Yes, I know. I meant it can be such as: new ServiceFailedException("RefreshAll operation failed ", ex); Anyway, based on your explanation, the current patch is also fine to me. these comments are minor. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.r
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815842#comment-15815842 ] Jian He commented on YARN-6072: --- - If HA is not enabled, this call will be adding 'null' elector ? I think we can either move the entire elector creation code after add admin service, or move add admin service before adding elector. {code} // elector to be added post adminservice addIfService(elector); {code} - I think, the ex.getMessage will just be duplicated in the log trace ? In addition to add the ex variable, may be replace ex.getMessage() with a more meaningful message for current call only {code} throw new ServiceFailedException(ex.getMessage(), ex); {code} > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: YARN-6072.01.branch-2.8.patch, > YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, > hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused b
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813903#comment-15813903 ] Jian He commented on YARN-5995: --- sorry, I meant the external tool such as (Ambari Metrics Server) can store these metrics as long as RM emits it in the ideal way.. I don't actually mean to store these metrics in this jira. > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6072) RM unable to start in secure mode
[ https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812903#comment-15812903 ] Jian He commented on YARN-6072: --- YARN-5709 actually affected the sequence of start. Before YARN-5709, ActiveStandbyElector is created inside AdminService, so it is guaranteed that the server variable is instantiated before ActiveStandbyElector is started. After YARN-5709, this is not the case any more. > RM unable to start in secure mode > - > > Key: YARN-6072 > URL: https://issues.apache.org/jira/browse/YARN-6072 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Blocker > Attachments: hadoop-secureuser-resourcemanager-vm1.log > > > Resource manager is unable to start in secure mode > {code} > 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found > resource hadoop-policy.xml at > file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml > 2017-01-08 14:27:29,918 INFO > org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,919 ERROR > org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed > so firing fatal event > org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket > Reader #1 for port 8033 > 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Exception handling the winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll > during transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311) > at > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142) > ... 4 more > Caused by: org.apache.hadoop.ha.ServiceFailedException > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302) > ... 5 more > {code} > ResourceManager services are added in following order > # EmbeddedElector > # AdminServ
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812676#comment-15812676 ] Jian He commented on YARN-5995: --- Actually, it can be most useful if this is a time series metrics that can be used by external framework to show the metrics over time - to show when RM incurs high write latencies, as we always do postmortem analysis. If so, we can merely output the absolute value of 'time cost for each store op' or 'amount of data written for each op', external tool can use this metrics to plot metrics over time. > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812620#comment-15812620 ] Jian He commented on YARN-5995: --- As said earlier, read won't be that useful as it only happens on RM start up to load the data. It's a one-time value which dos not require metrics. IMO, We need to think about how the metrics can actually be used for performance analysis, that is, how much impact the store operation can affect RM's execution, i.e. how much delay it can incur. Metrics like data written per sec looks more like measuring ZK throughput which may not be that useful. I think what we need is to surface the time spent on write operation. With that in mind, we may have 1) a Histogram for the time spent for each write op ? 2) total no of write operations > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
[ https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805302#comment-15805302 ] Jian He commented on YARN-6009: --- makes sense to me, I'll commit later today if no more comments > RM fails to start during an upgrade - Failed to load/recover state > (YarnException: Invalid application timeout, value=0 for type=LIFETIME) > -- > > Key: YARN-6009 > URL: https://issues.apache.org/jira/browse/YARN-6009 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Gour Saha >Assignee: Rohith Sharma K S >Priority: Critical > Attachments: YARN-6009.01.patch > > > ResourceManager fails to start during an upgrade with the following > exceptions - > Exception 1: > {color:red} > {code} > 2016-12-09 14:57:23,508 INFO capacity.CapacityScheduler > (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler > with calculator=class > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, > minimumAllocation=<>, > maximumAllocation=<>, asynchronousScheduling=false, > asyncScheduleInterval=5ms > 2016-12-09 14:57:23,509 WARN ha.ActiveStandbyElector > (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the > winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127) > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, > value=0 for type=LIFETIME > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313) > ... 5 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid > application timeout, value=0 for type=LIFETIME > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 13 more > {code} > {color} > Exception 2: > {color:red} > {code} > 2016-12-09 14:57:26,162 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - > application_1477927786494_0008 State change from NEW to FINISHED > 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager > (ResourceManager.java:service
[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802518#comment-15802518 ] Jian He commented on YARN-4348: --- No, it doesn't need to. The zkstore implementation has been changed by using curator 2.8 upwards > ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding > blocking ZK's event thread > -- > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Fix For: 2.7.2, 2.6.3 > > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, > YARN-4348.001.patch, YARN-4348.001.patch, log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type
[ https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799750#comment-15799750 ] Jian He commented on YARN-4164: --- merged the patch to 2.8 too > Retrospect update ApplicationPriority API return type > - > > Key: YARN-4164 > URL: https://issues.apache.org/jira/browse/YARN-4164 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, > 0003-YARN-4164.patch, 0004-YARN-4164.patch > > > Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API > returns empty UpdateApplicationPriorityResponse response. > But RM update priority to the cluster.max-priority if the given priority is > greater than cluster.max-priority. In this scenarios, need to intimate back > to client that updated priority rather just keeping quite where client > assumes that given priority itself is taken. > During application submission also has same scenario can happen, but I feel > when > explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), > response should have updated priority in response. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4164) Retrospect update ApplicationPriority API return type
[ https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4164: -- Fix Version/s: 2.8.0 > Retrospect update ApplicationPriority API return type > - > > Key: YARN-4164 > URL: https://issues.apache.org/jira/browse/YARN-4164 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, > 0003-YARN-4164.patch, 0004-YARN-4164.patch > > > Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API > returns empty UpdateApplicationPriorityResponse response. > But RM update priority to the cluster.max-priority if the given priority is > greater than cluster.max-priority. In this scenarios, need to intimate back > to client that updated priority rather just keeping quite where client > assumes that given priority itself is taken. > During application submission also has same scenario can happen, but I feel > when > explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), > response should have updated priority in response. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
[ https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798965#comment-15798965 ] Jian He commented on YARN-6009: --- patch looks good to me. The UT failure passed locally for me. Retry the jenkins > RM fails to start during an upgrade - Failed to load/recover state > (YarnException: Invalid application timeout, value=0 for type=LIFETIME) > -- > > Key: YARN-6009 > URL: https://issues.apache.org/jira/browse/YARN-6009 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Gour Saha >Assignee: Rohith Sharma K S >Priority: Critical > Attachments: YARN-6009.01.patch > > > ResourceManager fails to start during an upgrade with the following > exceptions - > Exception 1: > {color:red} > {code} > 2016-12-09 14:57:23,508 INFO capacity.CapacityScheduler > (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler > with calculator=class > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, > minimumAllocation=<>, > maximumAllocation=<>, asynchronousScheduling=false, > asyncScheduleInterval=5ms > 2016-12-09 14:57:23,509 WARN ha.ActiveStandbyElector > (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the > winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127) > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, > value=0 for type=LIFETIME > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313) > ... 5 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid > application timeout, value=0 for type=LIFETIME > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 13 more > {code} > {color} > Exception 2: > {color:red} > {code} > 2016-12-09 14:57:26,162 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - > application_1477927786494_0008 State change from NEW to FINISHED > 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager > (ResourceMan
[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability
[ https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786313#comment-15786313 ] Jian He commented on YARN-5709: --- ok, looks like so, I'll commit this then > Cleanup leader election configs and pluggability > > > Key: YARN-5709 > URL: https://issues.apache.org/jira/browse/YARN-5709 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: yarn-5709-branch-2.8.01.patch, > yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.03.patch, > yarn-5709-branch-2.8.patch, yarn-5709-wip.2.patch, yarn-5709.1.patch, > yarn-5709.2.patch, yarn-5709.3.patch, yarn-5709.4.patch > > > While reviewing YARN-5677 and YARN-5694, I noticed we could make the > curator-based election code cleaner. It is nicer to get this fixed in 2.8 > before we ship it, but this can be done at a later time as well. > # By EmbeddedElector, we meant it was running as part of the RM daemon. Since > the Curator-based elector is also running embedded, I feel the code should be > checking for {{!curatorBased}} instead of {{isEmbeddedElector}} > # {{LeaderElectorService}} should probably be named > {{CuratorBasedEmbeddedElectorService}} or some such. > # The code that initializes the elector should be at the same place > irrespective of whether it is curator-based or not. > # We seem to be caching the CuratorFramework instance in RM. It makes more > sense for it to be in RMContext. If others are okay with it, we might even be > better of having {{RMContext#getCurator()}} method to lazily create the > curator framework and then cache it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5658) YARN should have a hook to delete a path from HDFS when an application ends
[ https://issues.apache.org/jira/browse/YARN-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781518#comment-15781518 ] Jian He commented on YARN-5658: --- [~templedf], not just HDFS, allowing deleting a path from ZK is also a required use-case for yarn-service-registry, so the implementation should to be somewhat generic. I think an option to clean a path is useful. One approach in my mind is to leverage the getApplicationsToCleanup signal sent in the node heartbeat when the application finally completes, after which the NM where AM container ran could do the post cleanup. The difference from YARN-2261 is that instead of running in a separate container, it could be run from NodeManager. And this approach does not require significant code change in application. YARN-2261 could be used for more advanced use-cases which AM requires. Problem with this approach is that if the NM crashes, the files may not get cleanup, even YARN-2261 has the same problem. For simplicity, may be we can allow this to occur and warn the user in the UI that the clean up is not done successfully and ask user do it manually. thoughts? > YARN should have a hook to delete a path from HDFS when an application ends > --- > > Key: YARN-5658 > URL: https://issues.apache.org/jira/browse/YARN-5658 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > There are many cases when a client uploads data to HDFS and then needs to > subsequently clean it up, such as with the distributed cache. It would be > helpful if YARN would do that cleanup automatically on job completion. > The hook could be generic to an URI supported by {{FileSystem}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability
[ https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781487#comment-15781487 ] Jian He commented on YARN-5709: --- Could you re-submit the patch with your change and retry ? > Cleanup leader election configs and pluggability > > > Key: YARN-5709 > URL: https://issues.apache.org/jira/browse/YARN-5709 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: yarn-5709-branch-2.8.01.patch, > yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, > yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, > yarn-5709.3.patch, yarn-5709.4.patch > > > While reviewing YARN-5677 and YARN-5694, I noticed we could make the > curator-based election code cleaner. It is nicer to get this fixed in 2.8 > before we ship it, but this can be done at a later time as well. > # By EmbeddedElector, we meant it was running as part of the RM daemon. Since > the Curator-based elector is also running embedded, I feel the code should be > checking for {{!curatorBased}} instead of {{isEmbeddedElector}} > # {{LeaderElectorService}} should probably be named > {{CuratorBasedEmbeddedElectorService}} or some such. > # The code that initializes the elector should be at the same place > irrespective of whether it is curator-based or not. > # We seem to be caching the CuratorFramework instance in RM. It makes more > sense for it to be in RMContext. If others are okay with it, we might even be > better of having {{RMContext#getCurator()}} method to lazily create the > curator framework and then cache it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability
[ https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781367#comment-15781367 ] Jian He commented on YARN-5709: --- I'm not sure which part of the patch is causing javadoc failure, [~templedf], [~kasha], any clue? > Cleanup leader election configs and pluggability > > > Key: YARN-5709 > URL: https://issues.apache.org/jira/browse/YARN-5709 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: yarn-5709-branch-2.8.01.patch, > yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, > yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, > yarn-5709.3.patch, yarn-5709.4.patch > > > While reviewing YARN-5677 and YARN-5694, I noticed we could make the > curator-based election code cleaner. It is nicer to get this fixed in 2.8 > before we ship it, but this can be done at a later time as well. > # By EmbeddedElector, we meant it was running as part of the RM daemon. Since > the Curator-based elector is also running embedded, I feel the code should be > checking for {{!curatorBased}} instead of {{isEmbeddedElector}} > # {{LeaderElectorService}} should probably be named > {{CuratorBasedEmbeddedElectorService}} or some such. > # The code that initializes the elector should be at the same place > irrespective of whether it is curator-based or not. > # We seem to be caching the CuratorFramework instance in RM. It makes more > sense for it to be in RMContext. If others are okay with it, we might even be > better of having {{RMContext#getCurator()}} method to lazily create the > curator framework and then cache it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5709) Cleanup leader election configs and pluggability
[ https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5709: -- Attachment: yarn-5709-branch-2.8.02.patch > Cleanup leader election configs and pluggability > > > Key: YARN-5709 > URL: https://issues.apache.org/jira/browse/YARN-5709 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: yarn-5709-branch-2.8.01.patch, > yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, > yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, > yarn-5709.3.patch, yarn-5709.4.patch > > > While reviewing YARN-5677 and YARN-5694, I noticed we could make the > curator-based election code cleaner. It is nicer to get this fixed in 2.8 > before we ship it, but this can be done at a later time as well. > # By EmbeddedElector, we meant it was running as part of the RM daemon. Since > the Curator-based elector is also running embedded, I feel the code should be > checking for {{!curatorBased}} instead of {{isEmbeddedElector}} > # {{LeaderElectorService}} should probably be named > {{CuratorBasedEmbeddedElectorService}} or some such. > # The code that initializes the elector should be at the same place > irrespective of whether it is curator-based or not. > # We seem to be caching the CuratorFramework instance in RM. It makes more > sense for it to be in RMContext. If others are okay with it, we might even be > better of having {{RMContext#getCurator()}} method to lazily create the > curator framework and then cache it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability
[ https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771301#comment-15771301 ] Jian He commented on YARN-5709: --- I fixed the javac warnings, javadoc warnings seems be existing. > Cleanup leader election configs and pluggability > > > Key: YARN-5709 > URL: https://issues.apache.org/jira/browse/YARN-5709 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: yarn-5709-branch-2.8.01.patch, > yarn-5709-branch-2.8.patch, yarn-5709-wip.2.patch, yarn-5709.1.patch, > yarn-5709.2.patch, yarn-5709.3.patch, yarn-5709.4.patch > > > While reviewing YARN-5677 and YARN-5694, I noticed we could make the > curator-based election code cleaner. It is nicer to get this fixed in 2.8 > before we ship it, but this can be done at a later time as well. > # By EmbeddedElector, we meant it was running as part of the RM daemon. Since > the Curator-based elector is also running embedded, I feel the code should be > checking for {{!curatorBased}} instead of {{isEmbeddedElector}} > # {{LeaderElectorService}} should probably be named > {{CuratorBasedEmbeddedElectorService}} or some such. > # The code that initializes the elector should be at the same place > irrespective of whether it is curator-based or not. > # We seem to be caching the CuratorFramework instance in RM. It makes more > sense for it to be in RMContext. If others are okay with it, we might even be > better of having {{RMContext#getCurator()}} method to lazily create the > curator framework and then cache it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5709) Cleanup leader election configs and pluggability
[ https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5709: -- Attachment: yarn-5709-branch-2.8.01.patch > Cleanup leader election configs and pluggability > > > Key: YARN-5709 > URL: https://issues.apache.org/jira/browse/YARN-5709 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: yarn-5709-branch-2.8.01.patch, > yarn-5709-branch-2.8.patch, yarn-5709-wip.2.patch, yarn-5709.1.patch, > yarn-5709.2.patch, yarn-5709.3.patch, yarn-5709.4.patch > > > While reviewing YARN-5677 and YARN-5694, I noticed we could make the > curator-based election code cleaner. It is nicer to get this fixed in 2.8 > before we ship it, but this can be done at a later time as well. > # By EmbeddedElector, we meant it was running as part of the RM daemon. Since > the Curator-based elector is also running embedded, I feel the code should be > checking for {{!curatorBased}} instead of {{isEmbeddedElector}} > # {{LeaderElectorService}} should probably be named > {{CuratorBasedEmbeddedElectorService}} or some such. > # The code that initializes the elector should be at the same place > irrespective of whether it is curator-based or not. > # We seem to be caching the CuratorFramework instance in RM. It makes more > sense for it to be in RMContext. If others are okay with it, we might even be > better of having {{RMContext#getCurator()}} method to lazily create the > curator framework and then cache it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5924) Resource Manager fails to load state with InvalidProtocolBufferException
[ https://issues.apache.org/jira/browse/YARN-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5924: -- Assignee: Oleksii Dymytrov > Resource Manager fails to load state with InvalidProtocolBufferException > > > Key: YARN-5924 > URL: https://issues.apache.org/jira/browse/YARN-5924 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Oleksii Dymytrov >Assignee: Oleksii Dymytrov > Attachments: YARN-5924.002.patch > > > InvalidProtocolBufferException is thrown during recovering of the > application's state if application's data has invalid format (or is broken) > under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in > HDFS: > {noformat} > com.google.protobuf.InvalidProtocolBufferException: Protocol message > end-group tag did not match expected tag. > at > com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94) > at > com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124) > at > com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193) > at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232) > {noformat} > The solution can be to catch "InvalidProtocolBufferException", show warning > and remove application's folder that contains invalid data to prevent RM > restart failure. > Additionally, I've added catch for other exceptions that can appear during > recovering of the specific application, to avoid RM failure even if the only > one application's state can't be loaded. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms
[ https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768534#comment-15768534 ] Jian He commented on YARN-4757: --- This jira now becomes a dependency of YARN-5079, we are going to merge YARN-4757 branch to yarn-native-services branch > [Umbrella] Simplified discovery of services via DNS mechanisms > -- > > Key: YARN-4757 > URL: https://issues.apache.org/jira/browse/YARN-4757 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Jonathan Maron > Labels: oct16-hard > Attachments: > 0001-YARN-4757-Initial-code-submission-for-DNS-Service.patch, YARN-4757- > Simplified discovery of services via DNS mechanisms.pdf, > YARN-4757-YARN-4757.001.patch, YARN-4757-YARN-4757.002.patch, > YARN-4757-YARN-4757.003.patch, YARN-4757-YARN-4757.004.patch, > YARN-4757-YARN-4757.005.patch, YARN-4757.001.patch, YARN-4757.002.patch > > > [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track > all related efforts.] > In addition to completing the present story of serviceÂ-registry (YARN-913), > we also need to simplify the access to the registry entries. The existing > read mechanisms of the YARN Service Registry are currently limited to a > registry specific (java) API and a REST interface. In practice, this makes it > very difficult for wiring up existing clients and services. For e.g, dynamic > configuration of dependent endÂpoints of a service is not easy to implement > using the present registryÂ-read mechanisms, *without* code-changes to > existing services. > A good solution to this is to expose the registry information through a more > generic and widely used discovery mechanism: DNS. Service Discovery via DNS > uses the well-Âknown DNS interfaces to browse the network for services. > YARN-913 in fact talked about such a DNS based mechanism but left it as a > future task. (Task) Having the registry information exposed via DNS > simplifies the life of services. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
[ https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765547#comment-15765547 ] Jian He commented on YARN-6009: --- [~rohithsharma], could you clarify which code logic changed? > RM fails to start during an upgrade - Failed to load/recover state > (YarnException: Invalid application timeout, value=0 for type=LIFETIME) > -- > > Key: YARN-6009 > URL: https://issues.apache.org/jira/browse/YARN-6009 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Gour Saha >Assignee: Rohith Sharma K S >Priority: Critical > > ResourceManager fails to start during an upgrade with the following > exceptions - > Exception 1: > {color:red} > {code} > 2016-12-09 14:57:23,508 INFO capacity.CapacityScheduler > (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler > with calculator=class > org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, > minimumAllocation=<>, > maximumAllocation=<>, asynchronousScheduling=false, > asyncScheduleInterval=5ms > 2016-12-09 14:57:23,509 WARN ha.ActiveStandbyElector > (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the > winning of election > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when > transitioning to Active mode > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127) > ... 4 more > Caused by: org.apache.hadoop.service.ServiceStateException: > org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, > value=0 for type=LIFETIME > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313) > ... 5 more > Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid > application timeout, value=0 for type=LIFETIME > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > ... 13 more > {code} > {color} > Exception 2: > {color:red} > {code} > 2016-12-09 14:57:26,162 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - > application_1477927786494_0008 State change from NEW to FINISHED > 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(599)) - Failed to load/recover state > o
[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance
[ https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765298#comment-15765298 ] Jian He commented on YARN-5995: --- Agree a general metrics for overall performance, rather than a metrics for every single API will be more useful. I think we can focus on writes first, read only happens on RM startup. Also, total number of failed ops may be useful > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance > --- > > Key: YARN-5995 > URL: https://issues.apache.org/jira/browse/YARN-5995 > Project: Hadoop YARN > Issue Type: Improvement > Components: metrics, resourcemanager >Affects Versions: 2.7.1 > Environment: CentOS7.2 Hadoop-2.7.1 >Reporter: zhangyubiao > Labels: patch > Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, > YARN-5995.patch > > > Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition > performance -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765017#comment-15765017 ] Jian He edited comment on YARN-5910 at 12/20/16 7:23 PM: - Uploaded an in-process patch which uses the approach of making client send the jobConf to RM, RM DelegationTokenRenewer will renew the token using the app conf A flag is added in MR to indicate whether sending the conf or not. was (Author: jianhe): Uploaded an in-process patch which uses the approach of making client send the jobConf to RM, RM DelegationTokenRenewer will renew the token using the app conf > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) -
[jira] [Assigned] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-5910: - Assignee: Jian He > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Assignee: Jian He >Priority: Minor > Attachments: YARN-5910.01.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Attachment: YARN-5910.01.patch Uploaded an in-process patch which uses the approach of making client send the jobConf to RM, RM DelegationTokenRenewer will renew the token using the app conf > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Priority: Minor > Attachments: YARN-5910.01.patch > > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764997#comment-15764997 ] Jian He commented on YARN-5910: --- Hi Clay, thanks for the feedback. bq. we could also perhaps extend the various delegation token types to only optionally include this payload? Then we the RM would only pay the price when needed for an off-cluster request? We realized that changing existing token structure might have issues regarding compatibility. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Priority: Minor > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5910: -- Comment: was deleted (was: Hi Clay, thanks for the feedback. bq. we could also perhaps extend the various delegation token types to only optionally include this payload? Then we the RM would only pay the price when needed for an off-cluster request? We realized that changing existing token structure might have issues regarding compatibility.) > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Priority: Minor > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens
[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764998#comment-15764998 ] Jian He commented on YARN-5910: --- Hi Clay, thanks for the feedback. bq. we could also perhaps extend the various delegation token types to only optionally include this payload? Then we the RM would only pay the price when needed for an off-cluster request? We realized that changing existing token structure might have issues regarding compatibility. > Support for multi-cluster delegation tokens > --- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Clay B. >Priority: Minor > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6014) Followup fix for slider core module findbugs
[ https://issues.apache.org/jira/browse/YARN-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-6014: -- Attachment: YARN-6014-yarn-native-services.02.patch > Followup fix for slider core module findbugs > > > Key: YARN-6014 > URL: https://issues.apache.org/jira/browse/YARN-6014 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-6014-yarn-native-services.01.patch, > YARN-6014-yarn-native-services.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled
[ https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763232#comment-15763232 ] Jian He commented on YARN-6013: --- [~Steven Rand], do you have server side log where this exception happens? > ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when > RPC privacy is enabled > -- > > Key: YARN-6013 > URL: https://issues.apache.org/jira/browse/YARN-6013 > Project: Hadoop YARN > Issue Type: Bug > Components: client, yarn >Affects Versions: 2.8.0 >Reporter: Steven Rand >Priority: Critical > > When privacy is enabled for RPC (hadoop.rpc.protection = privacy), > {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) > fails with an EOFException. I've reproduced this with Spark 2.0.2 built > against latest branch-2.8 and with a simple distcp job on latest branch-2.8. > Steps to reproduce using distcp: > 1. Set hadoop.rpc.protection equal to privacy > 2. Write data to HDFS. I did this with Spark as follows: > {code} > sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, > org.apache.commons.lang.RandomStringUtils.random(1024, > "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData") > {code} > 3. Attempt to distcp that data to another location in HDFS. For example: > {code} > hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData > hdfs:///tmp/testDataCopy > {code} > I observed this error in the ApplicationMaster's syslog: > {code} > 2016-12-19 19:13:50,097 INFO [eventHandlingThread] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer > setup for JobId: job_1482189777425_0004, File: > hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist > 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before > Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 > AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 > HostLocal:0 RackLocal:0 > 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() > for application_1482189777425_0004: ask=1 release= 0 newContainers=0 > finishedContainers=0 resourcelimit= knownNMs=3 > 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] > org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking > ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after > sleeping for 3ms. > java.io.EOFException: End of File Exception between local host is: > "/"; destination host is: "":8030; > : java.io.EOFException; For more details see: > http://wiki.apache.org/hadoop/EOFException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486) > at org.apache.hadoop.ipc.Client.call(Client.java:1428) > at org.apache.hadoop.ipc.Client.call(Client.java:1338) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy80.allocate(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) >
[jira] [Updated] (YARN-6014) Followup fix for slider core module findbugs
[ https://issues.apache.org/jira/browse/YARN-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-6014: -- Attachment: YARN-6014-yarn-native-services.01.patch > Followup fix for slider core module findbugs > > > Key: YARN-6014 > URL: https://issues.apache.org/jira/browse/YARN-6014 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-6014-yarn-native-services.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6014) Followup fix for slider core module findbugs
Jian He created YARN-6014: - Summary: Followup fix for slider core module findbugs Key: YARN-6014 URL: https://issues.apache.org/jira/browse/YARN-6014 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5132) Exclude generated protobuf sources from YARN Javadoc build
[ https://issues.apache.org/jira/browse/YARN-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761892#comment-15761892 ] Jian He commented on YARN-5132: --- [~subru], [~kasha], [~billie.rinaldi] found this, could you please confirm ? seems like an issue. bq. It looks like the maven-javadoc-plugin is not configured properly for the hadoop-yarn module. There are YARN exclusions in the top level pom: https://github.com/apache/hadoop/blob/trunk/pom.xml#L443, and these are overridden in the hadoop-yarn pom: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml#L78 by a subset of the package names. I am not sure if the lack of exclusion of the yarn server and yarn webapp packages was intentional or not. > Exclude generated protobuf sources from YARN Javadoc build > -- > > Key: YARN-5132 > URL: https://issues.apache.org/jira/browse/YARN-5132 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Subru Krishnan >Assignee: Subru Krishnan >Priority: Critical > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-5132-v1.patch > > > Currently YARN build includes Javadoc from generated protobuf sources which > is causing CI to fail. This JIRA proposes to exclude generated protobuf > sources from YARN Javadoc build -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5768) Integrate remaining app lifetime using feature implemented in YARN-4206
[ https://issues.apache.org/jira/browse/YARN-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-5768. --- Resolution: Fixed This is done in YARN-5740 > Integrate remaining app lifetime using feature implemented in YARN-4206 > --- > > Key: YARN-5768 > URL: https://issues.apache.org/jira/browse/YARN-5768 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He > Fix For: yarn-native-services > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5968) Fix slider core module javadocs
[ https://issues.apache.org/jira/browse/YARN-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757548#comment-15757548 ] Jian He commented on YARN-5968: --- Oh, does this mean the exclusions in the hadoop top-level pom is ignored ? If so, this is a bug. YARN-5132 added this change recently > Fix slider core module javadocs > --- > > Key: YARN-5968 > URL: https://issues.apache.org/jira/browse/YARN-5968 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Billie Rinaldi > Attachments: YARN-5968-yarn-native-services.01.patch, > YARN-5968-yarn-native-services.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6010) Fix findbugs, site warnings in yarn-services-api module
[ https://issues.apache.org/jira/browse/YARN-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-6010: -- Attachment: YARN-6010-yarn-native-services.01.patch > Fix findbugs, site warnings in yarn-services-api module > --- > > Key: YARN-6010 > URL: https://issues.apache.org/jira/browse/YARN-6010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He > Attachments: YARN-6010-yarn-native-services.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6010) Fix findbugs, site warnings in yarn-services-api module
[ https://issues.apache.org/jira/browse/YARN-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-6010: - Assignee: Jian He > Fix findbugs, site warnings in yarn-services-api module > --- > > Key: YARN-6010 > URL: https://issues.apache.org/jira/browse/YARN-6010 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-6010-yarn-native-services.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6010) Fix findbugs, site warnings in yarn-services-api module
Jian He created YARN-6010: - Summary: Fix findbugs, site warnings in yarn-services-api module Key: YARN-6010 URL: https://issues.apache.org/jira/browse/YARN-6010 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.09.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.05.patch, > YARN-5967-yarn-native-services.06.patch, > YARN-5967-yarn-native-services.07.patch, > YARN-5967-yarn-native-services.08.patch, > YARN-5967-yarn-native-services.08.patch, > YARN-5967-yarn-native-services.09.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5968) Fix slider core module javadocs
[ https://issues.apache.org/jira/browse/YARN-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5968: -- Assignee: Billie Rinaldi > Fix slider core module javadocs > --- > > Key: YARN-5968 > URL: https://issues.apache.org/jira/browse/YARN-5968 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Billie Rinaldi > Attachments: YARN-5968-yarn-native-services.01.patch, > YARN-5968-yarn-native-services.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756098#comment-15756098 ] Jian He commented on YARN-5967: --- btw. this patch is the latest https://issues.apache.org/jira/secure/attachment/12843691/YARN-5967-yarn-native-services.08.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.05.patch, > YARN-5967-yarn-native-services.06.patch, > YARN-5967-yarn-native-services.07.patch, > YARN-5967-yarn-native-services.08.patch, > YARN-5967-yarn-native-services.08.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.08.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.05.patch, > YARN-5967-yarn-native-services.06.patch, > YARN-5967-yarn-native-services.07.patch, > YARN-5967-yarn-native-services.08.patch, > YARN-5967-yarn-native-services.08.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.08.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.05.patch, > YARN-5967-yarn-native-services.06.patch, > YARN-5967-yarn-native-services.07.patch, > YARN-5967-yarn-native-services.08.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.07.patch Thanks Billie for the thorough review ! Fixed all of them > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.05.patch, > YARN-5967-yarn-native-services.06.patch, > YARN-5967-yarn-native-services.07.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753382#comment-15753382 ] Jian He commented on YARN-5967: --- - For any unused code that's throwing warnings, simply removed it - Suppress certain warnings which are ok. > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.05.patch, > YARN-5967-yarn-native-services.06.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)
[ https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5740: -- Attachment: YARN-5740-yarn-native-services.03.patch > Add a new field in Slider status output - lifetime (remaining) > -- > > Key: YARN-5740 > URL: https://issues.apache.org/jira/browse/YARN-5740 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He > Fix For: yarn-native-services > > Attachments: YARN-5740-yarn-native-services.01.patch, > YARN-5740-yarn-native-services.02.patch, > YARN-5740-yarn-native-services.03.patch > > > With YARN-5735, REST service is now setting lifetime to application during > submission (YARN-4205 exposed application lifetime support). Now Slider > status needs to expose this field so that the REST service can return it in > its GET response. Note, the lifetime value that GET response intends to > return is the remaining lifetime of the application. > There is an ongoing discussion in YARN-4206, that the lifetime value returned > in Application Report will be the remaining lifetime (at the time of > request). So until it is finalized, the lifetime value might go through > different connotations. But as long as we have the lifetime field in the > status output, it will be a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)
[ https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753103#comment-15753103 ] Jian He commented on YARN-5740: --- thanks for the review, fixed the issues except one, which I think no need to fix because all other variables do the same. {{./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core/src/main/java/org/apache/slider/common/params/ActionStatusArgs.java:40: public boolean lifetime;:18: Variable 'lifetime' must be private and have accessor methods.}} > Add a new field in Slider status output - lifetime (remaining) > -- > > Key: YARN-5740 > URL: https://issues.apache.org/jira/browse/YARN-5740 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He > Fix For: yarn-native-services > > Attachments: YARN-5740-yarn-native-services.01.patch, > YARN-5740-yarn-native-services.02.patch > > > With YARN-5735, REST service is now setting lifetime to application during > submission (YARN-4205 exposed application lifetime support). Now Slider > status needs to expose this field so that the REST service can return it in > its GET response. Note, the lifetime value that GET response intends to > return is the remaining lifetime of the application. > There is an ongoing discussion in YARN-4206, that the lifetime value returned > in Application Report will be the remaining lifetime (at the time of > request). So until it is finalized, the lifetime value might go through > different connotations. But as long as we have the lifetime field in the > status output, it will be a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)
[ https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5740: -- Attachment: (was: YARN-5740-yarn-native-services.02.patch) > Add a new field in Slider status output - lifetime (remaining) > -- > > Key: YARN-5740 > URL: https://issues.apache.org/jira/browse/YARN-5740 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He > Fix For: yarn-native-services > > Attachments: YARN-5740-yarn-native-services.01.patch, > YARN-5740-yarn-native-services.02.patch > > > With YARN-5735, REST service is now setting lifetime to application during > submission (YARN-4205 exposed application lifetime support). Now Slider > status needs to expose this field so that the REST service can return it in > its GET response. Note, the lifetime value that GET response intends to > return is the remaining lifetime of the application. > There is an ongoing discussion in YARN-4206, that the lifetime value returned > in Application Report will be the remaining lifetime (at the time of > request). So until it is finalized, the lifetime value might go through > different connotations. But as long as we have the lifetime field in the > status output, it will be a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)
[ https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5740: -- Attachment: YARN-5740-yarn-native-services.02.patch > Add a new field in Slider status output - lifetime (remaining) > -- > > Key: YARN-5740 > URL: https://issues.apache.org/jira/browse/YARN-5740 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He > Fix For: yarn-native-services > > Attachments: YARN-5740-yarn-native-services.01.patch, > YARN-5740-yarn-native-services.02.patch > > > With YARN-5735, REST service is now setting lifetime to application during > submission (YARN-4205 exposed application lifetime support). Now Slider > status needs to expose this field so that the REST service can return it in > its GET response. Note, the lifetime value that GET response intends to > return is the remaining lifetime of the application. > There is an ongoing discussion in YARN-4206, that the lifetime value returned > in Application Report will be the remaining lifetime (at the time of > request). So until it is finalized, the lifetime value might go through > different connotations. But as long as we have the lifetime field in the > status output, it will be a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)
[ https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5740: -- Attachment: YARN-5740-yarn-native-services.02.patch > Add a new field in Slider status output - lifetime (remaining) > -- > > Key: YARN-5740 > URL: https://issues.apache.org/jira/browse/YARN-5740 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He > Fix For: yarn-native-services > > Attachments: YARN-5740-yarn-native-services.01.patch, > YARN-5740-yarn-native-services.02.patch > > > With YARN-5735, REST service is now setting lifetime to application during > submission (YARN-4205 exposed application lifetime support). Now Slider > status needs to expose this field so that the REST service can return it in > its GET response. Note, the lifetime value that GET response intends to > return is the remaining lifetime of the application. > There is an ongoing discussion in YARN-4206, that the lifetime value returned > in Application Report will be the remaining lifetime (at the time of > request). So until it is finalized, the lifetime value might go through > different connotations. But as long as we have the lifetime field in the > status output, it will be a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.06.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.05.patch, > YARN-5967-yarn-native-services.06.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.05.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.04.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch, > YARN-5967-yarn-native-services.04.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.04.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch, > YARN-5967-yarn-native-services.04.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.03.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5996) Native services AM kills app on AMRMClientAsync onError call
[ https://issues.apache.org/jira/browse/YARN-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750101#comment-15750101 ] Jian He commented on YARN-5996: --- I checked the possible exceptions, didn't find such error that must force app to kill itself. So, I think it should be fine. > Native services AM kills app on AMRMClientAsync onError call > > > Key: YARN-5996 > URL: https://issues.apache.org/jira/browse/YARN-5996 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Attachments: YARN-5996-yarn-native-services.001.patch, > YARN-5996-yarn-native-services.002.patch > > > The AMRMClientAsync onError callback occurred due to an InterruptedException > in this case. The AM may need to kill itself once the client reaches this > state, but it should not kill the entire application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.02.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch, > YARN-5967-yarn-native-services.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4844) Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749834#comment-15749834 ] Jian He commented on YARN-4844: --- looks good > Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource > - > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4844-branch-2.8.0016_.patch, > YARN-4844-branch-2.8.addendum.2.patch, YARN-4844-branch-2.addendum.1_.patch, > YARN-4844-branch-2.addendum.2.patch, YARN-4844.1.patch, YARN-4844.10.patch, > YARN-4844.11.patch, YARN-4844.12.patch, YARN-4844.13.patch, > YARN-4844.14.patch, YARN-4844.15.patch, YARN-4844.16.branch-2.patch, > YARN-4844.16.patch, YARN-4844.2.patch, YARN-4844.3.patch, YARN-4844.4.patch, > YARN-4844.5.patch, YARN-4844.6.patch, YARN-4844.7.patch, > YARN-4844.8.branch-2.patch, YARN-4844.8.patch, YARN-4844.9.branch, > YARN-4844.9.branch-2.patch, YARN-4844.addendum.3.patch, > YARN-4844.addendum.4.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to add getMemoryLong/getVirtualCoreLong to > o.a.h.y.api.records.Resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: (was: YARN-5967-yarn-native-services.02.patch) > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings
[ https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5967: -- Attachment: YARN-5967-yarn-native-services.02.patch > Fix slider core module findbugs warnings > - > > Key: YARN-5967 > URL: https://issues.apache.org/jira/browse/YARN-5967 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5967-yarn-native-services.01.patch, > YARN-5967-yarn-native-services.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5931) Document timeout interfaces CLI and REST APIs
[ https://issues.apache.org/jira/browse/YARN-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749265#comment-15749265 ] Jian He commented on YARN-5931: --- sounds good to me > Document timeout interfaces CLI and REST APIs > - > > Key: YARN-5931 > URL: https://issues.apache.org/jira/browse/YARN-5931 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: ResourceManagerRest.html, YARN-5931.0.patch, > YARN-5931.1.patch, YarnCommands.html > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749171#comment-15749171 ] Jian He commented on YARN-4126: --- The rest are test refactorings which are good to have, IMO > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Fix For: 3.0.0-alpha1 > > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, > 0006-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5999) AMRMClientAsync will stop if any exceptions thrown on allocate call
[ https://issues.apache.org/jira/browse/YARN-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-5999: -- Attachment: YARN-5999.1.patch > AMRMClientAsync will stop if any exceptions thrown on allocate call > > > Key: YARN-5999 > URL: https://issues.apache.org/jira/browse/YARN-5999 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-5999.1.patch > > > Currently, for any exceptions thrown on the allocate call of AMRMClientAsync, > it will stop both heartbeat thread and the callback handler thread, leaving > AMRMClient in an unusable state. Caller has to instantiate a new AMRMClient. > IMO, the threads should keep on running, it should be up to the caller > whether to stop the AMRMClient or not. > {code} > try { > response = client.allocate(progress); > } catch (ApplicationAttemptNotFoundException e) { > handler.onShutdownRequest(); > LOG.info("Shutdown requested. Stopping callback."); > return; > } catch (Throwable ex) { > LOG.error("Exception on heartbeat", ex); > savedException = ex; > // interrupt handler thread in case it waiting on the queue > handlerThread.interrupt(); > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747270#comment-15747270 ] Jian He commented on YARN-4126: --- I've committed a patch to revert this logic to return true if kerberos is not enabled. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Fix For: 3.0.0-alpha1 > > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, > 0006-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747232#comment-15747232 ] Jian He commented on YARN-4126: --- ok, let's revert it from trunk. > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Fix For: 3.0.0-alpha1 > > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, > 0006-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5999) AMRMClientAsync will stop if any exceptions thrown on allocate call
[ https://issues.apache.org/jira/browse/YARN-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-5999: - Assignee: Jian He > AMRMClientAsync will stop if any exceptions thrown on allocate call > > > Key: YARN-5999 > URL: https://issues.apache.org/jira/browse/YARN-5999 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > > Currently, for any exceptions thrown on the allocate call of AMRMClientAsync, > it will stop both heartbeat thread and the callback handler thread, leaving > AMRMClient in an unusable state. Caller has to instantiate a new AMRMClient. > IMO, the threads should keep on running, it should be up to the caller > whether to stop the AMRMClient or not. > {code} > try { > response = client.allocate(progress); > } catch (ApplicationAttemptNotFoundException e) { > handler.onShutdownRequest(); > LOG.info("Shutdown requested. Stopping callback."); > return; > } catch (Throwable ex) { > LOG.error("Exception on heartbeat", ex); > savedException = ex; > // interrupt handler thread in case it waiting on the queue > handlerThread.interrupt(); > return; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5999) AMRMClientAsync will stop if any exceptions thrown on allocate call
Jian He created YARN-5999: - Summary: AMRMClientAsync will stop if any exceptions thrown on allocate call Key: YARN-5999 URL: https://issues.apache.org/jira/browse/YARN-5999 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Currently, for any exceptions thrown on the allocate call of AMRMClientAsync, it will stop both heartbeat thread and the callback handler thread, leaving AMRMClient in an unusable state. Caller has to instantiate a new AMRMClient. IMO, the threads should keep on running, it should be up to the caller whether to stop the AMRMClient or not. {code} try { response = client.allocate(progress); } catch (ApplicationAttemptNotFoundException e) { handler.onShutdownRequest(); LOG.info("Shutdown requested. Stopping callback."); return; } catch (Throwable ex) { LOG.error("Exception on heartbeat", ex); savedException = ex; // interrupt handler thread in case it waiting on the queue handlerThread.interrupt(); return; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org