[jira] [Commented] (YARN-7450) ATS Client should retry on intermittent Kerberos issues.
[ https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383031#comment-16383031 ] SammiChen commented on YARN-7450: - Hi [~raviprak], does this still target for 2.9.1? If not, can we push this out to next 2.9.2 release? > ATS Client should retry on intermittent Kerberos issues. > > > Key: YARN-7450 > URL: https://issues.apache.org/jira/browse/YARN-7450 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Affects Versions: 2.7.3 > Environment: Hadoop-2.7.3 >Reporter: Ravi Prakash >Priority: Major > > We saw a stack trace (posted in the first comment) in the ResourceManager > logs for the TimelineClientImpl not being able to relogin from keytab. > I'm guessing there was an intermittent issue that failed the kerberos relogin > from keytab. However, I'm assuming this was *not* retried because I only saw > one instance of this stack trace. I propose that this operation should have > been retried. > It seems, this caused events at the ResourceManager to queue up and > eventually stop responding to even basic {{yarn application -list}} commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7450) ATS Client should retry on intermittent Kerberos issues.
[ https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249816#comment-16249816 ] Ravi Prakash commented on YARN-7450: This is more complicated than I thought. [UserGroupInformation.reloginFromKeyTab()|https://github.com/apache/hadoop/blob/975a57a6886e81e412bea35bf597beccc807a66f/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1321] *clears* away existing credentials, so a temporary Kerberos failure which may have been resolved by trying to retry the login in some time, will be exhibited immediately. Hi [~daryn] ! Am I reading this right? Is it practical to fix this behavior? > ATS Client should retry on intermittent Kerberos issues. > > > Key: YARN-7450 > URL: https://issues.apache.org/jira/browse/YARN-7450 > Project: Hadoop YARN > Issue Type: Improvement > Components: ATSv2 >Affects Versions: 2.7.3 > Environment: Hadoop-2.7.3 >Reporter: Ravi Prakash > > We saw a stack trace (posted in the first comment) in the ResourceManager > logs for the TimelineClientImpl not being able to relogin from keytab. > I'm guessing there was an intermittent network issue that failed the kerberos > relogin from keytab. However, I'm assuming this was *not* retried because I > only saw one instance of this stack trace. I propose that this operation > should have been retried. > It seems, this caused events at the ResourceManager to queue up and > eventually stop responding to even basic {{yarn application -list}} commands. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7450) ATS Client should retry on intermittent Kerberos issues.
[ https://issues.apache.org/jira/browse/YARN-7450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240899#comment-16240899 ] Ravi Prakash commented on YARN-7450: {code} 2017-10-29 02:30:30,260 ERROR org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher: Error when publishing entity [YARN_APPLICATION,application_1507181091525_3046] com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: Login failure for @ from keytab at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:235) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:184) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:246) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:483) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:332) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:329) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:329) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:314) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.putEntity(SystemMetricsPublisher.java:452) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.publishApplicationCreatedEvent(SystemMetricsPublisher.java:265) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.handleSystemMetricsEvent(SystemMetricsPublisher.java:220) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:469) at org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher$ForwardingEventHandler.handle(SystemMetricsPublisher.java:464) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Login failure for @ from keytab at org.apache.hadoop.security.UserGroupInformation.reloginFromKeytab(UserGroupInformation.java:1109) at org.apache.hadoop.security.UserGroupInformation.checkTGTAndReloginFromKeytab(UserGroupInformation.java:1042) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:500) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147) ... 23 more Caused by: javax.security.auth.login.LoginException: Generic error (description in e-text) (60) - LOOKING_UP_CLIENT at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginCont