from:"\"Jian He \\\\\\\(JIRA\\\\\\\)\""

[jira] [Commented] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-13 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864678#comment-15864678
 ] 

Jian He commented on YARN-6153:
---

could you also add a test case ?

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-13 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864677#comment-15864677
 ] 

Jian He commented on YARN-6153:
---

[~kyungwan nam], thanks for the patch.  Minor suggestion to the code:
In RMAppImpl, we also have below code to detect whether a failure should be 
counted towards the max-retry. I think we can move the logic of checking the 
validity interval inside shouldCountTowardsMaxAttemptRetry itself, so that this 
method could be used by both RMAppImpl and RMAttemptImpl 
{code}
  if (attempt.shouldCountTowardsMaxAttemptRetry()) {
if (this.attemptFailuresValidityInterval <= 0
|| (attempt.getFinishTime() > endTime
- this.attemptFailuresValidityInterval)) {
  completedAttempts++;
}
  }
{code}

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-06 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854841#comment-15854841
 ] 

Jian He commented on YARN-6013:
---

[~Steven Rand], the server log you provided does not have any exceptions - it's 
for a different time range. Are you able to get the corresponding server log 
when the exception happens ?
I also converted this to a hadoop common jira

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: YARN-6013
> URL: https://issues.apache.org/jira/browse/YARN-6013
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, yarn
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: YARN-6013-branch-2.8.0.002.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.jav

[jira] [Commented] (YARN-6145) Improve log message on fail over

2017-02-03 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852228#comment-15852228
 ] 

Jian He commented on YARN-6145:
---

sample log for RequestHedgingRMFailoverProxyProvider after the patch
{code}
17/02/03 22:34:26 INFO impl.TimelineClientImpl: Timeline service address: 
http://host:8188/ws/v1/timeline/
17/02/03 22:34:26 INFO client.RequestHedgingRMFailoverProxyProvider: Created 
wrapped proxy for [rm1, rm2]
17/02/03 22:34:26 INFO client.AHSProxy: Connecting to Application History 
server at host/172.22.126.225:10200
17/02/03 22:34:27 INFO client.RequestHedgingRMFailoverProxyProvider: Looking 
for the active RM in [rm1, rm2]...
17/02/03 22:34:27 INFO client.RequestHedgingRMFailoverProxyProvider: Found 
active RM on [rm2]
17/02/03 22:34:28 INFO mapreduce.JobSubmitter: number of splits:1
17/02/03 22:34:29 INFO impl.YarnClientImpl: Submitted application 
application_1486160572621_0002
{code}

> Improve log message on fail over
> 
>
> Key: YARN-6145
> URL: https://issues.apache.org/jira/browse/YARN-6145
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6145.1.patch
>
>
> On failover, a series of exception stack shown in the log, which is harmless, 
> but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6145) Improve log message on fail over

2017-02-03 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-6145:
--
Attachment: (was: YARN-6145.1.patch)

> Improve log message on fail over
> 
>
> Key: YARN-6145
> URL: https://issues.apache.org/jira/browse/YARN-6145
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6145.1.patch
>
>
> On failover, a series of exception stack shown in the log, which is harmless, 
> but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6145) Improve log message on fail over

2017-02-03 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-6145:
--
Attachment: YARN-6145.1.patch

> Improve log message on fail over
> 
>
> Key: YARN-6145
> URL: https://issues.apache.org/jira/browse/YARN-6145
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6145.1.patch
>
>
> On failover, a series of exception stack shown in the log, which is harmless, 
> but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6145) Improve log message on fail over

2017-02-03 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852207#comment-15852207
 ] 

Jian He commented on YARN-6145:
---

Sample log for ConfiguredRMFailoverProxyProvider after the patch:
{code}
17/02/03 21:45:18 INFO impl.TimelineClientImpl: Timeline service address: 
http://host:8188/ws/v1/timeline/
17/02/03 21:45:18 INFO client.AHSProxy: Connecting to Application History 
server at host/172.22.126.225:10200
17/02/03 21:45:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
17/02/03 21:45:19 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host/172.22.126.229 to host:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused while invoking 
ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 1 
failover attempts. Trying to failover after sleeping for 24348ms.
17/02/03 21:45:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm1
17/02/03 21:45:44 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host/172.22.126.229 to host:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused while invoking 
ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 2 
failover attempts. Trying to failover after sleeping for 20126ms.
17/02/03 21:46:04 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
17/02/03 21:46:04 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host/172.22.126.229 to host:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused while invoking 
ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 3 
failover attempts. Trying to failover after sleeping for 44768ms.
17/02/03 21:46:48 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm1
17/02/03 21:46:48 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host/172.22.126.229 to host:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused while invoking 
ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 4 
failover attempts. Trying to failover after sleeping for 20670ms.
17/02/03 21:47:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
17/02/03 21:47:09 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host/172.22.126.229 to host:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused while invoking 
ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 5 
failover attempts. Trying to failover after sleeping for 42523ms.
17/02/03 21:47:52 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm1
17/02/03 21:47:52 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host/172.22.126.229 to host:8032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused while invoking 
ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 6 
failover attempts. Trying to failover after sleeping for 16803ms.
{code}

> Improve log message on fail over
> 
>
> Key: YARN-6145
> URL: https://issues.apache.org/jira/browse/YARN-6145
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6145.1.patch
>
>
> On failover, a series of exception stack shown in the log, which is harmless, 
> but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6145) Improve log message on fail over

2017-02-03 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-6145:
--
Attachment: YARN-6145.1.patch

A couple of messages are changed to debug level, as the caller will eventually 
log when retry ends.
Added few logs in RequestHedgingRMFailoverProxyProvider 
RetryInvocationHandler is also changed to not print the stack if at retrying 

> Improve log message on fail over
> 
>
> Key: YARN-6145
> URL: https://issues.apache.org/jira/browse/YARN-6145
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6145.1.patch
>
>
> On failover, a series of exception stack shown in the log, which is harmless, 
> but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6145) Improve log message on fail over

2017-02-03 Thread Jian He (JIRA)

Jian He created YARN-6145:
-

 Summary: Improve log message on fail over
 Key: YARN-6145
 URL: https://issues.apache.org/jira/browse/YARN-6145
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


On failover, a series of exception stack shown in the log, which is harmless, 
but confusing to user.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3053) [Security] Review and implement authentication in ATS v.2

2017-01-26 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840442#comment-15840442
 ] 

Jian He commented on YARN-3053:
---

Yeah, makes sense to me. Let's move the discussion to YARN-6121 for off-apps. 
I think we have general consensus here for managed AMs. Would you like to 
update the design doc and may be open sub-jiras and start the development ?


> [Security] Review and implement authentication in ATS v.2
> -
>
> Key: YARN-3053
> URL: https://issues.apache.org/jira/browse/YARN-3053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: ATSv2Authentication(draft).pdf
>
>
> Per design in YARN-2928, we want to evaluate and review the system for 
> security, and ensure proper security in the system.
> This includes proper authentication, token management, access control, and 
> any other relevant security aspects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-3053) [Security] Review and implement authentication in ATS v.2

2017-01-25 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838885#comment-15838885
 ] 

Jian He edited comment on YARN-3053 at 1/26/17 12:27 AM:
-

bq. Because for such clients we will not have a mechanism to pass the token 
when collector/NM restarts.
sorry, didn't get that. For such apps, won't the client still need to pass the 
new address to the AMs in some way. IIUC, it has no difference with passing the 
token.
Also, I'm not sure the original collector design had accounted for unmanaged AM 
in general case. (I think the collector is not even launched currently for 
unmanaged AM). A lot other details need to be freshed out. 


was (Author: jianhe):
bq. Because for such clients we will not have a mechanism to pass the token 
when collector/NM restarts.
sorry, didn't get that. For such apps, won't the client still need to pass the 
new address to the AMs in app's own way. IIUC, it has no difference with 
passing the token.
Also, I'm not sure the original collector design had accounted for unmanaged AM 
in general case. (I think the collector is not even launched currently for 
unmanaged AM). A lot other details need to be freshed out. 

> [Security] Review and implement authentication in ATS v.2
> -
>
> Key: YARN-3053
> URL: https://issues.apache.org/jira/browse/YARN-3053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: ATSv2Authentication(draft).pdf
>
>
> Per design in YARN-2928, we want to evaluate and review the system for 
> security, and ensure proper security in the system.
> This includes proper authentication, token management, access control, and 
> any other relevant security aspects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3053) [Security] Review and implement authentication in ATS v.2

2017-01-25 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838885#comment-15838885
 ] 

Jian He commented on YARN-3053:
---

bq. Because for such clients we will not have a mechanism to pass the token 
when collector/NM restarts.
sorry, didn't get that. For such apps, won't the client still need to pass the 
new address to the AMs in app's own way. IIUC, it has no difference with 
passing the token.
Also, I'm not sure the original collector design had accounted for unmanaged AM 
in general case. (I think the collector is not even launched currently for 
unmanaged AM). A lot other details need to be freshed out. 

> [Security] Review and implement authentication in ATS v.2
> -
>
> Key: YARN-3053
> URL: https://issues.apache.org/jira/browse/YARN-3053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: ATSv2Authentication(draft).pdf
>
>
> Per design in YARN-2928, we want to evaluate and review the system for 
> security, and ensure proper security in the system.
> This includes proper authentication, token management, access control, and 
> any other relevant security aspects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3053) [Security] Review and implement authentication in ATS v.2

2017-01-24 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836412#comment-15836412
 ] 

Jian He commented on YARN-3053:
---

bq. Any need to generate the token if security is not enabled? 
Ah, right. we still need the field for collector address in insecure mode.

> [Security] Review and implement authentication in ATS v.2
> -
>
> Key: YARN-3053
> URL: https://issues.apache.org/jira/browse/YARN-3053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
>  Labels: YARN-5355, yarn-5355-merge-blocker
> Attachments: ATSv2Authentication(draft).pdf
>
>
> Per design in YARN-2928, we want to evaluate and review the system for 
> security, and ensure proper security in the system.
> This includes proper authentication, token management, access control, and 
> any other relevant security aspects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832738#comment-15832738
 ] 

Jian He commented on YARN-5910:
---

testFinishedAppRemovalAfterRMRestart passed locally for me..

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch, 
> YARN-5910.7.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: YARN-5910.7.patch

new patch addressed all comments

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch, 
> YARN-5910.7.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832277#comment-15832277
 ] 

Jian He commented on YARN-5910:
---

bq. It's confusing that the max size check is using capacity() but the error 
message uses position().
missed to change that..
bq. I'm curious on the reasoning for removing the assert for NEW state?
Because I feel that's obvious and not needed..
bq.  TestAppManager fails consistently for me with the patch applied and passes 
consistently without. Please investigate.
It's because the am containerLaunchContext is null in the UT which failed with 
NPE in the new code "submissionContext.getAMContainerSpec().getTokensConf()". I 
think it's ok to assume am ContainerLaunchContext being not null?  As I see  
other code does the same in this call path, like 
"submissionContext.getAMContainerSpec().getApplicationACLs()" in RMAppManager.

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
>

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-19 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: YARN-5910.6.patch

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-19 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: (was: YARN-5910.6.patch)

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831123#comment-15831123
 ] 

Jian He commented on YARN-5910:
---

Thanks again for the reviews ! 
bq. I'd either move the regex example into the description itself
done.
bq.  I could just specify one property with a gigantic payload 
good point.. thought the number of configs indirectly means the size, and was 
lazy at calculating the numbers.. missed this scenario.. I changed to check 
based on bytes.

bq. I am wondering how users/admins are going to debug their settings for the 
new property
good point..  it was there when I was debugging this feature.. I  added the 
debug level logging in both YarnRunner and DelegationTokenRenewer

uploaded a patch that addressed all comments.

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.j

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-19 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: YARN-5910.6.patch

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch, YARN-5910.6.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-18 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: YARN-5910.5.patch

Uploaded a patch that addressed all the comments.

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch, YARN-5910.5.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-18 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828844#comment-15828844
 ] 

Jian He commented on YARN-5910:
---

bq.  whether we may need some RM-specific configs to be able to successfully 
connect with kerberos. There may be some remappings that the admins only 
bothered to configure on the RM or are RM specific?
sorry, didn't get you. The 'dfs.namenode.kerberos.principal' is actually HDFS 
config, not RM config.  If two clusters have different DFS principal name 
configured, when MR client asks for the delegation token from both clusters, I 
guess this check will fail, because it cannot differentiate the cluster. 

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)

[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-18 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828810#comment-15828810
 ] 

Jian He edited comment on YARN-5910 at 1/18/17 9:41 PM:


bq. Yeah, I'm thinking it's unnecessary to check both. 
sounds good, I'll remove the is security enabled check in YARNRunner.  
Regarding the if security enabled check in ClientRMSerivce,  do you also prefer 
removing it ?
bq.  Configuration.addResource will add a resource object to the list of 
resources for the config and never get rid of them. This will cause every 
app-specific conf to be tracked by renewerConf forever, resulting in a memory 
leak.
Ah, I see. Good point. I didn't understand you previous comment about this. 
 
So I've done the experiment. Actually, we don't need RM's own config for renew. 
Additionally, we need to pass in the dfs.namenode.kerberos.principal from the 
client to pass the check in SaslRpcClient#getServerPrincipal where it checks 
whether the remote principle equals to the local config. I have one question 
about this design: the dfs.namenode.kerberos.principal is not differentiated by 
clusterId. So it assumes all clusters will have the same value for 
'dfs.namenode.kerberos.principal' ?  This applies to all other service 
including RM as well.

So I can just use appConfig in DelegationTokenRenewer. 
I'll also add the config limit in RM.



was (Author: jianhe):
bq. Yeah, I'm thinking it's unnecessary to check both. 
sounds good, I'll remove the is security enabled check in YARNRunner.  
Regarding the if security enabled check in ClientRMSerivce,  do you also prefer 
removing it ?
bq.  Configuration.addResource will add a resource object to the list of 
resources for the config and never get rid of them. This will cause every 
app-specific conf to be tracked by renewerConf forever, resulting in a memory 
leak.
Ah, I see. Good point. I didn't understand you previous comment about this. 
 
So I've done the experiment. Actually, we don't need RM's own config for renew. 
Additionally, we need to pass in the dfs.namenode.kerberos.principal from the 
client to pass the check in SaslRpcClient#getServerPrincipal where it checks 
whether the remote principle equals to the local config. I have one question 
about this design: the dfs.namenode.kerberos.principal is not differentiated by 
clusterId. So it assumes all clusters will have the same value for 
'dfs.namenode.kerberos.principal' ?  This applies to all other service 
including as well.

So I can just use appConfig in DelegationTokenRenewer. 
I'll also add the config limit in RM.


> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.h

[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-18 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828810#comment-15828810
 ] 

Jian He edited comment on YARN-5910 at 1/18/17 9:41 PM:


bq. Yeah, I'm thinking it's unnecessary to check both. 
sounds good, I'll remove the is security enabled check in YARNRunner.  
Regarding the if security enabled check in ClientRMSerivce,  do you also prefer 
removing it ?
bq.  Configuration.addResource will add a resource object to the list of 
resources for the config and never get rid of them. This will cause every 
app-specific conf to be tracked by renewerConf forever, resulting in a memory 
leak.
Ah, I see. Good point. I didn't understand you previous comment about this. 
 
So I've done the experiment. Actually, we don't need RM's own config for renew. 
Additionally, we need to pass in the dfs.namenode.kerberos.principal from the 
client to pass the check in SaslRpcClient#getServerPrincipal where it checks 
whether the remote principle equals to the local config. I have one question 
about this design: the dfs.namenode.kerberos.principal is not differentiated by 
clusterId. So it assumes all clusters will have the same value for 
'dfs.namenode.kerberos.principal' ?  This applies to all other service 
including as well.

So I can just use appConfig in DelegationTokenRenewer. 
I'll also add the config limit in RM.



was (Author: jianhe):
bq. Yeah, I'm thinking it's unnecessary to check both. 
sounds good, I'll remove the is security enabled check in YARNRunner.  
Regarding the if security enabled check in ClientRMSerivce,  do you also prefer 
removing it ?
bq.  Configuration.addResource will add a resource object to the list of 
resources for the config and never get rid of them. This will cause every 
app-specific conf to be tracked by renewerConf forever, resulting in a memory 
leak.
Ah, I see. Good point. I didn't understand you previous comment about this. 
 
So I've done the experiment. Actually, we don't need RM's own config for renew. 
Additionally, we need to pass in the dfs.namenode.kerberos.principal from the 
client to pass the check in SaslRpcClient#getServerPrincipal where it checks 
whether the remote principle equals to the local config. I have one question 
about this design: the dfs.namenode.kerberos.principal is not differentiated by 
clusterId. So when MR client asks delegation token from both clusters, it 
assumes all clusters will have the same value for 
'dfs.namenode.kerberos.principal' ? 

So I can just use appConfig in DelegationTokenRenewer. 
I'll also add the config limit in RM.


> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-18 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828810#comment-15828810
 ] 

Jian He commented on YARN-5910:
---

bq. Yeah, I'm thinking it's unnecessary to check both. 
sounds good, I'll remove the is security enabled check in YARNRunner.  
Regarding the if security enabled check in ClientRMSerivce,  do you also prefer 
removing it ?
bq.  Configuration.addResource will add a resource object to the list of 
resources for the config and never get rid of them. This will cause every 
app-specific conf to be tracked by renewerConf forever, resulting in a memory 
leak.
Ah, I see. Good point. I didn't understand you previous comment about this. 
 
So I've done the experiment. Actually, we don't need RM's own config for renew. 
Additionally, we need to pass in the dfs.namenode.kerberos.principal from the 
client to pass the check in SaslRpcClient#getServerPrincipal where it checks 
whether the remote principle equals to the local config. I have one question 
about this design: the dfs.namenode.kerberos.principal is not differentiated by 
clusterId. So when MR client asks delegation token from both clusters, it 
assumes all clusters will have the same value for 
'dfs.namenode.kerberos.principal' ? 

So I can just use appConfig in DelegationTokenRenewer. 
I'll also add the config limit in RM.


> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager

[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-18 Thread Jian He (JIRA)

[
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828654#comment-15828654
]

Jian He edited comment on YARN-5910 at 1/18/17 7:43 PM:

Hi Jason, thank you very much for the review !
bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet
it clearly is not the default value.
removed it

bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm
wondering if this can cause issues where the current cluster isn't secure but
the RM needs to renew the job's tokens for a remote secure cluster or some
other secure service. Seems like if this conf is set then that's all we need to
know.
Currently, the RM DelegationTokenRewener will only add the tokens if security
is enabled (code in RMAppManager#submitApplication), so I think with this
existing implemtation, we can assume this feature is for security enabled only ?

bq. Similarly the code explicitly fails in ClientRMService if the conf is there
when security is disabled which seems like we're taking a case that isn't
optimal but should work benignly and explicitly making sure it fails. Not sure
that's user friendly behavior.
My intention was to prevent user from sending conf in non-secure mode(which
anyways is not needed if my above reply is true), in case the conf size huge
which may increase load on RM. On ther other hand, Varun chatted offline that
we can add a limit config in RM to limit the size of configs, your opinion ?

bq. Nit: For the ByteBuffer usage in parseCredentials and parseTokensConf, the
rewind method calls seem unnecessary since we're throwing the buffers away
immediately afterwards.
Actually, the bytebuffer is a direct reference from the containerLaunchContext,
not a copy. I think this is also required because it was specifically to solve
issues in YARN-2893.

bq. Should the Configuration constructor call in parseTokensConf be using the
version that does not load defaults? If not then I recommend we at least allow
a conf to be passed in to use as a copy constructor.Loading a new Configuration
from scratch is really expensive and we should avoid it if possible. See the
discussion on HADOOP-11223 for details.
Good point. I actually did the same in YarnRunner#setAppConf method, but missed
this place.

bq. In DelegationTokenRenewer, why aren't we using the appConf as-is when
renewing the tokens?
I wasn't sure whether the mere appConf is enough for the connection - (Is there
any kerberos related configs for RM itself are required for authentication?).
Let me do some experiments, if this works, I'll just use appConf.
bq. Also it looks like we're polluting subsequent app-conf renewals with prior
app configurations, as well as simply leaking appConf objects as renewerConf
resources infinitum. I don't see where renewerConf gets reset in-between.
My previous patch made a copy of each appConf and merge with RM's conf(for the
reason I wasn't sure whether RM's own conf is required) and use that for
renwer. But then I think this maybe bad because every app will have its own
copy of configs, which may largely increase the memory size if the number of
apps is very big. So, in the latest patch I changed it to let all apps share
the same renewerConf - this is based on the assumption that "dfs.nameservices"
must have distint keys for each distinct cluster, so we won't have situation
where two apps use different configs for the same cluster - it is true that
unnecessary configs used by 1st app will be shared by subsequent apps.

bq. Arguably there should be a unit tests that verifies a first app with token
conf key A and a second app with token conf key B doesn't leave a situation
where the renewals of the second app are polluted with conf key A.
If the mere appConf works, we should be fine.

bq. Speaking of unit tests, I see where we fixed up the YARN unit tests to pass
the new conf but not a new test that verifies the specified conf is used
appropriately when renewing for that app and not for other apps that didn't
specify a conf.
Yep, I'll add the UT.

was (Author: jianhe):
Hi Jason, thank you very much for the review !
bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet
it clearly is not the default value.
removed it

bq. Similarly the code explicit

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-18 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828654#comment-15828654
 ] 

Jian He commented on YARN-5910:
---

Hi Jason, thank you very much for the review !
bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet 
it clearly is not the default value.
removed it

bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm 
wondering if this can cause issues where the current cluster isn't secure but 
the RM needs to renew the job's tokens for a remote secure cluster or some 
other secure service. Seems like if this conf is set then that's all we need to 
know.
Currently, the RM DelegationTokenRewener will only add the tokens if security 
is enabled  (code in RMAppManager#submitApplication), so I think with this 
existing implemtation, we can assume this feature is for security enabled only ?

bq. Similarly the code explicitly fails in ClientRMService if the conf is there 
when security is disabled which seems like we're taking a case that isn't 
optimal but should work benignly and explicitly making sure it fails. Not sure 
that's user friendly behavior.
My intention was to prevent user from sending conf in non-secure mode(which 
anyways is not needed if my above reply is true), in case the conf size huge 
which may increase load on RM. On ther other hand, Varun chatted offline that 
we can add a limit config in RM to limit the size of configs, your opinion ?

bq. Nit: For the ByteBuffer usage in parseCredentials and parseTokensConf, the 
rewind method calls seem unnecessary since we're throwing the buffers away 
immediately afterwards.
Actually, the bytebuffer is a direct reference from the containerLaunchContext, 
not a copy. I think this is also required because it was specifically to solve 
issues in YARN-2893.

bq. Should the Configuration constructor call in parseTokensConf be using the 
version that does not load defaults? If not then I recommend we at least allow 
a conf to be passed in to use as a copy constructor.Loading a new Configuration 
from scratch is really expensive and we should avoid it if possible. See the 
discussion on HADOOP-11223 for details.
Good point. I actually did the same in YarnRunner#setAppConf method, but missed 
this place.

bq. In DelegationTokenRenewer, why aren't we using the appConf as-is when 
renewing the tokens? 
I wasn't sure whether the mere appConf is enough for the connection - (Is there 
any kerberos related configs for RM itself are required for authentication?). 
Let me do some experiments, if this works, I'll just use appConf. 
bq. Also it looks like we're polluting subsequent app-conf renewals with prior 
app configurations, as well as simply leaking appConf objects as renewerConf 
resources infinitum. I don't see where renewerConf gets reset in-between.
My previous patch made a copy of each appConf and merge with RM's conf(for the 
reason I wasn't sure whether RM's own conf is required) and use that for 
renwer. But then I think this maybe bad because every app will have its own 
copy of configs, which may largely increase the memory size if the number of 
apps is very big. So, in the latest patch I changed it to let all apps share 
the same renewerConf - this is based on the assumption that "dfs.nameservices" 
must have distint keys for each distinct cluster, so we won't have situation 
where two apps use different configs for the same cluster - it is true that 
unnecessary configs used by 1st app will be shared by subsequent apps.  

bq. Arguably there should be a unit tests that verifies a first app with token 
conf key A and a second app with token conf key B doesn't leave a situation 
where the renewals of the second app are polluted with conf key A. 
If the mere appConf works, we should be fine.

Speaking of unit tests, I see where we fixed up the YARN unit tests to pass the 
new conf but not a new test that verifies the specified conf is used 
appropriately when renewing for that app and not for other apps that didn't 
specify a conf.
Yep, I'll add the UT.

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{h

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: YARN-5910.4.patch

fixed failed UT.  testRMAppSubmitWithValidTokens is removed, because it's not 
actually being tested as expected as security is not enabled in the test scope, 
and the test scenario should already be covered in other place. 

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: YARN-5910.3.patch

Fixed jenkins issues

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch, YARN-5910.3.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2017-01-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: YARN-5910.2.patch

Updated the patch with minor changes.

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch, YARN-5910.2.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6016) Bugs in AMRMProxy handling (local)AMRMToken

2017-01-13 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822667#comment-15822667
 ] 

Jian He commented on YARN-6016:
---

lgtm too, thanks

> Bugs in AMRMProxy handling (local)AMRMToken
> ---
>
> Key: YARN-6016
> URL: https://issues.apache.org/jira/browse/YARN-6016
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-6016.v1.patch, YARN-6016.v2.patch, 
> YARN-6016.v3.patch
>
>
> Two AMRMProxy bugs: 
> First, the AMRMToken from RM should not be propagated to AM, since AMRMProxy 
> will create a local AMRMToken for it. 
> Second, the AMRMProxy Context is now parse the localAMRMTokenKeyId from 
> amrmToken, but should be from localAmrmToken. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-11 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819587#comment-15819587
 ] 

Jian He commented on YARN-6072:
---

looks good to me, +1

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.8.patch, 
> YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, 
> YARN-6072.03.branch-2.8.patch, YARN-6072.03.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {

[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-10 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816388#comment-15816388
 ] 

Jian He commented on YARN-5995:
---

How about start with below: 
- Time cost of write op - MutableRate (which contains the total number of ops 
and avg time)
- total failed ops  - MutableCounterLong

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-10 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816109#comment-15816109
 ] 

Jian He commented on YARN-6072:
---

bq. // Set HA configuration should be done before login
I don't know why this comment is added. In my understanding, it should at least 
be fine to move "add admin service" before "add elector service". 
bq. Hmm yes but additionally we get the log trace too,
Yes, I know. I meant it can be such as: new ServiceFailedException("RefreshAll 
operation failed ", ex);

Anyway, based on your explanation, the current patch is also fine to me. these 
comments are minor. 

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.8.patch, 
> YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.r

[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-10 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815842#comment-15815842
 ] 

Jian He commented on YARN-6072:
---

- If HA is not enabled, this call will be adding 'null' elector ?  I think we 
can either move the entire elector creation code after add admin service, or 
move add admin service before adding elector. 
{code}
// elector to be added post adminservice
addIfService(elector);
{code}
- I think, the ex.getMessage will just be duplicated in the log trace ?  In 
addition to add the ex variable, may be replace ex.getMessage() with a more 
meaningful message for current call only
{code}
  throw new ServiceFailedException(ex.getMessage(), ex);
{code}

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: YARN-6072.01.branch-2.8.patch, 
> YARN-6072.01.branch-2.patch, YARN-6072.01.patch, YARN-6072.02.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused b

[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813903#comment-15813903
 ] 

Jian He commented on YARN-5995:
---

sorry, I meant the external tool such as (Ambari Metrics Server) can store 
these metrics as long as RM emits it in the ideal way.. I don't actually mean 
to store these metrics in this jira.



> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812903#comment-15812903
 ] 

Jian He commented on YARN-6072:
---

YARN-5709 actually affected the sequence of start. Before YARN-5709, 
ActiveStandbyElector is created inside AdminService, so it is guaranteed that 
the server variable is instantiated before ActiveStandbyElector is started. 
After YARN-5709, this is not the case any more.

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminServ

[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812676#comment-15812676
 ] 

Jian He commented on YARN-5995:
---

Actually, it can be most useful if this is a time series metrics that can be 
used by external framework to show the metrics over time - to show when RM 
incurs high write latencies, as we always do postmortem analysis. 

If so, we can merely output the absolute value of 'time cost for each store op' 
or 'amount of data written for each op',  external tool can use this metrics to 
plot metrics over time.  

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2017-01-09 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812620#comment-15812620
 ] 

Jian He commented on YARN-5995:
---

As said earlier, read won't be that useful as it only happens on RM start up to 
load the data. It's a one-time value which dos not require metrics.

IMO, We need to think about how the metrics can actually be used for 
performance analysis, that is, how much impact the store operation can affect 
RM's execution, i.e. how much delay it can incur. Metrics like data written per 
sec looks more like measuring ZK throughput which may not be that useful. 
I think what we need is to surface the time spent on write operation. With that 
in mind, we may have 
1) a Histogram for the time spent for each write op ? 
2) total no of write operations 


> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-06 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805302#comment-15805302
 ] 

Jian He commented on YARN-6009:
---

makes sense to me, I'll commit later today if no more comments

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:service

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2017-01-05 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802518#comment-15802518
 ] 

Jian He commented on YARN-4348:
---

No, it doesn't need to. The zkstore implementation has been changed by using 
curator 2.8 upwards

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4164) Retrospect update ApplicationPriority API return type

2017-01-04 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799750#comment-15799750
 ] 

Jian He commented on YARN-4164:
---

merged the patch to 2.8 too


> Retrospect update ApplicationPriority API return type
> -
>
> Key: YARN-4164
> URL: https://issues.apache.org/jira/browse/YARN-4164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, 
> 0003-YARN-4164.patch, 0004-YARN-4164.patch
>
>
> Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API 
> returns empty UpdateApplicationPriorityResponse response.
> But RM update priority to the cluster.max-priority if the given priority is 
> greater than cluster.max-priority. In this scenarios, need to intimate back 
> to client that updated  priority rather just keeping quite where client 
> assumes that given priority itself is taken.
> During application submission also has same scenario can happen, but I feel 
> when 
> explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), 
> response should have updated priority in response. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4164) Retrospect update ApplicationPriority API return type

2017-01-04 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4164:
--
Fix Version/s: 2.8.0

> Retrospect update ApplicationPriority API return type
> -
>
> Key: YARN-4164
> URL: https://issues.apache.org/jira/browse/YARN-4164
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1
>
> Attachments: 0001-YARN-4164.patch, 0002-YARN-4164.patch, 
> 0003-YARN-4164.patch, 0004-YARN-4164.patch
>
>
> Currently {{ApplicationClientProtocol#updateApplicationPriority()}} API 
> returns empty UpdateApplicationPriorityResponse response.
> But RM update priority to the cluster.max-priority if the given priority is 
> greater than cluster.max-priority. In this scenarios, need to intimate back 
> to client that updated  priority rather just keeping quite where client 
> assumes that given priority itself is taken.
> During application submission also has same scenario can happen, but I feel 
> when 
> explicitly invoke via ApplicationClientProtocol#updateApplicationPriority(), 
> response should have updated priority in response. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-04 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15798965#comment-15798965
 ] 

Jian He commented on YARN-6009:
---

patch looks good to me. The UT failure passed locally for me.
Retry the jenkins

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager 
> (ResourceMan

[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-29 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786313#comment-15786313
 ] 

Jian He commented on YARN-5709:
---

ok, looks like so, I'll commit this then

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.03.patch, 
> yarn-5709-branch-2.8.patch, yarn-5709-wip.2.patch, yarn-5709.1.patch, 
> yarn-5709.2.patch, yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5658) YARN should have a hook to delete a path from HDFS when an application ends

2016-12-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781518#comment-15781518
 ] 

Jian He commented on YARN-5658:
---

[~templedf], not just HDFS, allowing deleting a path from ZK is also a required 
use-case for yarn-service-registry,  so the implementation should to be 
somewhat generic.
I think an option to clean a path is useful.  One approach in my mind is to 
leverage the getApplicationsToCleanup signal sent in the node heartbeat when 
the application finally completes, after which the NM where AM container ran 
could do the post cleanup.  The difference from YARN-2261 is that instead of 
running in a separate container, it could be run from NodeManager. And this 
approach does not require significant code change in application. YARN-2261 
could be used for more advanced use-cases which AM requires.  Problem with this 
approach is that if the NM crashes, the files may not get cleanup, even 
YARN-2261 has the same problem. For simplicity, may be we can allow this to 
occur and warn the user in the UI that the clean up is not done successfully 
and ask user do it manually.  thoughts?

> YARN should have a hook to delete a path from HDFS when an application ends
> ---
>
> Key: YARN-5658
> URL: https://issues.apache.org/jira/browse/YARN-5658
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> There are many cases when a client uploads data to HDFS and then needs to 
> subsequently clean it up, such as with the distributed cache.  It would be 
> helpful if YARN would do that cleanup automatically on job completion.
> The hook could be generic to an URI supported by {{FileSystem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781487#comment-15781487
 ] 

Jian He commented on YARN-5709:
---

Could you re-submit the patch with your change and retry ?

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, 
> yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, 
> yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-27 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781367#comment-15781367
 ] 

Jian He commented on YARN-5709:
---

I'm not sure which part of the patch is causing javadoc failure, [~templedf], 
[~kasha], any clue?

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, 
> yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, 
> yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-27 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5709:
--
Attachment: yarn-5709-branch-2.8.02.patch

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.patch, 
> yarn-5709-wip.2.patch, yarn-5709.1.patch, yarn-5709.2.patch, 
> yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-22 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771301#comment-15771301
 ] 

Jian He commented on YARN-5709:
---

I fixed the javac warnings, javadoc warnings seems be existing. 

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.patch, yarn-5709-wip.2.patch, yarn-5709.1.patch, 
> yarn-5709.2.patch, yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5709) Cleanup leader election configs and pluggability

2016-12-22 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5709:
--
Attachment: yarn-5709-branch-2.8.01.patch

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.patch, yarn-5709-wip.2.patch, yarn-5709.1.patch, 
> yarn-5709.2.patch, yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5924) Resource Manager fails to load state with InvalidProtocolBufferException

2016-12-21 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5924:
--
Assignee: Oleksii Dymytrov

> Resource Manager fails to load state with InvalidProtocolBufferException
> 
>
> Key: YARN-5924
> URL: https://issues.apache.org/jira/browse/YARN-5924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Oleksii Dymytrov
>Assignee: Oleksii Dymytrov
> Attachments: YARN-5924.002.patch
>
>
> InvalidProtocolBufferException is thrown during recovering of the 
> application's state if application's data has invalid format (or is broken) 
> under FSRMStateRoot/RMAppRoot/application_1477986176766_0134/ directory in 
> HDFS:
> {noformat}
> com.google.protobuf.InvalidProtocolBufferException: Protocol message 
> end-group tag did not match expected tag.
>   at 
> com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
>   at 
> com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
>   at 
> com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:143)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
>   at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
>   at 
> org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.parseFrom(YarnServerResourceManagerRecoveryProtos.java:1028)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$RMAppStateFileProcessor.processChildNode(FileSystemRMStateStore.java:966)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.processDirectoriesOfFiles(FileSystemRMStateStore.java:317)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMAppState(FileSystemRMStateStore.java:281)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:232)
> {noformat}
> The solution can be to catch "InvalidProtocolBufferException", show warning 
> and remove application's folder that contains invalid data to prevent RM 
> restart failure. 
> Additionally, I've added catch for other exceptions that can appear during 
> recovering of the specific application, to avoid RM failure even if the only 
> one application's state can't be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4757) [Umbrella] Simplified discovery of services via DNS mechanisms

2016-12-21 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768534#comment-15768534
 ] 

Jian He commented on YARN-4757:
---

This jira now becomes a dependency of YARN-5079, we are going to merge 
YARN-4757 branch to yarn-native-services branch

> [Umbrella] Simplified discovery of services via DNS mechanisms
> --
>
> Key: YARN-4757
> URL: https://issues.apache.org/jira/browse/YARN-4757
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Jonathan Maron
>  Labels: oct16-hard
> Attachments: 
> 0001-YARN-4757-Initial-code-submission-for-DNS-Service.patch, YARN-4757- 
> Simplified discovery of services via DNS mechanisms.pdf, 
> YARN-4757-YARN-4757.001.patch, YARN-4757-YARN-4757.002.patch, 
> YARN-4757-YARN-4757.003.patch, YARN-4757-YARN-4757.004.patch, 
> YARN-4757-YARN-4757.005.patch, YARN-4757.001.patch, YARN-4757.002.patch
>
>
> [See overview doc at YARN-4692, copying the sub-section (3.2.10.2) to track 
> all related efforts.]
> In addition to completing the present story of service-registry (YARN-913), 
> we also need to simplify the access to the registry entries. The existing 
> read mechanisms of the YARN Service Registry are currently limited to a 
> registry specific (java) API and a REST interface. In practice, this makes it 
> very difficult for wiring up existing clients and services. For e.g, dynamic 
> configuration of dependent endpoints of a service is not easy to implement 
> using the present registry-read mechanisms, *without* code-changes to 
> existing services.
> A good solution to this is to expose the registry information through a more 
> generic and widely used discovery mechanism: DNS. Service Discovery via DNS 
> uses the well-known DNS interfaces to browse the network for services. 
> YARN-913 in fact talked about such a DNS based mechanism but left it as a 
> future task. (Task) Having the registry information exposed via DNS 
> simplifies the life of services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2016-12-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765547#comment-15765547
 ] 

Jian He commented on YARN-6009:
---

[~rohithsharma], could you clarify which code logic changed?

> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {code}
> {color}
> Exception 2:
> {color:red}
> {code}
> 2016-12-09 14:57:26,162 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(790)) - 
> application_1477927786494_0008 State change from NEW to FINISHED
> 2016-12-09 14:57:26,162 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(599)) - Failed to load/recover state
> o

[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2016-12-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765298#comment-15765298
 ] 

Jian He commented on YARN-5995:
---

Agree a general metrics for overall performance, rather than a metrics for 
every single API will be more useful.
I think we can focus on writes first, read only happens on RM startup.

Also, total number of failed ops may be useful

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens

2016-12-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765017#comment-15765017
 ] 

Jian He edited comment on YARN-5910 at 12/20/16 7:23 PM:
-

Uploaded an in-process patch which uses the approach of making client send the 
jobConf to RM, RM DelegationTokenRenewer will renew the token using the app conf
A flag is added in MR to indicate whether sending the conf or not.


was (Author: jianhe):
Uploaded an in-process patch which uses the approach of making client send the 
jobConf to RM, RM DelegationTokenRenewer will renew the token using the app conf

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Assigned] (YARN-5910) Support for multi-cluster delegation tokens

2016-12-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-5910:
-

Assignee: Jian He

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Assignee: Jian He
>Priority: Minor
> Attachments: YARN-5910.01.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5910) Support for multi-cluster delegation tokens

2016-12-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Attachment: YARN-5910.01.patch

Uploaded an in-process patch which uses the approach of making client send the 
jobConf to RM, RM DelegationTokenRenewer will renew the token using the app conf

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Priority: Minor
> Attachments: YARN-5910.01.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2016-12-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764997#comment-15764997
 ] 

Jian He commented on YARN-5910:
---

Hi Clay, thanks for the feedback.
bq. we could also perhaps extend the various delegation token types to only 
optionally include this payload? Then we the RM would only pay the price when 
needed for an off-cluster request?
We realized that changing existing token structure might have issues regarding 
compatibility.

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Priority: Minor
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Issue Comment Deleted] (YARN-5910) Support for multi-cluster delegation tokens

2016-12-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5910:
--
Comment: was deleted

(was: Hi Clay, thanks for the feedback.
bq. we could also perhaps extend the various delegation token types to only 
optionally include this payload? Then we the RM would only pay the price when 
needed for an off-cluster request?
We realized that changing existing token structure might have issues regarding 
compatibility.)

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Priority: Minor
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5910) Support for multi-cluster delegation tokens

2016-12-20 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764998#comment-15764998
 ] 

Jian He commented on YARN-5910:
---

Hi Clay, thanks for the feedback.
bq. we could also perhaps extend the various delegation token types to only 
optionally include this payload? Then we the RM would only pay the price when 
needed for an off-cluster request?
We realized that changing existing token structure might have issues regarding 
compatibility.

> Support for multi-cluster delegation tokens
> ---
>
> Key: YARN-5910
> URL: https://issues.apache.org/jira/browse/YARN-5910
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Clay B.
>Priority: Minor
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> 
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6014) Followup fix for slider core module findbugs

2016-12-20 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-6014:
--
Attachment: YARN-6014-yarn-native-services.02.patch

> Followup fix for slider core module findbugs
> 
>
> Key: YARN-6014
> URL: https://issues.apache.org/jira/browse/YARN-6014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6014-yarn-native-services.01.patch, 
> YARN-6014-yarn-native-services.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6013) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2016-12-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763232#comment-15763232
 ] 

Jian He commented on YARN-6013:
---

[~Steven Rand], do you have server side log where this exception happens?

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: YARN-6013
> URL: https://issues.apache.org/jira/browse/YARN-6013
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, yarn
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>

[jira] [Updated] (YARN-6014) Followup fix for slider core module findbugs

2016-12-19 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-6014:
--
Attachment: YARN-6014-yarn-native-services.01.patch

> Followup fix for slider core module findbugs
> 
>
> Key: YARN-6014
> URL: https://issues.apache.org/jira/browse/YARN-6014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6014-yarn-native-services.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6014) Followup fix for slider core module findbugs

2016-12-19 Thread Jian He (JIRA)

Jian He created YARN-6014:
-

 Summary: Followup fix for slider core module findbugs
 Key: YARN-6014
 URL: https://issues.apache.org/jira/browse/YARN-6014
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5132) Exclude generated protobuf sources from YARN Javadoc build

2016-12-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15761892#comment-15761892
 ] 

Jian He commented on YARN-5132:
---

[~subru], [~kasha], 

[~billie.rinaldi] found this, could you please confirm ? seems like an issue.
bq. It looks like the maven-javadoc-plugin is not configured properly for the 
hadoop-yarn module. There are YARN exclusions in the top level pom: 
https://github.com/apache/hadoop/blob/trunk/pom.xml#L443, and these are 
overridden in the hadoop-yarn pom: 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/pom.xml#L78
 by a subset of the package names. I am not sure if the lack of exclusion of 
the yarn server and yarn webapp packages was intentional or not.

> Exclude generated protobuf sources from YARN Javadoc build
> --
>
> Key: YARN-5132
> URL: https://issues.apache.org/jira/browse/YARN-5132
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-5132-v1.patch
>
>
> Currently YARN build includes Javadoc from generated protobuf sources which 
> is causing CI to fail. This JIRA proposes to exclude generated protobuf 
> sources from YARN Javadoc build



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-5768) Integrate remaining app lifetime using feature implemented in YARN-4206

2016-12-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-5768.
---
Resolution: Fixed

This is done in YARN-5740

> Integrate remaining app lifetime using feature implemented in YARN-4206
> ---
>
> Key: YARN-5768
> URL: https://issues.apache.org/jira/browse/YARN-5768
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
> Fix For: yarn-native-services
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5968) Fix slider core module javadocs

2016-12-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757548#comment-15757548
 ] 

Jian He commented on YARN-5968:
---

Oh, does this mean the exclusions in the hadoop top-level pom is ignored ? If 
so, this is a bug. YARN-5132 added this change recently 

> Fix slider core module javadocs
> ---
>
> Key: YARN-5968
> URL: https://issues.apache.org/jira/browse/YARN-5968
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Billie Rinaldi
> Attachments: YARN-5968-yarn-native-services.01.patch, 
> YARN-5968-yarn-native-services.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6010) Fix findbugs, site warnings in yarn-services-api module

2016-12-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-6010:
--
Attachment: YARN-6010-yarn-native-services.01.patch

> Fix findbugs, site warnings in yarn-services-api module
> ---
>
> Key: YARN-6010
> URL: https://issues.apache.org/jira/browse/YARN-6010
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
> Attachments: YARN-6010-yarn-native-services.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-6010) Fix findbugs, site warnings in yarn-services-api module

2016-12-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-6010:
-

Assignee: Jian He

> Fix findbugs, site warnings in yarn-services-api module
> ---
>
> Key: YARN-6010
> URL: https://issues.apache.org/jira/browse/YARN-6010
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-6010-yarn-native-services.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-6010) Fix findbugs, site warnings in yarn-services-api module

2016-12-16 Thread Jian He (JIRA)

Jian He created YARN-6010:
-

 Summary: Fix findbugs, site warnings in yarn-services-api module
 Key: YARN-6010
 URL: https://issues.apache.org/jira/browse/YARN-6010
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.09.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.05.patch, 
> YARN-5967-yarn-native-services.06.patch, 
> YARN-5967-yarn-native-services.07.patch, 
> YARN-5967-yarn-native-services.08.patch, 
> YARN-5967-yarn-native-services.08.patch, 
> YARN-5967-yarn-native-services.09.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5968) Fix slider core module javadocs

2016-12-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5968:
--
Assignee: Billie Rinaldi

> Fix slider core module javadocs
> ---
>
> Key: YARN-5968
> URL: https://issues.apache.org/jira/browse/YARN-5968
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Billie Rinaldi
> Attachments: YARN-5968-yarn-native-services.01.patch, 
> YARN-5968-yarn-native-services.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5967) Fix slider core module findbugs warnings

2016-12-16 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756098#comment-15756098
 ] 

Jian He commented on YARN-5967:
---

btw. this patch is the latest
https://issues.apache.org/jira/secure/attachment/12843691/YARN-5967-yarn-native-services.08.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.05.patch, 
> YARN-5967-yarn-native-services.06.patch, 
> YARN-5967-yarn-native-services.07.patch, 
> YARN-5967-yarn-native-services.08.patch, 
> YARN-5967-yarn-native-services.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.08.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.05.patch, 
> YARN-5967-yarn-native-services.06.patch, 
> YARN-5967-yarn-native-services.07.patch, 
> YARN-5967-yarn-native-services.08.patch, 
> YARN-5967-yarn-native-services.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.08.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.05.patch, 
> YARN-5967-yarn-native-services.06.patch, 
> YARN-5967-yarn-native-services.07.patch, 
> YARN-5967-yarn-native-services.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-16 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.07.patch

Thanks Billie for the thorough review !  
Fixed all of them

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.05.patch, 
> YARN-5967-yarn-native-services.06.patch, 
> YARN-5967-yarn-native-services.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5967) Fix slider core module findbugs warnings

2016-12-15 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753382#comment-15753382
 ] 

Jian He commented on YARN-5967:
---

- For any unused code that's throwing warnings, simply removed it
- Suppress certain warnings which are ok.

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.05.patch, 
> YARN-5967-yarn-native-services.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)

2016-12-15 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5740:
--
Attachment: YARN-5740-yarn-native-services.03.patch

> Add a new field in Slider status output - lifetime (remaining)
> --
>
> Key: YARN-5740
> URL: https://issues.apache.org/jira/browse/YARN-5740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
> Fix For: yarn-native-services
>
> Attachments: YARN-5740-yarn-native-services.01.patch, 
> YARN-5740-yarn-native-services.02.patch, 
> YARN-5740-yarn-native-services.03.patch
>
>
> With YARN-5735, REST service is now setting lifetime to application during 
> submission (YARN-4205 exposed application lifetime support). Now Slider 
> status needs to expose this field so that the REST service can return it in 
> its GET response. Note, the lifetime value that GET response intends to 
> return is the remaining lifetime of the application. 
> There is an ongoing discussion in YARN-4206, that the lifetime value returned 
> in Application Report will be the remaining lifetime (at the time of 
> request). So until it is finalized, the lifetime value might go through 
> different connotations. But as long as we have the lifetime field in the 
> status output, it will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)

2016-12-15 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753103#comment-15753103
 ] 

Jian He commented on YARN-5740:
---

thanks for the review, fixed the issues except one, which I think no need to 
fix because all other variables do the same.
{{./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core/src/main/java/org/apache/slider/common/params/ActionStatusArgs.java:40:
  public boolean lifetime;:18: Variable 'lifetime' must be private and have 
accessor methods.}}

> Add a new field in Slider status output - lifetime (remaining)
> --
>
> Key: YARN-5740
> URL: https://issues.apache.org/jira/browse/YARN-5740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
> Fix For: yarn-native-services
>
> Attachments: YARN-5740-yarn-native-services.01.patch, 
> YARN-5740-yarn-native-services.02.patch
>
>
> With YARN-5735, REST service is now setting lifetime to application during 
> submission (YARN-4205 exposed application lifetime support). Now Slider 
> status needs to expose this field so that the REST service can return it in 
> its GET response. Note, the lifetime value that GET response intends to 
> return is the remaining lifetime of the application. 
> There is an ongoing discussion in YARN-4206, that the lifetime value returned 
> in Application Report will be the remaining lifetime (at the time of 
> request). So until it is finalized, the lifetime value might go through 
> different connotations. But as long as we have the lifetime field in the 
> status output, it will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)

2016-12-15 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5740:
--
Attachment: (was: YARN-5740-yarn-native-services.02.patch)

> Add a new field in Slider status output - lifetime (remaining)
> --
>
> Key: YARN-5740
> URL: https://issues.apache.org/jira/browse/YARN-5740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
> Fix For: yarn-native-services
>
> Attachments: YARN-5740-yarn-native-services.01.patch, 
> YARN-5740-yarn-native-services.02.patch
>
>
> With YARN-5735, REST service is now setting lifetime to application during 
> submission (YARN-4205 exposed application lifetime support). Now Slider 
> status needs to expose this field so that the REST service can return it in 
> its GET response. Note, the lifetime value that GET response intends to 
> return is the remaining lifetime of the application. 
> There is an ongoing discussion in YARN-4206, that the lifetime value returned 
> in Application Report will be the remaining lifetime (at the time of 
> request). So until it is finalized, the lifetime value might go through 
> different connotations. But as long as we have the lifetime field in the 
> status output, it will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)

2016-12-15 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5740:
--
Attachment: YARN-5740-yarn-native-services.02.patch

> Add a new field in Slider status output - lifetime (remaining)
> --
>
> Key: YARN-5740
> URL: https://issues.apache.org/jira/browse/YARN-5740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
> Fix For: yarn-native-services
>
> Attachments: YARN-5740-yarn-native-services.01.patch, 
> YARN-5740-yarn-native-services.02.patch
>
>
> With YARN-5735, REST service is now setting lifetime to application during 
> submission (YARN-4205 exposed application lifetime support). Now Slider 
> status needs to expose this field so that the REST service can return it in 
> its GET response. Note, the lifetime value that GET response intends to 
> return is the remaining lifetime of the application. 
> There is an ongoing discussion in YARN-4206, that the lifetime value returned 
> in Application Report will be the remaining lifetime (at the time of 
> request). So until it is finalized, the lifetime value might go through 
> different connotations. But as long as we have the lifetime field in the 
> status output, it will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5740) Add a new field in Slider status output - lifetime (remaining)

2016-12-15 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5740:
--
Attachment: YARN-5740-yarn-native-services.02.patch

> Add a new field in Slider status output - lifetime (remaining)
> --
>
> Key: YARN-5740
> URL: https://issues.apache.org/jira/browse/YARN-5740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>Assignee: Jian He
> Fix For: yarn-native-services
>
> Attachments: YARN-5740-yarn-native-services.01.patch, 
> YARN-5740-yarn-native-services.02.patch
>
>
> With YARN-5735, REST service is now setting lifetime to application during 
> submission (YARN-4205 exposed application lifetime support). Now Slider 
> status needs to expose this field so that the REST service can return it in 
> its GET response. Note, the lifetime value that GET response intends to 
> return is the remaining lifetime of the application. 
> There is an ongoing discussion in YARN-4206, that the lifetime value returned 
> in Application Report will be the remaining lifetime (at the time of 
> request). So until it is finalized, the lifetime value might go through 
> different connotations. But as long as we have the lifetime field in the 
> status output, it will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-15 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.06.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.05.patch, 
> YARN-5967-yarn-native-services.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-15 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.05.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-15 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.04.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch, 
> YARN-5967-yarn-native-services.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-14 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.04.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch, 
> YARN-5967-yarn-native-services.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-14 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.03.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5996) Native services AM kills app on AMRMClientAsync onError call

2016-12-14 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750101#comment-15750101
 ] 

Jian He commented on YARN-5996:
---

I checked the possible exceptions, didn't find such error that must force app 
to kill itself. So, I think it should be fine.

> Native services AM kills app on AMRMClientAsync onError call
> 
>
> Key: YARN-5996
> URL: https://issues.apache.org/jira/browse/YARN-5996
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-5996-yarn-native-services.001.patch, 
> YARN-5996-yarn-native-services.002.patch
>
>
> The AMRMClientAsync onError callback occurred due to an InterruptedException 
> in this case. The AM may need to kill itself once the client reaches this 
> state, but it should not kill the entire application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-14 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.02.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch, 
> YARN-5967-yarn-native-services.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4844) Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource

2016-12-14 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749834#comment-15749834
 ] 

Jian He commented on YARN-4844:
---

looks good

> Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource
> -
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-4844-branch-2.8.0016_.patch, 
> YARN-4844-branch-2.8.addendum.2.patch, YARN-4844-branch-2.addendum.1_.patch, 
> YARN-4844-branch-2.addendum.2.patch, YARN-4844.1.patch, YARN-4844.10.patch, 
> YARN-4844.11.patch, YARN-4844.12.patch, YARN-4844.13.patch, 
> YARN-4844.14.patch, YARN-4844.15.patch, YARN-4844.16.branch-2.patch, 
> YARN-4844.16.patch, YARN-4844.2.patch, YARN-4844.3.patch, YARN-4844.4.patch, 
> YARN-4844.5.patch, YARN-4844.6.patch, YARN-4844.7.patch, 
> YARN-4844.8.branch-2.patch, YARN-4844.8.patch, YARN-4844.9.branch, 
> YARN-4844.9.branch-2.patch, YARN-4844.addendum.3.patch, 
> YARN-4844.addendum.4.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to add getMemoryLong/getVirtualCoreLong to 
> o.a.h.y.api.records.Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-14 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: (was: YARN-5967-yarn-native-services.02.patch)

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5967) Fix slider core module findbugs warnings

2016-12-14 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5967:
--
Attachment: YARN-5967-yarn-native-services.02.patch

> Fix slider core module findbugs warnings 
> -
>
> Key: YARN-5967
> URL: https://issues.apache.org/jira/browse/YARN-5967
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5967-yarn-native-services.01.patch, 
> YARN-5967-yarn-native-services.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5931) Document timeout interfaces CLI and REST APIs

2016-12-14 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749265#comment-15749265
 ] 

Jian He commented on YARN-5931:
---

sounds good to me

> Document timeout interfaces CLI and REST APIs
> -
>
> Key: YARN-5931
> URL: https://issues.apache.org/jira/browse/YARN-5931
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: ResourceManagerRest.html, YARN-5931.0.patch, 
> YARN-5931.1.patch, YarnCommands.html
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2016-12-14 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749171#comment-15749171
 ] 

Jian He commented on YARN-4126:
---

The rest are test refactorings which are good to have, IMO

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Fix For: 3.0.0-alpha1
>
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, 
> 0006-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5999) AMRMClientAsync will stop if any exceptions thrown on allocate call

2016-12-13 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5999:
--
Attachment: YARN-5999.1.patch

> AMRMClientAsync will stop if any exceptions thrown on allocate call 
> 
>
> Key: YARN-5999
> URL: https://issues.apache.org/jira/browse/YARN-5999
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5999.1.patch
>
>
> Currently, for any exceptions thrown on the allocate call of AMRMClientAsync, 
>  it will stop both heartbeat thread and the callback handler thread, leaving 
> AMRMClient in an unusable state.  Caller has to instantiate a new AMRMClient. 
> IMO, the threads should keep on running, it should be up to the caller 
> whether to stop the AMRMClient or not.
> {code}
>   try {
> response = client.allocate(progress);
>   } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;
>   } catch (Throwable ex) {
> LOG.error("Exception on heartbeat", ex);
> savedException = ex;
> // interrupt handler thread in case it waiting on the queue
> handlerThread.interrupt();
> return;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2016-12-13 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747270#comment-15747270
 ] 

Jian He commented on YARN-4126:
---

I've committed a patch to revert this logic to return true if kerberos is not 
enabled.

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Fix For: 3.0.0-alpha1
>
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, 
> 0006-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode

2016-12-13 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15747232#comment-15747232
 ] 

Jian He commented on YARN-4126:
---

ok, let's revert it from trunk. 

> RM should not issue delegation tokens in unsecure mode
> --
>
> Key: YARN-4126
> URL: https://issues.apache.org/jira/browse/YARN-4126
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Bibin A Chundatt
> Fix For: 3.0.0-alpha1
>
> Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, 
> 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch, 
> 0006-YARN-4126.patch
>
>
> ClientRMService#getDelegationToken is currently  returning a delegation token 
> in insecure mode. We should not return the token if it's in insecure mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-5999) AMRMClientAsync will stop if any exceptions thrown on allocate call

2016-12-13 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-5999:
-

Assignee: Jian He

> AMRMClientAsync will stop if any exceptions thrown on allocate call 
> 
>
> Key: YARN-5999
> URL: https://issues.apache.org/jira/browse/YARN-5999
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>
> Currently, for any exceptions thrown on the allocate call of AMRMClientAsync, 
>  it will stop both heartbeat thread and the callback handler thread, leaving 
> AMRMClient in an unusable state.  Caller has to instantiate a new AMRMClient. 
> IMO, the threads should keep on running, it should be up to the caller 
> whether to stop the AMRMClient or not.
> {code}
>   try {
> response = client.allocate(progress);
>   } catch (ApplicationAttemptNotFoundException e) {
> handler.onShutdownRequest();
> LOG.info("Shutdown requested. Stopping callback.");
> return;
>   } catch (Throwable ex) {
> LOG.error("Exception on heartbeat", ex);
> savedException = ex;
> // interrupt handler thread in case it waiting on the queue
> handlerThread.interrupt();
> return;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5999) AMRMClientAsync will stop if any exceptions thrown on allocate call

2016-12-13 Thread Jian He (JIRA)

Jian He created YARN-5999:
-

 Summary: AMRMClientAsync will stop if any exceptions thrown on 
allocate call 
 Key: YARN-5999
 URL: https://issues.apache.org/jira/browse/YARN-5999
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He


Currently, for any exceptions thrown on the allocate call of AMRMClientAsync,  
it will stop both heartbeat thread and the callback handler thread, leaving 
AMRMClient in an unusable state.  Caller has to instantiate a new AMRMClient. 
IMO, the threads should keep on running, it should be up to the caller whether 
to stop the AMRMClient or not.

{code}
  try {
response = client.allocate(progress);
  } catch (ApplicationAttemptNotFoundException e) {
handler.onShutdownRequest();
LOG.info("Shutdown requested. Stopping callback.");
return;
  } catch (Throwable ex) {
LOG.error("Exception on heartbeat", ex);
savedException = ex;
// interrupt handler thread in case it waiting on the queue
handlerThread.interrupt();
return;
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 4393 matches

Mail list logo