[jira] [Commented] (YARN-9941) Opportunistic scheduler metrics should be reset during fail-over.

2020-07-20 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161710#comment-17161710
 ] 

Abhishek Modi commented on YARN-9941:
-

Sure [~BilwaST]. Feel free to take over. Thanks

> Opportunistic scheduler metrics should be reset during fail-over.
> -
>
> Key: YARN-9941
> URL: https://issues.apache.org/jira/browse/YARN-9941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-07-27 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9694:

Attachment: YARN-9694.002.patch

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-07-29 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1689#comment-1689
 ] 

Abhishek Modi commented on YARN-9694:
-

[~elgoiri] could you please review it. Thanks.

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-07-29 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895637#comment-16895637
 ] 

Abhishek Modi commented on YARN-9694:
-

Thanks [~elgoiri] for reviewing it.

GenerateNodeTableMapping generates a node to rack mapping file which is then 
being used by TableMapping to resolve rack names. The format required by 
TableMapping is a two column text file where first column specifies node name 
and second column specifies rack name. I am generating this file as part of 
generateNodeTableMapping.

 

May be I will remove the format name from the file. Will upload an updated 
patch with changing the format and having better javadoc.

 

 

 

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-07-29 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9694:

Attachment: YARN-9694.003.patch

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch, 
> YARN-9694.003.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-07-30 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896281#comment-16896281
 ] 

Abhishek Modi commented on YARN-9694:
-

Thanks for reviewing it.

Java doesn't provide a way to create temporary directory. So I was creating a 
file and deleting it to get temporary directory.

Updated patch to directly use temporary file.

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch, 
> YARN-9694.003.patch, YARN-9694.004.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-07-30 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9694:

Attachment: YARN-9694.004.patch

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch, 
> YARN-9694.003.patch, YARN-9694.004.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9690) Invalid AMRM token when distributed scheduling is enabled.

2019-07-30 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896659#comment-16896659
 ] 

Abhishek Modi commented on YARN-9690:
-

[~Babbleshack] thanks for filing the issue. Could you please try by setting 
this in yarn-site.xml of both RM and NM:

yarn.resourcemanager.hostname to yarn-master-0.yarn-service.yarn

and then you can remove following configs:

yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address.

> Invalid AMRM token when distributed scheduling is enabled.
> --
>
> Key: YARN-9690
> URL: https://issues.apache.org/jira/browse/YARN-9690
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-scheduling, yarn
>Affects Versions: 2.9.2, 3.1.2
> Environment: OS: Ubuntu 18.04
> JVM: 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03
>Reporter: Babble Shack
>Priority: Major
> Attachments: applicationlog, distributed_log, ds_application.log, 
> image-2019-07-26-18-00-14-980.png, nodemanager-yarn-site.xml, 
> nodemanager.log, rm-yarn-site.xml, yarn-site.xml
>
>
> Applications fail to start due to invalild AMRM from application attempt.
> I have tested this with 0/100% opportunistic maps and the same issue occurs 
> regardless.
> {code:java}
> 
> -->
> 
>   
>     mapreduceyarn.nodemanager.aux-services
>     mapreduce_shuffle
>   
>   
>       yarn.resourcemanager.address
>       yarn-master-0.yarn-service.yarn:8032
>   
>   
>       yarn.resourcemanager.scheduler.address
>       0.0.0.0:8049
>   
>   
>     
> yarn.resourcemanager.opportunistic-container-allocation.enabled
>     true
>   
>   
>     yarn.nodemanager.opportunistic-containers-max-queue-length
>     10
>   
>   
>     yarn.nodemanager.distributed-scheduling.enabled
>     true
>   
>  
>   
>     yarn.webapp.ui2.enable
>     true
>   
>   
>       yarn.resourcemanager.resource-tracker.address
>       yarn-master-0.yarn-service.yarn:8031
>   
>   
>     yarn.log-aggregation-enable
>     true
>   
>   
>       yarn.nodemanager.aux-services
>       mapreduce_shuffle
>   
>   
>   
>   
>   
>     yarn.nodemanager.resource.memory-mb
>     7168
>   
>   
>     yarn.scheduler.minimum-allocation-mb
>     3584
>   
>   
>     yarn.scheduler.maximum-allocation-mb
>     7168
>   
>   
>     yarn.app.mapreduce.am.resource.mb
>     7168
>   
>   
>   
>     yarn.app.mapreduce.am.command-opts
>     -Xmx5734m
>   
>   
>   
>     yarn.timeline-service.enabled
>     true
>   
>   
>     yarn.resourcemanager.system-metrics-publisher.enabled
>     true
>   
>   
>     yarn.timeline-service.generic-application-history.enabled
>     true
>   
>   
>     yarn.timeline-service.bind-host
>     0.0.0.0
>   
> 
> {code}
> Relevant logs:
> {code:java}
> 2019-07-22 14:56:37,104 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 100% of the 
> mappers will be scheduled using OPPORTUNISTIC containers
> 2019-07-22 14:56:37,117 INFO [main] org.apache.hadoop.yarn.client.RMProxy: 
> Connecting to ResourceManager at 
> yarn-master-0.yarn-service.yarn/10.244.1.134:8030
> 2019-07-22 14:56:37,150 WARN [main] org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Invalid AMRMToken from appattempt_1563805140414_0002_02
> 2019-07-22 14:56:37,152 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while 
> registering
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid 
> AMRMToken from appattempt_1563805140414_0002_02
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
>     at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationH

[jira] [Assigned] (YARN-9690) Invalid AMRM token when distributed scheduling is enabled.

2019-07-30 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-9690:
---

Assignee: Abhishek Modi

> Invalid AMRM token when distributed scheduling is enabled.
> --
>
> Key: YARN-9690
> URL: https://issues.apache.org/jira/browse/YARN-9690
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-scheduling, yarn
>Affects Versions: 2.9.2, 3.1.2
> Environment: OS: Ubuntu 18.04
> JVM: 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03
>Reporter: Babble Shack
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: applicationlog, distributed_log, ds_application.log, 
> image-2019-07-26-18-00-14-980.png, nodemanager-yarn-site.xml, 
> nodemanager.log, rm-yarn-site.xml, yarn-site.xml
>
>
> Applications fail to start due to invalild AMRM from application attempt.
> I have tested this with 0/100% opportunistic maps and the same issue occurs 
> regardless.
> {code:java}
> 
> -->
> 
>   
>     mapreduceyarn.nodemanager.aux-services
>     mapreduce_shuffle
>   
>   
>       yarn.resourcemanager.address
>       yarn-master-0.yarn-service.yarn:8032
>   
>   
>       yarn.resourcemanager.scheduler.address
>       0.0.0.0:8049
>   
>   
>     
> yarn.resourcemanager.opportunistic-container-allocation.enabled
>     true
>   
>   
>     yarn.nodemanager.opportunistic-containers-max-queue-length
>     10
>   
>   
>     yarn.nodemanager.distributed-scheduling.enabled
>     true
>   
>  
>   
>     yarn.webapp.ui2.enable
>     true
>   
>   
>       yarn.resourcemanager.resource-tracker.address
>       yarn-master-0.yarn-service.yarn:8031
>   
>   
>     yarn.log-aggregation-enable
>     true
>   
>   
>       yarn.nodemanager.aux-services
>       mapreduce_shuffle
>   
>   
>   
>   
>   
>     yarn.nodemanager.resource.memory-mb
>     7168
>   
>   
>     yarn.scheduler.minimum-allocation-mb
>     3584
>   
>   
>     yarn.scheduler.maximum-allocation-mb
>     7168
>   
>   
>     yarn.app.mapreduce.am.resource.mb
>     7168
>   
>   
>   
>     yarn.app.mapreduce.am.command-opts
>     -Xmx5734m
>   
>   
>   
>     yarn.timeline-service.enabled
>     true
>   
>   
>     yarn.resourcemanager.system-metrics-publisher.enabled
>     true
>   
>   
>     yarn.timeline-service.generic-application-history.enabled
>     true
>   
>   
>     yarn.timeline-service.bind-host
>     0.0.0.0
>   
> 
> {code}
> Relevant logs:
> {code:java}
> 2019-07-22 14:56:37,104 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 100% of the 
> mappers will be scheduled using OPPORTUNISTIC containers
> 2019-07-22 14:56:37,117 INFO [main] org.apache.hadoop.yarn.client.RMProxy: 
> Connecting to ResourceManager at 
> yarn-master-0.yarn-service.yarn/10.244.1.134:8030
> 2019-07-22 14:56:37,150 WARN [main] org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Invalid AMRMToken from appattempt_1563805140414_0002_02
> 2019-07-22 14:56:37,152 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while 
> registering
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid 
> AMRMToken from appattempt_1563805140414_0002_02
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
>     at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>     at 
> org.apa

[jira] [Commented] (YARN-9690) Invalid AMRM token when distributed scheduling is enabled.

2019-07-30 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896678#comment-16896678
 ] 

Abhishek Modi commented on YARN-9690:
-

Thanks [~Babbleshack]. I will help review that PR.

> Invalid AMRM token when distributed scheduling is enabled.
> --
>
> Key: YARN-9690
> URL: https://issues.apache.org/jira/browse/YARN-9690
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-scheduling, yarn
>Affects Versions: 2.9.2, 3.1.2
> Environment: OS: Ubuntu 18.04
> JVM: 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03
>Reporter: Babble Shack
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: applicationlog, distributed_log, ds_application.log, 
> image-2019-07-26-18-00-14-980.png, nodemanager-yarn-site.xml, 
> nodemanager.log, rm-yarn-site.xml, yarn-site.xml
>
>
> Applications fail to start due to invalild AMRM from application attempt.
> I have tested this with 0/100% opportunistic maps and the same issue occurs 
> regardless.
> {code:java}
> 
> -->
> 
>   
>     mapreduceyarn.nodemanager.aux-services
>     mapreduce_shuffle
>   
>   
>       yarn.resourcemanager.address
>       yarn-master-0.yarn-service.yarn:8032
>   
>   
>       yarn.resourcemanager.scheduler.address
>       0.0.0.0:8049
>   
>   
>     
> yarn.resourcemanager.opportunistic-container-allocation.enabled
>     true
>   
>   
>     yarn.nodemanager.opportunistic-containers-max-queue-length
>     10
>   
>   
>     yarn.nodemanager.distributed-scheduling.enabled
>     true
>   
>  
>   
>     yarn.webapp.ui2.enable
>     true
>   
>   
>       yarn.resourcemanager.resource-tracker.address
>       yarn-master-0.yarn-service.yarn:8031
>   
>   
>     yarn.log-aggregation-enable
>     true
>   
>   
>       yarn.nodemanager.aux-services
>       mapreduce_shuffle
>   
>   
>   
>   
>   
>     yarn.nodemanager.resource.memory-mb
>     7168
>   
>   
>     yarn.scheduler.minimum-allocation-mb
>     3584
>   
>   
>     yarn.scheduler.maximum-allocation-mb
>     7168
>   
>   
>     yarn.app.mapreduce.am.resource.mb
>     7168
>   
>   
>   
>     yarn.app.mapreduce.am.command-opts
>     -Xmx5734m
>   
>   
>   
>     yarn.timeline-service.enabled
>     true
>   
>   
>     yarn.resourcemanager.system-metrics-publisher.enabled
>     true
>   
>   
>     yarn.timeline-service.generic-application-history.enabled
>     true
>   
>   
>     yarn.timeline-service.bind-host
>     0.0.0.0
>   
> 
> {code}
> Relevant logs:
> {code:java}
> 2019-07-22 14:56:37,104 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 100% of the 
> mappers will be scheduled using OPPORTUNISTIC containers
> 2019-07-22 14:56:37,117 INFO [main] org.apache.hadoop.yarn.client.RMProxy: 
> Connecting to ResourceManager at 
> yarn-master-0.yarn-service.yarn/10.244.1.134:8030
> 2019-07-22 14:56:37,150 WARN [main] org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Invalid AMRMToken from appattempt_1563805140414_0002_02
> 2019-07-22 14:56:37,152 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while 
> registering
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid 
> AMRMToken from appattempt_1563805140414_0002_02
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
>     at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>     at 
> org.apache.hadoop.io.retry.RetryInvocati

[jira] [Assigned] (YARN-9690) Invalid AMRM token when distributed scheduling is enabled.

2019-08-01 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-9690:
---

Assignee: (was: Abhishek Modi)

> Invalid AMRM token when distributed scheduling is enabled.
> --
>
> Key: YARN-9690
> URL: https://issues.apache.org/jira/browse/YARN-9690
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-scheduling, yarn
>Affects Versions: 2.9.2, 3.1.2
> Environment: OS: Ubuntu 18.04
> JVM: 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03
>Reporter: Babble Shack
>Priority: Major
> Attachments: applicationlog, distributed_log, ds_application.log, 
> image-2019-07-26-18-00-14-980.png, nodemanager-yarn-site.xml, 
> nodemanager.log, rm-yarn-site.xml, yarn-site.xml
>
>
> Applications fail to start due to invalild AMRM from application attempt.
> I have tested this with 0/100% opportunistic maps and the same issue occurs 
> regardless.
> {code:java}
> 
> -->
> 
>   
>     mapreduceyarn.nodemanager.aux-services
>     mapreduce_shuffle
>   
>   
>       yarn.resourcemanager.address
>       yarn-master-0.yarn-service.yarn:8032
>   
>   
>       yarn.resourcemanager.scheduler.address
>       0.0.0.0:8049
>   
>   
>     
> yarn.resourcemanager.opportunistic-container-allocation.enabled
>     true
>   
>   
>     yarn.nodemanager.opportunistic-containers-max-queue-length
>     10
>   
>   
>     yarn.nodemanager.distributed-scheduling.enabled
>     true
>   
>  
>   
>     yarn.webapp.ui2.enable
>     true
>   
>   
>       yarn.resourcemanager.resource-tracker.address
>       yarn-master-0.yarn-service.yarn:8031
>   
>   
>     yarn.log-aggregation-enable
>     true
>   
>   
>       yarn.nodemanager.aux-services
>       mapreduce_shuffle
>   
>   
>   
>   
>   
>     yarn.nodemanager.resource.memory-mb
>     7168
>   
>   
>     yarn.scheduler.minimum-allocation-mb
>     3584
>   
>   
>     yarn.scheduler.maximum-allocation-mb
>     7168
>   
>   
>     yarn.app.mapreduce.am.resource.mb
>     7168
>   
>   
>   
>     yarn.app.mapreduce.am.command-opts
>     -Xmx5734m
>   
>   
>   
>     yarn.timeline-service.enabled
>     true
>   
>   
>     yarn.resourcemanager.system-metrics-publisher.enabled
>     true
>   
>   
>     yarn.timeline-service.generic-application-history.enabled
>     true
>   
>   
>     yarn.timeline-service.bind-host
>     0.0.0.0
>   
> 
> {code}
> Relevant logs:
> {code:java}
> 2019-07-22 14:56:37,104 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 100% of the 
> mappers will be scheduled using OPPORTUNISTIC containers
> 2019-07-22 14:56:37,117 INFO [main] org.apache.hadoop.yarn.client.RMProxy: 
> Connecting to ResourceManager at 
> yarn-master-0.yarn-service.yarn/10.244.1.134:8030
> 2019-07-22 14:56:37,150 WARN [main] org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server : 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Invalid AMRMToken from appattempt_1563805140414_0002_02
> 2019-07-22 14:56:37,152 ERROR [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while 
> registering
> org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid 
> AMRMToken from appattempt_1563805140414_0002_02
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
>     at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
>     at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>     at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>     at 
> org.apache.hadoop.io.retry.RetryI

[jira] [Assigned] (YARN-7547) Throttle Localization for Opportunistic Containers in the NM

2019-08-05 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-7547:
---

Assignee: Abhishek Modi  (was: kartheek muthyala)

> Throttle Localization for Opportunistic Containers in the NM
> 
>
> Key: YARN-7547
> URL: https://issues.apache.org/jira/browse/YARN-7547
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Abhishek Modi
>Priority: Major
>
> Currently, Localization is performed before the container is queued on the 
> NM. It is possible that a barrage of Opportunsitic containers can prevent 
> Guaranteed containers from starting. This can be avoided by throttling 
> Localization Requests for opportunistic containers - for eg. if the number of 
> Queued containers is > x, then don't start localization for new Opp 
> containers.   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-08-06 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901105#comment-16901105
 ] 

Abhishek Modi commented on YARN-9694:
-

Tested the latest patch on scale with 4500 nodes json and everything worked 
fine.

[~elgoiri] could you please review it. Thanks.

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch, 
> YARN-9694.003.patch, YARN-9694.004.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.

2019-08-08 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903597#comment-16903597
 ] 

Abhishek Modi commented on YARN-9694:
-

Thanks [~elgoiri] for review. I have committed it to trunk.

> UI always show default-rack for all the nodes while running SLS.
> 
>
> Key: YARN-9694
> URL: https://issues.apache.org/jira/browse/YARN-9694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9694.001.patch, YARN-9694.002.patch, 
> YARN-9694.003.patch, YARN-9694.004.patch
>
>
> Currently, independent of the specification of the nodes in SLS.json or 
> nodes.json, UI always shows that rack of the node is default-rack.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9732) yarn.system-metrics-publisher.enabled=false does not work

2019-08-09 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903961#comment-16903961
 ] 

Abhishek Modi commented on YARN-9732:
-

Thanks [~magnum] for the patch. I will commit it in couple of hours.

> yarn.system-metrics-publisher.enabled=false does not work
> -
>
> Key: YARN-9732
> URL: https://issues.apache.org/jira/browse/YARN-9732
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineclient
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9732.0001.patch
>
>
> RM does not use yarn.system-metrics-publisher.enabled=false,
> so if configure only yarn.timeline-service.enabled=true, 
> YARN system metrics are always published on the timeline server by RM
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl

2019-08-09 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903963#comment-16903963
 ] 

Abhishek Modi commented on YARN-9731:
-

Thanks [~magnum] for the patch. Thanks [~Prabhu Joseph] for review. Patch looks 
good to me. 

[~magnum] could you please take care of check-style warnings.

> In ATS v1.5, all jobs are visible to all users without view-acl
> ---
>
> Key: YARN-9731
> URL: https://issues.apache.org/jira/browse/YARN-9731
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9731.001.patch, YARN-9731.002.patch, 
> ats_v1.5_screenshot.png
>
>
> In ATS v1.5 of secure mode,
> all jobs are visible to all users without view-acl.
> if user does not have view-acl,  user should not be able to see jobs.
> I attatched ATS UI screenshot.
>  
> ATS v1.5 log
> {code:java}
> 2019-08-09 10:21:13,679 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1954. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1954
> 2019-08-09 10:21:13,680 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1951. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1951
> {code}
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9732) yarn.system-metrics-publisher.enabled=false is not honored by RM

2019-08-09 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9732:

Summary: yarn.system-metrics-publisher.enabled=false is not honored by RM  
(was: yarn.system-metrics-publisher.enabled=false does not work)

> yarn.system-metrics-publisher.enabled=false is not honored by RM
> 
>
> Key: YARN-9732
> URL: https://issues.apache.org/jira/browse/YARN-9732
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineclient
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9732.0001.patch
>
>
> RM does not use yarn.system-metrics-publisher.enabled=false,
> so if configure only yarn.timeline-service.enabled=true, 
> YARN system metrics are always published on the timeline server by RM
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9732) yarn.system-metrics-publisher.enabled=false is not honored by RM

2019-08-09 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904037#comment-16904037
 ] 

Abhishek Modi commented on YARN-9732:
-

Thanks [~magnum] for patch and [~Prabhu Joseph] for review. Committed to trunk.

> yarn.system-metrics-publisher.enabled=false is not honored by RM
> 
>
> Key: YARN-9732
> URL: https://issues.apache.org/jira/browse/YARN-9732
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineclient
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9732.0001.patch
>
>
> RM does not use yarn.system-metrics-publisher.enabled=false,
> so if configure only yarn.timeline-service.enabled=true, 
> YARN system metrics are always published on the timeline server by RM
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl

2019-08-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904610#comment-16904610
 ] 

Abhishek Modi commented on YARN-9731:
-

[~magnum] thanks for the patch. One minor comment:

Could you use logger format in ApplicationHistoryManagerOnTimelineStore.java 
line 687.

> In ATS v1.5, all jobs are visible to all users without view-acl
> ---
>
> Key: YARN-9731
> URL: https://issues.apache.org/jira/browse/YARN-9731
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9731.001.patch, YARN-9731.002.patch, 
> YARN-9731.003.patch, ats_v1.5_screenshot.png
>
>
> In ATS v1.5 of secure mode,
> all jobs are visible to all users without view-acl.
> if user does not have view-acl,  user should not be able to see jobs.
> I attatched ATS UI screenshot.
>  
> ATS v1.5 log
> {code:java}
> 2019-08-09 10:21:13,679 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1954. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1954
> 2019-08-09 10:21:13,680 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1951. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1951
> {code}
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9657) AbstractLivelinessMonitor add serviceName to PingChecker thread

2019-08-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904612#comment-16904612
 ] 

Abhishek Modi commented on YARN-9657:
-

Thanks [~BilwaST] for working on it. Patch looks good to me.

Committed to trunk.

 

> AbstractLivelinessMonitor add serviceName to PingChecker thread
> ---
>
> Key: YARN-9657
> URL: https://issues.apache.org/jira/browse/YARN-9657
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-9657-001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9722) PlacementRule logs object ID in place of queue name.

2019-08-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904613#comment-16904613
 ] 

Abhishek Modi commented on YARN-9722:
-

Thanks [~Prabhu Joseph] for the patch. A minor comment: Can we use logger 
format here.

> PlacementRule logs object ID in place of queue name.
> 
>
> Key: YARN-9722
> URL: https://issues.apache.org/jira/browse/YARN-9722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-9722-001.patch
>
>
> UserGroupMappingPlacementRule logs object ID in place of queue name.
> {code}
> 2019-08-05 09:28:52,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1564996871731_0003 user ambari-qa mapping [default] 
> to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
>  override false
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API

2019-08-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904616#comment-16904616
 ] 

Abhishek Modi commented on YARN-9464:
-

Thanks [~Prabhu Joseph] for the patch. It looks good to me. I will commit it by 
tomorrow if there is no objection.

> Support "Pending Resource" metrics in RM's RESTful API
> --
>
> Key: YARN-9464
> URL: https://issues.apache.org/jira/browse/YARN-9464
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9464-001.patch, YARN-9464-002.patch
>
>
> Knowing only the "available", "used" resource is not enough for YARN 
> management tools like auto-scaler. It would be helpful to diagnose the 
> cluster resource utilization if it gets "Pending Resource" from RM RESTful 
> APIs. In a certain extent, it represents how starving the applications are.
> Initially, we can add "pending resource" information in below two RM REST 
> APIs:
> {code:java}
> RMnode:port/ws/v1/cluster/metrics
> RMnode:port/ws/v1/cluster/nodes
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl

2019-08-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904847#comment-16904847
 ] 

Abhishek Modi commented on YARN-9731:
-

You have replaced all exceptions by e.toString(). This will not show the stack 
trace. Could you please use exception directly while logging.

> In ATS v1.5, all jobs are visible to all users without view-acl
> ---
>
> Key: YARN-9731
> URL: https://issues.apache.org/jira/browse/YARN-9731
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9731.001.patch, YARN-9731.002.patch, 
> YARN-9731.003.patch, YARN-9731.004.patch, ats_v1.5_screenshot.png
>
>
> In ATS v1.5 of secure mode,
> all jobs are visible to all users without view-acl.
> if user does not have view-acl,  user should not be able to see jobs.
> I attatched ATS UI screenshot.
>  
> ATS v1.5 log
> {code:java}
> 2019-08-09 10:21:13,679 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1954. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1954
> 2019-08-09 10:21:13,680 WARN 
> applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
> (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687))
>  - Failed to authorize when generating application report for 
> application_1565247558150_1951. Use a placeholder for its latest attempt id.
> org.apache.hadoop.security.authorize.AuthorizationException: User magnum does 
> not have privilege to see this application application_1565247558150_1951
> {code}
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9722) PlacementRule logs object ID in place of queue name.

2019-08-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904852#comment-16904852
 ] 

Abhishek Modi commented on YARN-9722:
-

Patch  [^YARN-9722-002.patch] lgtm. Committed to trunk. 

Thanks [~Prabhu Joseph] for the patch and [~sunilg] for review.

> PlacementRule logs object ID in place of queue name.
> 
>
> Key: YARN-9722
> URL: https://issues.apache.org/jira/browse/YARN-9722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-9722-001.patch, YARN-9722-002.patch
>
>
> UserGroupMappingPlacementRule logs object ID in place of queue name.
> {code}
> 2019-08-05 09:28:52,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1564996871731_0003 user ambari-qa mapping [default] 
> to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
>  override false
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API

2019-08-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904853#comment-16904853
 ] 

Abhishek Modi commented on YARN-9464:
-

[~Prabhu Joseph] 
[TestSystemMetricsPublisher.testPublishContainerMetrics|https://builds.apache.org/job/PreCommit-YARN-Build/24527/testReport/org.apache.hadoop.yarn.server.resourcemanager.metrics/TestSystemMetricsPublisher/testPublishContainerMetrics/]
 is failing. Could you please take a look whether it's related. Thanks

> Support "Pending Resource" metrics in RM's RESTful API
> --
>
> Key: YARN-9464
> URL: https://issues.apache.org/jira/browse/YARN-9464
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9464-001.patch, YARN-9464-002.patch
>
>
> Knowing only the "available", "used" resource is not enough for YARN 
> management tools like auto-scaler. It would be helpful to diagnose the 
> cluster resource utilization if it gets "Pending Resource" from RM RESTful 
> APIs. In a certain extent, it represents how starving the applications are.
> Initially, we can add "pending resource" information in below two RM REST 
> APIs:
> {code:java}
> RMnode:port/ws/v1/cluster/metrics
> RMnode:port/ws/v1/cluster/nodes
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7982) Do ACLs check while retrieving entity-types per application

2019-08-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904859#comment-16904859
 ] 

Abhishek Modi commented on YARN-7982:
-

Thanks [~Prabhu Joseph] for the patch.

Some comments:
In TimelineReaderManager.java, I think we should still create a new context 
while calling getEntityTypes to make sure that existing context is not modified.

In FilesystemTimelineReaderImpl, why do we need to set userId explicitly.  In 
line just above, we are calling context.getUserId.

> Do ACLs check while retrieving entity-types per application
> ---
>
> Key: YARN-7982
> URL: https://issues.apache.org/jira/browse/YARN-7982
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-7982-001.patch, YARN-7982-002.patch, 
> YARN-7982-003.patch
>
>
> REST end point {{/apps/$appid/entity-types}} retrieves all the entity-types 
> for given application. This need to be guarded with ACL check
> {code}
> [yarn@yarn-ats-3 ~]$ curl 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002?user.name=ambari-qa1";
> {"exception":"ForbiddenException","message":"java.lang.Exception: User 
> ambari-qa1 is not allowed to read TimelineService V2 
> data.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"}
> [yarn@yarn-ats-3 ~]$ curl 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002/entity-types?user.name=ambari-qa1";
> ["YARN_APPLICATION_ATTEMPT","YARN_CONTAINER"]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API

2019-08-12 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904993#comment-16904993
 ] 

Abhishek Modi commented on YARN-9464:
-

Thanks [~Prabhu Joseph]. Committed to trunk.

> Support "Pending Resource" metrics in RM's RESTful API
> --
>
> Key: YARN-9464
> URL: https://issues.apache.org/jira/browse/YARN-9464
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9464-001.patch, YARN-9464-002.patch
>
>
> Knowing only the "available", "used" resource is not enough for YARN 
> management tools like auto-scaler. It would be helpful to diagnose the 
> cluster resource utilization if it gets "Pending Resource" from RM RESTful 
> APIs. In a certain extent, it represents how starving the applications are.
> Initially, we can add "pending resource" information in below two RM REST 
> APIs:
> {code:java}
> RMnode:port/ws/v1/cluster/metrics
> RMnode:port/ws/v1/cluster/nodes
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9373) HBaseTimelineSchemaCreator has to allow user to configure pre-splits

2019-08-12 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905783#comment-16905783
 ] 

Abhishek Modi commented on YARN-9373:
-

Thanks [~Prabhu Joseph] for the patch. I will review it as soon as I get some 
free cycles. Thanks.

> HBaseTimelineSchemaCreator has to allow user to configure pre-splits
> 
>
> Key: YARN-9373
> URL: https://issues.apache.org/jira/browse/YARN-9373
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: Configurable_PreSplits.png, YARN-9373-001.patch, 
> YARN-9373-002.patch, YARN-9373-003.patch
>
>
> Most of the TimelineService HBase tables is set with username splits which is 
> based on lowercase alphabet (a,ad,an,b,ca). This won't help if the rowkey 
> starts with either number or uppercase alphabet. We need to allow user to 
> configure based upon their data. For example, say a user has configured the 
> yarn.resourcemanager.cluster-id to be ATS or 123, then the splits can be 
> configured as A,B,C,,, or 100,200,300,,,



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9744) RollingLevelDBTimelineStore.getEntityByTime fails with NPE

2019-08-13 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906205#comment-16906205
 ] 

Abhishek Modi commented on YARN-9744:
-

Thanks [~Prabhu Joseph] for the patch. LGTM. Committed to trunk.

> RollingLevelDBTimelineStore.getEntityByTime fails with NPE
> --
>
> Key: YARN-9744
> URL: https://issues.apache.org/jira/browse/YARN-9744
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9744-001.patch
>
>
> RollingLevelDBTimelineStore.getEntityByTime fails with NPE.
> {code}
> 2019-08-07 12:58:55,990 WARN  ipc.Server (Server.java:logException(2433)) - 
> IPC Server handler 0 on 10200, call 
> org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB.getContainers from 
> 10.21.216.93:36392 Call#29446915 Retry#0
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntityByTime(RollingLevelDBTimelineStore.java:786)
> at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntities(RollingLevelDBTimelineStore.java:614)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1045)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168)
> at 
> org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:222)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:213)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172)
> at 
> org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
> {code}
> This affects Rest Api to get entities.
> curl http://pjosephdocker:8188/ws/v1/timeline/TEZ_APPLICATION 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9752) Add support for allocation id in SLS.

2019-08-15 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9752:
---

 Summary: Add support for allocation id in SLS.
 Key: YARN-9752
 URL: https://issues.apache.org/jira/browse/YARN-9752
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-16 Thread Abhishek Modi (JIRA)
Abhishek Modi created YARN-9754:
---

 Summary: Add support for arbitrary DAG AM Simulator.
 Key: YARN-9754
 URL: https://issues.apache.org/jira/browse/YARN-9754
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Currently, all map containers are requests as soon as Application master comes 
up and then all reducer containers are requested. This doesn't get flexibility 
to simulate behavior of DAG where various number of containers would be 
requested at different time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-18 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9754:

Attachment: YARN-9754.001.patch

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9752) Add support for allocation id in SLS.

2019-08-18 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9752:

Attachment: YARN-9752.002.patch

> Add support for allocation id in SLS.
> -
>
> Key: YARN-9752
> URL: https://issues.apache.org/jira/browse/YARN-9752
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9752.001.patch, YARN-9752.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9752) Add support for allocation id in SLS.

2019-08-18 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910008#comment-16910008
 ] 

Abhishek Modi commented on YARN-9752:
-

[~elgoiri] could you please review it. Thanks.

> Add support for allocation id in SLS.
> -
>
> Key: YARN-9752
> URL: https://issues.apache.org/jira/browse/YARN-9752
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9752.001.patch, YARN-9752.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-18 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9754:

Attachment: YARN-9754.002.patch

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9752) Add support for allocation id in SLS.

2019-08-19 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9752:

Attachment: YARN-9752.003.patch

> Add support for allocation id in SLS.
> -
>
> Key: YARN-9752
> URL: https://issues.apache.org/jira/browse/YARN-9752
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9752.001.patch, YARN-9752.002.patch, 
> YARN-9752.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9752) Add support for allocation id in SLS.

2019-08-19 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911017#comment-16911017
 ] 

Abhishek Modi commented on YARN-9752:
-

Thanks [~elgoiri] for review. Attached v3 patch with the fixes.

 

Also tested this with 1000 jobs and 4500 nodes sls.json and it's working fine 
end to end.

> Add support for allocation id in SLS.
> -
>
> Key: YARN-9752
> URL: https://issues.apache.org/jira/browse/YARN-9752
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9752.001.patch, YARN-9752.002.patch, 
> YARN-9752.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9765) SLS runner crashes when run with metrics turned off.

2019-08-20 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9765:
---

 Summary: SLS runner crashes when run with metrics turned off.
 Key: YARN-9765
 URL: https://issues.apache.org/jira/browse/YARN-9765
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


When sls metrics is turned off, creation of AM fails with NPE.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9765) SLS runner crashes when run with metrics turned off.

2019-08-21 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912112#comment-16912112
 ] 

Abhishek Modi commented on YARN-9765:
-

Thanks [~bibinchundatt] for review and committing it.

> SLS runner crashes when run with metrics turned off.
> 
>
> Key: YARN-9765
> URL: https://issues.apache.org/jira/browse/YARN-9765
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9765.001.patch
>
>
> When sls metrics is turned off, creation of AM fails with NPE.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9752) Add support for allocation id in SLS.

2019-08-21 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912392#comment-16912392
 ] 

Abhishek Modi commented on YARN-9752:
-

Thanks [~elgoiri] for review. Committed to trunk.

> Add support for allocation id in SLS.
> -
>
> Key: YARN-9752
> URL: https://issues.apache.org/jira/browse/YARN-9752
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9752.001.patch, YARN-9752.002.patch, 
> YARN-9752.003.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9782) Avoid DNS resolution while running SLS.

2019-08-25 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9782:
---

 Summary: Avoid DNS resolution while running SLS.
 Key: YARN-9782
 URL: https://issues.apache.org/jira/browse/YARN-9782
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


In SLS, we add nodes with random names and rack. DNS resolution of these nodes 
takes around 2 seconds because it will timeout after that. This makes the 
result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-25 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9754:

Attachment: YARN-9754.003.patch

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-27 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9754:

Attachment: YARN-9754.004.patch

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch, YARN-9754.004.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-27 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916721#comment-16916721
 ] 

Abhishek Modi commented on YARN-9754:
-

Thanks [~elgoiri] for review. I have addressed all the review comments in the 
latest patch.

Since changing the AMSimulator will also impact MRAMSimulator and 
StreamAMSimulator, I will create a separate Jira for that.

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch, YARN-9754.004.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-28 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9754:

Attachment: YARN-9754.005.patch

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch, YARN-9754.004.patch, YARN-9754.005.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-28 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9754:

Attachment: YARN-9754.006.patch

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch, YARN-9754.004.patch, YARN-9754.005.patch, 
> YARN-9754.006.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9754) Add support for arbitrary DAG AM Simulator.

2019-08-28 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918317#comment-16918317
 ] 

Abhishek Modi commented on YARN-9754:
-

Thanks [~elgoiri] for review. Committed it to trunk.

> Add support for arbitrary DAG AM Simulator.
> ---
>
> Key: YARN-9754
> URL: https://issues.apache.org/jira/browse/YARN-9754
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9754.001.patch, YARN-9754.002.patch, 
> YARN-9754.003.patch, YARN-9754.004.patch, YARN-9754.005.patch, 
> YARN-9754.006.patch
>
>
> Currently, all map containers are requests as soon as Application master 
> comes up and then all reducer containers are requested. This doesn't get 
> flexibility to simulate behavior of DAG where various number of containers 
> would be requested at different time.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5281) Explore supporting a simpler back-end implementation for ATS v2

2019-08-29 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919222#comment-16919222
 ] 

Abhishek Modi commented on YARN-5281:
-

We are already supporting Hbase and CosmosDB as backend. We also support 
local/hdfs filesystem with limited capabilities. Based on the discussion, I 
don't think it would be possible to support simpler backend with all 
functionalities without re-implementing some part of the features provided by 
Hbase/CosmosDB.

For single node setup, ATSv2 can still be used with limited functionalities 
using local filesystem as backend.

If no one is actively working on this, I would close this as part of Jira 
cleanup for ATSv2.

cc [~vrushalic]/[~rohithsharma]

> Explore supporting a simpler back-end implementation for ATS v2
> ---
>
> Key: YARN-5281
> URL: https://issues.apache.org/jira/browse/YARN-5281
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Joep Rottinghuis
>Priority: Major
>  Labels: YARN-5355
>
> During the merge discussion [~kasha] raised the question whether we would 
> support simpler backend for users to try out, in addition to the HBase 
> implementation.
> The understanding is that this would not be meant to scale, but it could 
> simplify initial adoption and early usage.
> I'm filing this jira to gather the merits and challenges of such approach in 
> one place.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9794) RM crashes due to runtime errors in TimelineServiceV2Publisher

2019-08-29 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919237#comment-16919237
 ] 

Abhishek Modi commented on YARN-9794:
-

Thanks [~tarunparimi] for filing it and working on it. Thanks [~Prabhu Joseph] 
for review.

[~tarunparimi] some more comments in addition to Prabhu's comment:
 # we should handle IOException and Exception separately.

> RM crashes due to runtime errors in TimelineServiceV2Publisher
> --
>
> Key: YARN-9794
> URL: https://issues.apache.org/jira/browse/YARN-9794
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
>  Labels: atsv2
> Attachments: YARN-9794.001.patch
>
>
> Saw that RM crashes while startup due to errors while putting entity in 
> TimelineServiceV2Publisher.
> {code:java}
> 2019-08-28 09:35:45,273 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.RuntimeException: java.lang.IllegalArgumentException: 
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  CodedInputStream encountered an embedded string or message which claimed to 
> have negative size
> .
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:200)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236)
> at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:321)
> at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:285)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.flush(TypedBufferedMutator.java:66)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.flush(HBaseTimelineWriterImpl.java:566)
> at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.flushBufferedTimelineEntities(TimelineCollector.java:173)
> at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:150)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:459)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:494)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:483)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException: 
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  CodedInputStream encountered an embedded string or message which claimed to 
> have negative size.
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:117)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9540) TestRMAppTransitions fails intermittently

2019-08-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919465#comment-16919465
 ] 

Abhishek Modi commented on YARN-9540:
-

Thanks [~Tao Yang] for the patch and [~adam.antal] for review. LGTM, will 
commit shortly.

> TestRMAppTransitions fails intermittently
> -
>
> Key: YARN-9540
> URL: https://issues.apache.org/jira/browse/YARN-9540
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-9540.001.patch
>
>
> Failed
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppFinishedFinished[0]
> {code}
> Error Message
> expected:<1> but was:<0>
> Stacktrace
> java.lang.AssertionError: expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppCompletedEvent(TestRMAppTransitions.java:1307)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppAfterFinishEvent(TestRMAppTransitions.java:1302)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testCreateAppFinished(TestRMAppTransitions.java:648)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppFinishedFinished(TestRMAppTransitions.java:1083)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9798) ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails intermittently

2019-08-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919472#comment-16919472
 ] 

Abhishek Modi commented on YARN-9798:
-

Thanks [~Tao Yang] for the patch. Will wait for jenkins results before 
committing it.

[~Tao Yang] could you please update about the frequency of failures before and 
after the fix. Thanks.

> ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails 
> intermittently
> -
>
> Key: YARN-9798
> URL: https://issues.apache.org/jira/browse/YARN-9798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-9798.001.patch
>
>
> Found intermittent failure of 
> ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster in 
> YARN-9714 jenkins report, the cause is that the assertion which will make 
> sure dispatcher has handled UNREGISTERED event but not wait until all events 
> in dispatcher are handled, we need to add {{rm.drainEvents()}} before that 
> assertion to fix this issue.
> Failure info:
> {noformat}
> [ERROR] 
> testRepeatedFinishApplicationMaster(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceCapacity)
>   Time elapsed: 0.559 s  <<< FAILURE!
> java.lang.AssertionError: Expecting only one event expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServiceTestBase.testRepeatedFinishApplicationMaster(ApplicationMasterServiceTestBase.java:385)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Standard output:
> {noformat}
> 2019-08-29 06:59:54,458 ERROR [AsyncDispatcher event handler] 
> resourcemanager.ResourceManager (ResourceManager.java:handle(1088)) - Error 
> in handling event type REGISTERED for applicationAttempt 
> appattempt_1567061994047_0001_01
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:276)
>   at 
> org.apache.hadoop.yarn.event.DrainDispatcher$2.handle(DrainDispatcher.java:91)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1679)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1658)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:914)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1086)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1067)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
>

[jira] [Commented] (YARN-9798) ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails intermittently

2019-08-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920016#comment-16920016
 ] 

Abhishek Modi commented on YARN-9798:
-

Thanks [~Tao Yang] for providing details. Committed it to trunk.

> ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails 
> intermittently
> -
>
> Key: YARN-9798
> URL: https://issues.apache.org/jira/browse/YARN-9798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Minor
> Attachments: YARN-9798.001.patch
>
>
> Found intermittent failure of 
> ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster in 
> YARN-9714 jenkins report, the cause is that the assertion which will make 
> sure dispatcher has handled UNREGISTERED event but not wait until all events 
> in dispatcher are handled, we need to add {{rm.drainEvents()}} before that 
> assertion to fix this issue.
> Failure info:
> {noformat}
> [ERROR] 
> testRepeatedFinishApplicationMaster(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceCapacity)
>   Time elapsed: 0.559 s  <<< FAILURE!
> java.lang.AssertionError: Expecting only one event expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServiceTestBase.testRepeatedFinishApplicationMaster(ApplicationMasterServiceTestBase.java:385)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Standard output:
> {noformat}
> 2019-08-29 06:59:54,458 ERROR [AsyncDispatcher event handler] 
> resourcemanager.ResourceManager (ResourceManager.java:handle(1088)) - Error 
> in handling event type REGISTERED for applicationAttempt 
> appattempt_1567061994047_0001_01
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:276)
>   at 
> org.apache.hadoop.yarn.event.DrainDispatcher$2.handle(DrainDispatcher.java:91)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1679)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1658)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:914)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1086)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1067)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServiceTestBase$CountingDispatcher.dispatch(Applic

[jira] [Commented] (YARN-9800) TestRMDelegationTokens can fail in testRemoveExpiredMasterKeyInRMStateStore

2019-08-30 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920019#comment-16920019
 ] 

Abhishek Modi commented on YARN-9800:
-

Thanks [~adam.antal] for the patch and [~kmarton] for review. Fix looks good to 
me. Will commit shortly.

> TestRMDelegationTokens can fail in testRemoveExpiredMasterKeyInRMStateStore
> ---
>
> Key: YARN-9800
> URL: https://issues.apache.org/jira/browse/YARN-9800
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9800.001.patch
>
>
> The test fails intermittently with the following stack trace:
> {noformat}
> java.lang.AssertionError: 
> expected:<[org.apache.hadoop.security.token.delegation.DelegationKey@c09cd1a8,
>  org.apache.hadoop.security.token.delegation.DelegationKey@ca089752]> but 
> was:<[org.apache.hadoop.security.token.delegation.DelegationKey@c09cd1a8, 
> org.apache.hadoop.security.token.delegation.DelegationKey@b206142c, 
> org.apache.hadoop.security.token.delegation.DelegationKey@ca089752]>
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRemoveExpiredMasterKeyInRMStateStore(TestRMDelegationTokens.java:161)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9790) Failed to set default-application-lifetime if maximum-application-lifetime is less than or equal to zero

2019-08-31 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920059#comment-16920059
 ] 

Abhishek Modi commented on YARN-9790:
-

Thanks [~kyungwan nam] for working on it. Thanks [~Prabhu Joseph] for the 
review. 004 patch lgtm. Will commit shortly.

> Failed to set default-application-lifetime if maximum-application-lifetime is 
> less than or equal to zero
> 
>
> Key: YARN-9790
> URL: https://issues.apache.org/jira/browse/YARN-9790
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-9790.001.patch, YARN-9790.002.patch, 
> YARN-9790.003.patch, YARN-9790.004.patch
>
>
> capacity-scheduler
> {code}
> ...
> yarn.scheduler.capacity.root.dev.maximum-application-lifetime=-1
> yarn.scheduler.capacity.root.dev.default-application-lifetime=604800
> {code}
> refreshQueue was failed as follows
> {code}
> 2019-08-28 15:21:57,423 WARN  resourcemanager.AdminService 
> (AdminService.java:logAndWrapException(910)) - Exception refresh queues.
> java.io.IOException: Failed to re-init queues : Default lifetime604800 can't 
> exceed maximum lifetime -1
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:477)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:423)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:394)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshQueues(ResourceManagerAdministrationProtocolPBServiceImpl.java:114)
> at 
> org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:271)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Default 
> lifetime604800 can't exceed maximum lifetime -1
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:268)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:162)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:141)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:726)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:472)
> ... 12 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8678) Queue Management API - rephrase error messages

2019-08-31 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920276#comment-16920276
 ] 

Abhishek Modi commented on YARN-8678:
-

Thanks [~Prabhu Joseph] for working on it. LGTM. Will commit it.

> Queue Management API - rephrase error messages
> --
>
> Key: YARN-8678
> URL: https://issues.apache.org/jira/browse/YARN-8678
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Akhil PB
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-8678-002.patch, YARN-8768-001.patch
>
>
> 1. When stopping a running parent queue, error message thrown by API was not 
> meaningful.
> For example: When tried to stop root queue, error message thrown was  
> {{Failed to re-init queues : The parent queue:root state is STOPPED, child 
> queue:default state cannot be RUNNING.}}  
> It is evident that root queue update failed, but the message says 
> {{queue:root state is STOPPED}}.
> 2. While tried to delete a running leaf queue, error message thrown by API 
> was not meaningful.
> For example: Error message was {{Failed to re-init queues : root.default.prod 
> is deleted from the new capacity scheduler configuration, but the queue is 
> not yet in stopped state. Current State : RUNNING}}.
> Clearly deletion of queue root.default.prod failed with error, but the 
> message says {{queues : root.default.prod is deleted from the new capacity 
> scheduler configuration}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9791) Queue Mutation API does not allow to remove a config

2019-08-31 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920320#comment-16920320
 ] 

Abhishek Modi commented on YARN-9791:
-

Thanks [~Prabhu Joseph] for the patch.

Some minor comments:
 # Can we replace kv.getValue by keyValue everywhere as.
 # Can we update test to check that other configs are unchanged after applying 
mutation.

> Queue Mutation API does not allow to remove a config
> 
>
> Key: YARN-9791
> URL: https://issues.apache.org/jira/browse/YARN-9791
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9791-001.patch
>
>
> Queue Mutation API does not allow to remove a config. When removing a node 
> label from a queue and it's capacity config
> {code}
>  
> 
>   root.batch
>   
> 
>   accessible-node-labels
>   
> 
> 
>   accessible-node-labels.x.capacity
>   
> 
>   
> 
>   
> {code}
> It fails with below.
> {code}
> Caused by: java.lang.NumberFormatException: empty String
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
>   at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
>   at java.lang.Float.parseFloat(Float.java:451)
>   at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1632)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:682)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:697)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:136)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:185)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:172)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:157)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:139)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:785)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:497)
>   ... 72 more
> {code}  
> We can fix this by providing a separate XmlElement to remove config.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9793) Remove duplicate sentence from TimelineServiceV2.md

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920323#comment-16920323
 ] 

Abhishek Modi commented on YARN-9793:
-

Thanks [~kmarton] for the patch and [~adam.antal] for review. LGTM. Will commit 
shortly.

> Remove duplicate sentence from TimelineServiceV2.md
> ---
>
> Key: YARN-9793
> URL: https://issues.apache.org/jira/browse/YARN-9793
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs
>Reporter: Julia Kinga Marton
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: YARN-9793.001.patch
>
>
> In the documentation of the ATSv2, TimelineEntity objects description part 
> there is a duplication: 
>  * configs: A map from a string (config name) to a string (config value) 
> representing all configs associated with the entity. Users can post the whole 
> config or a part of it in the configs field. *Supported for application and 
> generic entities. Supported for application and generic entities.*
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9793) Remove duplicate sentence from TimelineServiceV2.md

2019-09-01 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9793:

Component/s: ATSv2

> Remove duplicate sentence from TimelineServiceV2.md
> ---
>
> Key: YARN-9793
> URL: https://issues.apache.org/jira/browse/YARN-9793
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, docs
>Reporter: Julia Kinga Marton
>Assignee: Julia Kinga Marton
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9793.001.patch
>
>
> In the documentation of the ATSv2, TimelineEntity objects description part 
> there is a duplication: 
>  * configs: A map from a string (config name) to a string (config value) 
> representing all configs associated with the entity. Users can post the whole 
> config or a part of it in the configs field. *Supported for application and 
> generic entities. Supported for application and generic entities.*
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9106) Add option to graceful decommission to not wait for applications

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920328#comment-16920328
 ] 

Abhishek Modi commented on YARN-9106:
-

Gracefully decommissioning nodes without waiting for applications to finish 
also makes sense when shuffle data is offloaded to persistent storage or 
shuffle service is running completely outside nodes.

While running Yarn on cloud, it is very common to offload shuffle data to 
persistent volumes and remove nodes. cc [~elgoiri]

> Add option to graceful decommission to not wait for applications
> 
>
> Key: YARN-9106
> URL: https://issues.apache.org/jira/browse/YARN-9106
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Mikayla Konst
>Assignee: Mikayla Konst
>Priority: Major
> Attachments: YARN-9106.patch
>
>
> Add property 
> yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-applications.
> If true (the default), the resource manager waits for all containers, as well 
> as all applications associated with those containers, to finish before 
> gracefully decommissioning a node.
> If false, the resource manager only waits for containers, but not 
> applications, to finish. For map-only jobs or other jobs in which mappers do 
> not need to serve shuffle data, this allows nodes to be decommissioned as 
> soon as their containers are finished as opposed to when the job is done.
> Add property 
> yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-app-masters.
> If false, during graceful decommission, when the resource manager waits for 
> all containers on a node to finish, it will not wait for app master 
> containers to finish. Defaults to true. This property should only be set to 
> false if app master failure is recoverable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7982) Do ACLs check while retrieving entity-types per application

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920337#comment-16920337
 ] 

Abhishek Modi commented on YARN-7982:
-

Thanks [~Prabhu Joseph] for clarification.

Some minor comments:
 # In FileSystemTimelineReaderImpl, we should set userId only if it is null.
 # Would it be possible to write unit test covering case where user id is not 
specified.

 

> Do ACLs check while retrieving entity-types per application
> ---
>
> Key: YARN-7982
> URL: https://issues.apache.org/jira/browse/YARN-7982
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-7982-001.patch, YARN-7982-002.patch, 
> YARN-7982-003.patch
>
>
> REST end point {{/apps/$appid/entity-types}} retrieves all the entity-types 
> for given application. This need to be guarded with ACL check
> {code}
> [yarn@yarn-ats-3 ~]$ curl 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002?user.name=ambari-qa1";
> {"exception":"ForbiddenException","message":"java.lang.Exception: User 
> ambari-qa1 is not allowed to read TimelineService V2 
> data.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"}
> [yarn@yarn-ats-3 ~]$ curl 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002/entity-types?user.name=ambari-qa1";
> ["YARN_APPLICATION_ATTEMPT","YARN_CONTAINER"]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7982) Do ACLs check while retrieving entity-types per application

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920337#comment-16920337
 ] 

Abhishek Modi edited comment on YARN-7982 at 9/1/19 8:07 AM:
-

Thanks [~Prabhu Joseph] for clarification.

Some minor comments:
 # In FileSystemTimelineReaderImpl, we should set userId only if it is passed 
as null.
 # Would it be possible to write unit test covering case where user id is not 
specified.

 


was (Author: abmodi):
Thanks [~Prabhu Joseph] for clarification.

Some minor comments:
 # In FileSystemTimelineReaderImpl, we should set userId only if it is null.
 # Would it be possible to write unit test covering case where user id is not 
specified.

 

> Do ACLs check while retrieving entity-types per application
> ---
>
> Key: YARN-7982
> URL: https://issues.apache.org/jira/browse/YARN-7982
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-7982-001.patch, YARN-7982-002.patch, 
> YARN-7982-003.patch
>
>
> REST end point {{/apps/$appid/entity-types}} retrieves all the entity-types 
> for given application. This need to be guarded with ACL check
> {code}
> [yarn@yarn-ats-3 ~]$ curl 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002?user.name=ambari-qa1";
> {"exception":"ForbiddenException","message":"java.lang.Exception: User 
> ambari-qa1 is not allowed to read TimelineService V2 
> data.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"}
> [yarn@yarn-ats-3 ~]$ curl 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002/entity-types?user.name=ambari-qa1";
> ["YARN_APPLICATION_ATTEMPT","YARN_CONTAINER"]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9791) Queue Mutation API does not allow to remove a config

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920462#comment-16920462
 ] 

Abhishek Modi commented on YARN-9791:
-

Thanks [~Prabhu Joseph]. Latest patch looks good to me. Committed to trunk.

> Queue Mutation API does not allow to remove a config
> 
>
> Key: YARN-9791
> URL: https://issues.apache.org/jira/browse/YARN-9791
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9791-001.patch, YARN-9791-002.patch
>
>
> Queue Mutation API does not allow to remove a config. When removing a node 
> label from a queue and it's capacity config
> {code}
>  
> 
>   root.batch
>   
> 
>   accessible-node-labels
>   
> 
> 
>   accessible-node-labels.x.capacity
>   
> 
>   
> 
>   
> {code}
> It fails with below.
> {code}
> Caused by: java.lang.NumberFormatException: empty String
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
>   at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
>   at java.lang.Float.parseFloat(Float.java:451)
>   at 
> org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1632)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:682)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:697)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:136)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:111)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:185)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:172)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:157)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:139)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:785)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:497)
>   ... 72 more
> {code}  
> We can fix this by providing a separate XmlElement to remove config.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7982) Do ACLs check while retrieving entity-types per application

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920610#comment-16920610
 ] 

Abhishek Modi commented on YARN-7982:
-

Thanks [~Prabhu Joseph]. v4 patch looks good to me. Will commit it shortly.

> Do ACLs check while retrieving entity-types per application
> ---
>
> Key: YARN-7982
> URL: https://issues.apache.org/jira/browse/YARN-7982
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-7982-001.patch, YARN-7982-002.patch, 
> YARN-7982-003.patch, YARN-7982-004.patch
>
>
> REST end point {{/apps/$appid/entity-types}} retrieves all the entity-types 
> for given application. This need to be guarded with ACL check
> {code}
> [yarn@yarn-ats-3 ~]$ curl 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002?user.name=ambari-qa1";
> {"exception":"ForbiddenException","message":"java.lang.Exception: User 
> ambari-qa1 is not allowed to read TimelineService V2 
> data.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"}
> [yarn@yarn-ats-3 ~]$ curl 
> "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002/entity-types?user.name=ambari-qa1";
> ["YARN_APPLICATION_ATTEMPT","YARN_CONTAINER"]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8174) Add containerId to ResourceLocalizationService fetch failure log statement

2019-09-01 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8174:

Summary: Add containerId to ResourceLocalizationService fetch failure log 
statement  (was: Add containerId to ResourceLocalizationService fetch failure)

> Add containerId to ResourceLocalizationService fetch failure log statement
> --
>
> Key: YARN-8174
> URL: https://issues.apache.org/jira/browse/YARN-8174
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-8174.1.patch, YARN-8174.2.patch, YARN-8174.3.patch
>
>
> When a localization for a resource failed due to change in timestamp, there 
> is no containerId logged to correlate.
> {code}
> 2018-04-18 07:31:46,033 WARN  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:processHeartbeat(1017)) - { 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo,
>  1524036694502, FILE, null } failed: Resource 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo
>  changed on src filesystem (expected 1524036694502, was 1524036694502
> java.io.IOException: Resource 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo
>  changed on src filesystem (expected 1524036694502, was 1524036694502
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:258)
> at 
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:360)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:360)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8174) Add containerId to ResourceLocalizationService fetch failure log statement

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920619#comment-16920619
 ] 

Abhishek Modi commented on YARN-8174:
-

v3 patch lgtm. Committed to trunk.

> Add containerId to ResourceLocalizationService fetch failure log statement
> --
>
> Key: YARN-8174
> URL: https://issues.apache.org/jira/browse/YARN-8174
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-8174.1.patch, YARN-8174.2.patch, YARN-8174.3.patch
>
>
> When a localization for a resource failed due to change in timestamp, there 
> is no containerId logged to correlate.
> {code}
> 2018-04-18 07:31:46,033 WARN  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:processHeartbeat(1017)) - { 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo,
>  1524036694502, FILE, null } failed: Resource 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo
>  changed on src filesystem (expected 1524036694502, was 1524036694502
> java.io.IOException: Resource 
> hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo
>  changed on src filesystem (expected 1524036694502, was 1524036694502
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:258)
> at 
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:360)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:360)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9400) Remove unnecessary if at EntityGroupFSTimelineStore#parseApplicationId

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920624#comment-16920624
 ] 

Abhishek Modi commented on YARN-9400:
-

Thanks [~Prabhu Joseph]. lgtm. will commit to trunk.

> Remove unnecessary if at EntityGroupFSTimelineStore#parseApplicationId
> --
>
> Key: YARN-9400
> URL: https://issues.apache.org/jira/browse/YARN-9400
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9400-001.patch
>
>
> If clause to validate whether appIdStr starts with "application" is not 
> required at EntityGroupFSTimelineStore#parseApplicationId
> {code}
>  // converts the String to an ApplicationId or null if conversion failed
>   private static ApplicationId parseApplicationId(String appIdStr) {
> ApplicationId appId = null;
> if (appIdStr.startsWith(ApplicationId.appIdStrPrefix)) {
>   try {
> appId = ApplicationId.fromString(appIdStr);
>   } catch (IllegalArgumentException e) {
> appId = null;
>   }
> }
> return appId;
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports

2019-09-01 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920634#comment-16920634
 ] 

Abhishek Modi commented on YARN-9804:
-

Thanks [~rohithsharma] for working on it. Some minor comments:



Road map include -> Road map includes

Simple authorization in terms of a configurable whitelist of users and groups 
who can read timeline data -> Support for simple authorization has been added 
in terms of a configurable whitelist of users and groups who can read timeline 
data.

YARN Client integrates with ATSv2. -> YARN Client has been integrated with 
ATSv2.

This enables fetching application/attempt/container
report from TimelineReader if details not present in ResouceManager. -> This 
enables fetching application/attempt/container
report from TimelineReader if details are not present in ResouceManager.

It set true -> If set true

 

Since Yarn Cli support has been added, should we remove this line: Currently 
there is no support for command line access.

> Update ATSv2 document for latest feature supports
> -
>
> Key: YARN-9804
> URL: https://issues.apache.org/jira/browse/YARN-9804
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-9804.01.patch
>
>
> Revisit ATSv2 documents and update for GA features. And also for the road map.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8139) Skip node hostname resolution when running SLS.

2019-09-02 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi resolved YARN-8139.
-
Resolution: Duplicate

> Skip node hostname resolution when running SLS.
> ---
>
> Key: YARN-8139
> URL: https://issues.apache.org/jira/browse/YARN-8139
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
>
> Currently depending on the time taken in resolution of hostname, metrics of 
> SLS gets skewed. To avoid this, in this fix we are introducing a flag which 
> can be used to disable hostname resolutions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports

2019-09-02 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921170#comment-16921170
 ] 

Abhishek Modi commented on YARN-9804:
-

Thanks [~rohithsharma]. New patch looks good to me. +1 from my end.

> Update ATSv2 document for latest feature supports
> -
>
> Key: YARN-9804
> URL: https://issues.apache.org/jira/browse/YARN-9804
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Blocker
> Attachments: YARN-9804.01.patch, YARN-9804.02.patch
>
>
> Revisit ATSv2 documents and update for GA features. And also for the road map.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-04 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-9812:
---

Assignee: Abhishek Modi

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-09-06 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9697:

Attachment: YARN-9697.wip1.patch

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-06 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9782:

Attachment: YARN-9782.002.patch

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924326#comment-16924326
 ] 

Abhishek Modi commented on YARN-9697:
-

[~elgoiri] could you please review the approach taken in poc patch. If it looks 
good to you, I can clean it up and add some more UTs. Thanks.

> Efficient allocation of Opportunistic containers.
> -
>
> Key: YARN-9697
> URL: https://issues.apache.org/jira/browse/YARN-9697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, 
> YARN-9697.wip1.patch
>
>
> In the current implementation, opportunistic containers are allocated based 
> on the number of queued opportunistic container information received in node 
> heartbeat. This information becomes stale as soon as more opportunistic 
> containers are allocated on that node.
> Allocation of opportunistic containers happens on the same heartbeat in which 
> AM asks for the containers. When multiple applications request for 
> Opportunistic containers, containers might get allocated on the same set of 
> nodes as already allocated containers on the node are not considered while 
> serving requests from different applications. This can lead to uneven 
> allocation of Opportunistic containers across the cluster leading to 
> increased queuing time 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924327#comment-16924327
 ] 

Abhishek Modi commented on YARN-9782:
-

Thanks [~elgoiri] for review. Updated patch.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924361#comment-16924361
 ] 

Abhishek Modi commented on YARN-9812:
-

[~aajisaka] [~elgoiri] could you please review it. Thanks.

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
> Attachments: YARN-9812.001.patch
>
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924455#comment-16924455
 ] 

Abhishek Modi commented on YARN-9782:
-

[~elgoiri] I found a potential issue with this unit test. Since we are setting 
up java security settings, it would be set for all the following unit tests as 
all of them run within same java process.

One way to avoid that is to run Unit tests for SLS project in separate java 
process, but that will increase runtime for the tests.

Second option is to skip unit test for this. Since it's a very small change 
behind config, would it be possible to skip unit test for this?

[~elgoiri] [~subru] could you please provide some suggestions here. Thanks.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-06 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9812:

Attachment: YARN-9812.002.patch

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
> Attachments: YARN-9812.001.patch, YARN-9812.002.patch
>
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7604) Fix some minor typos in the opportunistic container logging

2019-09-06 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924747#comment-16924747
 ] 

Abhishek Modi commented on YARN-7604:
-

Thanks [~cheersyang] for the patch. Could you please move this log lines to use 
new log4j format. Thanks.

> Fix some minor typos in the opportunistic container logging
> ---
>
> Key: YARN-7604
> URL: https://issues.apache.org/jira/browse/YARN-7604
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Trivial
> Attachments: YARN-7604.01.patch
>
>
> Fix some minor text issues. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls

2019-09-07 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924757#comment-16924757
 ] 

Abhishek Modi commented on YARN-9812:
-

Thanks [~elgoiri] for review. Committed to trunk.

> mvn javadoc:javadoc fails in hadoop-sls
> ---
>
> Key: YARN-9812
> URL: https://issues.apache.org/jira/browse/YARN-9812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Abhishek Modi
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0
>
> Attachments: YARN-9812.001.patch, YARN-9812.002.patch
>
>
> {noformat}
> [ERROR] 
> hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57:
>  error: bad use of '>'
> [ERROR]  * pending -> requests which are NOT yet sent to RM.
> [ERROR] ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58:
>  error: bad use of '>'
> [ERROR]  * scheduled -> requests which are sent to RM but not yet assigned.
> [ERROR]   ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59:
>  error: bad use of '>'
> [ERROR]  * assigned -> requests which are assigned to a container.
> [ERROR]  ^
> [ERROR] 
> hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60:
>  error: bad use of '>'
> [ERROR]  * completed -> request corresponding to which container has 
> completed.
> [ERROR]   ^
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-07 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9819:
---

 Summary: Make TestOpportunisticContainerAllocatorAMService more 
resilient.
 Key: YARN-9819
 URL: https://issues.apache.org/jira/browse/YARN-9819
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Abhishek Modi
Assignee: Abhishek Modi


Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
Opportunistic container status directly in RMNode but that can be updated by NM 
heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-07 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9819:

Attachment: YARN-9819.001.patch

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9784) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue is flaky

2019-09-07 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924852#comment-16924852
 ] 

Abhishek Modi commented on YARN-9784:
-

Thanks [~kmarton] for the patch. LGTM.

Thanks [~sunilg] and [~adam.antal] for additional reviews. Committed to trunk.

> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
>  is flaky
> ---
>
> Key: YARN-9784
> URL: https://issues.apache.org/jira/browse/YARN-9784
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.3.0
>Reporter: Julia Kinga Marton
>Assignee: Julia Kinga Marton
>Priority: Major
> Attachments: YARN-9784.001.patch
>
>
> There are some test cases in TestLeafQueue which are failing intermittently.
> From 100 runs, there were 16 failures. 
> Some failure examples are the following ones:
> {code:java}
> 2019-08-26 13:18:13 [ERROR] Errors: 
> 2019-08-26 13:18:13 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
> WrongTypeOfReturnValue 
> 2019-08-26 13:18:13 YarnConfigu...
> 2019-08-26 13:18:13 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
> WrongTypeOfReturnValue 
> 2019-08-26 13:18:13 YarnConfigu...
> 2019-08-26 13:18:13 [INFO] 
> 2019-08-26 13:18:13 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0
> {code}
> {code:java}
> 2019-08-26 13:18:09 [ERROR] Failures: 
> 2019-08-26 13:18:09 [ERROR]   TestLeafQueue.testHeadroomWithMaxCap:1373 
> expected:<2048> but was:<0>
> 2019-08-26 13:18:09 [INFO] 
> 2019-08-26 13:18:09 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0
> {code}
> {code:java}
> 2019-08-26 13:18:18 [ERROR] Errors: 
> 2019-08-26 13:18:18 [ERROR]   TestLeafQueue.setUp:144->setUpInternal:221 
> WrongTypeOfReturnValue 
> 2019-08-26 13:18:18 YarnConfigu...
> 2019-08-26 13:18:18 [ERROR]   TestLeafQueue.testHeadroomWithMaxCap:1307 ? 
> ClassCast org.apache.hadoop.yarn.c...
> 2019-08-26 13:18:18 [INFO] 
> 2019-08-26 13:18:18 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0
> {code}
> {code:java}
> 2019-08-26 13:18:10 [ERROR] Failures: 
> 2019-08-26 13:18:10 [ERROR]   TestLeafQueue.testDRFUserLimits:847 Verify 
> user_0 got resources 
> 2019-08-26 13:18:10 [INFO] 
> 2019-08-26 13:18:10 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-07 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9819:

Attachment: YARN-9819.002.patch

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-07 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925081#comment-16925081
 ] 

Abhishek Modi commented on YARN-9819:
-

[~elgoiri] could you please review it. Unit test failure is not related to 
patch.

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-08 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925331#comment-16925331
 ] 

Abhishek Modi commented on YARN-9821:
-

Thanks [~Prabhu Joseph] for the patch. Some minor comments:
 # Can we rename isHbaseUp => isStorageUp to make it more generic.
 # Can we log the exception too.

Apart from these minor comments, it looks good to me.

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> [Lorg.apache.hadoop.hbase.client.Resul

[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925481#comment-16925481
 ] 

Abhishek Modi commented on YARN-9821:
-

Thanks [~Prabhu Joseph] for the patch and [~rohithsharma] for additional 
review. I have committed it to trunk.

[~rohithsharma] should we commit it to 3.2 and 3.1 branch also?

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch, YARN-9821-002.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletio

[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925484#comment-16925484
 ] 

Abhishek Modi commented on YARN-9816:
-

Thanks [~Prabhu Joseph]. changes looks good to me. will commit shortly.

> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
> ---
>
> Key: YARN-9816
> URL: https://issues.apache.org/jira/browse/YARN-9816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9816-001.patch
>
>
> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError.  
> This happens when a file is present under /ats/active.
> {code}
> [hdfs@node2 yarn]$ hadoop fs -ls /ats/active
> Found 1 items
> -rw-r--r--   3 hdfs hadoop  0 2019-09-06 16:34 
> /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0
> {code}
> Error Message:
> {code:java}
> java.lang.StackOverflowError
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
> at com.sun.proxy.$Proxy15.getListing(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
>  {code}
> One of our user has tried to distcp hdfs://ats/active dir. Distcp job has 
> created the 
> temp file .distcp.

[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925490#comment-16925490
 ] 

Abhishek Modi commented on YARN-9821:
-

Sure [~rohithsharma]. I am leaving this Jira as unresolved and you can mark it 
as resolved after you backport it to 3.2 branches. Thanks.

> NM hangs at serviceStop when ATSV2 Backend Hbase is Down 
> -
>
> Key: YARN-9821
> URL: https://issues.apache.org/jira/browse/YARN-9821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9821-001.patch, YARN-9821-002.patch
>
>
> NM hangs at serviceStop when ATSV2 Backend Hbase is Down.
> {code}
> "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting 
> for monitor entry [0x7f5f1f29b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249)
>   - waiting to lock <0x0006c834d148> (a 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05808> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.AbstractService.close(AbstractService.java:247)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05890> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c058f8> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330)
>   - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c059a8> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05a98> (a java.lang.Object)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>   - locked <0x0006c7c05c88> (a java.lang.Object)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552)
>   
>   
> "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 
> nid=0x5fb7 in Object.wait() [0x7f5f23ad7000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:460)
>   at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>   at 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258)
>   - locked <0x000784ee8220> (a 
> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
>   at 

[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-09 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9819:

Attachment: YARN-9819.003.patch

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch, 
> YARN-9819.003.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-09 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925922#comment-16925922
 ] 

Abhishek Modi commented on YARN-9819:
-

Thanks [~elgoiri] for review. 

Attached v3 patch with javadocs for all public functions. 

Private functions introduced in TestOpportunisticContainerAllocatorAMService 
are one liner and quite self explanatory. Please let me know if you think we 
need documentation there too.

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch, 
> YARN-9819.003.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-09 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9782:

Attachment: YARN-9782.003.patch

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.

2019-09-10 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926479#comment-16926479
 ] 

Abhishek Modi commented on YARN-9782:
-

Test failure is not related to this patch and is happening because we are not 
able to delete a directory at end. [~elgoiri] could you please review latest 
patch. Thanks.

> Avoid DNS resolution while running SLS.
> ---
>
> Key: YARN-9782
> URL: https://issues.apache.org/jira/browse/YARN-9782
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9782.001.patch, YARN-9782.002.patch, 
> YARN-9782.003.patch
>
>
> In SLS, we add nodes with random names and rack. DNS resolution of these 
> nodes takes around 2 seconds because it will timeout after that. This makes 
> the result of SLS unreliable and adds spikes. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9827) Fix Http Response code in GenericExceptionHandler.

2019-09-11 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9827:
---

 Summary: Fix Http Response code in GenericExceptionHandler.
 Key: YARN-9827
 URL: https://issues.apache.org/jira/browse/YARN-9827
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Abhishek Modi
Assignee: Abhishek Modi


GenericExceptionHandler should respond with SERVICE_UNAVAILABLE in case of 
connection and service unavailable exception instead of INTERNAL_SERVICE_ERROR.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size

2019-09-11 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927813#comment-16927813
 ] 

Abhishek Modi commented on YARN-8972:
-

[~giovanni.fumarola] are you still working on it. Thanks.

> [Router] Add support to prevent DoS attack over ApplicationSubmissionContext 
> size
> -
>
> Key: YARN-8972
> URL: https://issues.apache.org/jira/browse/YARN-8972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, 
> YARN-8972.v3.patch, YARN-8972.v4.patch, YARN-8972.v5.patch
>
>
> This jira tracks the effort to add a new interceptor in the Router to prevent 
> user to submit applications with oversized ASC.
> This avoid YARN cluster to failover.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.

2019-09-11 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928147#comment-16928147
 ] 

Abhishek Modi commented on YARN-9819:
-

Thanks [~elgoiri] for review. Committed to trunk.

> Make TestOpportunisticContainerAllocatorAMService more resilient.
> -
>
> Key: YARN-9819
> URL: https://issues.apache.org/jira/browse/YARN-9819
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9819.001.patch, YARN-9819.002.patch, 
> YARN-9819.003.patch
>
>
> Currently, TestOpportunisticContainerAllocatorAMService tries to set the 
> Opportunistic container status directly in RMNode but that can be updated by 
> NM heartbeat. Correct way would be to send it through NM heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9828) Add log line for app submission in RouterWebServices.

2019-09-11 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9828:
---

 Summary: Add log line for app submission in RouterWebServices.
 Key: YARN-9828
 URL: https://issues.apache.org/jira/browse/YARN-9828
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError

2019-09-11 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928231#comment-16928231
 ] 

Abhishek Modi commented on YARN-9816:
-

Sure.. committing it shortly. Thanks.

> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
> ---
>
> Key: YARN-9816
> URL: https://issues.apache.org/jira/browse/YARN-9816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9816-001.patch
>
>
> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError.  
> This happens when a file is present under /ats/active.
> {code}
> [hdfs@node2 yarn]$ hadoop fs -ls /ats/active
> Found 1 items
> -rw-r--r--   3 hdfs hadoop  0 2019-09-06 16:34 
> /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0
> {code}
> Error Message:
> {code:java}
> java.lang.StackOverflowError
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
> at com.sun.proxy.$Proxy15.getListing(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
>  {code}
> One of our user has tried to distcp hdfs://ats/active dir. Distcp job has 
> created the 
> temp file .distcp.tmp.attempt_155759136_39768_m_

[jira] [Updated] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are present under /ats/active.

2019-09-11 Thread Abhishek Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9816:

Summary: EntityGroupFSTimelineStore#scanActiveLogs fails when undesired 
files are present under /ats/active.  (was: 
EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError)

> EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are 
> present under /ats/active.
> ---
>
> Key: YARN-9816
> URL: https://issues.apache.org/jira/browse/YARN-9816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9816-001.patch
>
>
> EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError.  
> This happens when a file is present under /ats/active.
> {code}
> [hdfs@node2 yarn]$ hadoop fs -ls /ats/active
> Found 1 items
> -rw-r--r--   3 hdfs hadoop  0 2019-09-06 16:34 
> /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0
> {code}
> Error Message:
> {code:java}
> java.lang.StackOverflowError
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
> at com.sun.proxy.$Proxy15.getListing(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383)
> at 
> org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383

[jira] [Commented] (YARN-9794) RM crashes due to runtime errors in TimelineServiceV2Publisher

2019-09-15 Thread Abhishek Modi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929935#comment-16929935
 ] 

Abhishek Modi commented on YARN-9794:
-

Thanks [~tarunparimi]. Latest patch looks good to me. Thanks [~Prabhu Joseph] 
for additional review. Committed to trunk.

> RM crashes due to runtime errors in TimelineServiceV2Publisher
> --
>
> Key: YARN-9794
> URL: https://issues.apache.org/jira/browse/YARN-9794
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-9794.001.patch, YARN-9794.002.patch
>
>
> Saw that RM crashes while startup due to errors while putting entity in 
> TimelineServiceV2Publisher.
> {code:java}
> 2019-08-28 09:35:45,273 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.RuntimeException: java.lang.IllegalArgumentException: 
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  CodedInputStream encountered an embedded string or message which claimed to 
> have negative size
> .
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:200)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834)
> at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at 
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236)
> at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:321)
> at 
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:285)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.flush(TypedBufferedMutator.java:66)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.flush(HBaseTimelineWriterImpl.java:566)
> at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.flushBufferedTimelineEntities(TimelineCollector.java:173)
> at 
> org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:150)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:459)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:494)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:483)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException: 
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  CodedInputStream encountered an embedded string or message which claimed to 
> have negative size.
> at 
> org.apache.hbase.thirdparty.com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:117)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2

2019-09-19 Thread Abhishek Modi (Jira)
Abhishek Modi created YARN-9842:
---

 Summary: Port YARN-9608 DecommissioningNodesWatcher should get 
lists of running applications on node from RMNode to branch-3.0/branch-2
 Key: YARN-9842
 URL: https://issues.apache.org/jira/browse/YARN-9842
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Abhishek Modi
Assignee: Abhishek Modi






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   >