[jira] [Commented] (YARN-9941) Opportunistic scheduler metrics should be reset during fail-over.
[ https://issues.apache.org/jira/browse/YARN-9941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161710#comment-17161710 ] Abhishek Modi commented on YARN-9941: - Sure [~BilwaST]. Feel free to take over. Thanks > Opportunistic scheduler metrics should be reset during fail-over. > - > > Key: YARN-9941 > URL: https://issues.apache.org/jira/browse/YARN-9941 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9694: Attachment: YARN-9694.002.patch > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1689#comment-1689 ] Abhishek Modi commented on YARN-9694: - [~elgoiri] could you please review it. Thanks. > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895637#comment-16895637 ] Abhishek Modi commented on YARN-9694: - Thanks [~elgoiri] for reviewing it. GenerateNodeTableMapping generates a node to rack mapping file which is then being used by TableMapping to resolve rack names. The format required by TableMapping is a two column text file where first column specifies node name and second column specifies rack name. I am generating this file as part of generateNodeTableMapping. May be I will remove the format name from the file. Will upload an updated patch with changing the format and having better javadoc. > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9694: Attachment: YARN-9694.003.patch > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch, > YARN-9694.003.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896281#comment-16896281 ] Abhishek Modi commented on YARN-9694: - Thanks for reviewing it. Java doesn't provide a way to create temporary directory. So I was creating a file and deleting it to get temporary directory. Updated patch to directly use temporary file. > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch, > YARN-9694.003.patch, YARN-9694.004.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9694: Attachment: YARN-9694.004.patch > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch, > YARN-9694.003.patch, YARN-9694.004.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9690) Invalid AMRM token when distributed scheduling is enabled.
[ https://issues.apache.org/jira/browse/YARN-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896659#comment-16896659 ] Abhishek Modi commented on YARN-9690: - [~Babbleshack] thanks for filing the issue. Could you please try by setting this in yarn-site.xml of both RM and NM: yarn.resourcemanager.hostname to yarn-master-0.yarn-service.yarn and then you can remove following configs: yarn.resourcemanager.resource-tracker.address, yarn.resourcemanager.address. > Invalid AMRM token when distributed scheduling is enabled. > -- > > Key: YARN-9690 > URL: https://issues.apache.org/jira/browse/YARN-9690 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-scheduling, yarn >Affects Versions: 2.9.2, 3.1.2 > Environment: OS: Ubuntu 18.04 > JVM: 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 >Reporter: Babble Shack >Priority: Major > Attachments: applicationlog, distributed_log, ds_application.log, > image-2019-07-26-18-00-14-980.png, nodemanager-yarn-site.xml, > nodemanager.log, rm-yarn-site.xml, yarn-site.xml > > > Applications fail to start due to invalild AMRM from application attempt. > I have tested this with 0/100% opportunistic maps and the same issue occurs > regardless. > {code:java} > > --> > > > mapreduceyarn.nodemanager.aux-services > mapreduce_shuffle > > > yarn.resourcemanager.address > yarn-master-0.yarn-service.yarn:8032 > > > yarn.resourcemanager.scheduler.address > 0.0.0.0:8049 > > > > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > > yarn.nodemanager.opportunistic-containers-max-queue-length > 10 > > > yarn.nodemanager.distributed-scheduling.enabled > true > > > > yarn.webapp.ui2.enable > true > > > yarn.resourcemanager.resource-tracker.address > yarn-master-0.yarn-service.yarn:8031 > > > yarn.log-aggregation-enable > true > > > yarn.nodemanager.aux-services > mapreduce_shuffle > > > > > > yarn.nodemanager.resource.memory-mb > 7168 > > > yarn.scheduler.minimum-allocation-mb > 3584 > > > yarn.scheduler.maximum-allocation-mb > 7168 > > > yarn.app.mapreduce.am.resource.mb > 7168 > > > > yarn.app.mapreduce.am.command-opts > -Xmx5734m > > > > yarn.timeline-service.enabled > true > > > yarn.resourcemanager.system-metrics-publisher.enabled > true > > > yarn.timeline-service.generic-application-history.enabled > true > > > yarn.timeline-service.bind-host > 0.0.0.0 > > > {code} > Relevant logs: > {code:java} > 2019-07-22 14:56:37,104 INFO [main] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 100% of the > mappers will be scheduled using OPPORTUNISTIC containers > 2019-07-22 14:56:37,117 INFO [main] org.apache.hadoop.yarn.client.RMProxy: > Connecting to ResourceManager at > yarn-master-0.yarn-service.yarn/10.244.1.134:8030 > 2019-07-22 14:56:37,150 WARN [main] org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Invalid AMRMToken from appattempt_1563805140414_0002_02 > 2019-07-22 14:56:37,152 ERROR [main] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while > registering > org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid > AMRMToken from appattempt_1563805140414_0002_02 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationH
[jira] [Assigned] (YARN-9690) Invalid AMRM token when distributed scheduling is enabled.
[ https://issues.apache.org/jira/browse/YARN-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-9690: --- Assignee: Abhishek Modi > Invalid AMRM token when distributed scheduling is enabled. > -- > > Key: YARN-9690 > URL: https://issues.apache.org/jira/browse/YARN-9690 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-scheduling, yarn >Affects Versions: 2.9.2, 3.1.2 > Environment: OS: Ubuntu 18.04 > JVM: 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 >Reporter: Babble Shack >Assignee: Abhishek Modi >Priority: Major > Attachments: applicationlog, distributed_log, ds_application.log, > image-2019-07-26-18-00-14-980.png, nodemanager-yarn-site.xml, > nodemanager.log, rm-yarn-site.xml, yarn-site.xml > > > Applications fail to start due to invalild AMRM from application attempt. > I have tested this with 0/100% opportunistic maps and the same issue occurs > regardless. > {code:java} > > --> > > > mapreduceyarn.nodemanager.aux-services > mapreduce_shuffle > > > yarn.resourcemanager.address > yarn-master-0.yarn-service.yarn:8032 > > > yarn.resourcemanager.scheduler.address > 0.0.0.0:8049 > > > > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > > yarn.nodemanager.opportunistic-containers-max-queue-length > 10 > > > yarn.nodemanager.distributed-scheduling.enabled > true > > > > yarn.webapp.ui2.enable > true > > > yarn.resourcemanager.resource-tracker.address > yarn-master-0.yarn-service.yarn:8031 > > > yarn.log-aggregation-enable > true > > > yarn.nodemanager.aux-services > mapreduce_shuffle > > > > > > yarn.nodemanager.resource.memory-mb > 7168 > > > yarn.scheduler.minimum-allocation-mb > 3584 > > > yarn.scheduler.maximum-allocation-mb > 7168 > > > yarn.app.mapreduce.am.resource.mb > 7168 > > > > yarn.app.mapreduce.am.command-opts > -Xmx5734m > > > > yarn.timeline-service.enabled > true > > > yarn.resourcemanager.system-metrics-publisher.enabled > true > > > yarn.timeline-service.generic-application-history.enabled > true > > > yarn.timeline-service.bind-host > 0.0.0.0 > > > {code} > Relevant logs: > {code:java} > 2019-07-22 14:56:37,104 INFO [main] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 100% of the > mappers will be scheduled using OPPORTUNISTIC containers > 2019-07-22 14:56:37,117 INFO [main] org.apache.hadoop.yarn.client.RMProxy: > Connecting to ResourceManager at > yarn-master-0.yarn-service.yarn/10.244.1.134:8030 > 2019-07-22 14:56:37,150 WARN [main] org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Invalid AMRMToken from appattempt_1563805140414_0002_02 > 2019-07-22 14:56:37,152 ERROR [main] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while > registering > org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid > AMRMToken from appattempt_1563805140414_0002_02 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apa
[jira] [Commented] (YARN-9690) Invalid AMRM token when distributed scheduling is enabled.
[ https://issues.apache.org/jira/browse/YARN-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896678#comment-16896678 ] Abhishek Modi commented on YARN-9690: - Thanks [~Babbleshack]. I will help review that PR. > Invalid AMRM token when distributed scheduling is enabled. > -- > > Key: YARN-9690 > URL: https://issues.apache.org/jira/browse/YARN-9690 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-scheduling, yarn >Affects Versions: 2.9.2, 3.1.2 > Environment: OS: Ubuntu 18.04 > JVM: 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 >Reporter: Babble Shack >Assignee: Abhishek Modi >Priority: Major > Attachments: applicationlog, distributed_log, ds_application.log, > image-2019-07-26-18-00-14-980.png, nodemanager-yarn-site.xml, > nodemanager.log, rm-yarn-site.xml, yarn-site.xml > > > Applications fail to start due to invalild AMRM from application attempt. > I have tested this with 0/100% opportunistic maps and the same issue occurs > regardless. > {code:java} > > --> > > > mapreduceyarn.nodemanager.aux-services > mapreduce_shuffle > > > yarn.resourcemanager.address > yarn-master-0.yarn-service.yarn:8032 > > > yarn.resourcemanager.scheduler.address > 0.0.0.0:8049 > > > > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > > yarn.nodemanager.opportunistic-containers-max-queue-length > 10 > > > yarn.nodemanager.distributed-scheduling.enabled > true > > > > yarn.webapp.ui2.enable > true > > > yarn.resourcemanager.resource-tracker.address > yarn-master-0.yarn-service.yarn:8031 > > > yarn.log-aggregation-enable > true > > > yarn.nodemanager.aux-services > mapreduce_shuffle > > > > > > yarn.nodemanager.resource.memory-mb > 7168 > > > yarn.scheduler.minimum-allocation-mb > 3584 > > > yarn.scheduler.maximum-allocation-mb > 7168 > > > yarn.app.mapreduce.am.resource.mb > 7168 > > > > yarn.app.mapreduce.am.command-opts > -Xmx5734m > > > > yarn.timeline-service.enabled > true > > > yarn.resourcemanager.system-metrics-publisher.enabled > true > > > yarn.timeline-service.generic-application-history.enabled > true > > > yarn.timeline-service.bind-host > 0.0.0.0 > > > {code} > Relevant logs: > {code:java} > 2019-07-22 14:56:37,104 INFO [main] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 100% of the > mappers will be scheduled using OPPORTUNISTIC containers > 2019-07-22 14:56:37,117 INFO [main] org.apache.hadoop.yarn.client.RMProxy: > Connecting to ResourceManager at > yarn-master-0.yarn-service.yarn/10.244.1.134:8030 > 2019-07-22 14:56:37,150 WARN [main] org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Invalid AMRMToken from appattempt_1563805140414_0002_02 > 2019-07-22 14:56:37,152 ERROR [main] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while > registering > org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid > AMRMToken from appattempt_1563805140414_0002_02 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocati
[jira] [Assigned] (YARN-9690) Invalid AMRM token when distributed scheduling is enabled.
[ https://issues.apache.org/jira/browse/YARN-9690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-9690: --- Assignee: (was: Abhishek Modi) > Invalid AMRM token when distributed scheduling is enabled. > -- > > Key: YARN-9690 > URL: https://issues.apache.org/jira/browse/YARN-9690 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-scheduling, yarn >Affects Versions: 2.9.2, 3.1.2 > Environment: OS: Ubuntu 18.04 > JVM: 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03 >Reporter: Babble Shack >Priority: Major > Attachments: applicationlog, distributed_log, ds_application.log, > image-2019-07-26-18-00-14-980.png, nodemanager-yarn-site.xml, > nodemanager.log, rm-yarn-site.xml, yarn-site.xml > > > Applications fail to start due to invalild AMRM from application attempt. > I have tested this with 0/100% opportunistic maps and the same issue occurs > regardless. > {code:java} > > --> > > > mapreduceyarn.nodemanager.aux-services > mapreduce_shuffle > > > yarn.resourcemanager.address > yarn-master-0.yarn-service.yarn:8032 > > > yarn.resourcemanager.scheduler.address > 0.0.0.0:8049 > > > > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > > yarn.nodemanager.opportunistic-containers-max-queue-length > 10 > > > yarn.nodemanager.distributed-scheduling.enabled > true > > > > yarn.webapp.ui2.enable > true > > > yarn.resourcemanager.resource-tracker.address > yarn-master-0.yarn-service.yarn:8031 > > > yarn.log-aggregation-enable > true > > > yarn.nodemanager.aux-services > mapreduce_shuffle > > > > > > yarn.nodemanager.resource.memory-mb > 7168 > > > yarn.scheduler.minimum-allocation-mb > 3584 > > > yarn.scheduler.maximum-allocation-mb > 7168 > > > yarn.app.mapreduce.am.resource.mb > 7168 > > > > yarn.app.mapreduce.am.command-opts > -Xmx5734m > > > > yarn.timeline-service.enabled > true > > > yarn.resourcemanager.system-metrics-publisher.enabled > true > > > yarn.timeline-service.generic-application-history.enabled > true > > > yarn.timeline-service.bind-host > 0.0.0.0 > > > {code} > Relevant logs: > {code:java} > 2019-07-22 14:56:37,104 INFO [main] > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 100% of the > mappers will be scheduled using OPPORTUNISTIC containers > 2019-07-22 14:56:37,117 INFO [main] org.apache.hadoop.yarn.client.RMProxy: > Connecting to ResourceManager at > yarn-master-0.yarn-service.yarn/10.244.1.134:8030 > 2019-07-22 14:56:37,150 WARN [main] org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server : > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > Invalid AMRMToken from appattempt_1563805140414_0002_02 > 2019-07-22 14:56:37,152 ERROR [main] > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while > registering > org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid > AMRMToken from appattempt_1563805140414_0002_02 > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) > at > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) > at > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryI
[jira] [Assigned] (YARN-7547) Throttle Localization for Opportunistic Containers in the NM
[ https://issues.apache.org/jira/browse/YARN-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-7547: --- Assignee: Abhishek Modi (was: kartheek muthyala) > Throttle Localization for Opportunistic Containers in the NM > > > Key: YARN-7547 > URL: https://issues.apache.org/jira/browse/YARN-7547 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Abhishek Modi >Priority: Major > > Currently, Localization is performed before the container is queued on the > NM. It is possible that a barrage of Opportunsitic containers can prevent > Guaranteed containers from starting. This can be avoided by throttling > Localization Requests for opportunistic containers - for eg. if the number of > Queued containers is > x, then don't start localization for new Opp > containers. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901105#comment-16901105 ] Abhishek Modi commented on YARN-9694: - Tested the latest patch on scale with 4500 nodes json and everything worked fine. [~elgoiri] could you please review it. Thanks. > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch, > YARN-9694.003.patch, YARN-9694.004.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9694) UI always show default-rack for all the nodes while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903597#comment-16903597 ] Abhishek Modi commented on YARN-9694: - Thanks [~elgoiri] for review. I have committed it to trunk. > UI always show default-rack for all the nodes while running SLS. > > > Key: YARN-9694 > URL: https://issues.apache.org/jira/browse/YARN-9694 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9694.001.patch, YARN-9694.002.patch, > YARN-9694.003.patch, YARN-9694.004.patch > > > Currently, independent of the specification of the nodes in SLS.json or > nodes.json, UI always shows that rack of the node is default-rack. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9732) yarn.system-metrics-publisher.enabled=false does not work
[ https://issues.apache.org/jira/browse/YARN-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903961#comment-16903961 ] Abhishek Modi commented on YARN-9732: - Thanks [~magnum] for the patch. I will commit it in couple of hours. > yarn.system-metrics-publisher.enabled=false does not work > - > > Key: YARN-9732 > URL: https://issues.apache.org/jira/browse/YARN-9732 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineclient >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Major > Attachments: YARN-9732.0001.patch > > > RM does not use yarn.system-metrics-publisher.enabled=false, > so if configure only yarn.timeline-service.enabled=true, > YARN system metrics are always published on the timeline server by RM > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl
[ https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903963#comment-16903963 ] Abhishek Modi commented on YARN-9731: - Thanks [~magnum] for the patch. Thanks [~Prabhu Joseph] for review. Patch looks good to me. [~magnum] could you please take care of check-style warnings. > In ATS v1.5, all jobs are visible to all users without view-acl > --- > > Key: YARN-9731 > URL: https://issues.apache.org/jira/browse/YARN-9731 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Major > Attachments: YARN-9731.001.patch, YARN-9731.002.patch, > ats_v1.5_screenshot.png > > > In ATS v1.5 of secure mode, > all jobs are visible to all users without view-acl. > if user does not have view-acl, user should not be able to see jobs. > I attatched ATS UI screenshot. > > ATS v1.5 log > {code:java} > 2019-08-09 10:21:13,679 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1954. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1954 > 2019-08-09 10:21:13,680 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1951. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1951 > {code} > > > > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9732) yarn.system-metrics-publisher.enabled=false is not honored by RM
[ https://issues.apache.org/jira/browse/YARN-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9732: Summary: yarn.system-metrics-publisher.enabled=false is not honored by RM (was: yarn.system-metrics-publisher.enabled=false does not work) > yarn.system-metrics-publisher.enabled=false is not honored by RM > > > Key: YARN-9732 > URL: https://issues.apache.org/jira/browse/YARN-9732 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineclient >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Major > Attachments: YARN-9732.0001.patch > > > RM does not use yarn.system-metrics-publisher.enabled=false, > so if configure only yarn.timeline-service.enabled=true, > YARN system metrics are always published on the timeline server by RM > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9732) yarn.system-metrics-publisher.enabled=false is not honored by RM
[ https://issues.apache.org/jira/browse/YARN-9732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904037#comment-16904037 ] Abhishek Modi commented on YARN-9732: - Thanks [~magnum] for patch and [~Prabhu Joseph] for review. Committed to trunk. > yarn.system-metrics-publisher.enabled=false is not honored by RM > > > Key: YARN-9732 > URL: https://issues.apache.org/jira/browse/YARN-9732 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, timelineclient >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Major > Attachments: YARN-9732.0001.patch > > > RM does not use yarn.system-metrics-publisher.enabled=false, > so if configure only yarn.timeline-service.enabled=true, > YARN system metrics are always published on the timeline server by RM > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl
[ https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904610#comment-16904610 ] Abhishek Modi commented on YARN-9731: - [~magnum] thanks for the patch. One minor comment: Could you use logger format in ApplicationHistoryManagerOnTimelineStore.java line 687. > In ATS v1.5, all jobs are visible to all users without view-acl > --- > > Key: YARN-9731 > URL: https://issues.apache.org/jira/browse/YARN-9731 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Major > Attachments: YARN-9731.001.patch, YARN-9731.002.patch, > YARN-9731.003.patch, ats_v1.5_screenshot.png > > > In ATS v1.5 of secure mode, > all jobs are visible to all users without view-acl. > if user does not have view-acl, user should not be able to see jobs. > I attatched ATS UI screenshot. > > ATS v1.5 log > {code:java} > 2019-08-09 10:21:13,679 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1954. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1954 > 2019-08-09 10:21:13,680 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1951. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1951 > {code} > > > > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9657) AbstractLivelinessMonitor add serviceName to PingChecker thread
[ https://issues.apache.org/jira/browse/YARN-9657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904612#comment-16904612 ] Abhishek Modi commented on YARN-9657: - Thanks [~BilwaST] for working on it. Patch looks good to me. Committed to trunk. > AbstractLivelinessMonitor add serviceName to PingChecker thread > --- > > Key: YARN-9657 > URL: https://issues.apache.org/jira/browse/YARN-9657 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-9657-001.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9722) PlacementRule logs object ID in place of queue name.
[ https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904613#comment-16904613 ] Abhishek Modi commented on YARN-9722: - Thanks [~Prabhu Joseph] for the patch. A minor comment: Can we use logger format here. > PlacementRule logs object ID in place of queue name. > > > Key: YARN-9722 > URL: https://issues.apache.org/jira/browse/YARN-9722 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Labels: supportability > Attachments: YARN-9722-001.patch > > > UserGroupMappingPlacementRule logs object ID in place of queue name. > {code} > 2019-08-05 09:28:52,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule: > Application application_1564996871731_0003 user ambari-qa mapping [default] > to > [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2] > override false > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904616#comment-16904616 ] Abhishek Modi commented on YARN-9464: - Thanks [~Prabhu Joseph] for the patch. It looks good to me. I will commit it by tomorrow if there is no objection. > Support "Pending Resource" metrics in RM's RESTful API > -- > > Key: YARN-9464 > URL: https://issues.apache.org/jira/browse/YARN-9464 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9464-001.patch, YARN-9464-002.patch > > > Knowing only the "available", "used" resource is not enough for YARN > management tools like auto-scaler. It would be helpful to diagnose the > cluster resource utilization if it gets "Pending Resource" from RM RESTful > APIs. In a certain extent, it represents how starving the applications are. > Initially, we can add "pending resource" information in below two RM REST > APIs: > {code:java} > RMnode:port/ws/v1/cluster/metrics > RMnode:port/ws/v1/cluster/nodes > {code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9731) In ATS v1.5, all jobs are visible to all users without view-acl
[ https://issues.apache.org/jira/browse/YARN-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904847#comment-16904847 ] Abhishek Modi commented on YARN-9731: - You have replaced all exceptions by e.toString(). This will not show the stack trace. Could you please use exception directly while logging. > In ATS v1.5, all jobs are visible to all users without view-acl > --- > > Key: YARN-9731 > URL: https://issues.apache.org/jira/browse/YARN-9731 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.2 >Reporter: KWON BYUNGCHANG >Assignee: KWON BYUNGCHANG >Priority: Major > Attachments: YARN-9731.001.patch, YARN-9731.002.patch, > YARN-9731.003.patch, YARN-9731.004.patch, ats_v1.5_screenshot.png > > > In ATS v1.5 of secure mode, > all jobs are visible to all users without view-acl. > if user does not have view-acl, user should not be able to see jobs. > I attatched ATS UI screenshot. > > ATS v1.5 log > {code:java} > 2019-08-09 10:21:13,679 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1954. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1954 > 2019-08-09 10:21:13,680 WARN > applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore > (ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(687)) > - Failed to authorize when generating application report for > application_1565247558150_1951. Use a placeholder for its latest attempt id. > org.apache.hadoop.security.authorize.AuthorizationException: User magnum does > not have privilege to see this application application_1565247558150_1951 > {code} > > > > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9722) PlacementRule logs object ID in place of queue name.
[ https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904852#comment-16904852 ] Abhishek Modi commented on YARN-9722: - Patch [^YARN-9722-002.patch] lgtm. Committed to trunk. Thanks [~Prabhu Joseph] for the patch and [~sunilg] for review. > PlacementRule logs object ID in place of queue name. > > > Key: YARN-9722 > URL: https://issues.apache.org/jira/browse/YARN-9722 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Labels: supportability > Attachments: YARN-9722-001.patch, YARN-9722-002.patch > > > UserGroupMappingPlacementRule logs object ID in place of queue name. > {code} > 2019-08-05 09:28:52,664 INFO > org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule: > Application application_1564996871731_0003 user ambari-qa mapping [default] > to > [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2] > override false > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904853#comment-16904853 ] Abhishek Modi commented on YARN-9464: - [~Prabhu Joseph] [TestSystemMetricsPublisher.testPublishContainerMetrics|https://builds.apache.org/job/PreCommit-YARN-Build/24527/testReport/org.apache.hadoop.yarn.server.resourcemanager.metrics/TestSystemMetricsPublisher/testPublishContainerMetrics/] is failing. Could you please take a look whether it's related. Thanks > Support "Pending Resource" metrics in RM's RESTful API > -- > > Key: YARN-9464 > URL: https://issues.apache.org/jira/browse/YARN-9464 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9464-001.patch, YARN-9464-002.patch > > > Knowing only the "available", "used" resource is not enough for YARN > management tools like auto-scaler. It would be helpful to diagnose the > cluster resource utilization if it gets "Pending Resource" from RM RESTful > APIs. In a certain extent, it represents how starving the applications are. > Initially, we can add "pending resource" information in below two RM REST > APIs: > {code:java} > RMnode:port/ws/v1/cluster/metrics > RMnode:port/ws/v1/cluster/nodes > {code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7982) Do ACLs check while retrieving entity-types per application
[ https://issues.apache.org/jira/browse/YARN-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904859#comment-16904859 ] Abhishek Modi commented on YARN-7982: - Thanks [~Prabhu Joseph] for the patch. Some comments: In TimelineReaderManager.java, I think we should still create a new context while calling getEntityTypes to make sure that existing context is not modified. In FilesystemTimelineReaderImpl, why do we need to set userId explicitly. In line just above, we are calling context.getUserId. > Do ACLs check while retrieving entity-types per application > --- > > Key: YARN-7982 > URL: https://issues.apache.org/jira/browse/YARN-7982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-7982-001.patch, YARN-7982-002.patch, > YARN-7982-003.patch > > > REST end point {{/apps/$appid/entity-types}} retrieves all the entity-types > for given application. This need to be guarded with ACL check > {code} > [yarn@yarn-ats-3 ~]$ curl > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002?user.name=ambari-qa1"; > {"exception":"ForbiddenException","message":"java.lang.Exception: User > ambari-qa1 is not allowed to read TimelineService V2 > data.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"} > [yarn@yarn-ats-3 ~]$ curl > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002/entity-types?user.name=ambari-qa1"; > ["YARN_APPLICATION_ATTEMPT","YARN_CONTAINER"] > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904993#comment-16904993 ] Abhishek Modi commented on YARN-9464: - Thanks [~Prabhu Joseph]. Committed to trunk. > Support "Pending Resource" metrics in RM's RESTful API > -- > > Key: YARN-9464 > URL: https://issues.apache.org/jira/browse/YARN-9464 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9464-001.patch, YARN-9464-002.patch > > > Knowing only the "available", "used" resource is not enough for YARN > management tools like auto-scaler. It would be helpful to diagnose the > cluster resource utilization if it gets "Pending Resource" from RM RESTful > APIs. In a certain extent, it represents how starving the applications are. > Initially, we can add "pending resource" information in below two RM REST > APIs: > {code:java} > RMnode:port/ws/v1/cluster/metrics > RMnode:port/ws/v1/cluster/nodes > {code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9373) HBaseTimelineSchemaCreator has to allow user to configure pre-splits
[ https://issues.apache.org/jira/browse/YARN-9373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905783#comment-16905783 ] Abhishek Modi commented on YARN-9373: - Thanks [~Prabhu Joseph] for the patch. I will review it as soon as I get some free cycles. Thanks. > HBaseTimelineSchemaCreator has to allow user to configure pre-splits > > > Key: YARN-9373 > URL: https://issues.apache.org/jira/browse/YARN-9373 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: Configurable_PreSplits.png, YARN-9373-001.patch, > YARN-9373-002.patch, YARN-9373-003.patch > > > Most of the TimelineService HBase tables is set with username splits which is > based on lowercase alphabet (a,ad,an,b,ca). This won't help if the rowkey > starts with either number or uppercase alphabet. We need to allow user to > configure based upon their data. For example, say a user has configured the > yarn.resourcemanager.cluster-id to be ATS or 123, then the splits can be > configured as A,B,C,,, or 100,200,300,,, -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9744) RollingLevelDBTimelineStore.getEntityByTime fails with NPE
[ https://issues.apache.org/jira/browse/YARN-9744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906205#comment-16906205 ] Abhishek Modi commented on YARN-9744: - Thanks [~Prabhu Joseph] for the patch. LGTM. Committed to trunk. > RollingLevelDBTimelineStore.getEntityByTime fails with NPE > -- > > Key: YARN-9744 > URL: https://issues.apache.org/jira/browse/YARN-9744 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9744-001.patch > > > RollingLevelDBTimelineStore.getEntityByTime fails with NPE. > {code} > 2019-08-07 12:58:55,990 WARN ipc.Server (Server.java:logException(2433)) - > IPC Server handler 0 on 10200, call > org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB.getContainers from > 10.21.216.93:36392 Call#29446915 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntityByTime(RollingLevelDBTimelineStore.java:786) > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.getEntities(RollingLevelDBTimelineStore.java:614) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.getEntities(EntityGroupFSTimelineStore.java:1045) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.doGetEntities(TimelineDataManager.java:168) > at > org.apache.hadoop.yarn.server.timeline.TimelineDataManager.getEntities(TimelineDataManager.java:138) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainers(ApplicationHistoryManagerOnTimelineStore.java:222) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainers(ApplicationHistoryClientService.java:213) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationHistoryProtocolPBServiceImpl.getContainers(ApplicationHistoryProtocolPBServiceImpl.java:172) > at > org.apache.hadoop.yarn.proto.ApplicationHistoryProtocol$ApplicationHistoryProtocolService$2.callBlockingMethod(ApplicationHistoryProtocol.java:201) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347) > {code} > This affects Rest Api to get entities. > curl http://pjosephdocker:8188/ws/v1/timeline/TEZ_APPLICATION -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9752) Add support for allocation id in SLS.
Abhishek Modi created YARN-9752: --- Summary: Add support for allocation id in SLS. Key: YARN-9752 URL: https://issues.apache.org/jira/browse/YARN-9752 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9754) Add support for arbitrary DAG AM Simulator.
Abhishek Modi created YARN-9754: --- Summary: Add support for arbitrary DAG AM Simulator. Key: YARN-9754 URL: https://issues.apache.org/jira/browse/YARN-9754 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi Currently, all map containers are requests as soon as Application master comes up and then all reducer containers are requested. This doesn't get flexibility to simulate behavior of DAG where various number of containers would be requested at different time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.
[ https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9754: Attachment: YARN-9754.001.patch > Add support for arbitrary DAG AM Simulator. > --- > > Key: YARN-9754 > URL: https://issues.apache.org/jira/browse/YARN-9754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9754.001.patch > > > Currently, all map containers are requests as soon as Application master > comes up and then all reducer containers are requested. This doesn't get > flexibility to simulate behavior of DAG where various number of containers > would be requested at different time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9752) Add support for allocation id in SLS.
[ https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9752: Attachment: YARN-9752.002.patch > Add support for allocation id in SLS. > - > > Key: YARN-9752 > URL: https://issues.apache.org/jira/browse/YARN-9752 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9752.001.patch, YARN-9752.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9752) Add support for allocation id in SLS.
[ https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910008#comment-16910008 ] Abhishek Modi commented on YARN-9752: - [~elgoiri] could you please review it. Thanks. > Add support for allocation id in SLS. > - > > Key: YARN-9752 > URL: https://issues.apache.org/jira/browse/YARN-9752 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9752.001.patch, YARN-9752.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.
[ https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9754: Attachment: YARN-9754.002.patch > Add support for arbitrary DAG AM Simulator. > --- > > Key: YARN-9754 > URL: https://issues.apache.org/jira/browse/YARN-9754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9754.001.patch, YARN-9754.002.patch > > > Currently, all map containers are requests as soon as Application master > comes up and then all reducer containers are requested. This doesn't get > flexibility to simulate behavior of DAG where various number of containers > would be requested at different time. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9752) Add support for allocation id in SLS.
[ https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9752: Attachment: YARN-9752.003.patch > Add support for allocation id in SLS. > - > > Key: YARN-9752 > URL: https://issues.apache.org/jira/browse/YARN-9752 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9752.001.patch, YARN-9752.002.patch, > YARN-9752.003.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9752) Add support for allocation id in SLS.
[ https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911017#comment-16911017 ] Abhishek Modi commented on YARN-9752: - Thanks [~elgoiri] for review. Attached v3 patch with the fixes. Also tested this with 1000 jobs and 4500 nodes sls.json and it's working fine end to end. > Add support for allocation id in SLS. > - > > Key: YARN-9752 > URL: https://issues.apache.org/jira/browse/YARN-9752 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9752.001.patch, YARN-9752.002.patch, > YARN-9752.003.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9765) SLS runner crashes when run with metrics turned off.
Abhishek Modi created YARN-9765: --- Summary: SLS runner crashes when run with metrics turned off. Key: YARN-9765 URL: https://issues.apache.org/jira/browse/YARN-9765 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi When sls metrics is turned off, creation of AM fails with NPE. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9765) SLS runner crashes when run with metrics turned off.
[ https://issues.apache.org/jira/browse/YARN-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912112#comment-16912112 ] Abhishek Modi commented on YARN-9765: - Thanks [~bibinchundatt] for review and committing it. > SLS runner crashes when run with metrics turned off. > > > Key: YARN-9765 > URL: https://issues.apache.org/jira/browse/YARN-9765 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9765.001.patch > > > When sls metrics is turned off, creation of AM fails with NPE. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9752) Add support for allocation id in SLS.
[ https://issues.apache.org/jira/browse/YARN-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912392#comment-16912392 ] Abhishek Modi commented on YARN-9752: - Thanks [~elgoiri] for review. Committed to trunk. > Add support for allocation id in SLS. > - > > Key: YARN-9752 > URL: https://issues.apache.org/jira/browse/YARN-9752 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9752.001.patch, YARN-9752.002.patch, > YARN-9752.003.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9782) Avoid DNS resolution while running SLS.
Abhishek Modi created YARN-9782: --- Summary: Avoid DNS resolution while running SLS. Key: YARN-9782 URL: https://issues.apache.org/jira/browse/YARN-9782 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi In SLS, we add nodes with random names and rack. DNS resolution of these nodes takes around 2 seconds because it will timeout after that. This makes the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.
[ https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9754: Attachment: YARN-9754.003.patch > Add support for arbitrary DAG AM Simulator. > --- > > Key: YARN-9754 > URL: https://issues.apache.org/jira/browse/YARN-9754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9754.001.patch, YARN-9754.002.patch, > YARN-9754.003.patch > > > Currently, all map containers are requests as soon as Application master > comes up and then all reducer containers are requested. This doesn't get > flexibility to simulate behavior of DAG where various number of containers > would be requested at different time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.
[ https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9754: Attachment: YARN-9754.004.patch > Add support for arbitrary DAG AM Simulator. > --- > > Key: YARN-9754 > URL: https://issues.apache.org/jira/browse/YARN-9754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9754.001.patch, YARN-9754.002.patch, > YARN-9754.003.patch, YARN-9754.004.patch > > > Currently, all map containers are requests as soon as Application master > comes up and then all reducer containers are requested. This doesn't get > flexibility to simulate behavior of DAG where various number of containers > would be requested at different time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9754) Add support for arbitrary DAG AM Simulator.
[ https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916721#comment-16916721 ] Abhishek Modi commented on YARN-9754: - Thanks [~elgoiri] for review. I have addressed all the review comments in the latest patch. Since changing the AMSimulator will also impact MRAMSimulator and StreamAMSimulator, I will create a separate Jira for that. > Add support for arbitrary DAG AM Simulator. > --- > > Key: YARN-9754 > URL: https://issues.apache.org/jira/browse/YARN-9754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9754.001.patch, YARN-9754.002.patch, > YARN-9754.003.patch, YARN-9754.004.patch > > > Currently, all map containers are requests as soon as Application master > comes up and then all reducer containers are requested. This doesn't get > flexibility to simulate behavior of DAG where various number of containers > would be requested at different time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.
[ https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9754: Attachment: YARN-9754.005.patch > Add support for arbitrary DAG AM Simulator. > --- > > Key: YARN-9754 > URL: https://issues.apache.org/jira/browse/YARN-9754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9754.001.patch, YARN-9754.002.patch, > YARN-9754.003.patch, YARN-9754.004.patch, YARN-9754.005.patch > > > Currently, all map containers are requests as soon as Application master > comes up and then all reducer containers are requested. This doesn't get > flexibility to simulate behavior of DAG where various number of containers > would be requested at different time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9754) Add support for arbitrary DAG AM Simulator.
[ https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9754: Attachment: YARN-9754.006.patch > Add support for arbitrary DAG AM Simulator. > --- > > Key: YARN-9754 > URL: https://issues.apache.org/jira/browse/YARN-9754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9754.001.patch, YARN-9754.002.patch, > YARN-9754.003.patch, YARN-9754.004.patch, YARN-9754.005.patch, > YARN-9754.006.patch > > > Currently, all map containers are requests as soon as Application master > comes up and then all reducer containers are requested. This doesn't get > flexibility to simulate behavior of DAG where various number of containers > would be requested at different time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9754) Add support for arbitrary DAG AM Simulator.
[ https://issues.apache.org/jira/browse/YARN-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918317#comment-16918317 ] Abhishek Modi commented on YARN-9754: - Thanks [~elgoiri] for review. Committed it to trunk. > Add support for arbitrary DAG AM Simulator. > --- > > Key: YARN-9754 > URL: https://issues.apache.org/jira/browse/YARN-9754 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9754.001.patch, YARN-9754.002.patch, > YARN-9754.003.patch, YARN-9754.004.patch, YARN-9754.005.patch, > YARN-9754.006.patch > > > Currently, all map containers are requests as soon as Application master > comes up and then all reducer containers are requested. This doesn't get > flexibility to simulate behavior of DAG where various number of containers > would be requested at different time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5281) Explore supporting a simpler back-end implementation for ATS v2
[ https://issues.apache.org/jira/browse/YARN-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919222#comment-16919222 ] Abhishek Modi commented on YARN-5281: - We are already supporting Hbase and CosmosDB as backend. We also support local/hdfs filesystem with limited capabilities. Based on the discussion, I don't think it would be possible to support simpler backend with all functionalities without re-implementing some part of the features provided by Hbase/CosmosDB. For single node setup, ATSv2 can still be used with limited functionalities using local filesystem as backend. If no one is actively working on this, I would close this as part of Jira cleanup for ATSv2. cc [~vrushalic]/[~rohithsharma] > Explore supporting a simpler back-end implementation for ATS v2 > --- > > Key: YARN-5281 > URL: https://issues.apache.org/jira/browse/YARN-5281 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Joep Rottinghuis >Priority: Major > Labels: YARN-5355 > > During the merge discussion [~kasha] raised the question whether we would > support simpler backend for users to try out, in addition to the HBase > implementation. > The understanding is that this would not be meant to scale, but it could > simplify initial adoption and early usage. > I'm filing this jira to gather the merits and challenges of such approach in > one place. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9794) RM crashes due to runtime errors in TimelineServiceV2Publisher
[ https://issues.apache.org/jira/browse/YARN-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919237#comment-16919237 ] Abhishek Modi commented on YARN-9794: - Thanks [~tarunparimi] for filing it and working on it. Thanks [~Prabhu Joseph] for review. [~tarunparimi] some more comments in addition to Prabhu's comment: # we should handle IOException and Exception separately. > RM crashes due to runtime errors in TimelineServiceV2Publisher > -- > > Key: YARN-9794 > URL: https://issues.apache.org/jira/browse/YARN-9794 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Labels: atsv2 > Attachments: YARN-9794.001.patch > > > Saw that RM crashes while startup due to errors while putting entity in > TimelineServiceV2Publisher. > {code:java} > 2019-08-28 09:35:45,273 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.RuntimeException: java.lang.IllegalArgumentException: > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > CodedInputStream encountered an embedded string or message which claimed to > have negative size > . > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:200) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269) > at > org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437) > at > org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312) > at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:321) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:285) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.flush(TypedBufferedMutator.java:66) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.flush(HBaseTimelineWriterImpl.java:566) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.flushBufferedTimelineEntities(TimelineCollector.java:173) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:150) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:459) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:494) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:483) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalArgumentException: > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > CodedInputStream encountered an embedded string or message which claimed to > have negative size. > at > org.apache.hbase.thirdparty.com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:117) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9540) TestRMAppTransitions fails intermittently
[ https://issues.apache.org/jira/browse/YARN-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919465#comment-16919465 ] Abhishek Modi commented on YARN-9540: - Thanks [~Tao Yang] for the patch and [~adam.antal] for review. LGTM, will commit shortly. > TestRMAppTransitions fails intermittently > - > > Key: YARN-9540 > URL: https://issues.apache.org/jira/browse/YARN-9540 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, test >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Tao Yang >Priority: Minor > Attachments: YARN-9540.001.patch > > > Failed > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppFinishedFinished[0] > {code} > Error Message > expected:<1> but was:<0> > Stacktrace > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppCompletedEvent(TestRMAppTransitions.java:1307) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.verifyAppAfterFinishEvent(TestRMAppTransitions.java:1302) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testCreateAppFinished(TestRMAppTransitions.java:648) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions.testAppFinishedFinished(TestRMAppTransitions.java:1083) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9798) ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails intermittently
[ https://issues.apache.org/jira/browse/YARN-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919472#comment-16919472 ] Abhishek Modi commented on YARN-9798: - Thanks [~Tao Yang] for the patch. Will wait for jenkins results before committing it. [~Tao Yang] could you please update about the frequency of failures before and after the fix. Thanks. > ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails > intermittently > - > > Key: YARN-9798 > URL: https://issues.apache.org/jira/browse/YARN-9798 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Minor > Attachments: YARN-9798.001.patch > > > Found intermittent failure of > ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster in > YARN-9714 jenkins report, the cause is that the assertion which will make > sure dispatcher has handled UNREGISTERED event but not wait until all events > in dispatcher are handled, we need to add {{rm.drainEvents()}} before that > assertion to fix this issue. > Failure info: > {noformat} > [ERROR] > testRepeatedFinishApplicationMaster(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceCapacity) > Time elapsed: 0.559 s <<< FAILURE! > java.lang.AssertionError: Expecting only one event expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServiceTestBase.testRepeatedFinishApplicationMaster(ApplicationMasterServiceTestBase.java:385) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Standard output: > {noformat} > 2019-08-29 06:59:54,458 ERROR [AsyncDispatcher event handler] > resourcemanager.ResourceManager (ResourceManager.java:handle(1088)) - Error > in handling event type REGISTERED for applicationAttempt > appattempt_1567061994047_0001_01 > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:276) > at > org.apache.hadoop.yarn.event.DrainDispatcher$2.handle(DrainDispatcher.java:91) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1679) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1658) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:914) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1086) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200) >
[jira] [Commented] (YARN-9798) ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails intermittently
[ https://issues.apache.org/jira/browse/YARN-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920016#comment-16920016 ] Abhishek Modi commented on YARN-9798: - Thanks [~Tao Yang] for providing details. Committed it to trunk. > ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster fails > intermittently > - > > Key: YARN-9798 > URL: https://issues.apache.org/jira/browse/YARN-9798 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Minor > Attachments: YARN-9798.001.patch > > > Found intermittent failure of > ApplicationMasterServiceTestBase#testRepeatedFinishApplicationMaster in > YARN-9714 jenkins report, the cause is that the assertion which will make > sure dispatcher has handled UNREGISTERED event but not wait until all events > in dispatcher are handled, we need to add {{rm.drainEvents()}} before that > assertion to fix this issue. > Failure info: > {noformat} > [ERROR] > testRepeatedFinishApplicationMaster(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceCapacity) > Time elapsed: 0.559 s <<< FAILURE! > java.lang.AssertionError: Expecting only one event expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServiceTestBase.testRepeatedFinishApplicationMaster(ApplicationMasterServiceTestBase.java:385) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Standard output: > {noformat} > 2019-08-29 06:59:54,458 ERROR [AsyncDispatcher event handler] > resourcemanager.ResourceManager (ResourceManager.java:handle(1088)) - Error > in handling event type REGISTERED for applicationAttempt > appattempt_1567061994047_0001_01 > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:276) > at > org.apache.hadoop.yarn.event.DrainDispatcher$2.handle(DrainDispatcher.java:91) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1679) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1658) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:914) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:121) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1086) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:1067) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterServiceTestBase$CountingDispatcher.dispatch(Applic
[jira] [Commented] (YARN-9800) TestRMDelegationTokens can fail in testRemoveExpiredMasterKeyInRMStateStore
[ https://issues.apache.org/jira/browse/YARN-9800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920019#comment-16920019 ] Abhishek Modi commented on YARN-9800: - Thanks [~adam.antal] for the patch and [~kmarton] for review. Fix looks good to me. Will commit shortly. > TestRMDelegationTokens can fail in testRemoveExpiredMasterKeyInRMStateStore > --- > > Key: YARN-9800 > URL: https://issues.apache.org/jira/browse/YARN-9800 > Project: Hadoop YARN > Issue Type: Bug > Components: test, yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9800.001.patch > > > The test fails intermittently with the following stack trace: > {noformat} > java.lang.AssertionError: > expected:<[org.apache.hadoop.security.token.delegation.DelegationKey@c09cd1a8, > org.apache.hadoop.security.token.delegation.DelegationKey@ca089752]> but > was:<[org.apache.hadoop.security.token.delegation.DelegationKey@c09cd1a8, > org.apache.hadoop.security.token.delegation.DelegationKey@b206142c, > org.apache.hadoop.security.token.delegation.DelegationKey@ca089752]> > at > org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRemoveExpiredMasterKeyInRMStateStore(TestRMDelegationTokens.java:161) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9790) Failed to set default-application-lifetime if maximum-application-lifetime is less than or equal to zero
[ https://issues.apache.org/jira/browse/YARN-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920059#comment-16920059 ] Abhishek Modi commented on YARN-9790: - Thanks [~kyungwan nam] for working on it. Thanks [~Prabhu Joseph] for the review. 004 patch lgtm. Will commit shortly. > Failed to set default-application-lifetime if maximum-application-lifetime is > less than or equal to zero > > > Key: YARN-9790 > URL: https://issues.apache.org/jira/browse/YARN-9790 > Project: Hadoop YARN > Issue Type: Bug >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Attachments: YARN-9790.001.patch, YARN-9790.002.patch, > YARN-9790.003.patch, YARN-9790.004.patch > > > capacity-scheduler > {code} > ... > yarn.scheduler.capacity.root.dev.maximum-application-lifetime=-1 > yarn.scheduler.capacity.root.dev.default-application-lifetime=604800 > {code} > refreshQueue was failed as follows > {code} > 2019-08-28 15:21:57,423 WARN resourcemanager.AdminService > (AdminService.java:logAndWrapException(910)) - Exception refresh queues. > java.io.IOException: Failed to re-init queues : Default lifetime604800 can't > exceed maximum lifetime -1 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:477) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:423) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:394) > at > org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshQueues(ResourceManagerAdministrationProtocolPBServiceImpl.java:114) > at > org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:271) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) > Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Default > lifetime604800 can't exceed maximum lifetime -1 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:268) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:162) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:141) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:726) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:472) > ... 12 more > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8678) Queue Management API - rephrase error messages
[ https://issues.apache.org/jira/browse/YARN-8678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920276#comment-16920276 ] Abhishek Modi commented on YARN-8678: - Thanks [~Prabhu Joseph] for working on it. LGTM. Will commit it. > Queue Management API - rephrase error messages > -- > > Key: YARN-8678 > URL: https://issues.apache.org/jira/browse/YARN-8678 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Akhil PB >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-8678-002.patch, YARN-8768-001.patch > > > 1. When stopping a running parent queue, error message thrown by API was not > meaningful. > For example: When tried to stop root queue, error message thrown was > {{Failed to re-init queues : The parent queue:root state is STOPPED, child > queue:default state cannot be RUNNING.}} > It is evident that root queue update failed, but the message says > {{queue:root state is STOPPED}}. > 2. While tried to delete a running leaf queue, error message thrown by API > was not meaningful. > For example: Error message was {{Failed to re-init queues : root.default.prod > is deleted from the new capacity scheduler configuration, but the queue is > not yet in stopped state. Current State : RUNNING}}. > Clearly deletion of queue root.default.prod failed with error, but the > message says {{queues : root.default.prod is deleted from the new capacity > scheduler configuration}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9791) Queue Mutation API does not allow to remove a config
[ https://issues.apache.org/jira/browse/YARN-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920320#comment-16920320 ] Abhishek Modi commented on YARN-9791: - Thanks [~Prabhu Joseph] for the patch. Some minor comments: # Can we replace kv.getValue by keyValue everywhere as. # Can we update test to check that other configs are unchanged after applying mutation. > Queue Mutation API does not allow to remove a config > > > Key: YARN-9791 > URL: https://issues.apache.org/jira/browse/YARN-9791 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9791-001.patch > > > Queue Mutation API does not allow to remove a config. When removing a node > label from a queue and it's capacity config > {code} > > > root.batch > > > accessible-node-labels > > > > accessible-node-labels.x.capacity > > > > > > {code} > It fails with below. > {code} > Caused by: java.lang.NumberFormatException: empty String > at > sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) > at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122) > at java.lang.Float.parseFloat(Float.java:451) > at > org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1632) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:682) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:697) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:136) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:185) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:362) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:172) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:157) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:139) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:785) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:497) > ... 72 more > {code} > We can fix this by providing a separate XmlElement to remove config. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9793) Remove duplicate sentence from TimelineServiceV2.md
[ https://issues.apache.org/jira/browse/YARN-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920323#comment-16920323 ] Abhishek Modi commented on YARN-9793: - Thanks [~kmarton] for the patch and [~adam.antal] for review. LGTM. Will commit shortly. > Remove duplicate sentence from TimelineServiceV2.md > --- > > Key: YARN-9793 > URL: https://issues.apache.org/jira/browse/YARN-9793 > Project: Hadoop YARN > Issue Type: Bug > Components: docs >Reporter: Julia Kinga Marton >Assignee: Julia Kinga Marton >Priority: Major > Attachments: YARN-9793.001.patch > > > In the documentation of the ATSv2, TimelineEntity objects description part > there is a duplication: > * configs: A map from a string (config name) to a string (config value) > representing all configs associated with the entity. Users can post the whole > config or a part of it in the configs field. *Supported for application and > generic entities. Supported for application and generic entities.* > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9793) Remove duplicate sentence from TimelineServiceV2.md
[ https://issues.apache.org/jira/browse/YARN-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9793: Component/s: ATSv2 > Remove duplicate sentence from TimelineServiceV2.md > --- > > Key: YARN-9793 > URL: https://issues.apache.org/jira/browse/YARN-9793 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2, docs >Reporter: Julia Kinga Marton >Assignee: Julia Kinga Marton >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9793.001.patch > > > In the documentation of the ATSv2, TimelineEntity objects description part > there is a duplication: > * configs: A map from a string (config name) to a string (config value) > representing all configs associated with the entity. Users can post the whole > config or a part of it in the configs field. *Supported for application and > generic entities. Supported for application and generic entities.* > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9106) Add option to graceful decommission to not wait for applications
[ https://issues.apache.org/jira/browse/YARN-9106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920328#comment-16920328 ] Abhishek Modi commented on YARN-9106: - Gracefully decommissioning nodes without waiting for applications to finish also makes sense when shuffle data is offloaded to persistent storage or shuffle service is running completely outside nodes. While running Yarn on cloud, it is very common to offload shuffle data to persistent volumes and remove nodes. cc [~elgoiri] > Add option to graceful decommission to not wait for applications > > > Key: YARN-9106 > URL: https://issues.apache.org/jira/browse/YARN-9106 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Mikayla Konst >Assignee: Mikayla Konst >Priority: Major > Attachments: YARN-9106.patch > > > Add property > yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-applications. > If true (the default), the resource manager waits for all containers, as well > as all applications associated with those containers, to finish before > gracefully decommissioning a node. > If false, the resource manager only waits for containers, but not > applications, to finish. For map-only jobs or other jobs in which mappers do > not need to serve shuffle data, this allows nodes to be decommissioned as > soon as their containers are finished as opposed to when the job is done. > Add property > yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-app-masters. > If false, during graceful decommission, when the resource manager waits for > all containers on a node to finish, it will not wait for app master > containers to finish. Defaults to true. This property should only be set to > false if app master failure is recoverable. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7982) Do ACLs check while retrieving entity-types per application
[ https://issues.apache.org/jira/browse/YARN-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920337#comment-16920337 ] Abhishek Modi commented on YARN-7982: - Thanks [~Prabhu Joseph] for clarification. Some minor comments: # In FileSystemTimelineReaderImpl, we should set userId only if it is null. # Would it be possible to write unit test covering case where user id is not specified. > Do ACLs check while retrieving entity-types per application > --- > > Key: YARN-7982 > URL: https://issues.apache.org/jira/browse/YARN-7982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-7982-001.patch, YARN-7982-002.patch, > YARN-7982-003.patch > > > REST end point {{/apps/$appid/entity-types}} retrieves all the entity-types > for given application. This need to be guarded with ACL check > {code} > [yarn@yarn-ats-3 ~]$ curl > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002?user.name=ambari-qa1"; > {"exception":"ForbiddenException","message":"java.lang.Exception: User > ambari-qa1 is not allowed to read TimelineService V2 > data.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"} > [yarn@yarn-ats-3 ~]$ curl > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002/entity-types?user.name=ambari-qa1"; > ["YARN_APPLICATION_ATTEMPT","YARN_CONTAINER"] > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7982) Do ACLs check while retrieving entity-types per application
[ https://issues.apache.org/jira/browse/YARN-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920337#comment-16920337 ] Abhishek Modi edited comment on YARN-7982 at 9/1/19 8:07 AM: - Thanks [~Prabhu Joseph] for clarification. Some minor comments: # In FileSystemTimelineReaderImpl, we should set userId only if it is passed as null. # Would it be possible to write unit test covering case where user id is not specified. was (Author: abmodi): Thanks [~Prabhu Joseph] for clarification. Some minor comments: # In FileSystemTimelineReaderImpl, we should set userId only if it is null. # Would it be possible to write unit test covering case where user id is not specified. > Do ACLs check while retrieving entity-types per application > --- > > Key: YARN-7982 > URL: https://issues.apache.org/jira/browse/YARN-7982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-7982-001.patch, YARN-7982-002.patch, > YARN-7982-003.patch > > > REST end point {{/apps/$appid/entity-types}} retrieves all the entity-types > for given application. This need to be guarded with ACL check > {code} > [yarn@yarn-ats-3 ~]$ curl > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002?user.name=ambari-qa1"; > {"exception":"ForbiddenException","message":"java.lang.Exception: User > ambari-qa1 is not allowed to read TimelineService V2 > data.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"} > [yarn@yarn-ats-3 ~]$ curl > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002/entity-types?user.name=ambari-qa1"; > ["YARN_APPLICATION_ATTEMPT","YARN_CONTAINER"] > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9791) Queue Mutation API does not allow to remove a config
[ https://issues.apache.org/jira/browse/YARN-9791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920462#comment-16920462 ] Abhishek Modi commented on YARN-9791: - Thanks [~Prabhu Joseph]. Latest patch looks good to me. Committed to trunk. > Queue Mutation API does not allow to remove a config > > > Key: YARN-9791 > URL: https://issues.apache.org/jira/browse/YARN-9791 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9791-001.patch, YARN-9791-002.patch > > > Queue Mutation API does not allow to remove a config. When removing a node > label from a queue and it's capacity config > {code} > > > root.batch > > > accessible-node-labels > > > > accessible-node-labels.x.capacity > > > > > > {code} > It fails with below. > {code} > Caused by: java.lang.NumberFormatException: empty String > at > sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) > at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122) > at java.lang.Float.parseFloat(Float.java:451) > at > org.apache.hadoop.conf.Configuration.getFloat(Configuration.java:1632) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.internalGetLabeledQueueCapacity(CapacitySchedulerConfiguration.java:682) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getLabeledQueueCapacity(CapacitySchedulerConfiguration.java:697) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadCapacitiesByLabelsFromConf(CSQueueUtils.java:136) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.loadUpdateAndCheckCapacities(CSQueueUtils.java:111) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupConfigurableCapacities(AbstractCSQueue.java:185) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:362) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:172) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:157) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:139) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:785) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:497) > ... 72 more > {code} > We can fix this by providing a separate XmlElement to remove config. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7982) Do ACLs check while retrieving entity-types per application
[ https://issues.apache.org/jira/browse/YARN-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920610#comment-16920610 ] Abhishek Modi commented on YARN-7982: - Thanks [~Prabhu Joseph]. v4 patch looks good to me. Will commit it shortly. > Do ACLs check while retrieving entity-types per application > --- > > Key: YARN-7982 > URL: https://issues.apache.org/jira/browse/YARN-7982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-7982-001.patch, YARN-7982-002.patch, > YARN-7982-003.patch, YARN-7982-004.patch > > > REST end point {{/apps/$appid/entity-types}} retrieves all the entity-types > for given application. This need to be guarded with ACL check > {code} > [yarn@yarn-ats-3 ~]$ curl > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002?user.name=ambari-qa1"; > {"exception":"ForbiddenException","message":"java.lang.Exception: User > ambari-qa1 is not allowed to read TimelineService V2 > data.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"} > [yarn@yarn-ats-3 ~]$ curl > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1552297011473_0002/entity-types?user.name=ambari-qa1"; > ["YARN_APPLICATION_ATTEMPT","YARN_CONTAINER"] > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8174) Add containerId to ResourceLocalizationService fetch failure log statement
[ https://issues.apache.org/jira/browse/YARN-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-8174: Summary: Add containerId to ResourceLocalizationService fetch failure log statement (was: Add containerId to ResourceLocalizationService fetch failure) > Add containerId to ResourceLocalizationService fetch failure log statement > -- > > Key: YARN-8174 > URL: https://issues.apache.org/jira/browse/YARN-8174 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-8174.1.patch, YARN-8174.2.patch, YARN-8174.3.patch > > > When a localization for a resource failed due to change in timestamp, there > is no containerId logged to correlate. > {code} > 2018-04-18 07:31:46,033 WARN localizer.ResourceLocalizationService > (ResourceLocalizationService.java:processHeartbeat(1017)) - { > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo, > 1524036694502, FILE, null } failed: Resource > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo > changed on src filesystem (expected 1524036694502, was 1524036694502 > java.io.IOException: Resource > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo > changed on src filesystem (expected 1524036694502, was 1524036694502 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:258) > at > org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:360) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:360) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8174) Add containerId to ResourceLocalizationService fetch failure log statement
[ https://issues.apache.org/jira/browse/YARN-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920619#comment-16920619 ] Abhishek Modi commented on YARN-8174: - v3 patch lgtm. Committed to trunk. > Add containerId to ResourceLocalizationService fetch failure log statement > -- > > Key: YARN-8174 > URL: https://issues.apache.org/jira/browse/YARN-8174 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-8174.1.patch, YARN-8174.2.patch, YARN-8174.3.patch > > > When a localization for a resource failed due to change in timestamp, there > is no containerId logged to correlate. > {code} > 2018-04-18 07:31:46,033 WARN localizer.ResourceLocalizationService > (ResourceLocalizationService.java:processHeartbeat(1017)) - { > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo, > 1524036694502, FILE, null } failed: Resource > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo > changed on src filesystem (expected 1524036694502, was 1524036694502 > java.io.IOException: Resource > hdfs://tarunhdp-1.openstacklocal:8020/user/ambari-qa/.staging/job_1523550428406_0016/job.splitmetainfo > changed on src filesystem (expected 1524036694502, was 1524036694502 > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:258) > at > org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:362) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:360) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:360) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9400) Remove unnecessary if at EntityGroupFSTimelineStore#parseApplicationId
[ https://issues.apache.org/jira/browse/YARN-9400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920624#comment-16920624 ] Abhishek Modi commented on YARN-9400: - Thanks [~Prabhu Joseph]. lgtm. will commit to trunk. > Remove unnecessary if at EntityGroupFSTimelineStore#parseApplicationId > -- > > Key: YARN-9400 > URL: https://issues.apache.org/jira/browse/YARN-9400 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-9400-001.patch > > > If clause to validate whether appIdStr starts with "application" is not > required at EntityGroupFSTimelineStore#parseApplicationId > {code} > // converts the String to an ApplicationId or null if conversion failed > private static ApplicationId parseApplicationId(String appIdStr) { > ApplicationId appId = null; > if (appIdStr.startsWith(ApplicationId.appIdStrPrefix)) { > try { > appId = ApplicationId.fromString(appIdStr); > } catch (IllegalArgumentException e) { > appId = null; > } > } > return appId; > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports
[ https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920634#comment-16920634 ] Abhishek Modi commented on YARN-9804: - Thanks [~rohithsharma] for working on it. Some minor comments: Road map include -> Road map includes Simple authorization in terms of a configurable whitelist of users and groups who can read timeline data -> Support for simple authorization has been added in terms of a configurable whitelist of users and groups who can read timeline data. YARN Client integrates with ATSv2. -> YARN Client has been integrated with ATSv2. This enables fetching application/attempt/container report from TimelineReader if details not present in ResouceManager. -> This enables fetching application/attempt/container report from TimelineReader if details are not present in ResouceManager. It set true -> If set true Since Yarn Cli support has been added, should we remove this line: Currently there is no support for command line access. > Update ATSv2 document for latest feature supports > - > > Key: YARN-9804 > URL: https://issues.apache.org/jira/browse/YARN-9804 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-9804.01.patch > > > Revisit ATSv2 documents and update for GA features. And also for the road map. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8139) Skip node hostname resolution when running SLS.
[ https://issues.apache.org/jira/browse/YARN-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi resolved YARN-8139. - Resolution: Duplicate > Skip node hostname resolution when running SLS. > --- > > Key: YARN-8139 > URL: https://issues.apache.org/jira/browse/YARN-8139 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > > Currently depending on the time taken in resolution of hostname, metrics of > SLS gets skewed. To avoid this, in this fix we are introducing a flag which > can be used to disable hostname resolutions. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9804) Update ATSv2 document for latest feature supports
[ https://issues.apache.org/jira/browse/YARN-9804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921170#comment-16921170 ] Abhishek Modi commented on YARN-9804: - Thanks [~rohithsharma]. New patch looks good to me. +1 from my end. > Update ATSv2 document for latest feature supports > - > > Key: YARN-9804 > URL: https://issues.apache.org/jira/browse/YARN-9804 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Attachments: YARN-9804.01.patch, YARN-9804.02.patch > > > Revisit ATSv2 documents and update for GA features. And also for the road map. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi reassigned YARN-9812: --- Assignee: Abhishek Modi > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9697: Attachment: YARN-9697.wip1.patch > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9782: Attachment: YARN-9782.002.patch > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9697) Efficient allocation of Opportunistic containers.
[ https://issues.apache.org/jira/browse/YARN-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924326#comment-16924326 ] Abhishek Modi commented on YARN-9697: - [~elgoiri] could you please review the approach taken in poc patch. If it looks good to you, I can clean it up and add some more UTs. Thanks. > Efficient allocation of Opportunistic containers. > - > > Key: YARN-9697 > URL: https://issues.apache.org/jira/browse/YARN-9697 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9697.ut.patch, YARN-9697.ut2.patch, > YARN-9697.wip1.patch > > > In the current implementation, opportunistic containers are allocated based > on the number of queued opportunistic container information received in node > heartbeat. This information becomes stale as soon as more opportunistic > containers are allocated on that node. > Allocation of opportunistic containers happens on the same heartbeat in which > AM asks for the containers. When multiple applications request for > Opportunistic containers, containers might get allocated on the same set of > nodes as already allocated containers on the node are not considered while > serving requests from different applications. This can lead to uneven > allocation of Opportunistic containers across the cluster leading to > increased queuing time -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924327#comment-16924327 ] Abhishek Modi commented on YARN-9782: - Thanks [~elgoiri] for review. Updated patch. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924361#comment-16924361 ] Abhishek Modi commented on YARN-9812: - [~aajisaka] [~elgoiri] could you please review it. Thanks. > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > Attachments: YARN-9812.001.patch > > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924455#comment-16924455 ] Abhishek Modi commented on YARN-9782: - [~elgoiri] I found a potential issue with this unit test. Since we are setting up java security settings, it would be set for all the following unit tests as all of them run within same java process. One way to avoid that is to run Unit tests for SLS project in separate java process, but that will increase runtime for the tests. Second option is to skip unit test for this. Since it's a very small change behind config, would it be possible to skip unit test for this? [~elgoiri] [~subru] could you please provide some suggestions here. Thanks. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9812: Attachment: YARN-9812.002.patch > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > Attachments: YARN-9812.001.patch, YARN-9812.002.patch > > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7604) Fix some minor typos in the opportunistic container logging
[ https://issues.apache.org/jira/browse/YARN-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924747#comment-16924747 ] Abhishek Modi commented on YARN-7604: - Thanks [~cheersyang] for the patch. Could you please move this log lines to use new log4j format. Thanks. > Fix some minor typos in the opportunistic container logging > --- > > Key: YARN-7604 > URL: https://issues.apache.org/jira/browse/YARN-7604 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.9.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Trivial > Attachments: YARN-7604.01.patch > > > Fix some minor text issues. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9812) mvn javadoc:javadoc fails in hadoop-sls
[ https://issues.apache.org/jira/browse/YARN-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924757#comment-16924757 ] Abhishek Modi commented on YARN-9812: - Thanks [~elgoiri] for review. Committed to trunk. > mvn javadoc:javadoc fails in hadoop-sls > --- > > Key: YARN-9812 > URL: https://issues.apache.org/jira/browse/YARN-9812 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Abhishek Modi >Priority: Major > Labels: newbie > Fix For: 3.3.0 > > Attachments: YARN-9812.001.patch, YARN-9812.002.patch > > > {noformat} > [ERROR] > hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:57: > error: bad use of '>' > [ERROR] * pending -> requests which are NOT yet sent to RM. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:58: > error: bad use of '>' > [ERROR] * scheduled -> requests which are sent to RM but not yet assigned. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:59: > error: bad use of '>' > [ERROR] * assigned -> requests which are assigned to a container. > [ERROR] ^ > [ERROR] > hadoop-mirror/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/DAGAMSimulator.java:60: > error: bad use of '>' > [ERROR] * completed -> request corresponding to which container has > completed. > [ERROR] ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
Abhishek Modi created YARN-9819: --- Summary: Make TestOpportunisticContainerAllocatorAMService more resilient. Key: YARN-9819 URL: https://issues.apache.org/jira/browse/YARN-9819 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi Currently, TestOpportunisticContainerAllocatorAMService tries to set the Opportunistic container status directly in RMNode but that can be updated by NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9819: Attachment: YARN-9819.001.patch > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9784) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue is flaky
[ https://issues.apache.org/jira/browse/YARN-9784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924852#comment-16924852 ] Abhishek Modi commented on YARN-9784: - Thanks [~kmarton] for the patch. LGTM. Thanks [~sunilg] and [~adam.antal] for additional reviews. Committed to trunk. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue > is flaky > --- > > Key: YARN-9784 > URL: https://issues.apache.org/jira/browse/YARN-9784 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.3.0 >Reporter: Julia Kinga Marton >Assignee: Julia Kinga Marton >Priority: Major > Attachments: YARN-9784.001.patch > > > There are some test cases in TestLeafQueue which are failing intermittently. > From 100 runs, there were 16 failures. > Some failure examples are the following ones: > {code:java} > 2019-08-26 13:18:13 [ERROR] Errors: > 2019-08-26 13:18:13 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:13 YarnConfigu... > 2019-08-26 13:18:13 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:13 YarnConfigu... > 2019-08-26 13:18:13 [INFO] > 2019-08-26 13:18:13 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:09 [ERROR] Failures: > 2019-08-26 13:18:09 [ERROR] TestLeafQueue.testHeadroomWithMaxCap:1373 > expected:<2048> but was:<0> > 2019-08-26 13:18:09 [INFO] > 2019-08-26 13:18:09 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:18 [ERROR] Errors: > 2019-08-26 13:18:18 [ERROR] TestLeafQueue.setUp:144->setUpInternal:221 > WrongTypeOfReturnValue > 2019-08-26 13:18:18 YarnConfigu... > 2019-08-26 13:18:18 [ERROR] TestLeafQueue.testHeadroomWithMaxCap:1307 ? > ClassCast org.apache.hadoop.yarn.c... > 2019-08-26 13:18:18 [INFO] > 2019-08-26 13:18:18 [ERROR] Tests run: 36, Failures: 0, Errors: 2, Skipped: 0 > {code} > {code:java} > 2019-08-26 13:18:10 [ERROR] Failures: > 2019-08-26 13:18:10 [ERROR] TestLeafQueue.testDRFUserLimits:847 Verify > user_0 got resources > 2019-08-26 13:18:10 [INFO] > 2019-08-26 13:18:10 [ERROR] Tests run: 36, Failures: 1, Errors: 0, Skipped: 0 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9819: Attachment: YARN-9819.002.patch > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925081#comment-16925081 ] Abhishek Modi commented on YARN-9819: - [~elgoiri] could you please review it. Unit test failure is not related to patch. > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925331#comment-16925331 ] Abhishek Modi commented on YARN-9821: - Thanks [~Prabhu Joseph] for the patch. Some minor comments: # Can we rename isHbaseUp => isStorageUp to make it more generic. # Can we log the exception too. Apart from these minor comments, it looks good to me. > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.Resul
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925481#comment-16925481 ] Abhishek Modi commented on YARN-9821: - Thanks [~Prabhu Joseph] for the patch and [~rohithsharma] for additional review. I have committed it to trunk. [~rohithsharma] should we commit it to 3.2 and 3.1 branch also? > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch, YARN-9821-002.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletio
[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925484#comment-16925484 ] Abhishek Modi commented on YARN-9816: - Thanks [~Prabhu Joseph]. changes looks good to me. will commit shortly. > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > {code} > One of our user has tried to distcp hdfs://ats/active dir. Distcp job has > created the > temp file .distcp.
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925490#comment-16925490 ] Abhishek Modi commented on YARN-9821: - Sure [~rohithsharma]. I am leaving this Jira as unresolved and you can mark it as resolved after you backport it to 3.2 branches. Thanks. > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch, YARN-9821-002.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at
[jira] [Updated] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9819: Attachment: YARN-9819.003.patch > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch, > YARN-9819.003.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925922#comment-16925922 ] Abhishek Modi commented on YARN-9819: - Thanks [~elgoiri] for review. Attached v3 patch with javadocs for all public functions. Private functions introduced in TestOpportunisticContainerAllocatorAMService are one liner and quite self explanatory. Please let me know if you think we need documentation there too. > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch, > YARN-9819.003.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9782: Attachment: YARN-9782.003.patch > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9782) Avoid DNS resolution while running SLS.
[ https://issues.apache.org/jira/browse/YARN-9782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926479#comment-16926479 ] Abhishek Modi commented on YARN-9782: - Test failure is not related to this patch and is happening because we are not able to delete a directory at end. [~elgoiri] could you please review latest patch. Thanks. > Avoid DNS resolution while running SLS. > --- > > Key: YARN-9782 > URL: https://issues.apache.org/jira/browse/YARN-9782 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9782.001.patch, YARN-9782.002.patch, > YARN-9782.003.patch > > > In SLS, we add nodes with random names and rack. DNS resolution of these > nodes takes around 2 seconds because it will timeout after that. This makes > the result of SLS unreliable and adds spikes. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9827) Fix Http Response code in GenericExceptionHandler.
Abhishek Modi created YARN-9827: --- Summary: Fix Http Response code in GenericExceptionHandler. Key: YARN-9827 URL: https://issues.apache.org/jira/browse/YARN-9827 Project: Hadoop YARN Issue Type: Bug Reporter: Abhishek Modi Assignee: Abhishek Modi GenericExceptionHandler should respond with SERVICE_UNAVAILABLE in case of connection and service unavailable exception instead of INTERNAL_SERVICE_ERROR. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8972) [Router] Add support to prevent DoS attack over ApplicationSubmissionContext size
[ https://issues.apache.org/jira/browse/YARN-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927813#comment-16927813 ] Abhishek Modi commented on YARN-8972: - [~giovanni.fumarola] are you still working on it. Thanks. > [Router] Add support to prevent DoS attack over ApplicationSubmissionContext > size > - > > Key: YARN-8972 > URL: https://issues.apache.org/jira/browse/YARN-8972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8972.v1.patch, YARN-8972.v2.patch, > YARN-8972.v3.patch, YARN-8972.v4.patch, YARN-8972.v5.patch > > > This jira tracks the effort to add a new interceptor in the Router to prevent > user to submit applications with oversized ASC. > This avoid YARN cluster to failover. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9819) Make TestOpportunisticContainerAllocatorAMService more resilient.
[ https://issues.apache.org/jira/browse/YARN-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928147#comment-16928147 ] Abhishek Modi commented on YARN-9819: - Thanks [~elgoiri] for review. Committed to trunk. > Make TestOpportunisticContainerAllocatorAMService more resilient. > - > > Key: YARN-9819 > URL: https://issues.apache.org/jira/browse/YARN-9819 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9819.001.patch, YARN-9819.002.patch, > YARN-9819.003.patch > > > Currently, TestOpportunisticContainerAllocatorAMService tries to set the > Opportunistic container status directly in RMNode but that can be updated by > NM heartbeat. Correct way would be to send it through NM heartbeat. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9828) Add log line for app submission in RouterWebServices.
Abhishek Modi created YARN-9828: --- Summary: Add log line for app submission in RouterWebServices. Key: YARN-9828 URL: https://issues.apache.org/jira/browse/YARN-9828 Project: Hadoop YARN Issue Type: Task Reporter: Abhishek Modi Assignee: Abhishek Modi -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928231#comment-16928231 ] Abhishek Modi commented on YARN-9816: - Sure.. committing it shortly. Thanks. > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > {code} > One of our user has tried to distcp hdfs://ats/active dir. Distcp job has > created the > temp file .distcp.tmp.attempt_155759136_39768_m_
[jira] [Updated] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are present under /ats/active.
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9816: Summary: EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are present under /ats/active. (was: EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError) > EntityGroupFSTimelineStore#scanActiveLogs fails when undesired files are > present under /ats/active. > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383
[jira] [Commented] (YARN-9794) RM crashes due to runtime errors in TimelineServiceV2Publisher
[ https://issues.apache.org/jira/browse/YARN-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929935#comment-16929935 ] Abhishek Modi commented on YARN-9794: - Thanks [~tarunparimi]. Latest patch looks good to me. Thanks [~Prabhu Joseph] for additional review. Committed to trunk. > RM crashes due to runtime errors in TimelineServiceV2Publisher > -- > > Key: YARN-9794 > URL: https://issues.apache.org/jira/browse/YARN-9794 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > Attachments: YARN-9794.001.patch, YARN-9794.002.patch > > > Saw that RM crashes while startup due to errors while putting entity in > TimelineServiceV2Publisher. > {code:java} > 2019-08-28 09:35:45,273 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.RuntimeException: java.lang.IllegalArgumentException: > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > CodedInputStream encountered an embedded string or message which claimed to > have negative size > . > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:200) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:269) > at > org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:437) > at > org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:312) > at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:597) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:834) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:732) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:236) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:321) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:285) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.flush(TypedBufferedMutator.java:66) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.flush(HBaseTimelineWriterImpl.java:566) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.flushBufferedTimelineEntities(TimelineCollector.java:173) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:150) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:459) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:494) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:483) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalArgumentException: > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > CodedInputStream encountered an embedded string or message which claimed to > have negative size. > at > org.apache.hbase.thirdparty.com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:117) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9842) Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2
Abhishek Modi created YARN-9842: --- Summary: Port YARN-9608 DecommissioningNodesWatcher should get lists of running applications on node from RMNode to branch-3.0/branch-2 Key: YARN-9842 URL: https://issues.apache.org/jira/browse/YARN-9842 Project: Hadoop YARN Issue Type: Task Reporter: Abhishek Modi Assignee: Abhishek Modi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org