[jira] [Commented] (YARN-7333) container-executor fails to remove entries from a directory that is not writable or executable

2017-10-16 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206580#comment-16206580
 ] 

Nathan Roberts commented on YARN-7333:
--

Thanks Jason. I'll commit this shortly.

> container-executor fails to remove entries from a directory that is not 
> writable or executable
> --
>
> Key: YARN-7333
> URL: https://issues.apache.org/jira/browse/YARN-7333
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1, 2.8.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7333.001.patch, YARN-7333.002.patch
>
>
> Similar to the situation from YARN-4594, container-executor will fail to 
> cleanup directories that do not have write and execute permissions for the 
> directory.  YARN-4594 fixed the scenario where the directory is not readable, 
> but it missed the case where we can open the directory but either not 
> traverse it (i.e.: no execute permission) or cannot remove entries from 
> within it (i.e.: no write permissions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7333) container-executor fails to remove entries from a directory that is not writable or executable

2017-10-16 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206438#comment-16206438
 ] 

Nathan Roberts commented on YARN-7333:
--

Thanks for the patch Jason. 
A couple of comments:
- Log line with "failed to unlink symlink" now applies to things that aren't 
symlinks
- Log line with "is_symlink_helper" should say "is_dir_helper"

I'm +1 after these small tweaks.


> container-executor fails to remove entries from a directory that is not 
> writable or executable
> --
>
> Key: YARN-7333
> URL: https://issues.apache.org/jira/browse/YARN-7333
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1, 2.8.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7333.001.patch
>
>
> Similar to the situation from YARN-4594, container-executor will fail to 
> cleanup directories that do not have write and execute permissions for the 
> directory.  YARN-4594 fixed the scenario where the directory is not readable, 
> but it missed the case where we can open the directory but either not 
> traverse it (i.e.: no execute permission) or cannot remove entries from 
> within it (i.e.: no write permissions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6219) NM web server related UT fails with "NMWebapps failed to start."

2017-09-08 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6219:
-
Fix Version/s: 2.9.0

> NM web server related UT fails with "NMWebapps failed to start."
> 
>
> Key: YARN-6219
> URL: https://issues.apache.org/jira/browse/YARN-6219
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Yesha Vora
>Assignee: Jason Lowe
> Fix For: 2.9.0
>
> Attachments: YARN-6219-branch-2.001.patch
>
>
> TestNodeStatusUpdater.testCompletedContainerStatusBackup and TestNMWebServer 
> UT fails with NMWebapps failed to start.
> {code}
> Error Message
> NMWebapps failed to start.
> Stacktrace
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: NMWebapps failed to 
> start.
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices.(NMWebServices.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices$$FastClassByGuice$$84485dc9.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>   at 
> com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
>   at 
> com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
>   at com.google.inject.Scopes$1$1.get(Scopes.java:65)
>   at 
> com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
>   at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>   at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>   at 
> com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
>   at 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory$GuiceInstantiatedComponentProvider.getInstance(GuiceComponentProviderFactory.java:332)
>   at 
> com.sun.jersey.server.impl.component.IoCResourceFactory$SingletonWrapper.init(IoCResourceFactory.java:178)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:584)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:581)
>   at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.getResourceComponentProvider(WebApplicationImpl.java:581)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:658)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:653)
>   at 
> com.sun.jersey.server.impl.application.RootResourceUriRules.(RootResourceUriRules.java:124)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._initiate(WebApplicationImpl.java:1298)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.access$700(WebApplicationImpl.java:169)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:775)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:771)
>   at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.initiate(WebApplicationImpl.java:771)
>   at 
> com.sun.jersey.guice.spi.container.servlet.GuiceContainer.initiate(GuiceContainer.java:121)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.initiate(ServletContainer.java:318)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:609)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:210)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:373)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:710)
>   at 
> com.google.inject.servlet.FilterDefinition.init(FilterDefinition.java:114)
>   at 
> 

[jira] [Commented] (YARN-6219) NM web server related UT fails with "NMWebapps failed to start."

2017-09-08 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159308#comment-16159308
 ] 

Nathan Roberts commented on YARN-6219:
--

+1. Thanks [~jlowe]. I will commit this shortly.

> NM web server related UT fails with "NMWebapps failed to start."
> 
>
> Key: YARN-6219
> URL: https://issues.apache.org/jira/browse/YARN-6219
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Yesha Vora
>Assignee: Jason Lowe
> Attachments: YARN-6219-branch-2.001.patch
>
>
> TestNodeStatusUpdater.testCompletedContainerStatusBackup and TestNMWebServer 
> UT fails with NMWebapps failed to start.
> {code}
> Error Message
> NMWebapps failed to start.
> Stacktrace
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: NMWebapps failed to 
> start.
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices.(NMWebServices.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices$$FastClassByGuice$$84485dc9.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>   at 
> com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
>   at 
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>   at 
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>   at 
> com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
>   at 
> com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
>   at com.google.inject.Scopes$1$1.get(Scopes.java:65)
>   at 
> com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
>   at 
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>   at 
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>   at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>   at 
> com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
>   at 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory$GuiceInstantiatedComponentProvider.getInstance(GuiceComponentProviderFactory.java:332)
>   at 
> com.sun.jersey.server.impl.component.IoCResourceFactory$SingletonWrapper.init(IoCResourceFactory.java:178)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:584)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$10.f(WebApplicationImpl.java:581)
>   at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.getResourceComponentProvider(WebApplicationImpl.java:581)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:658)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.initiateResource(WebApplicationImpl.java:653)
>   at 
> com.sun.jersey.server.impl.application.RootResourceUriRules.(RootResourceUriRules.java:124)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._initiate(WebApplicationImpl.java:1298)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.access$700(WebApplicationImpl.java:169)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:775)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl$13.f(WebApplicationImpl.java:771)
>   at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.initiate(WebApplicationImpl.java:771)
>   at 
> com.sun.jersey.guice.spi.container.servlet.GuiceContainer.initiate(GuiceContainer.java:121)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.initiate(ServletContainer.java:318)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:609)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:210)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:373)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:710)
>   at 
> com.google.inject.servlet.FilterDefinition.init(FilterDefinition.java:114)
>   at 
> 

[jira] [Commented] (YARN-6763) TestProcfsBasedProcessTree#testProcessTree fails in trunk

2017-08-18 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133624#comment-16133624
 ] 

Nathan Roberts commented on YARN-6763:
--

Sorry it took so long to get back to this issue. 

My feeling is that we have to rely on cgroups and YARN-2904 to properly fix 
this. It is possible in linux to launch a process in such a way that there is 
no way to trace it back to its container executor (at least not that I'm aware 
of). A very simple example is below:
{noformat}
$ setsid sleep 1000
$ ps -ef | grep sleep 
nroberts 121334  1  0 20:56 ?00:00:00 sleep 1000
$ cat /proc/121334/stat
121334 (sleep) S 1 121334 121334 0 -1 4202496 202 0 1 0 0 0 0 0 20 0 1 0 
78705750 105439232 155 18446744073709551615 4194304 4215308 140732871855488 
140732871855016 140092201749552 0 0 0 0 18446744071579549140 0 0 17 61 0 0 1 0 0
{noformat}

The sleep process has: ppid=1, pid=121334, session_id=121334, 
process_group_id=121334. There is no way to find the shell that actually 
launched it.

I'll fix the test so that it detects it's on a system where things don't 
re-parent to init. That way it at least wont' fail in this type of environment. 

Thoughts?

> TestProcfsBasedProcessTree#testProcessTree fails in trunk
> -
>
> Key: YARN-6763
> URL: https://issues.apache.org/jira/browse/YARN-6763
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>Assignee: Nathan Roberts
>Priority: Minor
>
> {code}
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.949 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree
> testProcessTree(org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree)  Time 
> elapsed: 7.119 sec  <<< FAILURE!
> java.lang.AssertionError: Child process owned by init escaped process tree.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java:184)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7014) container-executor has off-by-one error which can corrupt the heap

2017-08-15 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-7014:
-
Fix Version/s: 3.0.0-beta1

> container-executor has off-by-one error which can corrupt the heap
> --
>
> Key: YARN-7014
> URL: https://issues.apache.org/jira/browse/YARN-7014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-7014.001.patch
>
>
> test-container-executor is failing in trunk.
> {code}
> [INFO] 
> [INFO] --- hadoop-maven-plugins:3.0.0-beta1-SNAPSHOT:cmake-test 
> (test-container-executor) @ hadoop-yarn-server-nodemanager ---
> [INFO] ---
> [INFO]  C M A K E B U I L D E RT E S T
> [INFO] ---
> [INFO] test-container-executor: running 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor
> [INFO] with extra environment variables {}
> [INFO] STATUS: ERROR CODE 134 after 3714 millisecond(s).
> [INFO] ---
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 13:47 min
> [INFO] Finished at: 2017-08-12T12:58:55+00:00
> [INFO] Final Memory: 19M/296M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.hadoop:hadoop-maven-plugins:3.0.0-beta1-SNAPSHOT:cmake-test 
> (test-container-executor) on project hadoop-yarn-server-nodemanager: Test 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor
>  returned ERROR CODE 134 -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7014) container-executor has off-by-one error which can corrupt the heap

2017-08-15 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127824#comment-16127824
 ] 

Nathan Roberts commented on YARN-7014:
--

+1 on the patch. I will commit shortly.
Thanks [~jlowe] for the patch and  [~ebadger] and [~shaneku...@gmail.com] for 
the reviews!

> container-executor has off-by-one error which can corrupt the heap
> --
>
> Key: YARN-7014
> URL: https://issues.apache.org/jira/browse/YARN-7014
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: Shane Kumpf
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7014.001.patch
>
>
> test-container-executor is failing in trunk.
> {code}
> [INFO] 
> [INFO] --- hadoop-maven-plugins:3.0.0-beta1-SNAPSHOT:cmake-test 
> (test-container-executor) @ hadoop-yarn-server-nodemanager ---
> [INFO] ---
> [INFO]  C M A K E B U I L D E RT E S T
> [INFO] ---
> [INFO] test-container-executor: running 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor
> [INFO] with extra environment variables {}
> [INFO] STATUS: ERROR CODE 134 after 3714 millisecond(s).
> [INFO] ---
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 13:47 min
> [INFO] Finished at: 2017-08-12T12:58:55+00:00
> [INFO] Final Memory: 19M/296M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.hadoop:hadoop-maven-plugins:3.0.0-beta1-SNAPSHOT:cmake-test 
> (test-container-executor) on project hadoop-yarn-server-nodemanager: Test 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor
>  returned ERROR CODE 134 -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6867) AbstractYarnScheduler reports the configured maximum resources, instead of the actual, even after the configured waittime

2017-07-25 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts reassigned YARN-6867:


Assignee: Nathan Roberts

> AbstractYarnScheduler reports the configured maximum resources, instead of 
> the actual, even after the configured waittime
> -
>
> Key: YARN-6867
> URL: https://issues.apache.org/jira/browse/YARN-6867
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Muhammad Samir Khan
>Assignee: Nathan Roberts
> Attachments: YARN-6867.001.patch
>
>
> AbstractYarnScheduler has a configured wait time during which it reports the 
> maximum resources from the configuration instead of the actual resources 
> available in the cluster. However, the first query after the wait time 
> expiration is responded by the configured maximum resources instead of the 
> actual maximum resources. This can result in a app submission to fail with an 
> InvalidResourceRequestException (will attach a unit test in the patch) since 
> the maximum resources reported by the RM is different than the one it sanity 
> checks against at app submission.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-17 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089936#comment-16089936
 ] 

Nathan Roberts commented on YARN-6775:
--

Attached screenshots that show a couple of before/after metrics. Change went 
active early on the 14th.
1) rmeventprocbusy is avg cpu busy of the Event Processor thread
2) rpceventprocessingtimeschedulerport is avg rpc processing time for the 
scheduler port.


> CapacityScheduler: Improvements to assignContainers, avoid unnecessary 
> canAssignToUser/Queue calls
> --
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 3.0.0-beta1
>
> Attachments: rmeventprocbusy.png, rpcprocessingtimeschedulerport.png, 
> YARN-6775.001.patch, YARN-6775.002.patch, YARN-6775.branch-2.002.patch, 
> YARN-6775.branch-2.8.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-17 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6775:
-
Attachment: rpcprocessingtimeschedulerport.png

> CapacityScheduler: Improvements to assignContainers, avoid unnecessary 
> canAssignToUser/Queue calls
> --
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 3.0.0-beta1
>
> Attachments: rmeventprocbusy.png, rpcprocessingtimeschedulerport.png, 
> YARN-6775.001.patch, YARN-6775.002.patch, YARN-6775.branch-2.002.patch, 
> YARN-6775.branch-2.8.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-17 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6775:
-
Attachment: rmeventprocbusy.png

> CapacityScheduler: Improvements to assignContainers, avoid unnecessary 
> canAssignToUser/Queue calls
> --
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 3.0.0-beta1
>
> Attachments: rmeventprocbusy.png, YARN-6775.001.patch, 
> YARN-6775.002.patch, YARN-6775.branch-2.002.patch, 
> YARN-6775.branch-2.8.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-17 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089894#comment-16089894
 ] 

Nathan Roberts commented on YARN-6775:
--

[~leftnoteasy], I applied YARN-6775.branch-2.002.patch to branch 2 and 
YARN-6775.branch-2.8.002.patch to branch 2.8. I think they're ok. let me know 
if I'm missing something. 




> CapacityScheduler: Improvements to assignContainers, avoid unnecessary 
> canAssignToUser/Queue calls
> --
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6775.001.patch, YARN-6775.002.patch, 
> YARN-6775.branch-2.002.patch, YARN-6775.branch-2.8.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6775) CapacityScheduler: Improvements to assignContainers, avoid unnecessary canAssignToUser/Queue calls

2017-07-14 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6775:
-
Attachment: YARN-6775.branch-2.8.002.patch

> CapacityScheduler: Improvements to assignContainers, avoid unnecessary 
> canAssignToUser/Queue calls
> --
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6775.001.patch, YARN-6775.002.patch, 
> YARN-6775.branch-2.002.patch, YARN-6775.branch-2.8.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6768) Improve performance of yarn api record toString and fromString

2017-07-13 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086522#comment-16086522
 ] 

Nathan Roberts commented on YARN-6768:
--

Probably don't need to calculate full numDigits. once you have minimumDigits, 
you're done. 

> Improve performance of yarn api record toString and fromString
> --
>
> Key: YARN-6768
> URL: https://issues.apache.org/jira/browse/YARN-6768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-6768.1.patch, YARN-6768.2.patch, YARN-6768.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6775) CapacityScheduler: Improvements to assignContainers()

2017-07-13 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6775:
-
Attachment: YARN-6775.branch-2.002.patch

> CapacityScheduler: Improvements to assignContainers()
> -
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-6775.001.patch, YARN-6775.002.patch, 
> YARN-6775.branch-2.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6775) CapacityScheduler: Improvements to assignContainers()

2017-07-13 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085715#comment-16085715
 ] 

Nathan Roberts commented on YARN-6775:
--

Thanks [~leftnoteasy]. I looked at UT failures. I've seen these being flaky in 
the past. All pass locally except TestRMRestart which is a known problem: 
YARN-6759

I'll put up a branch-2 and 2.8 patch today.

> CapacityScheduler: Improvements to assignContainers()
> -
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-6775.001.patch, YARN-6775.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6775) CapacityScheduler: Improvements to assignContainers()

2017-07-12 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6775:
-
Attachment: YARN-6775.002.patch

Addressed checkstyle warnings and renamed rsrvd as requested

> CapacityScheduler: Improvements to assignContainers()
> -
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-6775.001.patch, YARN-6775.002.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6775) CapacityScheduler: Improvements to assignContainers()

2017-07-11 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083045#comment-16083045
 ] 

Nathan Roberts commented on YARN-6775:
--

Thanks [~leftnoteasy] for the review.

bq. 1) CachedUserLimit.canAssign is not necessary as we can set 
CachedUserLimit.reservation to UNBOUNDED initially.
I think it is necessary because we need to keep track of the largest 
reservation for which canAssignToUser() returns false. Anything smaller than 
the largest we've already calculated we know won't work so we can avoid the 
call. Therefore we can't start at UNBOUNDED and work our way down. 

bq. 2) Directly set cul.reservation = rsrv could be problematic under async 
scheduling logic since reserved resource of app could be updated while 
allocating.
Is this something we need to address? Couldn't this be mutating between the 
various lookups that are already occurring in today's assignContainers()?

bq. 3) Do you think is it necessary to add another Resource to track queue's 
verified_minimum_violated_reserved_resource similar to user limit?
My thought was we'll quickly run across an app that has no reservation and be 
able to skip assignToQueue() check from that point forward. The check against 
Resources.none() should be very fast compared to a resource comparison. I'm 
open to keeping track of the minimum though if you feel there would be 
sufficient gain.

Naming suggestions look good. I'll clean those up.(although minimum is actually 
a maximum)



> CapacityScheduler: Improvements to assignContainers()
> -
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-6775.001.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6797) TimelineWriter does not fully consume the POST response

2017-07-11 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082242#comment-16082242
 ] 

Nathan Roberts commented on YARN-6797:
--

Thanks Jason. +1 on this patch.

> TimelineWriter does not fully consume the POST response
> ---
>
> Key: YARN-6797
> URL: https://issues.apache.org/jira/browse/YARN-6797
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineclient
>Affects Versions: 2.8.1
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-6797.001.patch
>
>
> TimelineWriter does not fully consume the response to the POST request, and 
> that ends up preventing the HTTP client from being reused for the next write 
> of an entity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6763) TestProcfsBasedProcessTree#testProcessTree fails in trunk

2017-07-10 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080345#comment-16080345
 ] 

Nathan Roberts commented on YARN-6763:
--

[~bibinchundatt] - Out of curiosity, what is the OS environment you're running 
in (RHEL7, etc?). Still need to deal with cases where things don't get 
re-parented to 1, but was curious which environments were doing this by default 
now. 

> TestProcfsBasedProcessTree#testProcessTree fails in trunk
> -
>
> Key: YARN-6763
> URL: https://issues.apache.org/jira/browse/YARN-6763
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>Assignee: Nathan Roberts
>Priority: Minor
>
> {code}
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.949 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree
> testProcessTree(org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree)  Time 
> elapsed: 7.119 sec  <<< FAILURE!
> java.lang.AssertionError: Child process owned by init escaped process tree.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java:184)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6775) CapacityScheduler: Improvements to assignContainers()

2017-07-07 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078677#comment-16078677
 ] 

Nathan Roberts commented on YARN-6775:
--

Below is the list of changes included in the patch. Each is prefixed with the 
new throughput number as reported by the included unit test case. (Run as: mvn 
test -Dtest=TestCapacityScheduler#testUserLimitThroughput 
-DRunUserLimitThroughput=true)
* 13500 - Baseline (baseline was 9100 prior to Daryn's set of improvements in 
YARN-6242)
* 15000 - In computeUserLimitAndSetHeaderoom(), calculating headroom is not 
cheap so only do so if user metrics are enabled - which is the only thing that 
depends on the result of getHeadroom().
* 2 - cache userlimit calculation within assignContainers() + Avoid 
canAssignToQueue() check if we've already calculated the worst-case condition 
(no possibility of freeing up a reservation to satisfy the request)
* 24000 - Avoid canAssignToUser() if we've already determined this user is over 
its limit given the current application's reservation request
* 53000 - Check for shouldRecordThisNode() earlier in 
recordRejectedAppActivityFromLeafQueue() to avoid expensive calculations that 
will just be thrown away later

> CapacityScheduler: Improvements to assignContainers()
> -
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-6775.001.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6775) CapacityScheduler: Improvements to assignContainers()

2017-07-07 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6775:
-
Attachment: YARN-6775.001.patch

> CapacityScheduler: Improvements to assignContainers()
> -
>
> Key: YARN-6775
> URL: https://issues.apache.org/jira/browse/YARN-6775
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha3
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-6775.001.patch
>
>
> There are several things in assignContainers() that are done multiple times 
> even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
> some local caching to take advantage of this fact.
> Will post patch shortly. Patch includes a simple throughput test that 
> demonstrates when we have users at their user-limit, the number of 
> NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
> 50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6775) CapacityScheduler: Improvements to assignContainers()

2017-07-07 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-6775:


 Summary: CapacityScheduler: Improvements to assignContainers()
 Key: YARN-6775
 URL: https://issues.apache.org/jira/browse/YARN-6775
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 3.0.0-alpha3, 2.8.1
Reporter: Nathan Roberts
Assignee: Nathan Roberts


There are several things in assignContainers() that are done multiple times 
even though the result cannot change (canAssignToUser, canAssignToQueue). Add 
some local caching to take advantage of this fact.

Will post patch shortly. Patch includes a simple throughput test that 
demonstrates when we have users at their user-limit, the number of 
NodeUpdateSchedulerEvents we can process can be improved from 13K/sec to 
50K/sec.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6768) Improve performance of yarn api record toString and fromString

2017-07-07 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078331#comment-16078331
 ] 

Nathan Roberts commented on YARN-6768:
--

Thanks Jon! As a datapoint, I have a testcase that measures how quickly we can 
handle NodeUpdateSchedulerEvents when a user is at their user limit. This path 
causes the ActivitiesLogger to be invoked at a very high rate. The most 
expensive operation within the ActivitiesLogger is 
application.getApplicationId().toString(). When I apply the patch on this jira, 
the throughput improves from 22K/sec to 32K/sec.

> Improve performance of yarn api record toString and fromString
> --
>
> Key: YARN-6768
> URL: https://issues.apache.org/jira/browse/YARN-6768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-6768.1.patch, YARN-6768.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6763) TestProcfsBasedProcessTree#testProcessTree fails in trunk

2017-07-07 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078091#comment-16078091
 ] 

Nathan Roberts commented on YARN-6763:
--

[~bibinchundatt] thanks for reporting this. I'll take a look at what's causing 
the failure.

> TestProcfsBasedProcessTree#testProcessTree fails in trunk
> -
>
> Key: YARN-6763
> URL: https://issues.apache.org/jira/browse/YARN-6763
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>Assignee: Nathan Roberts
>Priority: Minor
>
> {code}
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.949 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree
> testProcessTree(org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree)  Time 
> elapsed: 7.119 sec  <<< FAILURE!
> java.lang.AssertionError: Child process owned by init escaped process tree.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java:184)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6763) TestProcfsBasedProcessTree#testProcessTree fails in trunk

2017-07-07 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts reassigned YARN-6763:


Assignee: Nathan Roberts

> TestProcfsBasedProcessTree#testProcessTree fails in trunk
> -
>
> Key: YARN-6763
> URL: https://issues.apache.org/jira/browse/YARN-6763
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>Assignee: Nathan Roberts
>Priority: Minor
>
> {code}
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.949 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree
> testProcessTree(org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree)  Time 
> elapsed: 7.119 sec  <<< FAILURE!
> java.lang.AssertionError: Child process owned by init escaped process tree.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.util.TestProcfsBasedProcessTree.testProcessTree(TestProcfsBasedProcessTree.java:184)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6649) RollingLevelDBTimelineServer throws RuntimeException if object decoding ever fails runtime exception

2017-05-31 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6649:
-
Fix Version/s: 2.8.2
   2.8.1
   2.9.0

> RollingLevelDBTimelineServer throws RuntimeException if object decoding ever 
> fails runtime exception
> 
>
> Key: YARN-6649
> URL: https://issues.apache.org/jira/browse/YARN-6649
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Fix For: 2.9.0, 2.8.1, 3.0.0-alpha4, 2.8.2
>
> Attachments: YARN-6649.1.patch, YARN-6649.2.patch
>
>
> When Using tez ui (makes REST api calls to timeline service REST api), some 
> calls were coming back as 500 internal server error. The root cause was 
> YARN-6654. This jira is to handle object decoding to prevent sending back 
> internal server errors to the client and instead respond with a partial 
> message instead.
> {code}
> 2017-05-30 12:47:10,670 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> javax.ws.rs.WebApplicationException: java.lang.RuntimeException: 
> java.io.IOException: java.lang.RuntimeException: unable to encodeValue class 
> from code 1000
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntity(TimelineWebServices.java:164)
>   at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:636)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:588)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:95)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> 

[jira] [Updated] (YARN-6649) RollingLevelDBTimelineServer throws RuntimeException if object decoding ever fails runtime exception

2017-05-31 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-6649:
-
Fix Version/s: 3.0.0-alpha4

> RollingLevelDBTimelineServer throws RuntimeException if object decoding ever 
> fails runtime exception
> 
>
> Key: YARN-6649
> URL: https://issues.apache.org/jira/browse/YARN-6649
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Fix For: 3.0.0-alpha4
>
> Attachments: YARN-6649.1.patch, YARN-6649.2.patch
>
>
> When Using tez ui (makes REST api calls to timeline service REST api), some 
> calls were coming back as 500 internal server error. The root cause was 
> YARN-6654. This jira is to handle object decoding to prevent sending back 
> internal server errors to the client and instead respond with a partial 
> message instead.
> {code}
> 2017-05-30 12:47:10,670 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> javax.ws.rs.WebApplicationException: java.lang.RuntimeException: 
> java.io.IOException: java.lang.RuntimeException: unable to encodeValue class 
> from code 1000
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntity(TimelineWebServices.java:164)
>   at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:636)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:588)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:95)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1352)
>   

[jira] [Commented] (YARN-6649) RollingLevelDBTimelineServer throws RuntimeException if object decoding ever fails runtime exception

2017-05-30 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030211#comment-16030211
 ] 

Nathan Roberts commented on YARN-6649:
--

Thanks Jon for the explanation. I'll commit this tomorrow morning unless there 
are objections.


> RollingLevelDBTimelineServer throws RuntimeException if object decoding ever 
> fails runtime exception
> 
>
> Key: YARN-6649
> URL: https://issues.apache.org/jira/browse/YARN-6649
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Attachments: YARN-6649.1.patch, YARN-6649.2.patch
>
>
> When Using tez ui (makes REST api calls to timeline service REST api), some 
> calls were coming back as 500 internal server error. The root cause was 
> YARN-6654. This jira is to handle object decoding to prevent sending back 
> internal server errors to the client and instead respond with a partial 
> message instead.
> {code}
> 2017-05-30 12:47:10,670 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> javax.ws.rs.WebApplicationException: java.lang.RuntimeException: 
> java.io.IOException: java.lang.RuntimeException: unable to encodeValue class 
> from code 1000
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntity(TimelineWebServices.java:164)
>   at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:636)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:588)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:95)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> 

[jira] [Commented] (YARN-6649) RollingLevelDBTimelineServer throws RuntimeException if object decoding ever fails runtime exception

2017-05-30 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030070#comment-16030070
 ] 

Nathan Roberts commented on YARN-6649:
--

Thanks Jon for the patch. Since there is no unit test, could you please comment 
on how you verified correctness. With that I am +1 on version 2 of the patch.

> RollingLevelDBTimelineServer throws RuntimeException if object decoding ever 
> fails runtime exception
> 
>
> Key: YARN-6649
> URL: https://issues.apache.org/jira/browse/YARN-6649
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Critical
> Attachments: YARN-6649.1.patch, YARN-6649.2.patch
>
>
> When Using tez ui (makes REST api calls to timeline service REST api), some 
> calls were coming back as 500 internal server error. The root cause was 
> YARN-6654. This jira is to handle object decoding to prevent sending back 
> internal server errors to the client and instead respond with a partial 
> message instead.
> {code}
> 2017-05-30 12:47:10,670 WARN 
> org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
> javax.ws.rs.WebApplicationException: java.lang.RuntimeException: 
> java.io.IOException: java.lang.RuntimeException: unable to encodeValue class 
> from code 1000
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntity(TimelineWebServices.java:164)
>   at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>   at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>   at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>   at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>   at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:636)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:588)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:95)
>   at 
> 

[jira] [Comment Edited] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-05-11 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007016#comment-16007016
 ] 

Nathan Roberts edited comment on YARN-6585 at 5/11/17 7:13 PM:
---

YARN-6143 changed AddToClusterNodeLabelsRequestProto such that field 1 is now 
referred to as deprecatedNodeLabels. FileSystemNodeLabelsStore is referencing 
field 2 (nodeLabels) which will not be present in a 2.7 labelStore.

CC: [~leftnoteasy] [~sunilg]



was (Author: nroberts):
YARN-6143 changed AddToClusterNodeLabelsRequestProto such that field 1 is now 
referred to as deprecatedNodeLabels. FileSystemNodeLabelsStore is referencing 
field 2 (nodeLabels) which will not be present in a 2.7 labelStore.

CC: [~leftnoteasy]


> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Priority: Blocker
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-05-11 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16007016#comment-16007016
 ] 

Nathan Roberts commented on YARN-6585:
--

YARN-6143 changed AddToClusterNodeLabelsRequestProto such that field 1 is now 
referred to as deprecatedNodeLabels. FileSystemNodeLabelsStore is referencing 
field 2 (nodeLabels) which will not be present in a 2.7 labelStore.

CC: [~leftnoteasy]


> RM fails to start when upgrading from 2.7 to 2.8 for clusters with node 
> labels.
> ---
>
> Key: YARN-6585
> URL: https://issues.apache.org/jira/browse/YARN-6585
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Payne
>Priority: Blocker
>
> {noformat}
> Caused by: java.io.IOException: Not all labels being replaced contained by 
> known label collections, please check, new labels=[abc]
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6344) Rethinking OFF_SWITCH locality in CapacityScheduler

2017-03-20 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932783#comment-15932783
 ] 

Nathan Roberts commented on YARN-6344:
--

+1 on improving localityWaitFactor. It definitely won't behave well for 
applications that ask for resources in small batches. 

> Rethinking OFF_SWITCH locality in CapacityScheduler
> ---
>
> Key: YARN-6344
> URL: https://issues.apache.org/jira/browse/YARN-6344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Konstantinos Karanasos
>
> When relaxing locality from node to rack, the {{node-locality-parameter}} is 
> used: when scheduling opportunities for a scheduler key are more than the 
> value of this parameter, we relax locality and try to assign the container to 
> a node in the corresponding rack.
> On the other hand, when relaxing locality to off-switch (i.e., assign the 
> container anywhere in the cluster), we are using a {{localityWaitFactor}}, 
> which is computed based on the number of outstanding requests for a specific 
> scheduler key, which is divided by the size of the cluster. 
> In case of applications that request containers in big batches (e.g., 
> traditional MR jobs), and for relatively small clusters, the 
> localityWaitFactor does not affect relaxing locality much.
> However, in case of applications that request containers in small batches, 
> this load factor takes a very small value, which leads to assigning 
> off-switch containers too soon. This situation is even more pronounced in big 
> clusters.
> For example, if an application requests only one container per request, the 
> locality will be relaxed after a single missed scheduling opportunity.
> The purpose of this JIRA is to rethink the way we are relaxing locality for 
> off-switch assignments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5179) Issue of CPU usage of containers

2017-02-06 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854387#comment-15854387
 ] 

Nathan Roberts commented on YARN-5179:
--

bq. I think ResourceUtilization.getCPU() has a similar sort of issue (i.e. it's 
difficult to interpret in cases where physical cores != vcores). Should we fix 
that here or in separate jira? Thoughts?
Nevermind. I think physicalResource allows the RM to figure out how to properly 
interpret getCPU().


> Issue of CPU usage of containers
> 
>
> Key: YARN-5179
> URL: https://issues.apache.org/jira/browse/YARN-5179
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Both on Windows and Linux
>Reporter: Zhongkai Mi
>
> // Multiply by 1000 to avoid losing data when converting to int 
>int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 
>   * maxVCoresAllottedForContainers /nodeCpuPercentageForYARN); 
> This formula will not get right CPU usage based vcore if vcores != physical 
> cores. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5179) Issue of CPU usage of containers

2017-02-06 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854374#comment-15854374
 ] 

Nathan Roberts commented on YARN-5179:
--

I think ResourceUtilization.getCPU() has a similar sort of issue (i.e. it's 
difficult to interpret in cases where physical cores != vcores). Should we fix 
that here or in separate jira? Thoughts?


> Issue of CPU usage of containers
> 
>
> Key: YARN-5179
> URL: https://issues.apache.org/jira/browse/YARN-5179
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Both on Windows and Linux
>Reporter: Zhongkai Mi
>
> // Multiply by 1000 to avoid losing data when converting to int 
>int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 
>   * maxVCoresAllottedForContainers /nodeCpuPercentageForYARN); 
> This formula will not get right CPU usage based vcore if vcores != physical 
> cores. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2904) Use linux cgroups to enhance container tear down

2016-12-16 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15754975#comment-15754975
 ] 

Nathan Roberts commented on YARN-2904:
--

Simple streaming job that does the following illustrates tasks escaping. 
(/usr/bin/timeout does a setpgrp() which puts it in its own session). 
{noformat}
#!/bin/bash
/usr/bin/timeout 1d /bin/sleep 1000
{noformat}

Mesos has apparently addressed this a couple of different ways including 1) 
freeze_container->kill_all_processes_in_container->unfreeze_container; or 2) 
use a private PID NS within the container and then kill PID1 within the 
container. 

> Use linux cgroups to enhance container tear down
> 
>
> Key: YARN-2904
> URL: https://issues.apache.org/jira/browse/YARN-2904
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Nathan Roberts
>
> If we are launching yarn containers within cgroups, linux provides some 
> guarantees that can help completely tear down a container.  Specifically, 
> linux guarantees that tasks can't escape a cgroup. We can use this fact to 
> tear down a yarn container without leaking tasks.
> Today, a SIGTERM is sent to the session (normally lead by bash). When the 
> session leader exits, the LCE sees this and assumes all resources have been 
> given back to the system. This is not guaranteed. Example: YARN-2809 
> implements a workaround that is only necessary because tasks are still 
> lingering within the cgroup when the nodemanager attempts to delete it.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2016-10-31 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622245#comment-15622245
 ] 

Nathan Roberts commented on YARN-5356:
--

Thanks [~elgoiri] for the update. +1 (non-binding) on version 7.

> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Inigo Goiri
>  Labels: oct16-medium
> Attachments: YARN-5356.000.patch, YARN-5356.001.patch, 
> YARN-5356.002.patch, YARN-5356.002.patch, YARN-5356.003.patch, 
> YARN-5356.004.patch, YARN-5356.005.patch, YARN-5356.006.patch, 
> YARN-5356.007.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2016-10-28 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616725#comment-15616725
 ] 

Nathan Roberts commented on YARN-5356:
--

Hi [~elgoiri]. Looked over version 6 of the patch. I am wondering about the 
comment regarding order-of-operations at: 
https://issues.apache.org/jira/browse/YARN-5356?focusedCommentId=15383184=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15383184

I think it's incorrect because we need the cast to happen after we've scaled 
down to MB, but right now I think it's happening on the result of 
getPhysicalMemorySize(). LMK if I'm not in sync on this.





> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Inigo Goiri
>  Labels: oct16-medium
> Attachments: YARN-5356.000.patch, YARN-5356.001.patch, 
> YARN-5356.002.patch, YARN-5356.002.patch, YARN-5356.003.patch, 
> YARN-5356.004.patch, YARN-5356.005.patch, YARN-5356.006.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-10-27 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4963:
-
Attachment: YARN-4963.004.patch

rebased with trunk

> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>  Labels: oct16-easy
> Attachments: YARN-4963.001.patch, YARN-4963.002.patch, 
> YARN-4963.003.patch, YARN-4963.004.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2016-09-19 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504823#comment-15504823
 ] 

Nathan Roberts commented on YARN-5356:
--

Hi [~elgoiri]. Tried out the patch but get NPE in RM because physicalResource 
is null. Think this code in 
org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.RegisterNodeManagerRequestPBImpl.setPhysicalResource
 needs to set it via the builder as well.
{code}
  @Override
  public synchronized void setPhysicalResource(Resource pPhysicalResource) {
maybeInitBuilder();
if (pPhysicalResource == null) {
  builder.clearPhysicalResource();
}
this.physicalResource = pPhysicalResource;
  }
{code}

> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Inigo Goiri
> Attachments: YARN-5356.000.patch, YARN-5356.001.patch, 
> YARN-5356.002.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3432) Cluster metrics have wrong Total Memory when there is reserved memory on CS

2016-09-01 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456537#comment-15456537
 ] 

Nathan Roberts commented on YARN-3432:
--

Recently ran into this issue again. Just seems wrong that totalMB fluctuates 
significantly due to reservedMB moving around. Since we can't reserve beyond 
the size of the cluster, it seems fine to have Total MB = Available MB + 
Allocated MB + Reserved MB for the capacity scheduler. 

I guess this could be an incompatible change to anyone who's worked around the 
problem by adding reservedMB to totalMB. [~vinodkv], [~ka...@cloudera.com], 
others have comments on this aspect?


> Cluster metrics have wrong Total Memory when there is reserved memory on CS
> ---
>
> Key: YARN-3432
> URL: https://issues.apache.org/jira/browse/YARN-3432
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3432-002.patch, YARN-3432-003.patch, YARN-3432.patch
>
>
> I noticed that when reservations happen when using the Capacity Scheduler, 
> the UI and web services report the wrong total memory.
> For example.  I have a 300GB of total memory in my cluster.  I allocate 50 
> and I reserve 10.  The cluster metrics for total memory get reported as 290GB.
> This was broken by https://issues.apache.org/jira/browse/YARN-656 so perhaps 
> there is a difference between fair scheduler and capacity scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5202) Dynamic Overcommit of Node Resources - POC

2016-08-26 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440131#comment-15440131
 ] 

Nathan Roberts commented on YARN-5202:
--

Yahoo has received a letter accusing the patches originally contributed as part 
of this jira (specifically YARN-5202.patch dated 06/Jun/16 16:35, and 
YARN-5202-branch2.7-uber.patch dated 18/Jul/16 12:45) of infringing two U.S. 
patents.  While Yahoo believes this assertion has no merit, the patches have 
been removed pending further investigation.

> Dynamic Overcommit of Node Resources - POC
> --
>
> Key: YARN-5202
> URL: https://issues.apache.org/jira/browse/YARN-5202
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> This Jira is to present a proof-of-concept implementation (collaboration 
> between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. 
>  The type of over-commit implemented in this jira is similar to but not as 
> full-featured as what's being implemented via YARN-1011. YARN-1011 is where 
> we see ourselves heading but we needed something quick and completely 
> transparent so that we could test it at scale with our varying workloads 
> (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much 
> additional capacity we can achieve with over-commit approaches, and has 
> fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  
> - simply adjust individual nodes' available resources based on current node 
> utilization and then let scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and 
> there is demand, slowly add resource. If memory starts to look over-utilized, 
> aggressively reduce the amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a 
> node gets too high, preempt something - preferably something from a 
> preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since 
> the patch is mainly to explain the approach, we didn't do any sort of testing 
> against trunk except for basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, 
> {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
> the patch is mainly UI, Config, Metrics, Tests, and some minor code 
> duplication (e.g. to optimize node resource changes we treat an over-commit 
> resource change differently than an updateNodeResource change - i.e. 
> remove_node/add_node is just too expensive for the frequency of over-commit 
> changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5202) Dynamic Overcommit of Node Resources - POC

2016-08-26 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5202:
-
Attachment: (was: YARN-5202.patch)

> Dynamic Overcommit of Node Resources - POC
> --
>
> Key: YARN-5202
> URL: https://issues.apache.org/jira/browse/YARN-5202
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> This Jira is to present a proof-of-concept implementation (collaboration 
> between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. 
>  The type of over-commit implemented in this jira is similar to but not as 
> full-featured as what's being implemented via YARN-1011. YARN-1011 is where 
> we see ourselves heading but we needed something quick and completely 
> transparent so that we could test it at scale with our varying workloads 
> (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much 
> additional capacity we can achieve with over-commit approaches, and has 
> fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  
> - simply adjust individual nodes' available resources based on current node 
> utilization and then let scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and 
> there is demand, slowly add resource. If memory starts to look over-utilized, 
> aggressively reduce the amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a 
> node gets too high, preempt something - preferably something from a 
> preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since 
> the patch is mainly to explain the approach, we didn't do any sort of testing 
> against trunk except for basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, 
> {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
> the patch is mainly UI, Config, Metrics, Tests, and some minor code 
> duplication (e.g. to optimize node resource changes we treat an over-commit 
> resource change differently than an updateNodeResource change - i.e. 
> remove_node/add_node is just too expensive for the frequency of over-commit 
> changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5202) Dynamic Overcommit of Node Resources - POC

2016-08-26 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5202:
-
Attachment: (was: YARN-5202-branch2.7-uber.patch)

> Dynamic Overcommit of Node Resources - POC
> --
>
> Key: YARN-5202
> URL: https://issues.apache.org/jira/browse/YARN-5202
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> This Jira is to present a proof-of-concept implementation (collaboration 
> between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. 
>  The type of over-commit implemented in this jira is similar to but not as 
> full-featured as what's being implemented via YARN-1011. YARN-1011 is where 
> we see ourselves heading but we needed something quick and completely 
> transparent so that we could test it at scale with our varying workloads 
> (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much 
> additional capacity we can achieve with over-commit approaches, and has 
> fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  
> - simply adjust individual nodes' available resources based on current node 
> utilization and then let scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and 
> there is demand, slowly add resource. If memory starts to look over-utilized, 
> aggressively reduce the amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a 
> node gets too high, preempt something - preferably something from a 
> preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since 
> the patch is mainly to explain the approach, we didn't do any sort of testing 
> against trunk except for basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, 
> {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
> the patch is mainly UI, Config, Metrics, Tests, and some minor code 
> duplication (e.g. to optimize node resource changes we treat an over-commit 
> resource change differently than an updateNodeResource change - i.e. 
> remove_node/add_node is just too expensive for the frequency of over-commit 
> changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-25 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437415#comment-15437415
 ] 

Nathan Roberts commented on YARN-5551:
--

I think the two examples you provided in the description are actually 2 very 
different cases. Notice how the first has an anonymous size of 0 while the 
second has the entire dirty region marked as anonymous. I think (not certain 
here), that this means in the first case the kernel actually has file-backed 
pages to write to if necessary. In the second case, I feel like anonymous means 
it does NOT have a place to put dirty pages (like maybe the file has been both 
truncated and unlinked). If that's a correct interpretation of "anonymous" then 
I feel like we should be counting the second mapping in the processes memory 
usage.



> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7f6123f99000-7f6163f99000 rw-p  08:41 211419477  
> /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache
>  (deleted)
> Size:1048576 kB
> Rss:  637292 kB
> Pss:  637292 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:637292 kB
> Referenced:   637292 kB
> Anonymous:637292 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5551) Ignore deleted file mapping from memory computation when smaps is enabled

2016-08-24 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435726#comment-15435726
 ] 

Nathan Roberts commented on YARN-5551:
--

bq. Nathan Roberts, Jason Lowe - do you mind reviewing the attached patch? It 
looks ok to me but you guys are more familiar with ProcfsBasedProcessTree.
I should have time to review this tomorrow. Hope that is ok.


> Ignore deleted file mapping from memory computation when smaps is enabled
> -
>
> Key: YARN-5551
> URL: https://issues.apache.org/jira/browse/YARN-5551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: YARN-5551.branch-2.001.patch
>
>
> Currently deleted file mappings are also included in the memory computation 
> when SMAP is enabled. For e.g
> {noformat}
> 7f612004a000-7f612004c000 rw-s  00:10 4201507513 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-521969216_162_734673185
>  (deleted)
> Size:  8 kB
> Rss:   4 kB
> Pss:   2 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  4 kB
> Private_Clean: 0 kB
> Private_Dirty: 0 kB
> Referenced:4 kB
> Anonymous: 0 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> MMUPageSize:   4 kB
> 7f6123f99000-7f6163f99000 rw-p  08:41 211419477  
> /grid/4/hadoop/yarn/local/usercache/root/appcache/application_1466700718395_1249/container_e19_1466700718395_1249_01_03/7389389356021597290.cache
>  (deleted)
> Size:1048576 kB
> Rss:  637292 kB
> Pss:  637292 kB
> Shared_Clean:  0 kB
> Shared_Dirty:  0 kB
> Private_Clean: 0 kB
> Private_Dirty:637292 kB
> Referenced:   637292 kB
> Anonymous:637292 kB
> AnonHugePages: 0 kB
> Swap:  0 kB
> KernelPageSize:4 kB
> {noformat}
> It would be good to exclude these from getSmapBasedRssMemorySize() 
> computation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5352) Allow container-executor to use private /tmp

2016-08-24 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435125#comment-15435125
 ] 

Nathan Roberts commented on YARN-5352:
--

bq. I think minimally this needs to be an optional feature (either at the 
cluster level or at the per-job level) that defaults to false.
I agree. I'm thinking that a cluster-level config with default=off, and then 
the best way to handle container-level support would be via the ongoing work to 
support docker containers.


> Allow container-executor to use private /tmp 
> -
>
> Key: YARN-5352
> URL: https://issues.apache.org/jira/browse/YARN-5352
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5352-v0.patch
>
>
> It's very common for user code to create things in /tmp. Yes, applications 
> have means to specify alternate tmp directories but doing so is opt-in and 
> therefore doesn't happen in many case.  At a minimum, linux can use private 
> namespaces to create a private /tmp for each container so that it's using the 
> same space allocated to containers and it's automatically cleaned up as part 
> of container clean-up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5540) Capacity Scheduler spends too much time looking at empty priorities

2016-08-19 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-5540:


 Summary: Capacity Scheduler spends too much time looking at empty 
priorities
 Key: YARN-5540
 URL: https://issues.apache.org/jira/browse/YARN-5540
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, resourcemanager
Affects Versions: 2.7.2
Reporter: Nathan Roberts
Assignee: Jason Lowe


We're starting to see the capacity scheduler run out of scheduling horsepower 
when running 500-1000 applications on clusters with 4K nodes or so.

This seems to be amplified by TEZ applications. TEZ applications have many more 
priorities (sometimes in the hundreds) than typical MR applications and 
therefore the loop in the scheduler which examines every priority within every 
running application, starts to be a hotspot. The priorities appear to stay 
around forever, even when there is no remaining resource request at that 
priority causing us to spend a lot of time looking at nothing.

jstack snippet:
{noformat}
"ResourceManager Event Processor" #28 prio=5 os_prio=0 tid=0x7fc2d453e800 
nid=0x22f3 runnable [0x7fc2a8be2000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.getResourceRequest(SchedulerApplicationAttempt.java:210)
- eliminated <0x0005e73e5dc0> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:852)
- locked <0x0005e73e5dc0> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp)
- locked <0x0003006fcf60> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:527)
- locked <0x0003001b22f8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:415)
- locked <0x0003001b22f8> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1224)
- locked <0x000300041e40> (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2016-08-18 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v7.patch

Thanks [~leftnoteasy] for the comments. I took both suggestions in latest patch 
and cleaned up some remaining checkstyle issues.


> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.8.0, 2.7.2, 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, 
> YARN-3388-v2.patch, YARN-3388-v3.patch, YARN-3388-v4.patch, 
> YARN-3388-v5.patch, YARN-3388-v6.patch, YARN-3388-v7.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2016-08-17 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v6.patch

Cleaned up most of the findbugs/checkstyle issues.

> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.8.0, 2.7.2, 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, 
> YARN-3388-v2.patch, YARN-3388-v3.patch, YARN-3388-v4.patch, 
> YARN-3388-v5.patch, YARN-3388-v6.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2016-08-16 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v5.patch

fixed build error.

> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.8.0, 2.7.2, 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, 
> YARN-3388-v2.patch, YARN-3388-v3.patch, YARN-3388-v4.patch, YARN-3388-v5.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2016-08-16 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v4.patch

Thanks for the comments [~jlowe]. upmerged and added whitespace.

> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, 
> YARN-3388-v2.patch, YARN-3388-v3.patch, YARN-3388-v4.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5352) Allow container-executor to use private /tmp

2016-07-19 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384496#comment-15384496
 ] 

Nathan Roberts commented on YARN-5352:
--

This patch doesn't address localization. Thinking was that localization doesn't 
run application code so while it may create files in /tmp (e.g. hsperf*), I 
wouldn't expect that to be a significant problem. I can look into addressing 
localization as well if folks think it's important.

> Allow container-executor to use private /tmp 
> -
>
> Key: YARN-5352
> URL: https://issues.apache.org/jira/browse/YARN-5352
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5352-v0.patch
>
>
> It's very common for user code to create things in /tmp. Yes, applications 
> have means to specify alternate tmp directories but doing so is opt-in and 
> therefore doesn't happen in many case.  At a minimum, linux can use private 
> namespaces to create a private /tmp for each container so that it's using the 
> same space allocated to containers and it's automatically cleaned up as part 
> of container clean-up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5352) Allow container-executor to use private /tmp

2016-07-19 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5352:
-
Attachment: YARN-5352-v0.patch

Patch that uses linux private namespace and bind mounts to achieve a private 
/tmp.


> Allow container-executor to use private /tmp 
> -
>
> Key: YARN-5352
> URL: https://issues.apache.org/jira/browse/YARN-5352
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5352-v0.patch
>
>
> It's very common for user code to create things in /tmp. Yes, applications 
> have means to specify alternate tmp directories but doing so is opt-in and 
> therefore doesn't happen in many case.  At a minimum, linux can use private 
> namespaces to create a private /tmp for each container so that it's using the 
> same space allocated to containers and it's automatically cleaned up as part 
> of container clean-up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2016-07-18 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383184#comment-15383184
 ] 

Nathan Roberts commented on YARN-5356:
--

Hi Inigo, couple more comments when looking over most recent patch:
{noformat}
+int physicalMemoryMb = (int) rcp.getPhysicalMemorySize() / (1024 * 1024);
{noformat}
- Think this needs to be (int)(rcp.getPhysicalMemorySize() / (1024 * 1024))
- Would be good to get some other comments on the approach. Once we have 
general agreement then probably need to add a couple of tests.

Thanks again.

> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Inigo Goiri
> Attachments: YARN-5356.000.patch, YARN-5356.001.patch, 
> YARN-5356.002.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5202) Dynamic Overcommit of Node Resources - POC

2016-07-18 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5202:
-
Attachment: YARN-5202-branch2.7-uber.patch

Attached an uber patch for branch 2.7 so that folks can experiment with dynamic 
overcommit on that release line. Some notes on the patch:
* The patch contains 2.8 back-ports of the following jira:
** YARN-4055. Report node resource utilization in heartbeat.   (Inigo Goiri via 
kasha)
** YARN-3534. Collect memory/cpu usage on the node. (Inigo Goiri via kasha)
** YARN-3116. RM notifies NM whether a container is an AM container or normal 
task container. (Giovanni Matteo Fumarola via zjshen)
** YARN-3980. Plumb resource-utilization info in node heartbeat through to the 
scheduler. (Inigo Goiri via kasha)
** YARN-1012. Report NM aggregated container resource utilization in heartbeat. 
(Inigo Goiri via kasha)
* I have built branch 2.7 with this patch but that is the extent of the testing 
at this point. The patch we are running in production is nearly identical so it 
should be in good shape.

Comments and feedback is welcome! 


> Dynamic Overcommit of Node Resources - POC
> --
>
> Key: YARN-5202
> URL: https://issues.apache.org/jira/browse/YARN-5202
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5202-branch2.7-uber.patch, YARN-5202.patch
>
>
> This Jira is to present a proof-of-concept implementation (collaboration 
> between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. 
>  The type of over-commit implemented in this jira is similar to but not as 
> full-featured as what's being implemented via YARN-1011. YARN-1011 is where 
> we see ourselves heading but we needed something quick and completely 
> transparent so that we could test it at scale with our varying workloads 
> (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much 
> additional capacity we can achieve with over-commit approaches, and has 
> fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  
> - simply adjust individual nodes' available resources based on current node 
> utilization and then let scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and 
> there is demand, slowly add resource. If memory starts to look over-utilized, 
> aggressively reduce the amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a 
> node gets too high, preempt something - preferably something from a 
> preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since 
> the patch is mainly to explain the approach, we didn't do any sort of testing 
> against trunk except for basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, 
> {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
> the patch is mainly UI, Config, Metrics, Tests, and some minor code 
> duplication (e.g. to optimize node resource changes we treat an over-commit 
> resource change differently than an updateNodeResource change - i.e. 
> remove_node/add_node is just too expensive for the frequency of over-commit 
> changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2016-07-18 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382613#comment-15382613
 ] 

Nathan Roberts commented on YARN-5356:
--

Thanks [~elgoiri] for the patch! Some quick comments

{noformat}
+int nodeMemoryMb = (int) rcp.getPhysicalMemorySize() / (1024 * 1024);
+int nodeVirtualCores = rcp.getNumProcessors();
+this.nodeResource = Resource.newInstance(nodeMemoryMb, nodeVirtualCores);
{noformat}
- Wondering if we should remove "Virtual" from variable names since these are 
real cores and the ratio of VCores to Cores isn't always 1. Another option 
might be  "nodePhysicalCores"?

{noformat}
+
+  /**
+   * Get the physical resources in the node to properly estimate resource
+   * utilization.
+   * @return Physical resources in the node.
+   */
+  public abstract Resource getNodeResource();
+
+  /**
+   * Set the physical resources in the node to properly estimate resource
+   * utilization.
+   * @param nodeResource Physical resources in the node.
+   */
+  public abstract void setNodeResource(Resource nodeResource);
{noformat}
- Difference between getResource() and getNodeResource() might lead to 
confusion. Maybe getPhysicalResource()?

- Should we change RMNodeImpl so that it contains the physicalNodeResource? 

> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Inigo Goiri
> Attachments: YARN-5356.000.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2016-07-13 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5356:
-
Description: 
Currently ResourceUtilization contains absolute quantities of resource used 
(e.g. 4096MB memory used). It would be good if the NM also communicated the 
actual physical resource capabilities of the node so that the RM can use this 
data to schedule more effectively (overcommit, etc)

Currently the only available information is the Resource the node registered 
with (or later updated using updateNodeResource). However, these aren't really 
sufficient to get a good view of how utilized a resource is. For example, if a 
node reports 400% CPU utilization, does that mean it's completely full, or 
barely utilized? Today there is no reliable way to figure this out.

[~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you have 
thoughts/opinions on this?

  was:
Currently ResourceUtilization contains absolute quantities of resource used 
(e.g. 4096MB memory used). It would be good if it also included how much of 
that resource is actually available on the node so that the RM can use this 
data to schedule more effectively (overcommit, etc)

Currently the only available information is the Resource the node registered 
with (or later updated using updateNodeResource). However, these aren't really 
sufficient to get a good view of how utilized a resource is. For example, if a 
node reports 400% CPU utilization, does that mean it's completely full, or 
barely utilized? Today there is no reliable way to figure this out.

[~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you have 
thoughts/opinions on this?


> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Inigo Goiri
> Attachments: YARN-5356.000.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if the NM also communicated the 
> actual physical resource capabilities of the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5356) NodeManager should communicate physical resource capability to ResourceManager

2016-07-13 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5356:
-
Summary: NodeManager should communicate physical resource capability to 
ResourceManager  (was: ResourceUtilization should also include resource 
availability)

> NodeManager should communicate physical resource capability to ResourceManager
> --
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Inigo Goiri
> Attachments: YARN-5356.000.patch
>
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if it also included how much of 
> that resource is actually available on the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5356) ResourceUtilization should also include resource availability

2016-07-12 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373394#comment-15373394
 ] 

Nathan Roberts commented on YARN-5356:
--

bq. I can post a patch with these changes if you want.
That would be great. If not I can work on it later this week.

> ResourceUtilization should also include resource availability
> -
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if it also included how much of 
> that resource is actually available on the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5356) ResourceUtilization should also include resource availability

2016-07-12 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372905#comment-15372905
 ] 

Nathan Roberts edited comment on YARN-5356 at 7/12/16 2:07 PM:
---

bq. Nathan Roberts, I understand that your problem is that with the current 
approach you know that you have 6 cores available to the NM and 4 of them are 
used. However, the machine is not that utilized (~30%). Correct? In that case, 
we would only need to report the actual size of the machine at registration 
time as it would never change. Not sure that ResourceUtilization would be the 
right place for that as it would be reported in every heartbeat continuously.

[~elgoiri], Yep, that's exactly correct. I think reporting the physical 
capabilities of the machine during registration should be ok. At least with 
linux it is technically possible for the machine to change (e.g. echo 0 > 
/sys/devices/system/cpu/cpu3/online, OR memory gets automatically removed 
because it's getting ECC errors, OR something reserves a bunch of memory for 
huge pages, OR NIC re-negotiates from 10G to 1G), but I think these might be 
unusual enough that we could ignore them. I originally suggested tweaking 
ResourceUtilization due to this small chance of a physical resource changing 
but am happy to go either way. 


was (Author: nroberts):
bq. Nathan Roberts, I understand that your problem is that with the current 
approach you know that you have 6 cores available to the NM and 4 of them are 
used. However, the machine is not that utilized (~30%). Correct? In that case, 
we would only need to report the actual size of the machine at registration 
time as it would never change. Not sure that ResourceUtilization would be the 
right place for that as it would be reported in every heartbeat continuously.

[~elgoiri], Yep, that's exactly correct. I think reporting the physical 
capabilities of the machine during registration should be ok. At least with 
linux it is technically possible for the machine to change (e.g. echo 0 > 
/sys/devices/system/cpu/cpu3, OR memory gets automatically removed because it's 
getting ECC errors, OR something reserves a bunch of memory for huge pages, OR 
NIC re-negotiates from 10G to 1G), but I think these might be unusual enough 
that we could ignore them. I originally suggested tweaking ResourceUtilization 
due to this small chance of a physical resource changing but am happy to go 
either way. 

> ResourceUtilization should also include resource availability
> -
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if it also included how much of 
> that resource is actually available on the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5356) ResourceUtilization should also include resource availability

2016-07-12 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372905#comment-15372905
 ] 

Nathan Roberts commented on YARN-5356:
--

bq. Nathan Roberts, I understand that your problem is that with the current 
approach you know that you have 6 cores available to the NM and 4 of them are 
used. However, the machine is not that utilized (~30%). Correct? In that case, 
we would only need to report the actual size of the machine at registration 
time as it would never change. Not sure that ResourceUtilization would be the 
right place for that as it would be reported in every heartbeat continuously.

[~elgoiri], Yep, that's exactly correct. I think reporting the physical 
capabilities of the machine during registration should be ok. At least with 
linux it is technically possible for the machine to change (e.g. echo 0 > 
/sys/devices/system/cpu/cpu3, OR memory gets automatically removed because it's 
getting ECC errors, OR something reserves a bunch of memory for huge pages, OR 
NIC re-negotiates from 10G to 1G), but I think these might be unusual enough 
that we could ignore them. I originally suggested tweaking ResourceUtilization 
due to this small chance of a physical resource changing but am happy to go 
either way. 

> ResourceUtilization should also include resource availability
> -
>
> Key: YARN-5356
> URL: https://issues.apache.org/jira/browse/YARN-5356
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>
> Currently ResourceUtilization contains absolute quantities of resource used 
> (e.g. 4096MB memory used). It would be good if it also included how much of 
> that resource is actually available on the node so that the RM can use this 
> data to schedule more effectively (overcommit, etc)
> Currently the only available information is the Resource the node registered 
> with (or later updated using updateNodeResource). However, these aren't 
> really sufficient to get a good view of how utilized a resource is. For 
> example, if a node reports 400% CPU utilization, does that mean it's 
> completely full, or barely utilized? Today there is no reliable way to figure 
> this out.
> [~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you 
> have thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5356) ResourceUtilization should also include resource availability

2016-07-11 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-5356:


 Summary: ResourceUtilization should also include resource 
availability
 Key: YARN-5356
 URL: https://issues.apache.org/jira/browse/YARN-5356
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 3.0.0-alpha1
Reporter: Nathan Roberts


Currently ResourceUtilization contains absolute quantities of resource used 
(e.g. 4096MB memory used). It would be good if it also included how much of 
that resource is actually available on the node so that the RM can use this 
data to schedule more effectively (overcommit, etc)

Currently the only available information is the Resource the node registered 
with (or later updated using updateNodeResource). However, these aren't really 
sufficient to get a good view of how utilized a resource is. For example, if a 
node reports 400% CPU utilization, does that mean it's completely full, or 
barely utilized? Today there is no reliable way to figure this out.

[~elgoiri] - Lots of good work is happening in YARN-2965 so curious if you have 
thoughts/opinions on this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5352) Allow container-executor to use private /tmp

2016-07-11 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-5352:


 Summary: Allow container-executor to use private /tmp 
 Key: YARN-5352
 URL: https://issues.apache.org/jira/browse/YARN-5352
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Nathan Roberts
Assignee: Nathan Roberts


It's very common for user code to create things in /tmp. Yes, applications have 
means to specify alternate tmp directories but doing so is opt-in and therefore 
doesn't happen in many case.  At a minimum, linux can use private namespaces to 
create a private /tmp for each container so that it's using the same space 
allocated to containers and it's automatically cleaned up as part of container 
clean-up.







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5215) Scheduling containers based on external load in the servers

2016-06-15 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332601#comment-15332601
 ] 

Nathan Roberts commented on YARN-5215:
--

Thanks [~elgoiri] for the work. Maybe Summit would be a good time to get 
interested parties together to settle on a direction?

I do see this being very similar to what YARN-5202 is doing. In fact I think if 
we just removed the lower bounds in YARN-5202 (i.e. allow it to go below a 
node's declared resource), it would effectively accomplish the same thing. e.g. 
if a memory hungry process starts up on a node, node utilization will increase 
beyond the desired thresholds and the node's resource available for scheduling 
will be reduced. In my mind,  we should basically set  a utilization target and 
then have schedulerNode adjust the node's resource either up or down depending 
on where we are in relation to the target. The inputs used to decide if and by 
how-much a node's resource should be adjusted, is where I think it's 
interesting.

Regarding the patch. At least on Linux I think we have to be careful about 
aggregating all of the container utilizations together. A simple example where 
I think this might not do the right thing is a large MR job that is looking up 
data in a large mmap'ed lookup table. RSS as calculated via /proc//stat 
does not understand shared pages (afaik). This means we'll be double-counting 
this mmap'ed file for every container running on the node. We're frequently 
running 50+ containers on a node so if this job has lots of tasks running on a 
node, we'd have 10's of GB of error.  I know we keep it from going negative 
which is impportant, but in this case we'll underestimate the amount of 
external resource running on the node. 
{noformat}
+  externalUtilization = ResourceUtilization.newInstance(nodeUtilization);
+  externalUtilization.subtractFrom(
+  containersUtilization.getPhysicalMemory(),
+  containersUtilization.getVirtualMemory(),
+  containersUtilization.getCPU());
{noformat}


> Scheduling containers based on external load in the servers
> ---
>
> Key: YARN-5215
> URL: https://issues.apache.org/jira/browse/YARN-5215
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Inigo Goiri
> Attachments: YARN-5215.000.patch, YARN-5215.001.patch
>
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit

2016-06-14 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-3388:
-
Attachment: YARN-3388-v3.patch

[~leftnoteasy], [~eepayne]. Ok, "soon" was extremely relative;) Sorry about 
that. 

I think I addressed Wangda's comments but I need label partition experts to 
take a look.

Any ideas why people don't hit this more often? We find it's very easy to get 
stuck at queueCapacity even though userLimitFactor and maxCapacity say the 
system should allocate further. Do you think people aren't using DRF and are 
mostly just using memory as the resource?

> Allocation in LeafQueue could get stuck because DRF calculator isn't well 
> supported when computing user-limit
> -
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch, 
> YARN-3388-v2.patch, YARN-3388-v3.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5214) Pending on synchronized method DirectoryCollection#checkDirs can hang NM's NodeStatusUpdater

2016-06-10 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324682#comment-15324682
 ] 

Nathan Roberts commented on YARN-5214:
--

[~djp]. I agree it makes sense to keep the heartbeat path as lock free as 
possible. 

> Pending on synchronized method DirectoryCollection#checkDirs can hang NM's 
> NodeStatusUpdater
> 
>
> Key: YARN-5214
> URL: https://issues.apache.org/jira/browse/YARN-5214
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> In one cluster, we notice NM's heartbeat to RM is suddenly stopped and wait a 
> while and marked LOST by RM. From the log, the NM daemon is still running, 
> but jstack hints NM's NodeStatusUpdater thread get blocked:
> 1.  Node Status Updater thread get blocked by 0x8065eae8 
> {noformat}
> "Node Status Updater" #191 prio=5 os_prio=0 tid=0x7f0354194000 nid=0x26fa 
> waiting for monitor entry [0x7f035945a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.getFailedDirs(DirectoryCollection.java:170)
> - waiting to lock <0x8065eae8> (a 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getDisksHealthReport(LocalDirsHandlerService.java:287)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.getHealthReport(NodeHealthCheckerService.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:389)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$300(NodeStatusUpdaterImpl.java:83)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:643)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> 2. The actual holder of this lock is DiskHealthMonitor:
> {noformat}
> "DiskHealthMonitor-Timer" #132 daemon prio=5 os_prio=0 tid=0x7f0397393000 
> nid=0x26bd runnable [0x7f035e511000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1316)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.verifyDirUsingMkdir(DirectoryCollection.java:340)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.testDirs(DirectoryCollection.java:312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:231)
> - locked <0x8065eae8> (a 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:389)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$400(LocalDirsHandlerService.java:50)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:122)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> {noformat}
> This disk operation could take longer time than expectation especially in 
> high IO throughput case and we should have fine-grained lock for related 
> operations here. 
> The same issue on HDFS get raised and fixed in HDFS-7489, and we probably 
> should have similar fix here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5214) Pending on synchronized method DirectoryCollection#checkDirs can hang NM's NodeStatusUpdater

2016-06-08 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321342#comment-15321342
 ] 

Nathan Roberts commented on YARN-5214:
--

I'm not suggesting this change shouldn't be made but keep in mind that if the 
NM is having trouble performing this type of action within the timeout (10 
minutes or so), then the node is not very healthy and probably shouldn't be 
given anything more to run until the situation improves. It's going to have 
trouble doing all sorts of other things as well so having it look unhealthy in 
some fashion isn't all bad. If we somehow keep heartbeats completely free of 
I/O, then the RM will keep assigning containers that will likely run into 
exactly the same slowness. 

We used to see similar issues that we resolved by switching to the deadline I/O 
scheduler (assuming linux). See 
https://issues.apache.org/jira/browse/HDFS-9239?focusedCommentId=15218302=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15218302


> Pending on synchronized method DirectoryCollection#checkDirs can hang NM's 
> NodeStatusUpdater
> 
>
> Key: YARN-5214
> URL: https://issues.apache.org/jira/browse/YARN-5214
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> In one cluster, we notice NM's heartbeat to RM is suddenly stopped and wait a 
> while and marked LOST by RM. From the log, the NM daemon is still running, 
> but jstack hints NM's NodeStatusUpdater thread get blocked:
> 1.  Node Status Updater thread get blocked by 0x8065eae8 
> {noformat}
> "Node Status Updater" #191 prio=5 os_prio=0 tid=0x7f0354194000 nid=0x26fa 
> waiting for monitor entry [0x7f035945a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.getFailedDirs(DirectoryCollection.java:170)
> - waiting to lock <0x8065eae8> (a 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getDisksHealthReport(LocalDirsHandlerService.java:287)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.getHealthReport(NodeHealthCheckerService.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:389)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$300(NodeStatusUpdaterImpl.java:83)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:643)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> 2. The actual holder of this lock is DiskHealthMonitor:
> {noformat}
> "DiskHealthMonitor-Timer" #132 daemon prio=5 os_prio=0 tid=0x7f0397393000 
> nid=0x26bd runnable [0x7f035e511000]
>java.lang.Thread.State: RUNNABLE
> at java.io.UnixFileSystem.createDirectory(Native Method)
> at java.io.File.mkdir(File.java:1316)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.verifyDirUsingMkdir(DirectoryCollection.java:340)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.testDirs(DirectoryCollection.java:312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:231)
> - locked <0x8065eae8> (a 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:389)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$400(LocalDirsHandlerService.java:50)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:122)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> {noformat}
> This disk operation could take longer time than expectation especially in 
> high IO throughput case and we should have fine-grained lock for related 
> operations here. 
> The same issue on HDFS get raised and fixed in HDFS-7489, and we probably 
> should have similar fix here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org

[jira] [Updated] (YARN-5202) Dynamic Overcommit of Node Resources - POC

2016-06-06 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5202:
-
Attachment: YARN-5202.patch


Originally branched from commit: 42f90ab885d9693fcc1e52f9637f7de410ae

> Dynamic Overcommit of Node Resources - POC
> --
>
> Key: YARN-5202
> URL: https://issues.apache.org/jira/browse/YARN-5202
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5202.patch
>
>
> This Jira is to present a proof-of-concept implementation (collaboration 
> between [~jlowe] and myself) of a dynamic over-commit implementation in YARN. 
>  The type of over-commit implemented in this jira is similar to but not as 
> full-featured as what's being implemented via YARN-1011. YARN-1011 is where 
> we see ourselves heading but we needed something quick and completely 
> transparent so that we could test it at scale with our varying workloads 
> (mainly MapReduce, Spark, and Tez). Doing so has shed some light on how much 
> additional capacity we can achieve with over-commit approaches, and has 
> fleshed out some of the problems these approaches will face.
> Primary design goals:
> - Avoid changing protocols, application frameworks, or core scheduler logic,  
> - simply adjust individual nodes' available resources based on current node 
> utilization and then let scheduler do what it normally does
> - Over-commit slowly, pull back aggressively - If things are looking good and 
> there is demand, slowly add resource. If memory starts to look over-utilized, 
> aggressively reduce the amount of over-commit.
> - Make sure the nodes protect themselves - i.e. if memory utilization on a 
> node gets too high, preempt something - preferably something from a 
> preemptable queue
> A patch against trunk will be attached shortly.  Some notes on the patch:
> - This feature was originally developed against something akin to 2.7.  Since 
> the patch is mainly to explain the approach, we didn't do any sort of testing 
> against trunk except for basic build and basic unit tests
> - The key pieces of functionality are in {{SchedulerNode}}, 
> {{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
> the patch is mainly UI, Config, Metrics, Tests, and some minor code 
> duplication (e.g. to optimize node resource changes we treat an over-commit 
> resource change differently than an updateNodeResource change - i.e. 
> remove_node/add_node is just too expensive for the frequency of over-commit 
> changes)
> - We only over-commit memory at this point. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5202) Dynamic Overcommit of Node Resources - POC

2016-06-06 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-5202:


 Summary: Dynamic Overcommit of Node Resources - POC
 Key: YARN-5202
 URL: https://issues.apache.org/jira/browse/YARN-5202
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Affects Versions: 3.0.0-alpha1
Reporter: Nathan Roberts
Assignee: Nathan Roberts


This Jira is to present a proof-of-concept implementation (collaboration 
between [~jlowe] and myself) of a dynamic over-commit implementation in YARN.  
The type of over-commit implemented in this jira is similar to but not as 
full-featured as what's being implemented via YARN-1011. YARN-1011 is where we 
see ourselves heading but we needed something quick and completely transparent 
so that we could test it at scale with our varying workloads (mainly MapReduce, 
Spark, and Tez). Doing so has shed some light on how much additional capacity 
we can achieve with over-commit approaches, and has fleshed out some of the 
problems these approaches will face.

Primary design goals:
- Avoid changing protocols, application frameworks, or core scheduler logic,  - 
simply adjust individual nodes' available resources based on current node 
utilization and then let scheduler do what it normally does
- Over-commit slowly, pull back aggressively - If things are looking good and 
there is demand, slowly add resource. If memory starts to look over-utilized, 
aggressively reduce the amount of over-commit.
- Make sure the nodes protect themselves - i.e. if memory utilization on a node 
gets too high, preempt something - preferably something from a preemptable queue

A patch against trunk will be attached shortly.  Some notes on the patch:
- This feature was originally developed against something akin to 2.7.  Since 
the patch is mainly to explain the approach, we didn't do any sort of testing 
against trunk except for basic build and basic unit tests
- The key pieces of functionality are in {{SchedulerNode}}, 
{{AbstractYarnScheduler}}, and {{NodeResourceMonitorImpl}}. The remainder of 
the patch is mainly UI, Config, Metrics, Tests, and some minor code duplication 
(e.g. to optimize node resource changes we treat an over-commit resource change 
differently than an updateNodeResource change - i.e. remove_node/add_node is 
just too expensive for the frequency of over-commit changes)
- We only over-commit memory at this point. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-05-12 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4963:
-
Attachment: YARN-4963.003.patch

Thank you [~leftnoteasy] for the comments!

I have addressed them in the latest patch.


> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch, YARN-4963.002.patch, 
> YARN-4963.003.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting

2016-05-11 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280191#comment-15280191
 ] 

Nathan Roberts commented on YARN-5039:
--

Thanks [~milesc]. This seems to be an Amazon emr thing (unless I'm 
misunderstanding the log messages). 

Here are the important pieces:

Every time the scheduler is trying to schedule on a node with sufficient room, 
it is bailing out claiming it's not on the right type of emr node:
{noformat}
# egrep -i "node being looked for|is excluded" whole-scheduler-at-debug.log
2016-05-11 00:55:46,818 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Node being looked for scheduling 
ip-10-12-40-239.us-west-2.compute.internal:8041 availableResource: 

2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerAppUtils 
(ResourceManager Event Processor): node 
ip-10-12-40-239.us-west-2.compute.internal with emrlabel:TASK is excluded to 
request with emrLabel:MASTER,CORE
{noformat}

And below you see it consider the 0041 application and everything looks 
promising until the node is excluded. This is an emr-specific check which is 
why it wasn't making a lot of sense as to how this could happen.
{noformat}
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): pre-assignContainers for application 
application_1462722347496_0041
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): User limit computation for ai2service in 
queue default userLimit=100 userLimitFactor=1.0 required:  consumed:  limit:  
queueCapacity:  qconsumed:  currentCapacity:  activeUsers: 1 
clusterCapacity: 
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt
 (ResourceManager Event Processor): showRequests: 
application=application_1462722347496_0041 headRoom= 
currentConsumption=0
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt
 (ResourceManager Event Processor): showRequests: 
application=application_1462722347496_0041 request={Priority: 0, Capability: 
, # Containers: 1, Location: *, Relax Locality: true}
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): needsContainers: app.#re-reserve=636 
reserved=2 nodeFactor=0.20974576 minAllocFactor=0.99986756 starvation=251
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): User limit computation for ai2service in 
queue default userLimit=100 userLimitFactor=1.0 required:  consumed:  limit:  
queueCapacity:  qconsumed:  currentCapacity:  activeUsers: 1 
clusterCapacity: 
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): Headroom calculation for user ai2service:  
userLimit= queueMaxAvailRes= consumed= headroom=
2016-05-11 00:55:46,819 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerAppUtils 
(ResourceManager Event Processor): node 
ip-10-12-40-239.us-west-2.compute.internal with emrlabel:TASK is excluded to 
request with emrLabel:MASTER,CORE
{noformat}

I suspect EMR is not wanting to schedule AMs on nodes that are more likely to 
go away (TASK nodes). Once it gets the AM running though, it takes off. 
 
Maybe someone from Amazon can chime-in?? cc [~danzhi]


> Applications ACCEPTED but not starting
> --
>
> Key: YARN-5039
> URL: https://issues.apache.org/jira/browse/YARN-5039
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Miles Crawford
> Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 
> 2016-05-04 at 2.41.22 PM.png, capacity-scheduler-at-debug.log.gz, 
> queue-config.log, resource-manager-application-starts.log.gz, 
> whole-scheduler-at-debug.log.gz, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized 

[jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting

2016-05-10 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279018#comment-15279018
 ] 

Nathan Roberts commented on YARN-5039:
--

Thanks [~milesc]! Still not quite enough. How about 
org.apache.hadoop.yarn.server.resourcemanager.scheduler? 

BTW. Thanks so much for helping to track this down.


> Applications ACCEPTED but not starting
> --
>
> Key: YARN-5039
> URL: https://issues.apache.org/jira/browse/YARN-5039
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Miles Crawford
> Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 
> 2016-05-04 at 2.41.22 PM.png, capacity-scheduler-at-debug.log.gz, 
> queue-config.log, resource-manager-application-starts.log.gz, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they 
> sit, unable to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the 
> resourcemanger logs show that scheduling is being skipped. The scheduling is 
> skipped because the application itself has reserved the node? I'm not sure 
> how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-05-10 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278369#comment-15278369
 ] 

Nathan Roberts commented on YARN-4963:
--

I don't believe test failures are related to this change.

If someone has a few cycles for a review that would be great.  This patch 
covers step #1 agreed to above. step #2 will be handled via YARN-5013.

> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch, YARN-4963.002.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5039) Applications ACCEPTED but not starting

2016-05-10 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278314#comment-15278314
 ] 

Nathan Roberts commented on YARN-5039:
--

[~milesc], if you have it in this state again, would you mind enabling DEBUG 
for all of capacity scheduler: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity

I wasn't seeing enough information in LeafQueue to figure out why it's not 
scheduling on some of the nodes.

> Applications ACCEPTED but not starting
> --
>
> Key: YARN-5039
> URL: https://issues.apache.org/jira/browse/YARN-5039
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Miles Crawford
> Attachments: Screen Shot 2016-05-04 at 1.57.19 PM.png, Screen Shot 
> 2016-05-04 at 2.41.22 PM.png, queue-config.log, 
> resource-manager-application-starts.log.gz, 
> yarn-yarn-resourcemanager-ip-10-12-47-144.log.gz
>
>
> Often when we submit applications to an incompletely utilized cluster, they 
> sit, unable to start for no apparent reason.
> There are multiple nodes in the cluster with available resources, but the 
> resourcemanger logs show that scheduling is being skipped. The scheduling is 
> skipped because the application itself has reserved the node? I'm not sure 
> how to interpret this log output:
> {code}
> 2016-05-04 20:19:21,315 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:21,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-53.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,232 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-53.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Trying to fulfill reservation for 
> application application_1462291866507_0025 on node: 
> ip-10-12-43-54.us-west-2.compute.internal:8041
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
> (ResourceManager Event Processor): Reserved container  
> application=application_1462291866507_0025 resource= 
> queue=default: capacity=1.0, absoluteCapacity=1.0, 
> usedResources=, usedCapacity=0.7126589, 
> absoluteUsedCapacity=0.7126589, numApps=2, numContainers=33 
> usedCapacity=0.7126589 absoluteUsedCapacity=0.7126589 used= vCores:33> cluster=
> 2016-05-04 20:19:22,316 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
>  (ResourceManager Event Processor): Skipping scheduling since node 
> ip-10-12-43-54.us-west-2.compute.internal:8041 is reserved by application 
> appattempt_1462291866507_0025_01
> {code}



--
This message was sent by 

[jira] [Updated] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-05-05 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4963:
-
Attachment: YARN-4963.002.patch

Address checkstyle comment.



> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch, YARN-4963.002.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-28 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263109#comment-15263109
 ] 

Nathan Roberts commented on YARN-4963:
--

Sorry it took so long to get back to this. I filed YARN-5013 to handle #2. We 
can continue that discussion over there. 


> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5013) Allow applications to provide input on amount of locality delay to use

2016-04-28 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263106#comment-15263106
 ] 

Nathan Roberts commented on YARN-5013:
--

Re-posting latest comment from [~Naganarasimha] 

Thanks for the clarification Tan, Wangda & Nathan Roberts, yes point 2 
addresses the same issue and my mistake i missed to read this. And also agree 
to the focus of this jira to be specific to the system level OFF-SWITCH 
configuration.

bq.so I think when we do the application-level support the default would 
need to be either unlimited or some high value, otherwise we force all 
applications to set this limit to something other than 1 to get decent 
OFF_SWITCH scheduling behavior.

Once we have system level OFF-SWITCH configuration do we require app level 
default also ? IIUC by default we try to make use of system level OFF-SWITCH 
configuration unless explicitly overridden by the app (implementation can be 
further discussed in that jira)

bq.Sure, my application scheduled very quickly but my locality was terrible 
so I caused a lot of unnecessary cross-switch traffic. So I think we'll need 
some system-minimums that will prevent this type of abuse.

This point is debatable, even though i agree your point for controlling 
cross-switch traffic, but still the app is performing under its capacity limits 
so would it be good to limit it control it.

bq.If application A meets its OFF-SWITCH-per-node limit, do we offer the 
node to other applications in the same queue?

any limitations if we offer the node to other applications in the same queue ? 
it should be fine right ?


> Allow applications to provide input on amount of locality delay to use
> --
>
> Key: YARN-5013
> URL: https://issues.apache.org/jira/browse/YARN-5013
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.0.0
>Reporter: Nathan Roberts
>
> Continuing a discussion that started on YARN-4963
> It would be useful if applications could provide some input to the scheduler 
> as to how much locality delay they'd like and/or whether they'd prefer the 
> application to be spread wide across the cluster (as opposed to being 
> scheduled quickly and densely).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-5013) Allow applications to provide input on amount of locality delay to use

2016-04-28 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-5013:


 Summary: Allow applications to provide input on amount of locality 
delay to use
 Key: YARN-5013
 URL: https://issues.apache.org/jira/browse/YARN-5013
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Affects Versions: 3.0.0
Reporter: Nathan Roberts


Continuing a discussion that started on YARN-4963

It would be useful if applications could provide some input to the scheduler as 
to how much locality delay they'd like and/or whether they'd prefer the 
application to be spread wide across the cluster (as opposed to being scheduled 
quickly and densely).





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-5008) LeveldbRMStateStore database can grow substantially leading to long recovery times

2016-04-28 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262472#comment-15262472
 ] 

Nathan Roberts commented on YARN-5008:
--

Thanks for the patch. LGTM. +1 non-binding


> LeveldbRMStateStore database can grow substantially leading to long recovery 
> times
> --
>
> Key: YARN-5008
> URL: https://issues.apache.org/jira/browse/YARN-5008
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-5008.001.patch
>
>
> On large clusters with high application churn the background compaction in 
> leveldb may not be able to keep up with the write rate.  This can lead to 
> large leveldb databases that take many minutes to recover despite not having 
> very much real data in the database to load.  Most the time is spent 
> traversing tables full of keys that have been deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5003) Add container resource to RM audit log

2016-04-27 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-5003:
-
Attachment: YARN-5003.001.patch

Attaching patch

> Add container resource to RM audit log
> --
>
> Key: YARN-5003
> URL: https://issues.apache.org/jira/browse/YARN-5003
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Affects Versions: 3.0.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-5003.001.patch
>
>
> It would be valuable to know the resource consumed by a container in the RM 
> audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-5003) Add container resource to RM audit log

2016-04-27 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-5003:


 Summary: Add container resource to RM audit log
 Key: YARN-5003
 URL: https://issues.apache.org/jira/browse/YARN-5003
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


It would be valuable to know the resource consumed by a container in the RM 
audit log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails

2016-04-21 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252593#comment-15252593
 ] 

Nathan Roberts commented on YARN-4556:
--

Patch seems like a reasonable test improvement. +1 non-binding


>  TestFifoScheduler.testResourceOverCommit fails
> ---
>
> Key: YARN-4556
> URL: https://issues.apache.org/jira/browse/YARN-4556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Akihiro Suda
> Attachments: YARN-4556-1.patch
>
>
> From YARN-4548 Jenkins log: 
> https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
> {code}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
> Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
> testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler)
>   Time elapsed: 4.746 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<-2048> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142)
> {code}
> https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142
> It seems that Jenkins has been hitting this intermittently since April 2015
> https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-20 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250137#comment-15250137
 ] 

Nathan Roberts commented on YARN-4963:
--

bq. IMO, I think application specific configurations should be there rather at 
scheduler level. Some applications are fine with assigning containers in 
off_switch they can specify number of containers to be assigned. But few 
applications are very strict to node locality, they can configure 1 in 
off_switch.

bq. Even i feel the same, any specfic reason it has been set only at the 
scheduler level other than the AMRM interface change ? We can keep the default 
value as 1 so that its still compatible. Also anyway allocation happens within 
app's & queue's capacity limits so i feel it would be ideal for app to decide 
how many allocations in off_switch node. thoughts ?

Thanks [~Naganarasimha], [~rohithsharma], [~leftnoteasy] for the comments. I 
think we're all in agreement that there needs to be some control at the 
application level for things like OFF_SWITCH allocations, and locality delays 
(That's what #2 was going for and I think that should be a separate jira if 
folks are agreeable to that.) This new feature will require some discussion:
- The current value of 1 is not a good value for almost all applications so I 
think when we do the application-level support the default would need to be 
either unlimited or some high value, otherwise we force all applications to set 
this limit to something other than 1 to get decent OFF_SWITCH scheduling 
behavior.
- This setting not only affects the application at hand, but can also affect 
the entire system. I can see many cases where applications will relax these 
settings significantly so that their application schedules faster, however that 
may not have been the right thing for the system as a whole. Sure, my 
application scheduled very quickly but my locality was terrible so I caused a 
lot of unnecessary cross-switch traffic. So I think we'll need some 
system-minimums that will prevent this type of abuse. 
- These changes would potentially affect the fifo-ness of the queues. If 
application A meets its OFF-SWITCH-per-node limit, do we offer the node to 
other applications in the same queue? 

So my suggestion is:
1) Have this jira make the system-level OFF-SWITCH check  configurable so 
admins can easily crank this up and dramatically improve scheduling rate. 
2) Have a second jira to address per-application settings for things like 
locality_delay and off_switch limits.

Reasonable?





> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-15 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243662#comment-15243662
 ] 

Nathan Roberts commented on YARN-4963:
--

Thanks [~leftnoteasy] for the feedback. I agree that it would be a useful 
feature to be able to give some applications better spread, regardless of 
allocation type. Now we just have to figure out how to get there.

My concern is that I don't think we'd want to implement it using the same 
simple approach if it's going to apply to all container types. For example, in 
our case we almost always want NODE_LOCAL and RACK_LOCAL to get scheduled as 
quickly as possible so I'd want the limit to be high, as opposed to OFF_SWITCH 
where I want the limit to be 3-5 to keep a nice balance between scheduling 
performance and clustering. 

The reason this check was introduced in the first place (iirc) was to prevent 
network-heavy applications from loading up on specific nodes. The OFF_SWITCH 
check was a simple way of achieving this at a global level. The feature I think 
you're asking for (please correct me if I misunderstood) is that applications 
should be able to request that container spread be prioritized over timely 
scheduling (kind of like locality delay does today). I completely agree this 
would be a useful knob for applications to have. It is a trade-off though. An 
application that wants really good spread would be sacrificing scheduling 
opportunities that would probably be given to applications behind them in the 
queue (like locality delay).

So maybe there are two things to do:
1) Have the global OFF_SWITCH check to handle the simple case of avoiding too 
many network-heavy applications on a node. 
2) A feature where applications can specify a 
max_containers_assigned_per_node_per_heartbeat. I think this would be checked 
down in LeafQueue.assignContainers().

Even with #2 in place, I don't think #1 could immediately go away because the 
network-heavy applications would need to start properly specifying this limit.

The other approach to get rid of #1 would be when network is a resource. Such 
applications could then request lots of network resource, which should prevent 
clustering.

Does that make any sort of sense?


> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4964) Allow ShuffleHandler readahead without drop-behind

2016-04-15 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4964:
-
Attachment: YARN-4964.001.patch

> Allow ShuffleHandler readahead without drop-behind
> --
>
> Key: YARN-4964
> URL: https://issues.apache.org/jira/browse/YARN-4964
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4964.001.patch
>
>
> Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead 
> (POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the 
> ShuffleHandler.
> It would be beneficial if these were separately configurable. 
> - Running without readahead can lead to significant seek storms caused by 
> large numbers of sendfiles() competing with one another.
> - However, running with drop-behind can also lead to seek storms because 
> there are cases where the server can successfully write the shuffle bytes to 
> the network, BUT the client doesn't want the bytes right now (MergeManager 
> wants to WAIT is an example) so it ignores them and asks for them again a bit 
> later. This causes repeated reads of the same data from disk.
> I'll attach a simple patch that enables/disables readahead based on 
> mapreduce.shuffle.readahead.bytes==0, leaving 
> mapreduce.shuffle.manage.os.cache controlling only the drop-behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4964) Allow ShuffleHandler readahead without drop-behind

2016-04-15 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-4964:


 Summary: Allow ShuffleHandler readahead without drop-behind
 Key: YARN-4964
 URL: https://issues.apache.org/jira/browse/YARN-4964
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.2, 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead 
(POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the 
ShuffleHandler.

It would be beneficial if these were separately configurable. 
- Running without readahead can lead to significant seek storms caused by large 
numbers of sendfiles() competing with one another.
- However, running with drop-behind can also lead to seek storms because there 
are cases where the server can successfully write the shuffle bytes to the 
network, BUT the client doesn't want the bytes right now (MergeManager wants to 
WAIT is an example) so it ignores them and asks for them again a bit later. 
This causes repeated reads of the same data from disk.

I'll attach a simple patch that enables/disables readahead based on 
mapreduce.shuffle.readahead.bytes==0, leaving mapreduce.shuffle.manage.os.cache 
controlling only the drop-behind.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-15 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4963:
-
Attachment: YARN-4963.001.patch

> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-15 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-4963:


 Summary: capacity scheduler: Make number of OFF_SWITCH assignments 
per heartbeat configurable
 Key: YARN-4963
 URL: https://issues.apache.org/jira/browse/YARN-4963
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.7.2, 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment per 
heartbeat. With more and more non MapReduce workloads coming along, the degree 
of locality is declining, causing scheduling to be significantly slower. It's 
still important to limit the number of OFF_SWITCH assignments to avoid densely 
packing OFF_SWITCH containers onto nodes. 

Proposal is to add a simple config that makes the number of OFF_SWITCH 
assignments configurable.

Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228258#comment-15228258
 ] 

Nathan Roberts commented on YARN-4924:
--

Sorry [~sandflee]. I missed your comment about updating YARN-4051. That seems 
fine with me!

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4924:
-
Assignee: (was: Nathan Roberts)

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228243#comment-15228243
 ] 

Nathan Roberts commented on YARN-4924:
--

Thanks [~sandflee], [~jlowe] for the suggestion. I'll work up a fix soon.

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-06 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts reassigned YARN-4924:


Assignee: Nathan Roberts

> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-05 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226956#comment-15226956
 ] 

Nathan Roberts commented on YARN-4924:
--

Observed the following race with NM recovery.

1) ContainerManager handles a FINISH_APPS event causing 
storeFinishedApplication() to be recorded in state store (e.g. if RM kills 
application)
2) Prior to cleaning up the containers associated with this application, the NM 
dies
3) When NM restarts it attempts to recover the Application, Containers, and 
FinishedApplication events all associated with this application, in that order
4) This leads to a NEW to DONE transition for the containers, which will not 
try to cleanup the actual container since this is supposed to be a pre-LAUNCHED 
transition

iiuc, this happens because when the application transitions from NEW to INITING 
during Application recovery, the containerInitEvents aren't actually dispatched 
yet. They are delayed until the AppInitDoneTransition. However, the 
AppInitDoneTransition may not occur until after the recovery code has handled 
the FinishedApplicationEvent and queued up KILL_CONTAINER events. So, in 
effect, the containerKillEvents passed up the containerInitEvents leading to 
the NEW to DONE transition. 

{noformat}
2016-04-04 18:20:45,513 [main] INFO application.ApplicationImpl: Application 
application_1458666253602_2367938 transitioned from NEW to INITING
2016-04-04 18:20:56,437 [AsyncDispatcher event handler] INFO 
application.ApplicationImpl: Adding 
container_e08_1458666253602_2367938_01_04 to application 
application_1458666253602_2367938
2016-04-04 18:20:57,062 [AsyncDispatcher event handler] INFO 
application.ApplicationImpl: Application application_1458666253602_2367938 
transitioned from INITING to FINISHING_CONTAINERS_WAIT
2016-04-04 18:20:57,095 [AsyncDispatcher event handler] INFO 
container.ContainerImpl: Container 
container_e08_1458666253602_2367938_01_04 transitioned from NEW to DONE
2016-04-04 18:20:57,120 [AsyncDispatcher event handler] INFO 
application.ApplicationImpl: Removing 
container_e08_1458666253602_2367938_01_04 from application 
application_1458666253602_2367938
2016-04-04 18:20:57,120 [AsyncDispatcher event handler] INFO 
application.ApplicationImpl: Application application_1458666253602_2367938 
transitioned from FINISHING_CONTAINERS_WAIT to APPLICATION_RESOURCES_CLEANINGUP
{noformat}



> NM recovery race can lead to container not cleaned up
> -
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4924) NM recovery race can lead to container not cleaned up

2016-04-05 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-4924:


 Summary: NM recovery race can lead to container not cleaned up
 Key: YARN-4924
 URL: https://issues.apache.org/jira/browse/YARN-4924
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.2, 3.0.0
Reporter: Nathan Roberts


It's probably a small window but we observed a case where the NM crashed and 
then a container was not properly cleaned up during recovery.

I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4834) ProcfsBasedProcessTree doesn't track daemonized processes

2016-04-05 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226424#comment-15226424
 ] 

Nathan Roberts commented on YARN-4834:
--

As a note, we were seeing this with slider applications. I didn't investigate 
far enough to know if all slider applications escape or if this was just a a 
characteristic of this particular application.

> ProcfsBasedProcessTree doesn't track daemonized processes
> -
>
> Key: YARN-4834
> URL: https://issues.apache.org/jira/browse/YARN-4834
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4834.001.patch
>
>
> Currently the algorithm uses ppid from /proc//stat which can be 1 if a 
> child process has daemonized itself. This causes potentially large processes 
> from not being monitored. 
> session id might be a better choice since that's what we use to signal the 
> container during teardown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4834) ProcfsBasedProcessTree doesn't track daemonized processes

2016-04-05 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4834:
-
Attachment: YARN-4834.001.patch

Simple fix that falls back to sessionID if process has become owned by init. 
Seemed safest low risk change.

Other options might be:
- Only use sessionID to build process tree
- Use container cgroup (cgroup.procs) if available/configured.

> ProcfsBasedProcessTree doesn't track daemonized processes
> -
>
> Key: YARN-4834
> URL: https://issues.apache.org/jira/browse/YARN-4834
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4834.001.patch
>
>
> Currently the algorithm uses ppid from /proc//stat which can be 1 if a 
> child process has daemonized itself. This causes potentially large processes 
> from not being monitored. 
> session id might be a better choice since that's what we use to signal the 
> container during teardown. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4768) getAvailablePhysicalMemorySize can be inaccurate on linux

2016-03-19 Thread Nathan Roberts (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199732#comment-15199732
 ] 

Nathan Roberts commented on YARN-4768:
--

Any comments on this approach?


> getAvailablePhysicalMemorySize can be inaccurate on linux
> -
>
> Key: YARN-4768
> URL: https://issues.apache.org/jira/browse/YARN-4768
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
> Environment: Linux
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4768.patch
>
>
> Algorithm currently uses "MemFree" + "Inactive" from /proc/meminfo
> "Inactive" may not be a very good indication of how much memory can be 
> readily freed because it contains both:
> - Pages mapped with MAP_SHARED|MAP_ANONYMOUS (regardless of whether they're 
> being actively accessed or not. Unclear to me why this is the case...)
> - Pages mapped MAP_PRIVATE|MAP_ANONYMOUS that have not been accessed recently
> Both of these types of pages probably shouldn't be considered "Available".
> "Inactive(file)" would seem more accurate but it's not available in all 
> kernel versions. To keep things simple, maybe just use "Inactive(file)" if 
> available, otherwise fallback to "Inactive".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4834) ProcfsBasedProcessTree doesn't track daemonized processes

2016-03-19 Thread Nathan Roberts (JIRA)
Nathan Roberts created YARN-4834:


 Summary: ProcfsBasedProcessTree doesn't track daemonized processes
 Key: YARN-4834
 URL: https://issues.apache.org/jira/browse/YARN-4834
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.2, 3.0.0
Reporter: Nathan Roberts
Assignee: Nathan Roberts


Currently the algorithm uses ppid from /proc//stat which can be 1 if a 
child process has daemonized itself. This causes potentially large processes 
from not being monitored. 

session id might be a better choice since that's what we use to signal the 
container during teardown. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4768) getAvailablePhysicalMemorySize can be inaccurate on linux

2016-03-08 Thread Nathan Roberts (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Roberts updated YARN-4768:
-
Attachment: YARN-4768.patch

Patch for trunk.

Also changed getPhysicalMemorySize() to exclude:
- HardwareCorrupted pages - Not that uncommon.
- HugePagesTotal * hugePageSize - probably not commonly configured on compute 
nodes but just in case it seems reasonable to not count these.

Comments welcome on alternative ways to approach these.


> getAvailablePhysicalMemorySize can be inaccurate on linux
> -
>
> Key: YARN-4768
> URL: https://issues.apache.org/jira/browse/YARN-4768
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.7.2
> Environment: Linux
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4768.patch
>
>
> Algorithm currently uses "MemFree" + "Inactive" from /proc/meminfo
> "Inactive" may not be a very good indication of how much memory can be 
> readily freed because it contains both:
> - Pages mapped with MAP_SHARED|MAP_ANONYMOUS (regardless of whether they're 
> being actively accessed or not. Unclear to me why this is the case...)
> - Pages mapped MAP_PRIVATE|MAP_ANONYMOUS that have not been accessed recently
> Both of these types of pages probably shouldn't be considered "Available".
> "Inactive(file)" would seem more accurate but it's not available in all 
> kernel versions. To keep things simple, maybe just use "Inactive(file)" if 
> available, otherwise fallback to "Inactive".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >