[jira] [Commented] (YARN-5478) [YARN-4902] Define Java API for generalized & unified scheduling-strategies.
[ https://issues.apache.org/jira/browse/YARN-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948454#comment-15948454 ] Konstantinos Karanasos commented on YARN-5478: -- Hi [~leftnoteasy], bq. I think now we generally agree that we should stop investing on the old ResourceRequest and we should move APIs for new features like Allocation tag, Affinity/Anti-affinity, Node attributes: YARN-4902, to the new ResourceRequest. My understanding was that allocation tags that are attached to containers could indeed be added either in the existing ResourceRequest, in the new ResourceRequest or in the AllocateRequest object as a map between AllocateRequestID and tags. For the remaining features (affinity, node attributes), I am still not sure there is need to add them at the (old or new) ResourceRequest object. It seems that adding constraint expressions in the ApplicationSubmissionContext and the AllocateRequest (for more targeted ones) is sufficient for all the use cases we have come across and those mentioned in YARN-4793. I just uploaded a design document in YARN-5468, where we give more details on our thoughts. We tried to address all the points we discussed in our last meeting. Please give it a look and let's continue the discussion. [~Naganarasimha], please also check the document. Based on our latest discussions with Wangda, we included a way to specify node attributes in the constraint expression (using namespaces to differentiate between different type of constraints). > [YARN-4902] Define Java API for generalized & unified scheduling-strategies. > > > Key: YARN-5478 > URL: https://issues.apache.org/jira/browse/YARN-5478 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5478.1.patch, YARN-5478.2.patch, > YARN-5478.preliminary-poc.1.patch, YARN-5478.preliminary-poc.2.patch > > > Define Java API for application to specify generic scheduling requirements > described in YARN-4902 design doc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948462#comment-15948462 ] Varun Saxena commented on YARN-6342: Sorry for nitpicking. I would gives spaces in between timelinev2client. And say draining "leftover entities" to indicate we are talking about entities. {code} + +The time period for which timelinev2client will wait for draining its queue +after stop. + {code} Let me invoke the build manually. Some failures look weird. Regarding this being an advanced configuration, I felt upon offline discussion with Rohith that it would be better to make this config public as in their testing they found entities were taking upto 3 seconds to flush. So this would largely depend on AM and the amount of entities they write on stop. > Make TimelineV2Client's drain timeout after stop configurable > - > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > Attachments: YARN-6342.00.patch, YARN-6342.01.patch > > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6411) Clean up the overwrite of createDispatcher() in subclass of MockRM
[ https://issues.apache.org/jira/browse/YARN-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6411: --- Attachment: YARN-6411.001.patch > Clean up the overwrite of createDispatcher() in subclass of MockRM > -- > > Key: YARN-6411 > URL: https://issues.apache.org/jira/browse/YARN-6411 > Project: Hadoop YARN > Issue Type: Task > Components: resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu >Priority: Minor > Attachments: YARN-6411.001.patch > > > MockRM creates a object of {{DrainDispatcher}} in YARN-3102. We don't need to > do the same thing in its subclasses. > {code} > @Override > protected Dispatcher createDispatcher() { > return new DrainDispatcher(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6395) Integrate service app master to write data into ATSv2.
[ https://issues.apache.org/jira/browse/YARN-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948445#comment-15948445 ] Hadoop QA commented on YARN-6395: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 24s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} yarn-native-services passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 20s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-slider_hadoop-yarn-slider-core generated 6 new + 23 unchanged - 6 fixed = 29 total (was 29) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-slider/hadoop-yarn-slider-core: The patch generated 18 new + 256 unchanged - 1 fixed = 274 total (was 257) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-yarn-slider-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 21m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6395 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861162/YARN-6395.yarn-native-services.0002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2e257e6386c3 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | yarn-native-services / c0055cd | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/15431/artifact/patchprocess/diff-compile-javac-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-slider_hadoop-yarn-slider-core.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/15431/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-slider_hadoop-yarn-slider-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15431/testReport/ | | modules | C: hadoop-yarn-project
[jira] [Commented] (YARN-6395) Integrate service app master to write data into ATSv2.
[ https://issues.apache.org/jira/browse/YARN-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948451#comment-15948451 ] Rohith Sharma K S commented on YARN-6395: - javadoc errors are not related to patch. > Integrate service app master to write data into ATSv2. > -- > > Key: YARN-6395 > URL: https://issues.apache.org/jira/browse/YARN-6395 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: YARN-6395.yarn-native-services.0001.patch, > YARN-6395.yarn-native-services.0002.patch > > > Integration of ATSv2 with native service app master. And ATSv2 data's are > used for UI rendering. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948450#comment-15948450 ] Rohith Sharma K S commented on YARN-6342: - +1 for the patch. would you look at the build failure details? > Make TimelineV2Client's drain timeout after stop configurable > - > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > Attachments: YARN-6342.00.patch, YARN-6342.01.patch > > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5468) Scheduling of long-running applications
[ https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Karanasos updated YARN-5468: - Attachment: LRA-scheduling-design.v1.pdf Attaching latest design document, reflecting current design. > Scheduling of long-running applications > --- > > Key: YARN-5468 > URL: https://issues.apache.org/jira/browse/YARN-5468 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacityscheduler, fairscheduler >Reporter: Konstantinos Karanasos >Assignee: Konstantinos Karanasos > Attachments: LRA-scheduling-design.v1.pdf, YARN-5468.prototype.patch > > > This JIRA is about the scheduling of applications with long-running tasks. > It will include adding support to the YARN for a richer set of scheduling > constraints (such as affinity, anti-affinity, cardinality and time > constraints), and extending the schedulers to take them into account during > placement of containers to nodes. > We plan to have both an online version that will accommodate such requests as > they arrive, as well as a Long-running Application Planner that will make > more global decisions by considering multiple applications at once. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6395) Integrate service app master to write data into ATSv2.
[ https://issues.apache.org/jira/browse/YARN-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-6395: Attachment: YARN-6395.yarn-native-services.0002.patch updated the patch fixing some of the checkstyle and java doc. Many java docs errors are not from patch. > Integrate service app master to write data into ATSv2. > -- > > Key: YARN-6395 > URL: https://issues.apache.org/jira/browse/YARN-6395 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: YARN-6395.yarn-native-services.0001.patch, > YARN-6395.yarn-native-services.0002.patch > > > Integration of ATSv2 with native service app master. And ATSv2 data's are > used for UI rendering. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-6276) Now container kill mechanism may lead process leak
[ https://issues.apache.org/jira/browse/YARN-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Yuan updated YARN-6276: Comment: was deleted (was: Add this issue here.In this patch container signal operation will check current pid is or not own by containerId str. IMO,there is some process does not contain containerId itself. Such as subprocess create by user-code itself.) > Now container kill mechanism may lead process leak > -- > > Key: YARN-6276 > URL: https://issues.apache.org/jira/browse/YARN-6276 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > > When kill bash process, YarnChild may didn`t response because fullgc, -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6276) Now container kill mechanism may lead process leak
[ https://issues.apache.org/jira/browse/YARN-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948427#comment-15948427 ] Feng Yuan commented on YARN-6276: - Add this issue here.In this patch container signal operation will check current pid is or not own by containerId str. IMO,there is some process does not contain containerId itself. Such as subprocess create by user-code itself. > Now container kill mechanism may lead process leak > -- > > Key: YARN-6276 > URL: https://issues.apache.org/jira/browse/YARN-6276 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > > When kill bash process, YarnChild may didn`t response because fullgc, -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948417#comment-15948417 ] Feng Yuan commented on YARN-6277: - I have attach a patch,this issue is due to ShuffleHandler and Nodemanager configuration is inconformity. So if NM change the local-dir configs and ShuffleHandler would not synchronize. > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Yuan updated YARN-6277: Attachment: YARN-6277.branch-2.8.001.patch > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6141) ppc64le on Linux doesn't trigger __linux get_executable codepath
[ https://issues.apache.org/jira/browse/YARN-6141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948390#comment-15948390 ] Sonia Garudi commented on YARN-6141: Any update ? > ppc64le on Linux doesn't trigger __linux get_executable codepath > > > Key: YARN-6141 > URL: https://issues.apache.org/jira/browse/YARN-6141 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha3 > Environment: $ uname -a > Linux f8eef0f055cf 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 > 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux >Reporter: Sonia Garudi > Labels: ppc64le > Attachments: YARN-6141.patch > > > On ppc64le architecture, the build fails in the 'Hadoop YARN NodeManager' > project with the below error: > Cannot safely determine executable path with a relative HADOOP_CONF_DIR on > this operating system. > [WARNING] #error Cannot safely determine executable path with a relative > HADOOP_CONF_DIR on this operating system. > [WARNING] ^ > [WARNING] make[2]: *** > [CMakeFiles/container.dir/main/native/container-executor/impl/get_executable.c.o] > Error 1 > [WARNING] make[2]: *** Waiting for unfinished jobs > [WARNING] make[1]: *** [CMakeFiles/container.dir/all] Error 2 > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > Cmake version used : > $ /usr/bin/cmake --version > cmake version 2.8.12.2 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6403) Invalid local resource request can raise NPE and make NM exit
[ https://issues.apache.org/jira/browse/YARN-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-6403: --- Attachment: YARN-6403.002.patch [~jlowe] Thanks for correcting me. The last server-side change is not proper and I corrected it as your mentioned. For the client-side change, IIUIC the generated protobuf code won't throws NPE for this case actually. Unit tests for both the client and server change is added. Attach a new patch for review, please correct me if I missed something. > Invalid local resource request can raise NPE and make NM exit > - > > Key: YARN-6403 > URL: https://issues.apache.org/jira/browse/YARN-6403 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Tao Yang > Attachments: YARN-6403.001.patch, YARN-6403.002.patch > > > Recently we found this problem on our testing environment. The app that > caused this problem added a invalid local resource request(have no location) > into ContainerLaunchContext like this: > {code} > localResources.put("test", LocalResource.newInstance(location, > LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100, > System.currentTimeMillis())); > ContainerLaunchContext amContainer = > ContainerLaunchContext.newInstance(localResources, environment, > vargsFinal, null, securityTokens, acls); > {code} > The actual value of location was null although app doesn't expect that. This > mistake cause several NMs exited with the NPE below and can't restart until > the nm recovery dirs were deleted. > {code} > FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.(LocalResourceRequest.java:46) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > NPE occured when created LocalResourceRequest instance for invalid resource > request. > {code} > public LocalResourceRequest(LocalResource resource) > throws URISyntaxException { > this(resource.getResource().toPath(), //NPE occurred here > resource.getTimestamp(), > resource.getType(), > resource.getVisibility(), > resource.getPattern()); > } > {code} > We can't guarantee the validity of local resource request now, but we could > avoid damaging the cluster. Perhaps we can verify the resource both in > ContainerLaunchContext and LocalResourceRequest? Please feel free to give > your suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6204) Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948259#comment-15948259 ] Hadoop QA commented on YARN-6204: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 43s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 52 unchanged - 0 fixed = 53 total (was 52) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 27s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 50s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 95m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6204 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861126/YARN-6204.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 30463f3ad681 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6a5516c | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/15430/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15430/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-
[jira] [Created] (YARN-6413) Decouple Yarn Registry API from ZK
Ellen Hui created YARN-6413: --- Summary: Decouple Yarn Registry API from ZK Key: YARN-6413 URL: https://issues.apache.org/jira/browse/YARN-6413 Project: Hadoop YARN Issue Type: Improvement Components: amrmproxy, api, resourcemanager Reporter: Ellen Hui Assignee: Jian He Right now the Yarn Registry API (defined in the RegistryOperations interface) is a very thin layer over Zookeeper. This jira proposes changing the interface to abstract away the implementation details so that we can write a FS-based implementation of the registry service, which will be used to support AMRMProxy HA. The new interface will use register/delete/resolve APIs instead of Zookeeper-specific operations like mknode. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6412) aux-services classpath not documented
Miklos Szegedi created YARN-6412: Summary: aux-services classpath not documented Key: YARN-6412 URL: https://issues.apache.org/jira/browse/YARN-6412 Project: Hadoop YARN Issue Type: Bug Reporter: Miklos Szegedi Priority: Minor YARN-4577 introduced two new configuration entries yarn.nodemanager.aux-services.%s.classpath and yarn.nodemanager.aux-services.%s.system-classes. These are not documented in hadoop-yarn-common/.../yarn-default.xml -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6202) Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded
[ https://issues.apache.org/jira/browse/YARN-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948153#comment-15948153 ] Hadoop QA commented on YARN-6202: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 1s{color} | {color:orange} root: The patch generated 2 new + 313 unchanged - 5 fixed = 315 total (was 318) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 37s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 39s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 45s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 6s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 16s{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}124m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6202 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861109/YARN-6202.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 5886c5f6133c 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4966a6e | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle
[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948151#comment-15948151 ] Hadoop QA commented on YARN-6342: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 24s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 25s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 30s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 20s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 27s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6342 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861115/YARN-6342.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux 32955eff96c3 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6a5516c | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/15428/artifact/patchprocess/patch-mvnin
[jira] [Commented] (YARN-6363) Extending SLS: Synthetic Load Generator
[ https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948147#comment-15948147 ] Hadoop QA commented on YARN-6363: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 9 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 25s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 3s{color} | {color:orange} root: The patch generated 155 new + 270 unchanged - 21 fixed = 425 total (was 291) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 20s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 24s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 6s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 51s{color} | {color:red} hadoop-tools/hadoop-sls generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 43s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 21s{color} | {color:green} hadoop-rumen in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 32s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 37s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-tools/hadoop-sls | | | org.apache.hadoop.yarn.sls.SLSRunner.run(String[]) invokes System.exit(...), which shuts down the entire virtual machine At SLSRunner.java:down the entire virtual machine At SLSRunner.java:[line 744] | | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior | | | hadoop.yarn.server.resourcemanager.reservation.TestReservationSystem | | | hadoop.yarn.server.
[jira] [Created] (YARN-6411) Clean up the overwrite of createDispatcher() in subclass of MockRM
Yufei Gu created YARN-6411: -- Summary: Clean up the overwrite of createDispatcher() in subclass of MockRM Key: YARN-6411 URL: https://issues.apache.org/jira/browse/YARN-6411 Project: Hadoop YARN Issue Type: Task Components: resourcemanager Affects Versions: 3.0.0-alpha2, 2.9.0 Reporter: Yufei Gu Assignee: Yufei Gu Priority: Minor MockRM creates a object of {{DrainDispatcher}} in YARN-3102. We don't need to do the same thing in its subclasses. {code} @Override protected Dispatcher createDispatcher() { return new DrainDispatcher(); } {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6204) Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948139#comment-15948139 ] Yufei Gu commented on YARN-6204: Uploaded patch 001. Make {{RMCriticalThreadUncaughtExceptionHandler}} inherit from {{YarnUncaughtExceptionHandler}}, so that {{AsyncDispatcher}} doesn't need to deal with {{RMCriticalThreadUncaughtExceptionHandler}} directly. > Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher > - > > Key: YARN-6204 > URL: https://issues.apache.org/jira/browse/YARN-6204 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6204.001.patch > > > The event handling thread in AsyncDispatcher is a critical thread in RM. We > should set UncaughtExceptionHandler introduced in YARN-6061 for it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6204) Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher
[ https://issues.apache.org/jira/browse/YARN-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6204: --- Attachment: YARN-6204.001.patch > Set UncaughtExceptionHandler for event handling thread in AsyncDispatcher > - > > Key: YARN-6204 > URL: https://issues.apache.org/jira/browse/YARN-6204 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6204.001.patch > > > The event handling thread in AsyncDispatcher is a critical thread in RM. We > should set UncaughtExceptionHandler introduced in YARN-6061 for it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948131#comment-15948131 ] Hadoop QA commented on YARN-5797: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 18s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 289 unchanged - 7 fixed = 290 total (was 296) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 55s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 33m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5797 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12843130/YARN-5797-trunk.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 94aa661c8faf 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6a5516c | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/15429/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15429/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15429/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > >
[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948112#comment-15948112 ] Yufei Gu commented on YARN-5654: Great! Thanks [~rkanter] for the review and commit! > Not be able to run SLS with FairScheduler > - > > Key: YARN-5654 > URL: https://issues.apache.org/jira/browse/YARN-5654 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Yufei Gu > Fix For: 3.0.0-alpha3 > > Attachments: YARN-5654.002.patch, YARN-5654.003.patch, > YARN-5654.1.patch > > > With the config: > https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs > And data: > https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data > Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. > It reports NPE from RMAppAttemptImpl -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948110#comment-15948110 ] Hudson commented on YARN-5654: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11493 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11493/]) YARN-5654. Not be able to run SLS with FairScheduler (yufeigu via (rkanter: rev 6a5516c2381f107d96b8326939514de3c6e53d3d) * (edit) hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSCapacityScheduler.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/web/SLSWebApp.java * (add) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/Tracker.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerWrapper.java * (edit) hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/nodemanager/TestNMSimulator.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java * (add) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SLSFairScheduler.java * (delete) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/SchedulerMetrics.java > Not be able to run SLS with FairScheduler > - > > Key: YARN-5654 > URL: https://issues.apache.org/jira/browse/YARN-5654 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Yufei Gu > Fix For: 3.0.0-alpha3 > > Attachments: YARN-5654.002.patch, YARN-5654.003.patch, > YARN-5654.1.patch > > > With the config: > https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs > And data: > https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data > Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. > It reports NPE from RMAppAttemptImpl -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
[ https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948109#comment-15948109 ] Hadoop QA commented on YARN-6004: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 88 unchanged - 7 fixed = 88 total (was 95) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 3s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 31m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6004 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861114/YARN-6004-trunk.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux cc8cc14f8ee8 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4966a6e | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15427/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15427/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer > so that it is less than 150 lines > -- > > Key: YARN-6004 > URL: https://issues.apache.org/jira/browse/YARN-6004 > Project: Hadoop YARN
[jira] [Assigned] (YARN-6356) Allow different values of yarn.log-aggregation.retain-seconds for succeeded and failed jobs
[ https://issues.apache.org/jira/browse/YARN-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned YARN-6356: Assignee: Haibo Chen > Allow different values of yarn.log-aggregation.retain-seconds for succeeded > and failed jobs > --- > > Key: YARN-6356 > URL: https://issues.apache.org/jira/browse/YARN-6356 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Reporter: Robert Kanter >Assignee: Haibo Chen > > It would be useful to have a value of {{yarn.log-aggregation.retain-seconds}} > for succeeded jobs and a different value for failed/killed jobs. For jobs > that succeeded, you typically don't care about the logs, so a shorter > retention time is fine (and saves space/blocks in HDFS). For jobs that > failed or were killed, the logs are much more important, and it's likely to > want to keep them around for longer so you have time to look at them. > For instance, you could set it to keep logs for succeeded jobs for 1 day and > logs for failed/killed jobs for 1 week. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5654) Not be able to run SLS with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948070#comment-15948070 ] Robert Kanter commented on YARN-5654: - +1 > Not be able to run SLS with FairScheduler > - > > Key: YARN-5654 > URL: https://issues.apache.org/jira/browse/YARN-5654 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Yufei Gu > Attachments: YARN-5654.002.patch, YARN-5654.003.patch, > YARN-5654.1.patch > > > With the config: > https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/configs/hadoop-conf-fs > And data: > https://github.com/leftnoteasy/yarn_application_synthesizer/tree/master/data/scheduler-load-test-data > Capacity Scheduler runs fine, but Fair Scheduler cannot be successfully run. > It reports NPE from RMAppAttemptImpl -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
[ https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948062#comment-15948062 ] Chris Trezzo commented on YARN-5797: Patch is now available for YARN-6004 as well. Thanks! > Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches > -- > > Key: YARN-5797 > URL: https://issues.apache.org/jira/browse/YARN-5797 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-5797-trunk.002.patch, YARN-5797-trunk-v1.patch > > > Add new metrics to the node manager around the local cache sizes and how much > is being cleaned from them on a regular bases. For example, we can expose > information contained in the {{LocalCacheCleanerStats}} class. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6342: - Attachment: YARN-6342.01.patch > Make TimelineV2Client's drain timeout after stop configurable > - > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > Attachments: YARN-6342.00.patch, YARN-6342.01.patch > > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6354) LeveldbRMStateStore can parse invalid keys when recovering reservations
[ https://issues.apache.org/jira/browse/YARN-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948050#comment-15948050 ] Hadoop QA commented on YARN-6354: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 8s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 62m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6354 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861105/YARN-6354.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4fb5e828bdd0 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4966a6e | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15424/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15424/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15424/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > LeveldbRMStateStore can parse invalid keys when recovering reservations > --- > > Key: YARN-6354 > URL: https://iss
[jira] [Updated] (YARN-6342) Make TimelineV2Client's drain timeout after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6342: - Summary: Make TimelineV2Client's drain timeout after stop configurable (was: Make TimelineV2Client's drain period after stop configurable) > Make TimelineV2Client's drain timeout after stop configurable > - > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > Attachments: YARN-6342.00.patch > > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6004) Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer so that it is less than 150 lines
[ https://issues.apache.org/jira/browse/YARN-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated YARN-6004: --- Attachment: YARN-6004-trunk.002.patch Attached is v2 for trunk to address checkstyle issues. > Refactor TestResourceLocalizationService#testDownloadingResourcesOnContainer > so that it is less than 150 lines > -- > > Key: YARN-6004 > URL: https://issues.apache.org/jira/browse/YARN-6004 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Labels: newbie > Attachments: YARN-6004-trunk.001.patch, YARN-6004-trunk.002.patch > > > The TestResourceLocalizationService#testDownloadingResourcesOnContainerKill > method is over 150 lines: > bq. > ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java:1128: > @Test(timeout = 2):3: Method length is 242 lines (max allowed is 150). > This method needs to be refactored and broken up into smaller methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain period after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948024#comment-15948024 ] Haibo Chen commented on YARN-6342: -- Thanks for your review, [~varun_saxena]! bq. We need to add this configuration in yarn-default.xml I was under the impression that this is going to be an advanced configuration that people rarely change, so did not add to yarn-default.xml to advertise it. I'll address this along with the rest of your comments. > Make TimelineV2Client's drain period after stop configurable > > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > Attachments: YARN-6342.00.patch > > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6363) Extending SLS: Synthetic Load Generator
[ https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948016#comment-15948016 ] Subru Krishnan commented on YARN-6363: -- Thanks [~curino] for working on this really useful extension to SLS. Can you kindly move the documentation to the SLS md to ensure it gets published to the apache hadoop wiki. > Extending SLS: Synthetic Load Generator > --- > > Key: YARN-6363 > URL: https://issues.apache.org/jira/browse/YARN-6363 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, > YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch, YARN-6363.v4.patch > > > This JIRA tracks the introduction of a synthetic load generator in the SLS. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-5106: -- Assignee: (was: Yufei Gu) > Provide a builder interface for FairScheduler allocations for use in tests > -- > > Key: YARN-5106 > URL: https://issues.apache.org/jira/browse/YARN-5106 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.8.0 >Reporter: Karthik Kambatla > Labels: newbie++ > > Most, if not all, fair scheduler tests create an allocations XML file. Having > a helper class that potentially uses a builder would make the tests cleaner. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6202) Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded
[ https://issues.apache.org/jira/browse/YARN-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6202: --- Description: Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) always be true no matter what value in configuration files. This misleads users. Two solutions: # Remove the configuration item and provide a method to allow {{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable related unit tests. There is no need for false value in a real daemon since daemons should crash if its dispatcher quit. # Make it default true instead of false, so that we don't need to hard code it to be true in RM and NM, it is still configurable, and also provide method to enable related unit tests. Other than that, the code around it needs to refactor. {{public static final}} for a variable of interface isn't necessary, and YARN related configure item should be in class YarnConfiguration. was: Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) always be true no matter what value in configuration files. This misleads users. Two solutions: # Remove the configuration item and provide a method to allow {{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable related unit tests, the assumption is there is no need for false value in a real daemon, this is valid according to MAPREDUCE-3634 which introduce the configuration item. # Make it default true instead of false, so that we don't need to hard code it to be true in RM and NM, it is still configurable, and also provide method to enable related unit tests. Other than that, the code around it needs to refactor. {{public static final}} for a variable of interface isn't necessary, and YARN related configure item should be in class YarnConfiguration. > Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded > - > > Key: YARN-6202 > URL: https://issues.apache.org/jira/browse/YARN-6202 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6202.001.patch > > > Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) > always be true no matter what value in configuration files. This misleads > users. Two solutions: > # Remove the configuration item and provide a method to allow > {{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable > related unit tests. There is no need for false value in a real daemon since > daemons should crash if its dispatcher quit. > # Make it default true instead of false, so that we don't need to hard code > it to be true in RM and NM, it is still configurable, and also provide method > to enable related unit tests. > Other than that, the code around it needs to refactor. {{public static > final}} for a variable of interface isn't necessary, and YARN related > configure item should be in class YarnConfiguration. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6202) Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded
[ https://issues.apache.org/jira/browse/YARN-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948001#comment-15948001 ] Yufei Gu commented on YARN-6202: Uploaded patch v1. Remove configuration items: Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, and add methods for {{AsyncDispatcher}} and {{EventDispatcher}} to disable exit on exception/error, which are invoked by unit tests. > Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded > - > > Key: YARN-6202 > URL: https://issues.apache.org/jira/browse/YARN-6202 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6202.001.patch > > > Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) > always be true no matter what value in configuration files. This misleads > users. Two solutions: > # Remove the configuration item and provide a method to allow > {{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable > related unit tests, the assumption is there is no need for false value in a > real daemon, this is valid according to MAPREDUCE-3634 which introduce the > configuration item. > # Make it default true instead of false, so that we don't need to hard code > it to be true in RM and NM, it is still configurable, and also provide method > to enable related unit tests. > Other than that, the code around it needs to refactor. {{public static > final}} for a variable of interface isn't necessary, and YARN related > configure item should be in class YarnConfiguration. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6202) Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded
[ https://issues.apache.org/jira/browse/YARN-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6202: --- Attachment: YARN-6202.001.patch > Configuration item Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY is disregarded > - > > Key: YARN-6202 > URL: https://issues.apache.org/jira/browse/YARN-6202 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Yufei Gu >Assignee: Yufei Gu > Attachments: YARN-6202.001.patch > > > Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY (yarn.dispatcher.exit-on-error) > always be true no matter what value in configuration files. This misleads > users. Two solutions: > # Remove the configuration item and provide a method to allow > {{exitOnDispatchException}}/{{shouldExitOnError}} to be false to enable > related unit tests, the assumption is there is no need for false value in a > real daemon, this is valid according to MAPREDUCE-3634 which introduce the > configuration item. > # Make it default true instead of false, so that we don't need to hard code > it to be true in RM and NM, it is still configurable, and also provide method > to enable related unit tests. > Other than that, the code around it needs to refactor. {{public static > final}} for a variable of interface isn't necessary, and YARN related > configure item should be in class YarnConfiguration. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6363) Extending SLS: Synthetic Load Generator
[ https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947980#comment-15947980 ] Carlo Curino commented on YARN-6363: v4 of the patch contains correct command-line scripts to run the SLS. And a few fixes. During a long run, we notice the NPE mentioned in YARN-6408 (not sure if it is due to SLS or exists in normal operations as well, [~sunilg] is investigating). > Extending SLS: Synthetic Load Generator > --- > > Key: YARN-6363 > URL: https://issues.apache.org/jira/browse/YARN-6363 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, > YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch, YARN-6363.v4.patch > > > This JIRA tracks the introduction of a synthetic load generator in the SLS. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6363) Extending SLS: Synthetic Load Generator
[ https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6363: --- Attachment: YARN-6363.v4.patch > Extending SLS: Synthetic Load Generator > --- > > Key: YARN-6363 > URL: https://issues.apache.org/jira/browse/YARN-6363 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, > YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch, YARN-6363.v4.patch > > > This JIRA tracks the introduction of a synthetic load generator in the SLS. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6363) Extending SLS: Synthetic Load Generator
[ https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6363: --- Attachment: YARN-6363 overview.pdf Minor fix in doc (commandline param) > Extending SLS: Synthetic Load Generator > --- > > Key: YARN-6363 > URL: https://issues.apache.org/jira/browse/YARN-6363 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, > YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch > > > This JIRA tracks the introduction of a synthetic load generator in the SLS. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6363) Extending SLS: Synthetic Load Generator
[ https://issues.apache.org/jira/browse/YARN-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6363: --- Attachment: (was: YARN-6363 overview.pdf) > Extending SLS: Synthetic Load Generator > --- > > Key: YARN-6363 > URL: https://issues.apache.org/jira/browse/YARN-6363 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-6363 overview.pdf, YARN-6363.v0.patch, > YARN-6363.v1.patch, YARN-6363.v2.patch, YARN-6363.v3.patch > > > This JIRA tracks the introduction of a synthetic load generator in the SLS. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6354) LeveldbRMStateStore can parse invalid keys when recovering reservations
[ https://issues.apache.org/jira/browse/YARN-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-6354: - Attachment: YARN-6354.001.patch Patch that adds a termination check for the reservation key traversal loop and a unit test. > LeveldbRMStateStore can parse invalid keys when recovering reservations > --- > > Key: YARN-6354 > URL: https://issues.apache.org/jira/browse/YARN-6354 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Jason Lowe > Attachments: YARN-6354.001.patch > > > When trying to upgrade an RM to 2.8 it fails with a > StringIndexOutOfBoundsException trying to load reservation state. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6354) LeveldbRMStateStore can parse invalid keys when recovering reservations
[ https://issues.apache.org/jira/browse/YARN-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-6354: Assignee: Jason Lowe > LeveldbRMStateStore can parse invalid keys when recovering reservations > --- > > Key: YARN-6354 > URL: https://issues.apache.org/jira/browse/YARN-6354 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-6354.001.patch > > > When trying to upgrade an RM to 2.8 it fails with a > StringIndexOutOfBoundsException trying to load reservation state. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5301) NM mount cpu cgroups failed on some systems
[ https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947957#comment-15947957 ] Miklos Szegedi commented on YARN-5301: -- [~bibinchundatt], do you have time to take a look? > NM mount cpu cgroups failed on some systems > --- > > Key: YARN-5301 > URL: https://issues.apache.org/jira/browse/YARN-5301 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: Miklos Szegedi > Attachments: YARN-5301.000.patch, YARN-5301.001.patch > > > on ubuntu with linux kernel 3.19, , NM start failed if enable auto mount > cgroup. try command: > ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail > ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu > succ -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6407) Improve and fix locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947955#comment-15947955 ] Karthik Kambatla commented on YARN-6407: On large clusters, we have been recommending our customers to turn off continuous scheduling. Are you sure you need continuous scheduling? > Improve and fix locks of RM scheduler > - > > Key: YARN-6407 > URL: https://issues.apache.org/jira/browse/YARN-6407 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 > Environment: CentOS 7, 1 Gigabit Ethernet >Reporter: zhengchenyu > Fix For: 2.7.1 > > Original Estimate: 2m > Remaining Estimate: 2m > > First,this issue dose not duplicate the YARN-3091. > In our cluster, we have 5k nodes, and the server is configured with 1 Gigabit > Ethernet. So network is bottleneck in our cluster. > We must distcp data from warehouse, because of 1 Gigabit Ethernet, we must > set yarn.scheduler.fair.max.assign to 5, or must lead to hotspot. > The setting that max.assign is 5 lead to the assigned ability decreased. So > we start the ContinuousSchedulingThread. > As more applicaitons running in our cluster , and with > ContinuousSchedulingThread, the problem of lock contention is more serious. > In our cluster, the callqueue of ApplicationMasterSeriver's rpc is high > occasionally. we worried that more problem occure in future with more > application are running. > Here is our logical graph: > "1 Gigabit Ethernet" and "data hot spot" ==> "set > yarn.scheduler.fair.max.assign to 5" ==> "ContinuousSchedulingThread is > started" and "more applcations" => "lock contention" > I know YARN-3091 solved this problem, but the patch aims that change the > object lock to read write lock. This change is still Coarse-Grained. So I > think we lock the resources or not lock the large section code. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain period after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947892#comment-15947892 ] Varun Saxena commented on YARN-6342: Thanks [~haibochen] for the patch. It looks good to me in general. I have a few comments on the config though. # I think the configuration should start with {{yarn.timeline-service.client.}}. You can use TIMELINE_SERVICE_CLIENT_PREFIX instead of TIMELINE_SERVICE_PREFIX in YarnConfiguration. # How about naming the configuration {{yarn.timeline-service.client.drain-entities.timeout.ms}}. I think the configuration description can tell that its done on stop. # We need to add this configuration in yarn-default.xml > Make TimelineV2Client's drain period after stop configurable > > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > Attachments: YARN-6342.00.patch > > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5301) NM mount cpu cgroups failed on some systems
[ https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947891#comment-15947891 ] Hadoop QA commented on YARN-5301: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 4 unchanged - 4 fixed = 4 total (was 8) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5301 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861093/YARN-5301.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux d852e53ce52f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 640ba1d | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15423/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15423/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NM mount cpu cgroups failed on some systems > --- > > Key: YARN-5301 > URL: https://issues.apache.org/jira/browse/YARN-5301 > Project: Hadoop YARN > Is
[jira] [Commented] (YARN-6109) Add an ability to convert ChildQueue to ParentQueue
[ https://issues.apache.org/jira/browse/YARN-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947886#comment-15947886 ] Xuan Gong commented on YARN-6109: - Thanks for the comments. [~subru]. After carefully investigation, I still prefer to create a separate ticket to make sure reservation system work for Stop/Delete/Convert queue scenarios. And we should do: * For Delete/Convert a queue, we will make sure the current queue is in the stopped state which means we should stop the queue first. For the reservation system/plan queue, we should check the current leaf queue (if it is a parent queue, check its children) whether it has the future reservations or not. If it does not, we could stop the queue. Otherwise, we would throw out exception and ask the user to remove the reservation first. * To submits a reservation creation request, we should always make sure the current leaf queue is in RUNNING state. Otherwise, we would reject the request. [~leftnoteasy] [~subru] If you are fine with the plan, i will create a separate ticket. > Add an ability to convert ChildQueue to ParentQueue > --- > > Key: YARN-6109 > URL: https://issues.apache.org/jira/browse/YARN-6109 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-6109.1.patch, YARN-6109.2.patch, > YARN-6109.rebase.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly
[ https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947841#comment-15947841 ] Daniel Templeton commented on YARN-6302: LGTM. Can you please post a patch to the JIRA so that Jenkins has something to chew on? > Fail the node, if Linux Container Executor is not configured properly > - > > Key: YARN-6302 > URL: https://issues.apache.org/jira/browse/YARN-6302 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > > We have a cluster that has one node with misconfigured Linux Container > Executor. Every time an AM or regular container is launched on the cluster, > it will fail. The node will still have resources available, so it keeps > failing apps until the administrator notices the issue and decommissions the > node. AM Blacklisting only helps, if the application is already running. > As a possible improvement, when the LCE is used on the cluster and a NM gets > certain errors back from the LCE, like error 24 configuration not found, we > should not try to allocate anything on the node anymore or shut down the node > entirely. That kind of problem normally does not fix itself and it means that > nothing can really run on that node. > {code} > Application application_1488920587909_0010 failed 2 times due to AM Container > for appattempt_1488920587909_0010_02 exited with exitCode: -1000 > Failing this attempt.Diagnostics: Application application_1488920587909_0010 > initialization failed (exitCode=24) with output: > For more detailed output, check the application tracking page: > http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then > click on links to logs of each attempt. > . Failing the application. > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly
[ https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947838#comment-15947838 ] ASF GitHub Bot commented on YARN-6302: -- Github user szegedim commented on a diff in the pull request: https://github.com/apache/hadoop/pull/200#discussion_r108779597 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java --- @@ -580,19 +579,19 @@ public int launchContainer(ContainerStartContext ctx) logOutput(diagnostics); container.handle(new ContainerDiagnosticsUpdateEvent(containerId, diagnostics)); -if (exitCode == LinuxContainerExecutorExitCode. +if (exitCode == ExitCode. --- End diff -- I did not run my last git push. It should be fixed now. > Fail the node, if Linux Container Executor is not configured properly > - > > Key: YARN-6302 > URL: https://issues.apache.org/jira/browse/YARN-6302 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > > We have a cluster that has one node with misconfigured Linux Container > Executor. Every time an AM or regular container is launched on the cluster, > it will fail. The node will still have resources available, so it keeps > failing apps until the administrator notices the issue and decommissions the > node. AM Blacklisting only helps, if the application is already running. > As a possible improvement, when the LCE is used on the cluster and a NM gets > certain errors back from the LCE, like error 24 configuration not found, we > should not try to allocate anything on the node anymore or shut down the node > entirely. That kind of problem normally does not fix itself and it means that > nothing can really run on that node. > {code} > Application application_1488920587909_0010 failed 2 times due to AM Container > for appattempt_1488920587909_0010_02 exited with exitCode: -1000 > Failing this attempt.Diagnostics: Application application_1488920587909_0010 > initialization failed (exitCode=24) with output: > For more detailed output, check the application tracking page: > http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then > click on links to logs of each attempt. > . Failing the application. > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5301) NM mount cpu cgroups failed on some systems
[ https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-5301: - Attachment: YARN-5301.001.patch Fixing checkstyle issues. > NM mount cpu cgroups failed on some systems > --- > > Key: YARN-5301 > URL: https://issues.apache.org/jira/browse/YARN-5301 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: Miklos Szegedi > Attachments: YARN-5301.000.patch, YARN-5301.001.patch > > > on ubuntu with linux kernel 3.19, , NM start failed if enable auto mount > cgroup. try command: > ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail > ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu > succ -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5301) NM mount cpu cgroups failed on some systems
[ https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947816#comment-15947816 ] Hadoop QA commented on YARN-5301: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 9 new + 8 unchanged - 0 fixed = 17 total (was 8) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 24s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5301 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861086/YARN-5301.000.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 3f21c1c856ad 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 640ba1d | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/15422/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15422/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15422/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NM mount cpu cgroups failed on some syst
[jira] [Commented] (YARN-6246) Identifying starved apps does not need the scheduler writelock
[ https://issues.apache.org/jira/browse/YARN-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947815#comment-15947815 ] Daniel Templeton commented on YARN-6246: Thanks, [~kasha]. Looking at the patch, I'm concerned about two elements of state that are no longer protected by a lock: {{FSAppAttempt.demand}} and {{FSLeafQueue.lastTimeAtFairShare}}. It looks to me like both are used from different threads with no locking. {{FSLeafQueue.dumpStateInternal()}} appears to be the primary problem. > Identifying starved apps does not need the scheduler writelock > -- > > Key: YARN-6246 > URL: https://issues.apache.org/jira/browse/YARN-6246 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla > Attachments: YARN-6246.001.patch, YARN-6246.002.patch, > YARN-6246.003.patch > > > Currently, the starvation checks are done holding the scheduler writelock. We > are probably better of doing this outside. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain period after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947809#comment-15947809 ] Hadoop QA commented on YARN-6342: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 33s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 26s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6342 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861083/YARN-6342.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d778fa99cbee 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 13c766b | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15421/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15421/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated.
[jira] [Commented] (YARN-5685) RM configuration allows all failover methods to disabled when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947797#comment-15947797 ] Hudson commented on YARN-5685: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11491 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11491/]) YARN-5685. RM configuration allows all failover methods to disabled when (templedf: rev 640ba1d23fe8b8105bae6d342ddc1c839302f8e5) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestHAUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java > RM configuration allows all failover methods to disabled when automatic > failover is enabled > --- > > Key: YARN-5685 > URL: https://issues.apache.org/jira/browse/YARN-5685 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Labels: oct16-hard > Fix For: 2.9.0, 3.0.0-alpha3 > > Attachments: YARN-5685.001.patch, YARN-5685.002.patch, > YARN-5685.003.patch, YARN-5685.004.patch, YARN-5685.005.patch > > > If HA is enabled with automatic failover enabled and embedded failover > disabled, all RMs all come up in standby state. To make one of them active, > the {{\-\-forcemanual}} flag must be used when manually triggering the state > change. Should the active go down, the standby will not become active and > must be manually transitioned with the {{\-\-forcemanual}} flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6200) Revert YARN-5068
[ https://issues.apache.org/jira/browse/YARN-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-6200: - Fix Version/s: (was: 2.8.1) > Revert YARN-5068 > > > Key: YARN-6200 > URL: https://issues.apache.org/jira/browse/YARN-6200 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Blocker > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha3 > > > As Vinod > [commented|https://issues.apache.org/jira/browse/YARN-5068?focusedCommentId=15867356&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15867356], > the functionality of YARN-5068 is achieved by YARN-1623. So, YARN-5068 need > to be reverted. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree
[ https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947789#comment-15947789 ] Daniel Templeton commented on YARN-3427: The methods were made public in 2.7 specifically for Tez, but they were exposed as deprecated with the very explicit plan of removing them in 3.0. > Remove deprecated methods from ResourceCalculatorProcessTree > > > Key: YARN-3427 > URL: https://issues.apache.org/jira/browse/YARN-3427 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-3427.000.patch, YARN-3427.001.patch > > > In 2.7, we made ResourceCalculatorProcessTree Public and exposed some > existing ill-formed methods as deprecated ones for use by Tez. > We should remove it in 3.0.0, considering that the methods have been > deprecated for the all 2.x.y releases that it is marked Public in. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6410) FSContext.scheduler should be final
Daniel Templeton created YARN-6410: -- Summary: FSContext.scheduler should be final Key: YARN-6410 URL: https://issues.apache.org/jira/browse/YARN-6410 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.8.0 Reporter: Daniel Templeton Priority: Minor {code} private FairScheduler scheduler; {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree
[ https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947786#comment-15947786 ] Siddharth Seth commented on YARN-3427: -- >From a Tez perspective, would prefer if the methods were left in place. If >this was something that was fixed in 2.6, that would have been easier to work >with. Since the new methods were in 2.7 - Tez will need to introduce a shim >for this. > Remove deprecated methods from ResourceCalculatorProcessTree > > > Key: YARN-3427 > URL: https://issues.apache.org/jira/browse/YARN-3427 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-3427.000.patch, YARN-3427.001.patch > > > In 2.7, we made ResourceCalculatorProcessTree Public and exposed some > existing ill-formed methods as deprecated ones for use by Tez. > We should remove it in 3.0.0, considering that the methods have been > deprecated for the all 2.x.y releases that it is marked Public in. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Moved] (YARN-6409) RM does not blacklist node for AM launch failures
[ https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen moved MAPREDUCE-6872 to YARN-6409: - Affects Version/s: (was: 3.0.0-alpha2) 3.0.0-alpha2 Target Version/s: 3.0.0-alpha3 (was: 3.0.0-alpha3) Component/s: (was: resourcemanager) resourcemanager Key: YARN-6409 (was: MAPREDUCE-6872) Project: Hadoop YARN (was: Hadoop Map/Reduce) > RM does not blacklist node for AM launch failures > - > > Key: YARN-6409 > URL: https://issues.apache.org/jira/browse/YARN-6409 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Haibo Chen > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5685) RM configuration allows all failover methods to disabled when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-5685: --- Summary: RM configuration allows all failover methods to disabled when automatic failover is enabled (was: RM allows all failover methods to disabled when automatic failover is enabled) > RM configuration allows all failover methods to disabled when automatic > failover is enabled > --- > > Key: YARN-5685 > URL: https://issues.apache.org/jira/browse/YARN-5685 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Labels: oct16-hard > Attachments: YARN-5685.001.patch, YARN-5685.002.patch, > YARN-5685.003.patch, YARN-5685.004.patch, YARN-5685.005.patch > > > If HA is enabled with automatic failover enabled and embedded failover > disabled, all RMs all come up in standby state. To make one of them active, > the {{\-\-forcemanual}} flag must be used when manually triggering the state > change. Should the active go down, the standby will not become active and > must be manually transitioned with the {{\-\-forcemanual}} flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5301) NM mount cpu cgroups failed on some systems
[ https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-5301: - Summary: NM mount cpu cgroups failed on some systems (was: NM mount cpu cgroups failed on some system) > NM mount cpu cgroups failed on some systems > --- > > Key: YARN-5301 > URL: https://issues.apache.org/jira/browse/YARN-5301 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: Miklos Szegedi > Attachments: YARN-5301.000.patch > > > on ubuntu with linux kernel 3.19, , NM start failed if enable auto mount > cgroup. try command: > ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail > ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu > succ -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5685) RM allows all failover methods to disabled when automatic failover is enabled
[ https://issues.apache.org/jira/browse/YARN-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-5685: --- Summary: RM allows all failover methods to disabled when automatic failover is enabled (was: Non-embedded HA failover is broken) > RM allows all failover methods to disabled when automatic failover is enabled > - > > Key: YARN-5685 > URL: https://issues.apache.org/jira/browse/YARN-5685 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.9.0, 3.0.0-alpha1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Labels: oct16-hard > Attachments: YARN-5685.001.patch, YARN-5685.002.patch, > YARN-5685.003.patch, YARN-5685.004.patch, YARN-5685.005.patch > > > If HA is enabled with automatic failover enabled and embedded failover > disabled, all RMs all come up in standby state. To make one of them active, > the {{\-\-forcemanual}} flag must be used when manually triggering the state > change. Should the active go down, the standby will not become active and > must be manually transitioned with the {{\-\-forcemanual}} flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5301) NM mount cpu cgroups failed on some system
[ https://issues.apache.org/jira/browse/YARN-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Szegedi updated YARN-5301: - Attachment: YARN-5301.000.patch The attached patch fixes the issue by identifying any premounted cgroups and the subsystem set for each mount point. If the user specifies the mount option that means that the existing mount point might not be usable (access, design, etc.), so we remount it with the existing subsystem set (cpu, cpuacct in most cases). I tested it on CentOS6, Ubuntu16 with kernel 4.4. I am testing it on RHEL7 now. I also updated the unit tests and fixed a few compiler warnings in those files. > NM mount cpu cgroups failed on some system > -- > > Key: YARN-5301 > URL: https://issues.apache.org/jira/browse/YARN-5301 > Project: Hadoop YARN > Issue Type: Bug >Reporter: sandflee >Assignee: Miklos Szegedi > Attachments: YARN-5301.000.patch > > > on ubuntu with linux kernel 3.19, , NM start failed if enable auto mount > cgroup. try command: > ./bin/container-executor --mount-cgroups yarn-hadoop cpu=/cgroup/cpufail > ./bin/container-executor --mount-cgroups yarn-hadoop cpu,cpuacct=/cgroup/cpu > succ -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5952) Create REST API for changing YARN scheduler configurations
[ https://issues.apache.org/jira/browse/YARN-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947763#comment-15947763 ] Hadoop QA commented on YARN-5952: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 0s{color} | {color:green} YARN-5734 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} YARN-5734 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} YARN-5734 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} YARN-5734 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s{color} | {color:green} YARN-5734 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 24s{color} | {color:green} YARN-5734 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} YARN-5734 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 46s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-5952 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861074/YARN-5952-YARN-5734.008.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 96af0bf6102c 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | YARN-5734 / 25d2028 | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15419/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15419/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15419/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Create REST API for changing YARN scheduler configurations > -- > > Key: YARN-
[jira] [Commented] (YARN-6195) Export UsedCapacity and AbsoluteUsedCapacity to JMX
[ https://issues.apache.org/jira/browse/YARN-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947730#comment-15947730 ] Jason Lowe commented on YARN-6195: -- Latest patch lgtm, with the caveat that I don't think we can really support used capacity and absolute used capacity in the queue metrics without having per-partition queue metrics. Using a max across partitions seems like a reasonable value to report given we're trying to squash multiple values into a single field. [~leftnoteasy] do you have any concerns about using a max-across-partitions approach here? If not then I think this is ready to go. > Export UsedCapacity and AbsoluteUsedCapacity to JMX > --- > > Key: YARN-6195 > URL: https://issues.apache.org/jira/browse/YARN-6195 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, metrics, yarn >Affects Versions: 3.0.0-alpha3 >Reporter: Benson Qiu >Assignee: Benson Qiu > Attachments: YARN-6195.001.patch, YARN-6195.002.patch, > YARN-6195.003.patch > > > `usedCapacity` and `absoluteUsedCapacity` are currently not available as JMX. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947723#comment-15947723 ] Hadoop QA commented on YARN-6376: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 39s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 19m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6376 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861075/YARN-6376.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f2fbd7eac1bc 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 13c766b | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15420/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15420/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Exceptions caused by synchronous putEntities requests can be swallowed > -- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0-
[jira] [Commented] (YARN-6342) Make TimelineV2Client's drain period after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947719#comment-15947719 ] Haibo Chen commented on YARN-6342: -- Upload a simple patch to make the drain period configurable. We could do [~jrottinghuis]'s suggestion of doing a global drain period for all clients in a followup jira. > Make TimelineV2Client's drain period after stop configurable > > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > Attachments: YARN-6342.00.patch > > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6342) Make TimelineV2Client's drain period after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6342: - Attachment: YARN-6342.00.patch > Make TimelineV2Client's drain period after stop configurable > > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > Attachments: YARN-6342.00.patch > > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6342) Make TimelineV2Client's drain period after stop configurable
[ https://issues.apache.org/jira/browse/YARN-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6342: - Summary: Make TimelineV2Client's drain period after stop configurable (was: Issues in async API of TimelineClient) > Make TimelineV2Client's drain period after stop configurable > > > Key: YARN-6342 > URL: https://issues.apache.org/jira/browse/YARN-6342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Haibo Chen > > Found these with [~rohithsharma] while browsing the code > - In stop: it calls shutdownNow which doens't wait for pending tasks, should > it use shutdown instead ? > {code} > public void stop() { > LOG.info("Stopping TimelineClient."); > executor.shutdownNow(); > try { > executor.awaitTermination(DRAIN_TIME_PERIOD, TimeUnit.MILLISECONDS); > } catch (InterruptedException e) { > {code} > - In TimelineClientImpl#createRunnable: > If any exception happens when publish one entity > (publishWithoutBlockingOnQueue), the thread exists. I think it should try > best effort to continue publishing the timeline entities, one failure should > not cause all followup entities not published. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly
[ https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947706#comment-15947706 ] ASF GitHub Bot commented on YARN-6302: -- Github user templedf commented on a diff in the pull request: https://github.com/apache/hadoop/pull/200#discussion_r108759529 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java --- @@ -551,16 +550,16 @@ public int launchContainer(ContainerStartContext ctx) } else { LOG.info( "Container was marked as inactive. Returning terminated error"); -return ExitCode.TERMINATED.getExitCode(); +return ContainerExecutor.ExitCode.TERMINATED.getExitCode(); --- End diff -- Ah, missed that. > Fail the node, if Linux Container Executor is not configured properly > - > > Key: YARN-6302 > URL: https://issues.apache.org/jira/browse/YARN-6302 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > > We have a cluster that has one node with misconfigured Linux Container > Executor. Every time an AM or regular container is launched on the cluster, > it will fail. The node will still have resources available, so it keeps > failing apps until the administrator notices the issue and decommissions the > node. AM Blacklisting only helps, if the application is already running. > As a possible improvement, when the LCE is used on the cluster and a NM gets > certain errors back from the LCE, like error 24 configuration not found, we > should not try to allocate anything on the node anymore or shut down the node > entirely. That kind of problem normally does not fix itself and it means that > nothing can really run on that node. > {code} > Application application_1488920587909_0010 failed 2 times due to AM Container > for appattempt_1488920587909_0010_02 exited with exitCode: -1000 > Failing this attempt.Diagnostics: Application application_1488920587909_0010 > initialization failed (exitCode=24) with output: > For more detailed output, check the application tracking page: > http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then > click on links to logs of each attempt. > . Failing the application. > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly
[ https://issues.apache.org/jira/browse/YARN-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947687#comment-15947687 ] ASF GitHub Bot commented on YARN-6302: -- Github user templedf commented on a diff in the pull request: https://github.com/apache/hadoop/pull/200#discussion_r108757816 --- Diff: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java --- @@ -580,19 +579,19 @@ public int launchContainer(ContainerStartContext ctx) logOutput(diagnostics); container.handle(new ContainerDiagnosticsUpdateEvent(containerId, diagnostics)); -if (exitCode == LinuxContainerExecutorExitCode. +if (exitCode == ExitCode. --- End diff -- What am I missing? It doesn't look like anything changed... > Fail the node, if Linux Container Executor is not configured properly > - > > Key: YARN-6302 > URL: https://issues.apache.org/jira/browse/YARN-6302 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > > We have a cluster that has one node with misconfigured Linux Container > Executor. Every time an AM or regular container is launched on the cluster, > it will fail. The node will still have resources available, so it keeps > failing apps until the administrator notices the issue and decommissions the > node. AM Blacklisting only helps, if the application is already running. > As a possible improvement, when the LCE is used on the cluster and a NM gets > certain errors back from the LCE, like error 24 configuration not found, we > should not try to allocate anything on the node anymore or shut down the node > entirely. That kind of problem normally does not fix itself and it means that > nothing can really run on that node. > {code} > Application application_1488920587909_0010 failed 2 times due to AM Container > for appattempt_1488920587909_0010_02 exited with exitCode: -1000 > Failing this attempt.Diagnostics: Application application_1488920587909_0010 > initialization failed (exitCode=24) with output: > For more detailed output, check the application tracking page: > http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then > click on links to logs of each attempt. > . Failing the application. > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5478) [YARN-4902] Define Java API for generalized & unified scheduling-strategies.
[ https://issues.apache.org/jira/browse/YARN-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947674#comment-15947674 ] Naganarasimha G R commented on YARN-5478: - Had offline discussion with the above two points with [~leftnoteasy], And as per it : # If the node attribute expression is placed in {{affinityTargets}} then nodes satisfying the node attribute expression will be selected and if placed in {{antiAffinityTargets}} nodes *not* satisfying the node attribute expression will be selected. And advantage is, we can have combinations of node label expression, like delayed-or(placement-strategy-1, placement-strategy-2), different placement-strategy has different node attribute expression. # *PlacementStrategy* if it has scope as *RACK* and Node attribute affinity target, then we can throw an exception not accept the RR. > [YARN-4902] Define Java API for generalized & unified scheduling-strategies. > > > Key: YARN-5478 > URL: https://issues.apache.org/jira/browse/YARN-5478 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-5478.1.patch, YARN-5478.2.patch, > YARN-5478.preliminary-poc.1.patch, YARN-5478.preliminary-poc.2.patch > > > Define Java API for application to specify generic scheduling requirements > described in YARN-4902 design doc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3760) FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()
[ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947664#comment-15947664 ] Hadoop QA commented on YARN-3760: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-3760 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12861071/YARN-3760.00.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b2f1d744a65e 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 15e3873 | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15418/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15418/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close() > > > Key: YARN-3760 > URL: https://issues.apache.org/jira/browse/YARN-3760 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Haibo Chen >Priority: Critical
[jira] [Updated] (YARN-6361) FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues
[ https://issues.apache.org/jira/browse/YARN-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-6361: --- Description: FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. Most of the time is spent in FairShareComparator.compare. We could improve this by doing the calculations outside the sort loop {{(O\(n\))}} and we sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an performance issue when there are huge number of applications in a single queue. The attachments shows the performance impact when there are 10k applications in one queue. (was: FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. Most of the time is spent in FairShareComparator.compare. We could improve this by doing the calculations outside the sort loop {{(O\(n\))}} and we sorted by a fixed number inside instead {{O(n*log\(n\))}}.) > FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big > queues > > > Key: YARN-6361 > URL: https://issues.apache.org/jira/browse/YARN-6361 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Miklos Szegedi >Assignee: Yufei Gu >Priority: Minor > Attachments: dispatcherthread.png, threads.png > > > FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. > Most of the time is spent in FairShareComparator.compare. We could improve > this by doing the calculations outside the sort loop {{(O\(n\))}} and we > sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an > performance issue when there are huge number of applications in a single > queue. The attachments shows the performance impact when there are 10k > applications in one queue. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6401) terminating signal should be able to specify per application to support graceful-stop
[ https://issues.apache.org/jira/browse/YARN-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947656#comment-15947656 ] Jason Lowe commented on YARN-6401: -- Ah, sorry. I was thinking it was ignoring SIGTERM and thus not cleaning up because it would get killed by the subsequent SIGKILL. Instead it sounds like it _is_ responding to SIGTERM but not cleaning up. Isn't that a bit odd? The whole point of SIGTERM is to request a shutdown of the process rather than forcing one. I'm not an httpd expert, so I started digging into the docs to try to understand why it wouldn't do something sane with TERM but does with a non-standard signal like WINCH. Turns out it does handle TERM, but it's aggressive such that in-progress requests may be interrupted/canceled. WINCH only advises things to exit, which sounds like active requests could continue to be processed but the listen port is no longer monitored so no new requests will be processed. What worries me here is that we can still end up with an unorderly shutdown even if YARN sent WINCH instead of TERM. The default delay between the TERM and KILL signals is relatively short, which is why the processing httpd does for TERM seems more appropriate here. If a request could take hundreds of milliseconds to process then the KILL is going to arrive too soon after the WINCH signal unless the delay between the two signals is widened. However that delay is not a per-app setting, and making it a per-app setting would cause a DoS problem. Containers are often killed because YARN needs the container to leave in a timely manner (e.g.: container running beyond limits, preemption, etc.). So I still think this is something better handled by the application framework (in this case Slider) rather than YARN. MapReduce has a similar example. MapReduce jobs can be killed via YARN, but it's harsh and things are often lost when this occurs. That's why the {{mapred job -kill}} command first tries to kill the job by contacting the AM and requesting it to do an orderly shutdown outside of YARN, and only falls back on YARN to terminate the containers if the job is unresponsive to the kill request. I think the same thing applies here. If we really want an orderly shutdown to httpd so we won't kill outstanding requests (even if they can take a while) then Slider (or some layer on top of Slider) should support sending the WINCH signals to the containers for the app and then the app can terminate when all containers have completed their shutdown. Then the application can implement an arbitrary, application-specific shutdown sequence and timing. If YARN needs to do the killing directly then we cannot wait an arbitrary amount of time for the app to cleanup and shutdown gracefully. I think YARN will still need some support to send the WINCH signal in either case. Currently containers can be sent signals after YARN-1897, but it's only a restricted subset that can be translated cross-platform. That would need to be extended to support more arbitrary signals like WINCH. > terminating signal should be able to specify per application to support > graceful-stop > - > > Key: YARN-6401 > URL: https://issues.apache.org/jira/browse/YARN-6401 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: kyungwan nam > > when stop container, first send SIGTERM to the process. > after a while, send SIGKILL if the process is still alive. > above process is always the same for any application. > but, to graceful-stop, sometimes it need to send another signal instead of > SIGTERM. > for instance, if apache httpd on slider is running, SIGWINCH should be came > to stop gracefully. > the way to stop gracefully is depend on application. > it will be good if we can define a signal to terminate per application. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)
[ https://issues.apache.org/jira/browse/YARN-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947652#comment-15947652 ] Carlo Curino commented on YARN-6408: It looks like the userName was probably removed (last app for this user) and then we tried to update a missing entry: {color:red} getUser(userName).setUserResourceLimit(userLimitResource); return userLimitResource; {color} I put a simple null-check, though I would like someone to validate it, as I am not sure that is enough: [~wangda], [~jianhe], [~asuresh], thoughts? > NPE in handling NODE_UPDATE (while running SLS) > --- > > Key: YARN-6408 > URL: https://issues.apache.org/jira/browse/YARN-6408 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino > Attachments: YARN-6408.v0.patch > > > The UsersManager.computeUserLimit() throws an NPE during an SLS simulation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)
[ https://issues.apache.org/jira/browse/YARN-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947650#comment-15947650 ] Sunil G commented on YARN-6408: --- I will help to take a look. It seems we got NPE from here {{getUser(userName).setUserResourceLimit(userLimitResource);}}. It seems the user was removed was never added while computeUserLimit was executing from SLS. Since {{computeUserLimit}} and {{removeUser}} has writeLock, i suspect that the user was not added. I ll check the sls part a lil bit more and will share my feedback > NPE in handling NODE_UPDATE (while running SLS) > --- > > Key: YARN-6408 > URL: https://issues.apache.org/jira/browse/YARN-6408 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino > Attachments: YARN-6408.v0.patch > > > The UsersManager.computeUserLimit() throws an NPE during an SLS simulation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6376: - Attachment: YARN-6376.00.patch > Exceptions caused by synchronous putEntities requests can be swallowed > -- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Labels: yarn-5355-merge-blocker > Attachments: YARN-6376.00.patch > > > TimelineCollector.putEntitities() is currently implemented by calling > TimelineWriter.write() followed by TimelineWriter.flush(). Given > HBaseTimelineWriter.write() is an asynchronous operation, it is possible that > TimelineClient sends a synchronous putEntities() request for critical data, > but never gets back an exception even though the HBase write request to store > the entities may have failed. > This is due to a race condition between the WriterFlushThread in > TimelineCollectorManager and web threads handling synchronous putEntities() > requests. Entities are first put into the buffer by the web thread, it is > possible that before the web thread invokes writer.flush(), WriterFlushThread > is fired up to flush the writer. If the entities were not successfully > written to the backend during flush, the WriterFlushThread would just simply > log an error, whereas the web thread would never get an exception out from > its writer.flush() invocation. This is bad because the reason of > TimelineClient sending synchronously putEntities() is to retry upon any > exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6382) Address race condition on TimelineWriter.flush() caused by buffer-sized flush
[ https://issues.apache.org/jira/browse/YARN-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned YARN-6382: Assignee: Haibo Chen > Address race condition on TimelineWriter.flush() caused by buffer-sized flush > - > > Key: YARN-6382 > URL: https://issues.apache.org/jira/browse/YARN-6382 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Labels: yarn-5355-merge-blocker > > YARN-6376 fixes the race condition between putEntities() and periodical > flush() by WriterFlushThread in TimelineCollectorManager, or between > putEntities() in different threads. > However, BufferedMutator can have internal size-based flush as well. We need > to address the resulting race condition. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)
[ https://issues.apache.org/jira/browse/YARN-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-6408: --- Attachment: YARN-6408.v0.patch > NPE in handling NODE_UPDATE (while running SLS) > --- > > Key: YARN-6408 > URL: https://issues.apache.org/jira/browse/YARN-6408 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino > Attachments: YARN-6408.v0.patch > > > The UsersManager.computeUserLimit() throws an NPE during an SLS simulation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed
[ https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-6376: - Summary: Exceptions caused by synchronous putEntities requests can be swallowed (was: Exceptions caused by synchronous putEntities requests can be swallowed in TimelineCollector) > Exceptions caused by synchronous putEntities requests can be swallowed > -- > > Key: YARN-6376 > URL: https://issues.apache.org/jira/browse/YARN-6376 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.0.0-alpha2 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Labels: yarn-5355-merge-blocker > > TimelineCollector.putEntitities() is currently implemented by calling > TimelineWriter.write() followed by TimelineWriter.flush(). Given > HBaseTimelineWriter.write() is an asynchronous operation, it is possible that > TimelineClient sends a synchronous putEntities() request for critical data, > but never gets back an exception even though the HBase write request to store > the entities may have failed. > This is due to a race condition between the WriterFlushThread in > TimelineCollectorManager and web threads handling synchronous putEntities() > requests. Entities are first put into the buffer by the web thread, it is > possible that before the web thread invokes writer.flush(), WriterFlushThread > is fired up to flush the writer. If the entities were not successfully > written to the backend during flush, the WriterFlushThread would just simply > log an error, whereas the web thread would never get an exception out from > its writer.flush() invocation. This is bad because the reason of > TimelineClient sending synchronously putEntities() is to retry upon any > exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)
[ https://issues.apache.org/jira/browse/YARN-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947638#comment-15947638 ] Carlo Curino commented on YARN-6408: After a 2 hours running SLS (with YARN-6363 synth generator), I got an NPE in handling NODE_UPDATE. [~asuresh] suspects this to be related to the processing of a NODE_UPDATE after the application is completed. Below the stack trace for this: 2017-03-28 20:07:53,059 FATAL [SchedulerEventDispatcher:Event Processor] event.EventDispatcher (EventDispatcher.java:run(75)) - Error in handling event type NODE_UPDATE to the Event Dispatcher java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.computeUserLimit(UsersManager.java:732) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.reComputeUserLimits(UsersManager.java:611) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager.getComputedResourceLimitForActiveUsers(UsersManager.java:463) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getResourceLimitForActiveUsers(LeafQueue.java:1370) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getHeadroom(LeafQueue.java:1243) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getHeadroom(LeafQueue.java:1235) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityHeadroomProvider.getHeadroom(CapacityHeadroomProvider.java:57) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getHeadroom(FiCaSchedulerApp.java:757) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.releaseResource(LeafQueue.java:1618) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1528) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1600) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:602) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateCompletedContainers(AbstractYarnScheduler.java:965) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1038) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:971) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1368) at org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:254) at org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:84) at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66) at java.lang.Thread.run(Thread.java:745) > NPE in handling NODE_UPDATE (while running SLS) > --- > > Key: YARN-6408 > URL: https://issues.apache.org/jira/browse/YARN-6408 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Carlo Curino > > The UsersManager.computeUserLimit() throws an NPE during an SLS simulation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5952) Create REST API for changing YARN scheduler configurations
[ https://issues.apache.org/jira/browse/YARN-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947633#comment-15947633 ] Jonathan Hung commented on YARN-5952: - 008 patch preserves the capacity-scheduler.xml in the target/test-classes directory so that resourcemanager tests which run after this one don't fail. > Create REST API for changing YARN scheduler configurations > -- > > Key: YARN-5952 > URL: https://issues.apache.org/jira/browse/YARN-5952 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Jonathan Hung > Attachments: YARN-5952.001.patch, YARN-5952.002.patch, > YARN-5952-YARN-5734.003.patch, YARN-5952-YARN-5734.004.patch, > YARN-5952-YARN-5734.005.patch, YARN-5952-YARN-5734.006.patch, > YARN-5952-YARN-5734.007.patch, YARN-5952-YARN-5734.008.patch > > > Based on the design in YARN-5734. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6408) NPE in handling NODE_UPDATE (while running SLS)
Carlo Curino created YARN-6408: -- Summary: NPE in handling NODE_UPDATE (while running SLS) Key: YARN-6408 URL: https://issues.apache.org/jira/browse/YARN-6408 Project: Hadoop YARN Issue Type: Bug Reporter: Carlo Curino The UsersManager.computeUserLimit() throws an NPE during an SLS simulation. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5952) Create REST API for changing YARN scheduler configurations
[ https://issues.apache.org/jira/browse/YARN-5952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-5952: Attachment: YARN-5952-YARN-5734.008.patch > Create REST API for changing YARN scheduler configurations > -- > > Key: YARN-5952 > URL: https://issues.apache.org/jira/browse/YARN-5952 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Hung >Assignee: Jonathan Hung > Attachments: YARN-5952.001.patch, YARN-5952.002.patch, > YARN-5952-YARN-5734.003.patch, YARN-5952-YARN-5734.004.patch, > YARN-5952-YARN-5734.005.patch, YARN-5952-YARN-5734.006.patch, > YARN-5952-YARN-5734.007.patch, YARN-5952-YARN-5734.008.patch > > > Based on the design in YARN-5734. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3760) FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()
[ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-3760: - Attachment: YARN-3760.00.patch > FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close() > > > Key: YARN-3760 > URL: https://issues.apache.org/jira/browse/YARN-3760 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Haibo Chen >Priority: Critical > Attachments: YARN-3760.00.patch > > > The aggregated log file does not appear to be properly closed when writes > fail. This leaves a lease renewer active in the NM that spams the NN with > lease renewals. If the token is marked not to be cancelled, the renewals > appear to continue until the token expires. If the token is cancelled, the > periodic renew spam turns into a flood of failed connections until the lease > renewer gives up. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3760) FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()
[ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-3760: - Target Version/s: 2.8.1, 3.0.0-alpha3 (was: 2.8.0) > FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close() > > > Key: YARN-3760 > URL: https://issues.apache.org/jira/browse/YARN-3760 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Haibo Chen >Priority: Critical > Attachments: YARN-3760.00.patch > > > The aggregated log file does not appear to be properly closed when writes > fail. This leaves a lease renewer active in the NM that spams the NN with > lease renewals. If the token is marked not to be cancelled, the renewals > appear to continue until the token expires. If the token is cancelled, the > periodic renew spam turns into a flood of failed connections until the lease > renewer gives up. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3760) FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close()
[ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated YARN-3760: - Summary: FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close() (was: Log aggregation failures ) > FSDataOutputStream leak in AggregatedLogFormat.LogWriter.close() > > > Key: YARN-3760 > URL: https://issues.apache.org/jira/browse/YARN-3760 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Haibo Chen >Priority: Critical > > The aggregated log file does not appear to be properly closed when writes > fail. This leaves a lease renewer active in the NM that spams the NN with > lease renewals. If the token is marked not to be cancelled, the renewals > appear to continue until the token expires. If the token is cancelled, the > periodic renew spam turns into a flood of failed connections until the lease > renewer gives up. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3760) Log aggregation failures
[ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947608#comment-15947608 ] Haibo Chen commented on YARN-3760: -- bq. the ctor creates the fs data stream then a TFile.Writer w/o a try/catch. If the TFile.Writer ctor throws an exception, it's impossible to close the stream. YARN-6288 is in flight to fix this issue. Will upload a patch to address the other issue that ISE causes FSDataStream leak. > Log aggregation failures > - > > Key: YARN-3760 > URL: https://issues.apache.org/jira/browse/YARN-3760 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Haibo Chen >Priority: Critical > > The aggregated log file does not appear to be properly closed when writes > fail. This leaves a lease renewer active in the NM that spams the NN with > lease renewals. If the token is marked not to be cancelled, the renewals > appear to continue until the token expires. If the token is cancelled, the > periodic renew spam turns into a flood of failed connections until the lease > renewer gives up. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-3760) Log aggregation failures
[ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned YARN-3760: Assignee: Haibo Chen > Log aggregation failures > - > > Key: YARN-3760 > URL: https://issues.apache.org/jira/browse/YARN-3760 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Assignee: Haibo Chen >Priority: Critical > > The aggregated log file does not appear to be properly closed when writes > fail. This leaves a lease renewer active in the NM that spams the NN with > lease renewals. If the token is marked not to be cancelled, the renewals > appear to continue until the token expires. If the token is cancelled, the > periodic renew spam turns into a flood of failed connections until the lease > renewer gives up. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6407) Improve and fix locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947594#comment-15947594 ] Yufei Gu commented on YARN-6407: Hi [~zhengchenyu], thanks for filing this jira. IIUC, you reduced frequency of NM node update to avoid flooding the network in a 5k nodes cluster, but Continuous Scheduling is not necessary when there are still enough node update events in the clusters. Besides the improvement of lock in FS, we can always balance time interval of continuous scheduling and frequency of NM node update to get better scheduling latency. > Improve and fix locks of RM scheduler > - > > Key: YARN-6407 > URL: https://issues.apache.org/jira/browse/YARN-6407 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 > Environment: CentOS 7, 1 Gigabit Ethernet >Reporter: zhengchenyu > Fix For: 2.7.1 > > Original Estimate: 2m > Remaining Estimate: 2m > > First,this issue dose not duplicate the YARN-3091. > In our cluster, we have 5k nodes, and the server is configured with 1 Gigabit > Ethernet. So network is bottleneck in our cluster. > We must distcp data from warehouse, because of 1 Gigabit Ethernet, we must > set yarn.scheduler.fair.max.assign to 5, or must lead to hotspot. > The setting that max.assign is 5 lead to the assigned ability decreased. So > we start the ContinuousSchedulingThread. > As more applicaitons running in our cluster , and with > ContinuousSchedulingThread, the problem of lock contention is more serious. > In our cluster, the callqueue of ApplicationMasterSeriver's rpc is high > occasionally. we worried that more problem occure in future with more > application are running. > Here is our logical graph: > "1 Gigabit Ethernet" and "data hot spot" ==> "set > yarn.scheduler.fair.max.assign to 5" ==> "ContinuousSchedulingThread is > started" and "more applcations" => "lock contention" > I know YARN-3091 solved this problem, but the patch aims that change the > object lock to read write lock. This change is still Coarse-Grained. So I > think we lock the resources or not lock the large section code. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6407) Improve and fix locks of RM scheduler
[ https://issues.apache.org/jira/browse/YARN-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947567#comment-15947567 ] Daniel Templeton commented on YARN-6407: YARN-3091 just replaced {{synchronized}} with a read/write lock. There's still a lot that can be done to improve the locking. The trick is to map the state that's being protected and figure out how not to lock unnecessarily for each operation. > Improve and fix locks of RM scheduler > - > > Key: YARN-6407 > URL: https://issues.apache.org/jira/browse/YARN-6407 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.1 > Environment: CentOS 7, 1 Gigabit Ethernet >Reporter: zhengchenyu > Fix For: 2.7.1 > > Original Estimate: 2m > Remaining Estimate: 2m > > First,this issue dose not duplicate the YARN-3091. > In our cluster, we have 5k nodes, and the server is configured with 1 Gigabit > Ethernet. So network is bottleneck in our cluster. > We must distcp data from warehouse, because of 1 Gigabit Ethernet, we must > set yarn.scheduler.fair.max.assign to 5, or must lead to hotspot. > The setting that max.assign is 5 lead to the assigned ability decreased. So > we start the ContinuousSchedulingThread. > As more applicaitons running in our cluster , and with > ContinuousSchedulingThread, the problem of lock contention is more serious. > In our cluster, the callqueue of ApplicationMasterSeriver's rpc is high > occasionally. we worried that more problem occure in future with more > application are running. > Here is our logical graph: > "1 Gigabit Ethernet" and "data hot spot" ==> "set > yarn.scheduler.fair.max.assign to 5" ==> "ContinuousSchedulingThread is > started" and "more applcations" => "lock contention" > I know YARN-3091 solved this problem, but the patch aims that change the > object lock to read write lock. This change is still Coarse-Grained. So I > think we lock the resources or not lock the large section code. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-3760) Log aggregation failures
[ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp reopened YARN-3760: --- Line numbers are from an old release but the error is evident. {code} java.lang.IllegalStateException: Cannot close TFile in the middle of key-value insertion. at org.apache.hadoop.io.file.tfile.TFile$Writer.close(TFile.java:310) at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.close(AggregatedLogFormat.java:456) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:326) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:429) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:388) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:387) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {code} _AggregatedLogFormat.LogWriter_ {code} public void close() { try { this.writer.close(); } catch (IOException e) { LOG.warn("Exception closing writer", e); } IOUtils.closeStream(fsDataOStream); } {code} TFile writer's close which may throw {{IllegalStateException}} if the underlying fs data stream failed. Unfortunately it only catches IOE, so the ISE rips out w/o closing the fsdata stream. Additionally, the ctor creates the fs data stream then a TFile.Writer w/o a try/catch. If the TFile.Writer ctor throws an exception, it's impossible to close the stream. I haven't checked if there are futher issues with closing the writer high in the stack. > Log aggregation failures > - > > Key: YARN-3760 > URL: https://issues.apache.org/jira/browse/YARN-3760 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Daryn Sharp >Priority: Critical > > The aggregated log file does not appear to be properly closed when writes > fail. This leaves a lease renewer active in the NM that spams the NN with > lease renewals. If the token is marked not to be cancelled, the renewals > appear to continue until the token expires. If the token is cancelled, the > periodic renew spam turns into a flood of failed connections until the lease > renewer gives up. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6168) Restarted RM may not inform AM about all existing containers
[ https://issues.apache.org/jira/browse/YARN-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947326#comment-15947326 ] Jason Lowe commented on YARN-6168: -- This sounds like the RM isn't waiting long enough for all the live NMs to report in before reporting the live containers to the app. Technically it would have to wait up to the full NM expiry interval before it could know for sure no more containers are going to be reported by late-heartbeating NMs, so once fix would be to hold off AM restarts of container-preserving apps after an RM restart until the NM expiry interval has passed since restart. However I don't know if apps are willing to wait that long before their AM recovers. If not then there is always going to be the possibility that not all live containers are reported when the AM restarts and registers if an NM ends jup heartbeating late. > Restarted RM may not inform AM about all existing containers > > > Key: YARN-6168 > URL: https://issues.apache.org/jira/browse/YARN-6168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi > > There appears to be a race condition when an RM is restarted. I had a > situation where the RMs and AM were down, but NMs and app containers were > still running. When I restarted the RM, the AM restarted, registered with the > RM, and received its list of existing containers before the NMs had reported > all of their containers to the RM. The AM was only told about some of the > app's existing containers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6168) Restarted RM may not inform AM about all existing containers
[ https://issues.apache.org/jira/browse/YARN-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947305#comment-15947305 ] Billie Rinaldi commented on YARN-6168: -- Yes. In my case, the AM requested new containers immediately and got them allocated, so when it was informed later about the old containers, it just released them. > Restarted RM may not inform AM about all existing containers > > > Key: YARN-6168 > URL: https://issues.apache.org/jira/browse/YARN-6168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi > > There appears to be a race condition when an RM is restarted. I had a > situation where the RMs and AM were down, but NMs and app containers were > still running. When I restarted the RM, the AM restarted, registered with the > RM, and received its list of existing containers before the NMs had reported > all of their containers to the RM. The AM was only told about some of the > app's existing containers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6403) Invalid local resource request can raise NPE and make NM exit
[ https://issues.apache.org/jira/browse/YARN-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947283#comment-15947283 ] Jason Lowe commented on YARN-6403: -- Sorry, I completely missed the server-side change in ContainerImpl. I'm not sure that's the correct place to make the server-side change because it's happening so late in the container lifecycle. It would be better if we simply failed the container launch request _immediately_ rather than wait until the container transitions all the way to the localizing state. That way the client gets immediate feedback that their request was malformed rather than wondering why their container launch mysteriously failed sometime later. I think it's more appropriate to have ContainerManagerImpl#startContainerInternal sanity check the request (which it already does to some degree, just not for the local resources) and throw a YarnException if the request is malformed. That way the client will receive a failed container start response to their start request, so they will immediately know their request was bad. It would be good to add unit tests for both the client and server changes. > Invalid local resource request can raise NPE and make NM exit > - > > Key: YARN-6403 > URL: https://issues.apache.org/jira/browse/YARN-6403 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Tao Yang > Attachments: YARN-6403.001.patch > > > Recently we found this problem on our testing environment. The app that > caused this problem added a invalid local resource request(have no location) > into ContainerLaunchContext like this: > {code} > localResources.put("test", LocalResource.newInstance(location, > LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100, > System.currentTimeMillis())); > ContainerLaunchContext amContainer = > ContainerLaunchContext.newInstance(localResources, environment, > vargsFinal, null, securityTokens, acls); > {code} > The actual value of location was null although app doesn't expect that. This > mistake cause several NMs exited with the NPE below and can't restart until > the nm recovery dirs were deleted. > {code} > FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.(LocalResourceRequest.java:46) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > NPE occured when created LocalResourceRequest instance for invalid resource > request. > {code} > public LocalResourceRequest(LocalResource resource) > throws URISyntaxException { > this(resource.getResource().toPath(), //NPE occurred here > resource.getTimestamp(), > resource.getType(), > resource.getVisibility(), > resource.getPattern()); > } > {code} > We can't guarantee the validity of local resource request now, but we could > avoid damaging the cluster. Perhaps we can verify the resource both in > ContainerLaunchContext and LocalResourceRequest? Please feel free to give > your suggestions. -- This message