[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
[ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044498#comment-15044498 ] Xianyin Xin commented on YARN-4415: --- Sorry for the late, [~Naganarasimha]. I don't know i understand correctly, so pls correct me if i'm wrong. Now there're two cases, 1), if we have set the access-labels for a queue in xml, and 2), we didnt set the access-labels for a queue. For case 1), the access-labels and the configured capacities (0 for capacity and 100 max by default) are imported, and for case 2), the access-labels of the queue is inherited from its parent, but the capacities of the labels are 0 since {{setupConfigurableCapacities()}} only considers the configured access-labels in xml. {code} this.accessibleLabels = csContext.getConfiguration().getAccessibleNodeLabels(getQueuePath()); this.defaultLabelExpression = csContext.getConfiguration() .getDefaultNodeLabelExpression(getQueuePath()); // inherit from parent if labels not set if (this.accessibleLabels == null && parent != null) { this.accessibleLabels = parent.getAccessibleNodeLabels(); } // inherit from parent if labels not set if (this.defaultLabelExpression == null && parent != null && this.accessibleLabels.containsAll(parent.getAccessibleNodeLabels())) { this.defaultLabelExpression = parent.getDefaultNodeLabelExpression(); } // After we setup labels, we can setup capacities setupConfigurableCapacities(); {code} This would cause confusion because the access-labels inherited from parent have 0 max capacities. If the case is true, i agree that the inherited access-labels has 100 max capacities by default. But for the two scenarios in the descrition, i feel the final result is reasonable because you didnt set the access-labels for the queue and its parent doesn't have the access-labels also, so the label is not accessable explicitly by the queue. But the info that the web ui shows is wrong if the above analysis is right. i think the cause is from follow sentence in {QueueCapacitiesInfo.java}, {code} if (maxCapacity < CapacitySchedulerQueueInfo.EPSILON || maxCapacity > 1f) maxCapacity = 1f; {code} where it set the {{maxCapacity}} to 1 for case {{maxCapacity == 0}} which is just the case 2) above. cc [~leftnoteasy]. > Scheduler Web Ui shows max capacity for the queue is 100% but when we submit > application doesnt get assigned > > > Key: YARN-4415 > URL: https://issues.apache.org/jira/browse/YARN-4415 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: App info with diagnostics info.png, screenshot-1.png > > > Steps to reproduce the issue : > Scenario 1: > # Configure a queue(default) with accessible node labels as * > # create a exclusive partition *xxx* and map a NM to it > # ensure no capacities are configured for default for label xxx > # start an RM app with queue as default and label as xxx > # application is stuck but scheduler ui shows 100% as max capacity for that > queue > Scenario 2: > # create a nonexclusive partition *sharedPartition* and map a NM to it > # ensure no capacities are configured for default queue > # start an RM app with queue as *default* and label as *sharedPartition* > # application is stuck but scheduler ui shows 100% as max capacity for that > queue for *sharedPartition* > For both issues cause is the same default max capacity and abs max capacity > is set to Zero % -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044406#comment-15044406 ] Hadoop QA commented on YARN-4309: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 358, now 359). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 32s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 6s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 7s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. {color} | | {colo
[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044404#comment-15044404 ] Wangda Tan commented on YARN-4309: -- [~vvasudev], Thanks for reply, make sense to me. Few comments: - Could you make sure container process will be launched even if copy script or list folder command fails? - Could you add echo command (something like echo "Printing container launch debug info...") to container_launch.sh? (After following "if") {code} 362 if (getConf() != null && getConf().getBoolean( 363 YarnConfiguration.NM_LOG_CONTAINER_DEBUG_INFO, 364 YarnConfiguration.DEFAULT_NM_LOG_CONTAINER_DEBUG_INFO)) { {code} - Add a test to verify log aggregation result contains such debugging output? - Could you upload a sample container_launch.sh for easier review? > Add debug information to application logs when a container fails > > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, > YARN-4309.006.patch, YARN-4309.007.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism
[ https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044361#comment-15044361 ] Varun Vasudev commented on YARN-3542: - All the existing configurations will continue to work as is. The patch adds a new configuration - {code} yarn.nodemanager.resource.cpu.enabled {code} which if set to true will create the cpu handler as part of the resource handler chain. None of the other configurations change. If both yarn.nodemanager.resource.cpu.enabled and yarn.nodemanager.linux-container-executor.resources-handler.class are set, you end up in a situation where both objects end up modifying the same file. > Re-factor support for CPU as a resource using the new ResourceHandler > mechanism > --- > > Key: YARN-3542 > URL: https://issues.apache.org/jira/browse/YARN-3542 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana >Priority: Critical > Attachments: YARN-3542.001.patch, YARN-3542.002.patch > > > In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier > addition of new resource types in the nodemanager (this was used for network > as a resource - See YARN-2140 ). We should refactor the existing CPU > implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using > the new ResourceHandler mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4309) Add debug information to application logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4309: Attachment: YARN-4309.007.patch Uploaded a new patch with clarifications on following symlinks in the comments and yarn-default.xml . > Add debug information to application logs when a container fails > > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, > YARN-4309.006.patch, YARN-4309.007.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated YARN-2885: -- Attachment: YARN-2885-yarn-2877.full-2.patch Updating patch: # The earlier patch had changed the containerId generation scheme such that containers generated by the RM would be even numbers and those by the NM would be odd. Unfortunately, that requires too many test case changes. The latest patch uses a new scheme where containerIds generated by RM remains as is.. but those generated by NM would be negative (decr by -1) # Added more test cases to the LocalScheduler > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch, > YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, > YARN-2885-yarn-2877.full.patch, YARN-2885_api_changes.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4426) unhealthy disk makes NM LOST
sandflee created YARN-4426: -- Summary: unhealthy disk makes NM LOST Key: YARN-4426 URL: https://issues.apache.org/jira/browse/YARN-4426 Project: Hadoop YARN Issue Type: Bug Reporter: sandflee nm are hanged because mkdir hangs in DiskHealthMonitor-Timer, and nodeStatusUpdater couldn't get sync lock in getNodeStatus "DiskHealthMonitor-Timer" daemon prio=10 tid=0x7f4b3d867000 nid=0x50c8 runnable [0x7f4b27ef9000] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createDirectory(Native Method) at java.io.File.mkdir(File.java:1310) at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:90) at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.verifyDirUsingMkdir(DirectoryCollection.java:338) at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.testDirs(DirectoryCollection.java:310) at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:230) - locked <0xf8970408> (a org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:361) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$400(LocalDirsHandlerService.java:51) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:123) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) "Node Status Updater" prio=10 tid=0x7f4b3cd6d800 nid=0x4af5 waiting for monitor entry [0x7f4b1c141000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.getFailedDirs(DirectoryCollection.java:170) - waiting to lock <0xf8970408> (a org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getDisksHealthReport(LocalDirsHandlerService.java:259) at org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.getHealthReport(NodeHealthCheckerService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:365) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$100(NodeStatusUpdaterImpl.java:77) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:588) at java.lang.Thread.run(Thread.java:745) "AsyncDispatcher event handler" prio=10 tid=0x7f4b3da24000 nid=0x50d9 waiting for monitor entry [0x7f4b245b6000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.getGoodDirs(DirectoryCollection.java:163) - waiting to lock <0xf8970408> (a org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalDirsForCleanup(LocalDirsHandlerService.java:229) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handleCleanupContainerResources(ResourceLocalizationService.java:497) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:395) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044316#comment-15044316 ] Naganarasimha G R commented on YARN-4416: - [~sunilg], Hmm true, but any other way to avoid sync locks for the get API's ? I feel thats really not good its like web ui,CLI, REST everybody access Queue to get information and if any problem else where Main Scheduler Thread can get stuck. Also we can have unexpected deadlocks for read calls like one in the attached stack trace. Can Read/Write locks in the leaf queue be an option ? > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044310#comment-15044310 ] Sunil G commented on YARN-4304: --- Thank you [~Naganarasimha] for helping in verifying the patch. Yes, I will be handling as the suggestion from Wangda in another ticket and has provided patch there. Once that's resolved, we ll remove the leafqueue dependency here and only will be dependent on ResourceUsage as you suggested. Thank you. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044307#comment-15044307 ] Sunil G commented on YARN-4416: --- I also agree that we need to make ordering policy independent. But a fail fast iterator will also be pblm as we have an open loophole to change some contents in SchedulableEntity. A discussion took place while doing priority with Jian on same line. And we dropped the plan to have locks inside ordering policy due to tight coupling with leafqueue. Looping [~jianhe] also to the thread. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044261#comment-15044261 ] Lin Yiqun commented on YARN-4381: - [~djp], do you have some time to review my patch or what else can I do for this jira ? > Add container launchEvent and container localizeFailed metrics in container > --- > > Key: YARN-4381 > URL: https://issues.apache.org/jira/browse/YARN-4381 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun > Attachments: YARN-4381.001.patch > > > Recently, I found a issue on nodemanager metrics.That's > {{NodeManagerMetrics#containersLaunched}} is not actually means the container > succeed launched times.Because in some time, it will be failed when receiving > the killing command or happening container-localizationFailed.This will lead > to a failed container.But now,this counter value will be increased in these > code whenever the container is started successfully or failed. > {code} > Credentials credentials = parseCredentials(launchContext); > Container container = > new ContainerImpl(getConfig(), this.dispatcher, > context.getNMStateStore(), launchContext, > credentials, metrics, containerTokenIdentifier); > ApplicationId applicationID = > containerId.getApplicationAttemptId().getApplicationId(); > if (context.getContainers().putIfAbsent(containerId, container) != null) { > NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER, > "ContainerManagerImpl", "Container already running on this node!", > applicationID, containerId); > throw RPCUtil.getRemoteException("Container " + containerIdStr > + " already is running on this node!!"); > } > this.readLock.lock(); > try { > if (!serviceStopped) { > // Create the application > Application application = > new ApplicationImpl(dispatcher, user, applicationID, credentials, > context); > if (null == context.getApplications().putIfAbsent(applicationID, > application)) { > LOG.info("Creating a new application reference for app " + > applicationID); > LogAggregationContext logAggregationContext = > containerTokenIdentifier.getLogAggregationContext(); > Map appAcls = > container.getLaunchContext().getApplicationACLs(); > context.getNMStateStore().storeApplication(applicationID, > buildAppProto(applicationID, user, credentials, appAcls, > logAggregationContext)); > dispatcher.getEventHandler().handle( > new ApplicationInitEvent(applicationID, appAcls, > logAggregationContext)); > } > this.context.getNMStateStore().storeContainer(containerId, request); > dispatcher.getEventHandler().handle( > new ApplicationContainerInitEvent(container)); > > this.context.getContainerTokenSecretManager().startContainerSuccessful( > containerTokenIdentifier); > NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER, > "ContainerManageImpl", applicationID, containerId); > // TODO launchedContainer misplaced -> doesn't necessarily mean a > container > // launch. A finished Application will not launch containers. > metrics.launchedContainer(); > metrics.allocateContainer(containerTokenIdentifier.getResource()); > } else { > throw new YarnException( > "Container start failed as the NodeManager is " + > "in the process of shutting down"); > } > {code} > In addition, we are lack of localzationFailed metric in container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044258#comment-15044258 ] Tsuyoshi Ozawa commented on YARN-4348: -- [~jianhe] could you take a look? > ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding > blocking ZK's event thread > -- > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, > YARN-4348.001.patch, YARN-4348.001.patch, log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4424) YARN CLI command hangs
[ https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044196#comment-15044196 ] Hadoop QA commented on YARN-4424: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 48s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 149m 13s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes
[jira] [Updated] (YARN-4424) YARN CLI command hangs
[ https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4424: -- Attachment: YARN-4424.1.patch The patch removes the read lock in RMAppImpl#getFinalApplicationStatus as I think that's not required. > YARN CLI command hangs > -- > > Key: YARN-4424 > URL: https://issues.apache.org/jira/browse/YARN-4424 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Yesha Vora >Assignee: Jian He >Priority: Blocker > Attachments: YARN-4424.1.patch > > > {code} > yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn > application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING > 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: > http://XXX:8188/ws/v1/timeline/ > 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at > XXX/0.0.0.0:8050 > 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History > server at XXX/0.0.0.0:10200 > {code} > {code:title=RM log} > 2015-12-04 21:59:19,744 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000 > 2015-12-04 22:00:50,945 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000 > 2015-12-04 22:02:22,416 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000 > 2015-12-04 22:03:53,593 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24 > 2015-12-04 22:05:24,856 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000 > 2015-12-04 22:06:56,235 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000 > 2015-12-04 22:08:27,510 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000 > 2015-12-04 22:09:58,786 INFO event.AsyncDispatcher > (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4424) YARN CLI command hangs
[ https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044141#comment-15044141 ] Jian He commented on YARN-4424: --- This is a similar problem to YARN-2594 Thread 1 {code} Thread 53785: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line=964 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) @bci=10, line=1282 (Interpreted frame) - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, line=731 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus() @bci=4, line=478 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptFinished(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttempt, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptState, org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMApp, long) @bci=45, line=162 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=288, line=1300 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=9, line=1493 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(java.lang.Object, java.lang.Object) @bci=9, line=1480 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalStateSavedTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl, org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=24, line=1213 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalStateSavedTransition.transition(java.lang.Object, java.lang.Object) @bci=9, line=1205 (Interpreted frame) - org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(java.lang.Object, java.lang.Enum, java.lang.Object, java.lang.Enum) @bci=6, line=385 (Interpreted frame) - org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(java.lang.Object, java.lang.Enum, java.lang.Enum, java.lang.Object) @bci=45, line=302 (Interpreted frame) - org.apache.hadoop.yarn.state.StateMachineFactory.access$300(org.apache.hadoop.yarn.state.StateMachineFactory, java.lang.Object, java.lang.Enum, java.lang.Enum, java.lang.Object) @bci=6, line=46 (Interpreted frame) - org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(java.lang.Enum, java.lang.Object) @bci=15, line=448 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=65, line=784 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(org.apache.hadoop.yarn.event.Event) @bci=5, line=106 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) @bci=53, line=815 (Interpreted frame) - org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(org.apache.hadoop.yarn.event.Event) @bci=5, line=796 (Interpreted frame) - org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(org.apache.hadoop.yarn.event.Event) @bci=88, line=183 (Interpreted frame) - org.apache.hadoop.yarn.event.AsyncDispatcher$1.run() @bci=140, line=109 (Interpreted frame) {code} Thread 2 {code} Thread 25723: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, line=186 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() @bci=1, line=834 (Interpreted frame) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) @bci=83, line
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044081#comment-15044081 ] Hadoop QA commented on YARN-3367: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 59s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 24s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in feature-YARN-2928 has 3 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 30s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} Patch generated 6 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 52, now 58). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 59s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 5s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 13s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 31s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:blac
[jira] [Updated] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3367: Attachment: YARN-3367-feature-YARN-2928.v1.002.patch Thanks for the clarification [~djp], So IIUC from your reply, can i take ur answer to my query ??Is it req to ensure all the async events are also pushed along with the current sync event?? as yes. ? Also can you take a look at other 4 queries which i had [posted|https://issues.apache.org/jira/browse/YARN-3367?focusedCommentId=14732065&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14732065] initially ? bq. Btw, Cancel the patch as it is out of sync with new branch. Not sure why it failed, but was able to successfully apply in my local branch! recreating the patch and uploading it again. [~gtCarrera], bq. I looked at the patch. One general comment is that, the logic of TimelineEntityAsyncDispatcher is pretty similar to AsyncDispatcher. Since the code segments that handling concurrency is normally considered as non-trivial, maybe we should refactor AsycnDispatcher's code and reuse it, rather than follow the logic here? There are 2 aspects to consider * Basically we would require some parameterized generic class here so that the queue can be not just be holding {{Event}} instead any object. But the problem is we are doing this in a branch and we introduce it, then all the places where we are using AsyncDispatcher might require change which could be cumbersome to merge as changes would be at many places! * Also based on the [~djp]'s comment need to add additional logic to ensure that sync puts are blocked till all events till sync events are pushed. All these needs to be handled in the AsyncDispatcher Considering this my opinion would, *not* to modify the AsyncDispatcher, Thoughts? > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-3367-feature-YARN-2928.v1.002.patch, > YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service
[ https://issues.apache.org/jira/browse/YARN-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043971#comment-15043971 ] Steve Loughran commented on YARN-4072: -- +1 > ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager > to support JvmPauseMonitor as a service > > > Key: YARN-4072 > URL: https://issues.apache.org/jira/browse/YARN-4072 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Affects Versions: 2.8.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Minor > Attachments: 0001-YARN-4072.patch, 0002-YARN-4072.patch, > HADOOP-12321-005-aggregated.patch, HADOOP-12407-001.patch > > > As JvmPauseMonitor is made as an AbstractService, subsequent method changes > are needed in all places which uses the monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043953#comment-15043953 ] Naganarasimha G R commented on YARN-4304: - [~sunilg], Tested the latest patch on the trunk and seems to work fine and not facing the Web ui rendering issue (NPE) which was coming in the initial patch. WRT implementation i feel [~wangda]s comment ??ResourcesInfo's constructor shouldn't relate to LeafQueue and considerAMUsage, it should simply copy fields from ResourceUsage.?? is valid and even if required may be we can extend the ResourceInfo for the LeafQueue and have specific fields there. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043941#comment-15043941 ] Naganarasimha G R commented on YARN-4416: - [~sunilg], bq. Hence with this new lock, we are getting a hierarchy. Is this intentional.? Yes Sunil, even i was skeptical about it, but went ahead with [~wangda]'s [suggestion|https://issues.apache.org/jira/browse/YARN-4416?focusedCommentId=15038560&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15038560] as there were similar read write locks held in queueCapacity, resource-usage & some methods were already updating them without locks on LeafQueue. Further was of the opinion that Ordering policy should not be dependent on LeafQueue for ensuring multithreaded consistency as its independent entity and can be used else where. bq. we access the iterator from ordering policy under LeafQueue lock, so I could see that, now we have some methods in LeafQueue which is removed with LeafQueue lock and directly used only new lock from OrderingPolicy. Still all the methods which are modifying the Ordering policy is done holding lock on LeafQueue and if in future if any other place they modify they need to ensure first lock on Leaf queue is held. Also TreeSet iterator failsfast when the underlying set gets modified But Anyway need to evaluate the impact on the performance. Planning to run SLS with and without these changes to validate it. Further IMO i think we could have read write lock in LeafQueue which would better avoid all Synchronized locks on LeafQueue for the getter(/reads) in the leaf queue. Thoughts ? > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043940#comment-15043940 ] Sunil G commented on YARN-4416: --- A typo Almost all api's exposed from LeafQueue is used with Lock from Queue ==> Almost all api's exposed from *AbstractComparatorOrderingPolicy* is used with Lock from Queue > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043932#comment-15043932 ] Sunil G commented on YARN-4416: --- Sorry, I was not very clear in my earlier comments. Almost all api's exposed from LeafQueue is used with Lock from Queue. Hence with this new lock, we are getting a hierarchy. Is this intentional.? Because we are going to have a new lock in a major code path. Also In LeafQueue#assignContainers {code} for (Iterator assignmentIterator = orderingPolicy.getAssignmentIterator(); assignmentIterator.hasNext();) { FiCaSchedulerApp application = assignmentIterator.next(); {code} we access the iterator from ordering policy under LeafQueue lock, so I could see that, now we have some methods in LeafQueue which is removed with LeafQueue lock and directly used only new lock from OrderingPolicy. So we need to slightly careful here as we should ensure we do not delete any item w/o LeafQueue lock. (we are now doing under LeafQueue lock, hence no issues as of now) > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4291) ResourceUtilization should be a part of NodeReport API.
[ https://issues.apache.org/jira/browse/YARN-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R resolved YARN-4291. - Resolution: Done Scope of this jira has been handled as part of YARN-4293... Hence closing this issue! > ResourceUtilization should be a part of NodeReport API. > --- > > Key: YARN-4291 > URL: https://issues.apache.org/jira/browse/YARN-4291 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Naganarasimha G R > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI
[ https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043921#comment-15043921 ] Sunil G commented on YARN-4293: --- Hi [~Naganarasimha Garla] Extremely sorry for the mixed up. I was trying to have the CLI up here, and automatically did NodeReport since we needed that resource info. I feel it can be marked up as dup here if its fine. > ResourceUtilization should be a part of yarn node CLI > - > > Key: YARN-4293 > URL: https://issues.apache.org/jira/browse/YARN-4293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0001-YARN-4293.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI
[ https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043919#comment-15043919 ] Naganarasimha G R commented on YARN-4293: - hi [~sunilg], Seems like you have handled scope of YARN-4291 in this jira itself, so shall i close YARN-4291 jira? > ResourceUtilization should be a part of yarn node CLI > - > > Key: YARN-4293 > URL: https://issues.apache.org/jira/browse/YARN-4293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0001-YARN-4293.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043909#comment-15043909 ] Sunil G commented on YARN-4416: --- bq.i have added locks for the access of schedulableEntities in AbstractComparatorOrderingPolicy but not completely sure of the modifications as there already synchronization on entitiesToReorder. So would like additional(/focused) review for this part in particular AbstractComparatorOrderingPolicy or OrderingPolicy is accessed under the lock from LeafQueue. This dependency does exists now. I feel, its better to access this via LeafQueue lock. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, > deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043896#comment-15043896 ] Hadoop QA commented on YARN-2885: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 19 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 13s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s {color} | {color:green} yarn-2877 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s {color} | {color:green} yarn-2877 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 10s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 39s {color} | {color:green} yarn-2877 passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s {color} | {color:red} hadoop-yarn-server-resourcemanager in yarn-2877 failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 31s {color} | {color:green} yarn-2877 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 49s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 14, now 14). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 14m 6s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_85 with JDK v1.7.0_85 generated 1 new issues (was 15, now 15). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s {color} | {color:red} Patch generated 128 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 555, now 678). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s {color} | {color:red} The patch has 18 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 12s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager introduced 3 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 40s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 49s {color} | {color:green} the patch passed with JDK
[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error
[ https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043881#comment-15043881 ] yarntime commented on YARN-4411: ok, I get it, thank you for your help. > ResourceManager IllegalArgumentException error > -- > > Key: YARN-4411 > URL: https://issues.apache.org/jira/browse/YARN-4411 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: yarntime >Assignee: yarntime > Attachments: YARN-4411.001.patch > > > in version 2.7.1, line 1914 may cause IllegalArgumentException in > RMAppAttemptImpl: > YarnApplicationAttemptState.valueOf(this.getState().toString()) > cause by this.getState() returns type RMAppAttemptState which may not be > converted to YarnApplicationAttemptState. > {noformat} > java.lang.IllegalArgumentException: No enum constant > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING > at java.lang.Enum.valueOf(Enum.java:236) > at > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error
[ https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043878#comment-15043878 ] Naganarasimha G R commented on YARN-4411: - you cant avoid, as its caused by existing code. > ResourceManager IllegalArgumentException error > -- > > Key: YARN-4411 > URL: https://issues.apache.org/jira/browse/YARN-4411 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: yarntime >Assignee: yarntime > Attachments: YARN-4411.001.patch > > > in version 2.7.1, line 1914 may cause IllegalArgumentException in > RMAppAttemptImpl: > YarnApplicationAttemptState.valueOf(this.getState().toString()) > cause by this.getState() returns type RMAppAttemptState which may not be > converted to YarnApplicationAttemptState. > {noformat} > java.lang.IllegalArgumentException: No enum constant > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING > at java.lang.Enum.valueOf(Enum.java:236) > at > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error
[ https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043859#comment-15043859 ] yarntime commented on YARN-4411: Hi [~Naganarasimha] thanks for your reply, and I want to know is there any way to avoid these errors when I submit the patch? thank you very much. > ResourceManager IllegalArgumentException error > -- > > Key: YARN-4411 > URL: https://issues.apache.org/jira/browse/YARN-4411 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: yarntime >Assignee: yarntime > Attachments: YARN-4411.001.patch > > > in version 2.7.1, line 1914 may cause IllegalArgumentException in > RMAppAttemptImpl: > YarnApplicationAttemptState.valueOf(this.getState().toString()) > cause by this.getState() returns type RMAppAttemptState which may not be > converted to YarnApplicationAttemptState. > {noformat} > java.lang.IllegalArgumentException: No enum constant > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING > at java.lang.Enum.valueOf(Enum.java:236) > at > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error
[ https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043844#comment-15043844 ] Naganarasimha G R commented on YARN-4411: - Hi [~yarntime], This issue is not related to the modificaitons in your patch. There are already jira raised for these reported bugs YARN-4306 and YARN-4318. but apart from these issues, approach in ur patch seems to be fine. > ResourceManager IllegalArgumentException error > -- > > Key: YARN-4411 > URL: https://issues.apache.org/jira/browse/YARN-4411 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: yarntime >Assignee: yarntime > Attachments: YARN-4411.001.patch > > > in version 2.7.1, line 1914 may cause IllegalArgumentException in > RMAppAttemptImpl: > YarnApplicationAttemptState.valueOf(this.getState().toString()) > cause by this.getState() returns type RMAppAttemptState which may not be > converted to YarnApplicationAttemptState. > {noformat} > java.lang.IllegalArgumentException: No enum constant > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING > at java.lang.Enum.valueOf(Enum.java:236) > at > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error
[ https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043835#comment-15043835 ] yarntime commented on YARN-4411: Hi [~djp], Would you please help me with this problem? Thank you very much. I submited a simple patch which replace YarnApplicationAttemptState.valueOf(this.getState().toString()) with this.createApplicationAttemptState(), but it can not pass the unit testes in jenkins. the error message is like this: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "278445b1a8f3":8030; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at org.apache.hadoop.ipc.Client$Connection.(Client.java:413) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1489) at org.apache.hadoop.ipc.Client.call(Client.java:1424) at org.apache.hadoop.ipc.Client.call(Client.java:1385) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:281) testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) Time elapsed: 2.68 sec <<< ERROR! java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "278445b1a8f3":8030; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at org.apache.hadoop.ipc.Client$Connection.(Client.java:413) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1489) at org.apache.hadoop.ipc.Client.call(Client.java:1424) at org.apache.hadoop.ipc.Client.call(Client.java:1385) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:281) I'm looking forward your response,Thanks. > ResourceManager IllegalArgumentException error > -- > > Key: YARN-4411 > URL: https://issues.apache.org/jira/browse/YARN-4411 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: yarntime >Assignee: yarntime > Attachments: YARN-4411.001.patch > > > in version 2.7.1, line 1914 may cause IllegalArgumentException in > RMAppAttemptImpl: > YarnApplicationAttemptState.valueOf(this.getState().toString()) > cause by this.getState() returns type RMAppAttemptState which may not be > converted to YarnApplicationAttemptState. > {noformat} > java.lang.IllegalArgumentException: No enum constant > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING > at java.lang.Enum.valueOf(Enum.java:236) > at > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > {noformat} -- This message was sent by Atlassian