[JIRA] (JENKINS-37483) Deadlock caused by synchronized methods in EC2Cloud
Title: Message Title Francis Upton closed an issue as Fixed Jenkins / JENKINS-37483 Deadlock caused by synchronized methods in EC2Cloud Change By: Francis Upton Status: Resolved Closed Add Comment This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-37483) Deadlock caused by synchronized methods in EC2Cloud
Title: Message Title Randall Raboy commented on JENKINS-37483 Re: Deadlock caused by synchronized methods in EC2Cloud I am seeing the same deadlock in our setup: (omitted jvm related classes) Handling POST /cloud/ec2-us-west-2/provision from 172.16.6.210 : RequestHandlerThread[#969] - threadId:76774 - state:WAITING stackTrace: java.lang.Thread.State: WAITING at sun.misc.Unsafe.park(Native Method) - waiting to lock (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "jenkins.util.Timer [#3]" t@36 ... at hudson.model.Queue._withLock(Queue.java:1307) at hudson.model.Queue.withLock(Queue.java:1186) at jenkins.model.Nodes.removeNode(Nodes.java:237) at jenkins.model.Jenkins.removeNode(Jenkins.java:2084) at hudson.plugins.ec2.EC2Cloud.countCurrentEC2Slaves(EC2Cloud.java:420) at hudson.plugins.ec2.EC2Cloud.getPossibleNewSlavesCount(EC2Cloud.java:499) at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:518) - locked <65f5826a> (a hudson.plugins.ec2.AmazonEC2Cloud) at hudson.plugins.ec2.EC2Cloud.doProvision(EC2Cloud.java:340) ... Locked ownable synchronizers: - locked <112a6eb5> (a java.util.concurrent.ThreadPoolExecutor$Worker) jenkins.util.Timer [#3] - threadId:36 - state:BLOCKED stackTrace: java.lang.Thread.State: BLOCKED at hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:634) - waiting to lock <65f5826a> (a hudson.plugins.ec2.AmazonEC2Cloud) owned by "Handling POST /cloud/ec2-us-west-2/provision from 172.16.6.210 : RequestHandlerThread[#969]" t@76774 at hudson.plugins.ec2.EC2AbstractSlave.getInstance(EC2AbstractSlave.java:277) at hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:429) at hudson.plugins.ec2.EC2AbstractSlave.isAlive(EC2AbstractSlave.java:397) at hudson.plugins.ec2.EC2SpotSlave.terminate(EC2SpotSlave.java:73) at hudson.plugins.ec2.EC2AbstractSlave.idleTimeout(EC2AbstractSlave.java:344) at hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:136) at hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:85) at hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:43) at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72) at hudson.model.Queue._withLock(Queue.java:1309) at hudson.model.Queue.withLock(Queue.java:1186) at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50) ... Locked ownable synchronizers: - locked (a java.util.concurrent.locks.ReentrantLock$NonfairSync) I noticed this deadlock only happen if I switch from on-demand to a spot request. The on-demand works pretty well. Similarly, I noticed same deadlock when using the ec2 fleet plugin. Jenkins version 2.32.2 EC2 plugin: 1.36
[JIRA] (JENKINS-37483) Deadlock caused by synchronized methods in EC2Cloud
Title: Message Title Todd Rose commented on JENKINS-37483 Re: Deadlock caused by synchronized methods in EC2Cloud https://github.com/jenkinsci/ec2-plugin/pull/214 Add Comment This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-37483) Deadlock caused by synchronized methods in EC2Cloud
Title: Message Title Todd Rose commented on JENKINS-37483 Re: Deadlock caused by synchronized methods in EC2Cloud I think the quickest fix for this is to make the non-static connect() method synchronize on the class object. connect() is really the only thing that I can see that can be invoked from a lot of different contexts and threads. Add Comment This message was sent by Atlassian JIRA (v7.1.7#71011-sha1:2526d7c) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-37483) Deadlock caused by synchronized methods in EC2Cloud
Title: Message Title Todd Rose created an issue Jenkins / JENKINS-37483 Deadlock caused by synchronized methods in EC2Cloud Issue Type: Bug Assignee: Francis Upton Components: ec2-plugin Created: 2016/Aug/17 8:02 PM Labels: plugin Priority: Blocker Reporter: Todd Rose This is against 1.35 EC2Cloud.java has several synchronized methods that can be called from various timers. getNewOrExistingAvailableSlave() and connect() are the problematic ones in this case. Our installation heavily utilizes the spot market and we have a high number of nodes in our fleet. Under load you can easily get into a situation where one thread is terminating an instance and at the same time another is trying to provision a new one. The liberal use of synchronized methods in EC2Cloud is not safe. A finer-grained locking strategy, or moving to a lockless strategy is advisable. {{ T1 "Handling POST /view/Adhoc/job/admin_FailedSourceReplayRunner/build from xxx.xx.xxx.xx : RequestHandlerThread2247" – parking to wait for <0x00060090c078> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) which is held by T2 "EC2 alive slaves monitor thread" "Handling POST /view/Adhoc/job/admin_FailedSourceReplayRunner/build from xxx.xx.xxx.xx : RequestHandlerThread2247": at sun.misc.Unsafe.park(Native Method) parking to wait for <0x00060090c078> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at