We found the issue! So I thought I'd let you know. We downgraded to 1.424.2 but the issue was still there. After a good night's sleep we started to dig some more and found this in the log:
Jul 5, 2012 11:00:24 PM hudson.model.Executor run SEVERE: Executor threw an exception java.lang.NullPointerException at org.jenkinsci.plugins.artifactdeployer.ArtifactDeployerPublisher$DeleteRemoteArtifact.onDeleted(ArtifactDeployerPublisher.java:187) at org.jenkinsci.plugins.artifactdeployer.ArtifactDeployerPublisher$DeleteRemoteArtifact.onDeleted(ArtifactDeployerPublisher.java:171) at hudson.model.listeners.RunListener.fireDeleted(RunListener.java:208) at hudson.model.Run.delete(Run.java:1187) at hudson.model.AbstractBuild.delete(AbstractBuild.java:362) at hudson.tasks.LogRotator.perform(LogRotator.java:157) at hudson.model.Job.logRotate(Job.java:315) at hudson.model.Run.run(Run.java:1440) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:175) So for some reason the ArtifactDeployer throws a NPE for most builds that gets removed, and as you can see it is when a build is finished and the log rotator is removing the last builds to fit the new one in and this exception causes the Executor not to notify the Future object that the upstream build is waiting on, and I get hundreds of angry users breathing down my neck :) We removed the ArtifactDeployer and all is well. I should probably do a bug report on it. Two separate; one for core and one for ArtifactDeployer, or just one with both components in it? Robert Sandell Software Tools Engineer - Tools and Integration Sony Mobile Communications From: jenkinsci-dev@googlegroups.com [mailto:jenkinsci-dev@googlegroups.com] On Behalf Of Sandell, Robert Sent: den 5 juli 2012 18:33 To: 'jenkinsci-dev@googlegroups.com' Subject: AsyncFutureImpl.get problems in 1.447.2 Hi, We upgraded yesterday from 1.424.2 to 1.447.2 and the first problems has started to arrive that we had missed during testing. We use the parameterized trigger quite heavily, and the issue is that sometimes, about every eight build or so, the upstream build that is waiting for the downstream build that it triggered still waits even though the downstream build has finished long ago. A thread dump reveals that the executor is still waiting in AsyncFutureImpl.get so my guess is that for some reason the thread hasn't been notified about the build result as it should. We also have an in-house plugin doing similar stuff and it can also hang the same way. I haven't found anything that recreates the circumstances yet just the "sometimes it happens" thing on a couple of critical jobs. I've been going through the parts of core I can find that is involved in this; a git diff between Jenkins-1.424.2 and Jenkins-1.447.2 shows some changes in hudson.model.Executor that from what I can see shouldn't affect this, nothing in WorkUnitContext, and maybe something in hudson.model.Queue (but I tend to get lost in there every time I try to understand that code :) ). Does anyone have any hints on where I can continue my investigation or ways of attack to try and recreate the issue to better debug it? Robert Sandell Software Tools Engineer Tools and Integration Sony Mobile Communications Tel: +46 (0)10 80 12721 sonymobile.com<http://sonymobile.com/> [cid:image001.jpg@01CD5B88.4C002450]
<<inline: image001.jpg>>