Setup Jenkins v2.89.4 LTS Pipeline API Plugin: 2.26 Pipeline Nodes and Processes Plugin: 2.19 Durable Task Plugin: 1.18 Pipeline Job Plugin: 2.17 Pipeline Shared Groovy Libraries Plugin: 2.9 Pipeline Supporting APIs Plugin: 2.18 Script Security Plugin: 1.41 Pipeline Default Speed/Durability Level: Performance-Optimized "Do not allow the pipeline to resume if the master restarts": Enabled on all jobs Problem I logged into a Jenkins master and saw no builds running but there was a queue of about 10 jobs. When mousing over the queued jobs, I saw "pending - Already running 2 builds across all nodes". This is strange because no jobs were showing as running and no Jenkins agents or executors were showing any running builds. I then ran "http://xx.xxx.xxx.xxx:8080/computer/api/xml?tree=computer[executors[currentExecutable[url]],oneOffExecutors[currentExecutable[url]]]&xpath=//url&wrapper=builds" which did show 5 builds were running. I checked these builds and they were red (failure) and were not running. Research I checked the console log of a build that showed as running but isn't and saw the line below near the top of the log. Resume disabled by user, switching to high-performance, low-durability mode. At the end of the of the log I saw the following: Finished: FAILURE Resuming build at Tue Mar 13 23:04:52 UTC 2018 after Jenkins restart Why Resume Build? The build failed on Mar 12, 2018 6:02:37 PM. Why did the build try to resume almost a day later? The job and system durability are configured to not resume builds. Below are some details taken from the API for the build. _class "hudson.model.OneOffExecutor" id "41" keepLog false number 41 queueId 7178 result "FAILURE" timestamp 1520877757466 I checked the Java process on the server and it was last restarted on March 02 2018. What triggered the "Jenkins restart" identified on Mar 13 23:04:52 UTC 2018 since the Java process was not restarted? Why does this get the build stuck in a "running" state when it's not running? Scope This issue can be seen across many of our Jenkins masters. In each case we see "Resuming build at xxxxx after Jenkins restart" occur a few days after the build failure or abort even though Java was not restarted. This issue didn't occur on Jenkins 2.60.3 running the older (pre-durability configurable) Pipeline plugins. Logs I checked the jenkins.log file and saw the following when the build was attempting to be resumed.
Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/42/
Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
java.lang.NullPointerException
at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/42:JENKINS-JOB-NAME1 #42]
java.lang.NullPointerException
at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME1/40/
Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
java.lang.NullPointerException
at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Mar 13, 2018 9:29:56 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME1/40:JENKINS-JOB-NAME1 #40]
java.lang.NullPointerException
at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution onLoad
WARNING: Pipeline state not properly persisted, cannot resume job/JENKINS-JOB-NAME2/41/
Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
java.lang.NullPointerException
at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Mar 13, 2018 11:04:52 PM org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService reportProblem
WARNING: Unexpected exception in CPS VM thread: CpsFlowExecution[OwnerJENKINS-JOB-NAME2/41:JENKINS-JOB-NAME2 #41]
java.lang.NullPointerException
at org.jenkinsci.plugins.workflow.job.WorkflowRun$GraphL.onNewHead(WorkflowRun.java:997)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1368)
at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$3.run(CpsThreadGroup.java:412)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Thread Dump I only saw WorkflowRun.copyLogs with workflow in the name in a thread dump so I'm not sure if it is related. I didn't see anything BLOCKING. A lot of different items were WAITING on "java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject".
"WorkflowRun.copyLogs [#3]" Id=8829 Group=main TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"WorkflowRun.copyLogs [#4]" Id=8830 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@477b1566
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Workaround I was told that running the Jenkins script below will set the running builds to aborted to work around the issue.
|