Dmitry Konstantinov created CASSANDRA-20833:
-----------------------------------------------
Summary: Jenkins pipeline: add CauseOfInterruption logging to
simplify troubleshooting
Key: CASSANDRA-20833
URL: https://issues.apache.org/jira/browse/CASSANDRA-20833
Project: Apache Cassandra
Issue Type: Improvement
Components: Build
Reporter: Dmitry Konstantinov
Currently when a test split of Jenkins pipeline is aborted due to a timeout or
if an agent went offline Jenkins aborts other parallel steps too. All the
steps: originally failed and others are looking very similar in Pipeline
overview and it is time consuming activity to find the original split failure.
Jenkins throws an exception in this case but it does not have a message and
stack trace is not informative and generic as well, example:
{code}
org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
at PluginClassLoader for
workflow-durable-task-step//org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$RemovedNodeListener.lambda$cancelOwnerExecution$7(ExecutorStepExecution.java:432)
at PluginClassLoader for
workflow-durable-task-step//org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$1.onSuccess(ExecutorStepExecution.java:524)
at PluginClassLoader for
workflow-durable-task-step//org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$PlaceholderTask$1.onSuccess(ExecutorStepExecution.java:520)
at
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1139)
at
com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
at
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1307)
at
com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1070)
at
com.google.common.util.concurrent.AbstractFuture.set(AbstractFuture.java:794)
at
com.google.common.util.concurrent.SettableFuture.set(SettableFuture.java:49)
at PluginClassLoader for
workflow-cps//org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:1118)
at PluginClassLoader for
workflow-cps//org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:1096)
at PluginClassLoader for
workflow-cps//org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:1012)
at PluginClassLoader for
workflow-cps//org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.lambda$wrap$2(CpsVmExecutorService.java:85)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at
hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
at
jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at
jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
at
jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at PluginClassLoader for
workflow-cps//org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.call(CpsVmExecutorService.java:53)
at PluginClassLoader for
workflow-cps//org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.call(CpsVmExecutorService.java:50)
at
org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
at
org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
at PluginClassLoader for
workflow-cps//org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.lambda$categoryThreadFactory$0(CpsVmExecutorService.java:50)
at java.base/java.lang.Thread.run(Unknown Source)
Suppressed: org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId:
363f0c37-5777-48ab-92e6-9aa56ec5431d
{code}
FlowInterruptedException has additional info which may help here:
List<CauseOfInterruption> getCauses()
https://github.com/jenkinsci/workflow-step-api-plugin/blob/518c5dcb24c0d692abe25aea4c03f8273ac06ca4/src/main/java/org/jenkinsci/plugins/workflow/steps/FlowInterruptedException.java#L54
We can extract and print this info which contains the name of the original
failed step.
Example of the original split failure:
{code}
CauseOfInterruption:
org.jenkinsci.plugins.workflow.support.steps.ExecutorStepExecution$RemovedNodeCause
- Agent was removed
{code}
Example of aborted split due to another split error:
{code}
CauseOfInterruption:
org.jenkinsci.plugins.workflow.cps.steps.ParallelStep$FailFastCause - Failed in
branch jvm-dtest jdk11 10/16
{code}
Note: to use it the method should be added to the approved Jenkins API:
{code}
method org.jenkinsci.plugins.workflow.steps.FlowInterruptedException getCauses
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]