[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2019-10-18 Thread tpoe...@argo.ai (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Tony Poerio commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Hi Devin Nusbaum – when will this fix be released? My team needs it. (Or at least we think we do.)   And I saw that this message is from about 9 months back, at time of writing.   As of right now, the current release is only `2.200`, (released 10-14-2019).   However, the post immediately above is referencing version `2.29`.   Is it possible that this update is already present in `2.200`?   If not, when will it become available in a stable release?  Many thanks for the help.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.184823.1503944107000.10678.1571423460719%40Atlassian.JIRA.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2019-01-31 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum updated  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 As of version 2.29 of the Pipeline Nodes and Process Plugin, the default timeout for remote calls is 20 seconds, and the value can be configured using the system property org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep.REMOTE_TIMEOUT.  I am marking this ticket as closed since that is the main cause of the issue identified from discussion in the comments (thanks Andrew a!). If this issue is still occurring frequently for someone after increasing that value, please comment and we can investigate further.  
 

  
 
 
 
 

 
 Jenkins /  JENKINS-46507  
 
 
  Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
Change By: 
 Devin Nusbaum  
 
 
Status: 
 In Review Resolved  
 
 
Resolution: 
 Fixed  
 
 
Released As: 
 workflow-durable-task-step 2.29  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2019-01-15 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum updated  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-46507  
 
 
  Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
Change By: 
 Devin Nusbaum  
 
 
Status: 
 In  Progress  Review  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2019-01-15 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Andrew a No problem! I will move forward with my PR (adding an additional timeout), thanks so much for interactively debugging the issue!  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2019-01-15 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Devin Nusbaum Sorry for not reporting earlier. 60 seconds seem to have fixed all issues for me. The rest of the problems were due to ctest (and our numa scheduler wrapped within it before the actual simulator) not correctly dying when jenkins asked them to do so.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2019-01-14 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Andrew a did those timeouts end up helping? If so, I can roll them up into https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/90 and release that so they can be configured without needing to run custom code.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-13 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I've just ran a few regressions with that insanely huge timeout and the bad news is, the problem didn't completely go away. More, 2 different problems have emerged (I'm now not really sure if they are directly related to this issue, or I should open a new ticket. Posting everything here for now)*First one:* I'm now seeing a pipeline freezing AFTER all the tasks under parallel statement have completed. A restart of jenkins causes some of the steps under parallel to be rerun with the following warning:{code:java}Queue item for node block in SoC » RTL_REGRESSION #255 is missing (perhaps JENKINS-34281); rescheduling{code}But the pipeline completes. I'm also seeing runaway simulation processes that have to be killed by hand. Those kept running after the pipeline has been completed,  -  perhaps due to a master node restart -  (and thus preventing further builds in that workspace). Not yet sure how I should debug this one. *Second one:*In an attempt to mitigate another the issue (now with old ctest on RHEL, not always handling timeouts correctly)  I've added a timeout() block inside parallel, and that exposed another filesystem/timeout problem:  {code :java } Cancelling nested steps due to timeoutSending interrupt signal to processCancelling nested steps due to timeoutAfter 10s process did not stop java.nio.file.FileSystemException: /home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716: Device or resource busy at sun.nio.fs.UnixException.translateToIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixFileSystemProvider.implDelete(Unknown Source) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(Unknown Source) at java.nio.file.Files.deleteIfExists(Unknown Source) at hudson.Util.tryOnceDeleteFile(Util.java:316) at hudson.Util.deleteFile(Util.java:272) Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to taruca at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) at hudson.remoting.Channel.call(Channel.java:955) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at hudson.FilePath.deleteRecursive(FilePath.java:1266) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.cleanup(FileMonitoringTask.java:340) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$1.run(DurableTaskStep.java:382) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused: java.io.IOException: Unable to delete '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. at hudson.Util.deleteFile(Util.java:277) at 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-12 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I've just ran a few regressions with that insanely huge timeout and the bad news is, the problem didn't completely go away. More, 2 different problems have emerged (I'm now not really sure if they are directly related to this issue, or I should open a new ticket. Posting everything here for now)*First one:* I'm now seeing a pipeline freezing AFTER all the tasks under parallel statement have completed. A restart of jenkins causes some of the steps under parallel to be rerun with the following warning:{code:java}Queue item for node block in SoC » RTL_REGRESSION #255 is missing (perhaps JENKINS-34281); rescheduling{code}But the pipeline completes. I'm also seeing runaway simulation processes that have to be killed by hand. Those kept running after the pipeline has been completed, perhaps due to a master node restart (and thus preventing further builds in that workspace). Not yet sure how I should debug this one. *Second one:*In an attempt to mitigate another the issue (now with old ctest on RHEL, not always handling timeouts correctly)  I've added a timeout() block inside parallel, and that exposed another filesystem/timeout problem:{code} Cancelling nested steps due to timeoutSending interrupt signal to processCancelling nested steps due to timeoutAfter 10s process did not stop java.nio.file.FileSystemException: /home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716: Device or resource busy at sun.nio.fs.UnixException.translateToIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixFileSystemProvider.implDelete(Unknown Source) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(Unknown Source) at java.nio.file.Files.deleteIfExists(Unknown Source) at hudson.Util.tryOnceDeleteFile(Util.java:316) at hudson.Util.deleteFile(Util.java:272) Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to taruca at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) at hudson.remoting.Channel.call(Channel.java:955) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at hudson.FilePath.deleteRecursive(FilePath.java:1266) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.cleanup(FileMonitoringTask.java:340) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$1.run(DurableTaskStep.java:382) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused: java.io.IOException: Unable to delete '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. at hudson.Util.deleteFile(Util.java:277) at hudson.FilePath.deleteRecursive(FilePath.java:1303) at 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-12 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I've just ran a few regressions with that insanely huge timeout and the bad news is, the problem didn't completely go away. More, 2 different problems have emerged (I'm now not really sure if they are directly related to this issue, or I should open a new ticket. Posting everything here for now)*First one:* I'm now seeing a pipeline freezing AFTER all the tasks under parallel statement have completed. A restart of jenkins causes some of the steps under parallel to be rerun with the following warning:{code:java}Queue item for node block in SoC » RTL_REGRESSION #255 is missing (perhaps JENKINS-34281); rescheduling{code}But the pipeline completes. I'm also seeing runaway simulation processes that have to be killed by hand. Those kept running after the pipeline has been completed, perhaps due to a master node restart (and thus preventing further builds in that workspace). Not yet sure how I should debug this one. *Second one:*In an attempt to mitigate another the issue (now with old ctest on RHEL, not always handling timeouts correctly)  I've added a timeout() block inside parallel, and that exposed another filesystem/timeout problem:{code :java } Cancelling nested steps due to timeoutSending interrupt signal to processCancelling nested steps due to timeoutAfter 10s process did not stop java.nio.file.FileSystemException: /home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716: Device or resource busy at sun.nio.fs.UnixException.translateToIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixFileSystemProvider.implDelete(Unknown Source) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(Unknown Source) at java.nio.file.Files.deleteIfExists(Unknown Source) at hudson.Util.tryOnceDeleteFile(Util.java:316) at hudson.Util.deleteFile(Util.java:272) Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to taruca at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) at hudson.remoting.Channel.call(Channel.java:955) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at hudson.FilePath.deleteRecursive(FilePath.java:1266) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.cleanup(FileMonitoringTask.java:340) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$1.run(DurableTaskStep.java:382) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused: java.io.IOException: Unable to delete '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. at hudson.Util.deleteFile(Util.java:277) at hudson.FilePath.deleteRecursive(FilePath.java:1303) at 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-12 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I've just ran a few regressions with that insanely huge timeout and the bad news is, the problem didn't completely go away. More, 2 different problems have emerged (I'm now not really sure if they are directly related to this issue, or I should open a new ticket. Posting everything here for now)*First one:* I'm now seeing a pipeline freezing AFTER all the tasks under parallel statement have completed. A restart of jenkins causes some of the steps under parallel to be rerun with the following warning:{code:java}Queue item for node block in SoC » RTL_REGRESSION #255 is missing (perhaps JENKINS-34281); rescheduling{code}But the pipeline completes. I'm also seeing runaway simulation processes that have to be killed by hand. Those kept running after the pipeline has been completed, perhaps due to a master node restart (and thus preventing further builds in that workspace). Not yet sure how I should debug this one.  *Second one:*In an attempt to mitigate another the issue (now with old ctest on RHEL, not always handling timeouts correctly)  I've added a timeout() block inside parallel, and that exposed another filesystem/timeout problem: {code:java} Cancelling nested steps due to timeoutSending interrupt signal to processCancelling nested steps due to timeoutAfter 10s process did not stopjava.nio.file.FileSystemException: /home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716: Device or resource busy at sun.nio.fs.UnixException.translateToIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixFileSystemProvider.implDelete(Unknown Source) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(Unknown Source) at java.nio.file.Files.deleteIfExists(Unknown Source) at hudson.Util.tryOnceDeleteFile(Util.java:316) at hudson.Util.deleteFile(Util.java:272)Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to taruca  at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)  at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)  at hudson.remoting.Channel.call(Channel.java:955)  at hudson.FilePath.act(FilePath.java:1070)  at hudson.FilePath.act(FilePath.java:1059)  at hudson.FilePath.deleteRecursive(FilePath.java:1266)  at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.cleanup(FileMonitoringTask.java:340)  at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$1.run(DurableTaskStep.java:382)  at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)  at java.util.concurrent.FutureTask.run(FutureTask.java:266)  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  at java.lang.Thread.run(Thread.java:748)Caused: java.io.IOException: Unable to delete '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. at hudson.Util.deleteFile(Util.java:277) at 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-12 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I've just ran a few regressions with that insanely huge timeout and the bad news is, the problem didn't completely go away. More, 2 different problems have emerged (I'm now not really sure if they are directly related to this issue, or I should open a new ticket. Posting everything here for now) First one: I'm now seeing a pipeline freezing AFTER all the tasks under parallel statement have completed. A restart of jenkins causes some of the steps under parallel to be rerun with the following warning: 

 

Queue item for node block in SoC » RTL_REGRESSION #255 is missing (perhaps JENKINS-34281); rescheduling
 

 But the pipeline completes. I'm also seeing runaway simulation processes that have to be killed by hand. Those kept running after the pipeline has been completed, perhaps due to a master node restart (and thus preventing further builds in that workspace). Not yet sure how I should debug this one.    Second one: In an attempt to mitigate another the issue (now with old ctest on RHEL, not always handling timeouts correctly)  I've added a timeout() block inside parallel, and that exposed another filesystem/timeout problem: Cancelling nested steps due to timeoutSending interrupt signal to processCancelling nested steps due to timeoutAfter 10s process did not stop java.nio.file.FileSystemException: /home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs29ee028d2716: Device or resource busy at sun.nio.fs.UnixException.translateToIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixFileSystemProvider.implDelete(Unknown Source) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(Unknown Source) at java.nio.file.Files.deleteIfExists(Unknown Source) at hudson.Util.tryOnceDeleteFile(Util.java:316) at hudson.Util.deleteFile(Util.java:272) Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to taruca at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) at hudson.remoting.Channel.call(Channel.java:955) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at hudson.FilePath.deleteRecursive(FilePath.java:1266) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.cleanup(FileMonitoringTask.java:340) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$1.run(DurableTaskStep.java:382) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused: java.io.IOException: Unable 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-07 Thread ian.cot...@clearswift.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ian Cotton commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I realised we are running an old version of the plugin (2.17). unfortunately we are also running Jenkins 2.73.2 and the plugin versions beyond 2.22 require 2.73.3. We can try the plugin on our newer replacement Jenkins but we don't have the failing builds on those Jenkins. We are working to improve matters but I don't think we will be in a position to test this properly for at least a few days.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 For now, I went ahead and filed https://github.com/jenkinsci/workflow-durable-task-step-plugin/pull/90 to allow the timeout to be configured by system property. Feel free to pull down the incremental build of that PR once it is complete for testing if you are already running workflow-durable-task-step 2.26, or review/comment on the PR. Thanks!  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread ian.cot...@clearswift.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ian Cotton commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Andrew a I am not able to build the plugin at the moment. My Jenkins server doesn't have the required plugins. We are setting up some new server and hopefully I can use one of them.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum assigned an issue to Devin Nusbaum  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-46507  
 
 
  Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
Change By: 
 Devin Nusbaum  
 
 
Assignee: 
 Jose Blas Camacho Taboada Devin Nusbaum  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~dnusbaum] I'll recompile the plugin with a 60 second timeout and fire up the next regression tomorrow, expect results somewhere by friday/saturday. A facility to override this timeout would be very useful, because (according to zabbix) the iowait fluctuations were barely noticeable during backup. At higher loads things will get waaay worse, so I'd also put a note somewhere in README/TROUBLESHOOTING section.  [~icotton64] No problem, I fully understand. Can you  still  recompile it  yourself  with a 30 second timeout (see my patch above) and give it a try? This way we'll provide [~dnusbaum] some results about a suitable timeout faster.   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~dnusbaum] I'll recompile the plugin with a 60 second timeout and fire up the next regression tomorrow, expect results somewhere by friday/saturday. A facility to override this timeout would be very useful, because (according to zabbix) the iowait fluctuations were barely noticeable during backup. At higher loads things will get waaay worse, so I'd also put a note somewhere in README/TROUBLESHOOTING section.  [~icotton64] No problem, I fully understand. Can you  still  recompile it with a 30 second timeout (see my patch above) and give it a try? This way we'll provide [~dnusbaum] some results about a suitable timeout faster.   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~dnusbaum] I'll recompile the plugin with a 60 second timeout and fire up the next regression tomorrow, expect results somewhere by friday/saturday. A facility to override this timeout would be very useful, because (according to zabbix) the iowait fluctuations were barely noticeable during backup. At higher loads things will get waaay worse, so I'd also put a note somewhere in README/TROUBLESHOOTING section.      [~icotton64] No problem, I fully understand. Can you recompile it with a 30 second timeout (see my patch above) and give it a try? This way we'll provide [~dnusbaum] some results about a suitable timeout faster.   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Devin Nusbaum I'll recompile the plugin with a 60 second timeout and fire up the next regression tomorrow, expect results somewhere by friday/saturday. A facility to override this timeout would be very useful, because (according to zabbix) the iowait fluctuations were barely noticeable during backup. At higher loads things will get waaay worse, so I'd also put a note somewhere in README/TROUBLESHOOTING section.   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread ian.cot...@clearswift.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ian Cotton commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Sorry Andrew a the system we are seeing the issue on is our production system and I really don't want to risk installing a pre-compiled plugin on it.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~ncrmnt]  any Thanks for working on the issue! Any  chance you are able to play with the timeout to figure out a lower bound of what works for you? I would be fine increasing the limit to maybe 30-60 seconds by default, and providing a system property to increase it further, but I'm not sure about increasing the default further than that.    
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Andrew a any chance you are able to play with the timeout to figure out a lower bound of what works for you? I would be fine increasing the limit to maybe 30-60 seconds by default, and providing a system property to increase it further, but I'm not sure about increasing the default further than that.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~icotton64], can you give my proposed patch with increased timeouts a try?Here's my pre-compiled version of the plugin for your convenience.  (I hope sharing a precompiled plugin is allowed here, if not - sorry)  [^workflow-durable-task-step.hpi]   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~icotton64], can you give my proposed patch with increased timeouts a try?Here's my pre-compiled version of the plugin for your convenience.  [^workflow-durable-task-step.hpi]   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~icotton64], can you give my proposed patch with increased timeouts a try? Here's my pre-compiled version of the plugin for your convenience.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Ian Cotton, can you give my proposed patch with increased timeouts a try?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread ian.cot...@clearswift.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ian Cotton commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I now get the following exception from the DurableTaskStep log: Interrupting Thread[org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep 88150: checking /var/lib/jenkins/workspace/GW_GW_Pipeline_Build_trunk-KY5NRFS7BWLOEMDFRPJGO5IAG4Y6DFVRUT333ENYFD5NOOFQINTA on RedHatSlave10 / waiting for RedHatSlave10 id=16567391,5,main] after 10 SECONDS java.lang.Throwable at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:169) at hudson.remoting.Channel.call(Channel.java:904) at hudson.FilePath.act(FilePath.java:987) at hudson.FilePath.act(FilePath.java:976) at hudson.FilePath.isDirectory(FilePath.java:1514) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.getWorkspace(DurableTaskStep.java:193) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:306) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:289) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)   So it looks like it is timing out calling the node.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-06 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Devin Nusbaum, yes, I've just included those in the code section, see my comments above (apparently JIRA has hidden those already). All of them are related to file operations.  Anyways, I think I have resolved the issue on my setup with this patch (600 seconds is an overkill, but I just had to test it):    

 

@@ -311,7 +311,7 @@ public abstract class DurableTaskStep extends Step {
 }
 }
 boolean directory;
-try (Timeout timeout = Timeout.limit(10, TimeUnit.SECONDS)) {
+try (Timeout timeout = Timeout.limit(600, TimeUnit.SECONDS)) {
 directory = ws.isDirectory();
 } catch (Exception x) {
 getWorkspaceProblem(x);
@@ -451,7 +451,7 @@ public abstract class DurableTaskStep extends Step {
 return; // slave not yet ready, wait for another day
 }
 TaskListener listener = listener();
-try (Timeout timeout = Timeout.limit(10, TimeUnit.SECONDS)) {
+try (Timeout timeout = Timeout.limit(600, TimeUnit.SECONDS)) {
 if (watching) {
 Integer exitCode = controller.exitStatus(workspace, launcher(), listener);
 if (exitCode == null) {

 

 It seems that the 10 second timeout for file-based operations is something causing the problem under heavy disc/network load. During all three runs with FINE exception logging those occurred during routine backups. I also found the following workaround: A restart of the master will cause pipeline to resume and correctly reconnect with all the nodes, effectively unfreezing the job.    
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 
   

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-05 Thread ian.cot...@clearswift.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ian Cotton commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Thanks Devin Nusbaum. No, I didn't observe any networking related issues. I monitor both machines using both Nagios and Prometheus. the granularity of monitoring on Prometheus is every minute and it didn't see any disconnect.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-05 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~icotton64] Thanks for adding the stack trace! If [~ncrmnt] is seeing identical stack traces, then the patch they are testing will not have any affect, because it looks like the timeout that is timing out is in {{DurableTaskStep#getWorkspace}} [here|https://github.com/jenkinsci/workflow-durable-task-step-plugin/blob/00f94234cfb54181980352f9290ce987edde9b17/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java#L314], before the timeout in {{DurableTaskStep#check}} itself. It could be worth trying to increase that timeout just to see if it changes the behavior  (though my initial guess though is that if we don't get a response after 10 seconds we will never get one) , but given that the stuck operation is a remoting call, my guess is that this is a networking issue (perhaps an issue in Jenkins' networking stack (remoting) rather than something wrong with the network itself if your system monitoring shows no network issues). Did you see any network-related issues on the machine on which Jenkins was running around the time of the issue?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-05 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Ian Cotton Thanks for adding the stack trace! If Andrew a is seeing identical stack traces, then the patch they are testing will not have any affect, because it looks like the timeout that is timing out is in DurableTaskStep#getWorkspace here, before the timeout in DurableTaskStep#check itself. It could be worth trying to increase that timeout just to see if it changes the behavior, but given that the stuck operation is a remoting call, my guess is that this is a networking issue (perhaps an issue in Jenkins' networking stack (remoting) rather than something wrong with the network itself if your system monitoring shows no network issues). Did you see any network-related issues on the machine on which Jenkins was running around the time of the issue?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-05 Thread ian.cot...@clearswift.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ian Cotton commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I have the custom logger turned on. I have just added additional logging to it for DurableTask, so no results from that yet, just for the Timeout class. We are getting many exceptions as below. Our Jenkins master runs on an NFS file system, which ought to be reasonably quick but apparently may have issues. Interrupting Thread[org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep 78969: checking /var/lib/jenkins/workspace/anches_ert_gw-ert-custard-3-GARHGYOSDUCJ4EHKSORKR3E2RAFWD3A5ZVBAKP6LQ5ZKQZBMWUQA on RedHatSlave3 / waiting for RedHatSlave3 id=14810179,5,main] after 10 SECONDS java.lang.Throwable at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:169) at hudson.remoting.Channel.call(Channel.java:904) at hudson.FilePath.act(FilePath.java:987) at hudson.FilePath.act(FilePath.java:976) at hudson.FilePath.isDirectory(FilePath.java:1514) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.getWorkspace(DurableTaskStep.java:193) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:306) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:289) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-05 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Andrew a Are you able to upload the content of the custom logger for org.jenkinsci.plugins.workflow.support.concurrent.Timeout (or was it blank?)? Custom loggers are not included in the normal jenkins.log, so you'll have to add it separately. Along with your patch, it could be interesting to add another custom logger at FINE level for org.jenkinsci.plugins.workflow.steps.durable_task.DurableTask.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-03 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 A little follow-up. Carefully inspecting the zabbix logs it turned out that the issue was during the time when we were performing routine backups of the NFS home directories at night.  iowait is still very low, but a little more than usual.  We didn't hit this bug every night, but almost always on the second night. My hypothesis: As the builds are performed on the NFS share, this may have resulted in filesystem lags. I have applied the following patch (600 s is quite an overkill, but let's see if it helps) and restarted the regression.{code:java}+++ b/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java@@ -451,7 +451,7 @@ public abstract class DurableTaskStep extends Step { return; // slave not yet ready, wait for another day } TaskListener listener = listener();-try (Timeout timeout = Timeout.limit(10, TimeUnit.SECONDS)) {+try (Timeout timeout = Timeout.limit(600, TimeUnit.SECONDS)) { if (watching) { Integer exitCode = controller.exitStatus(workspace, launcher(), listener); if (exitCode == null) {{code}A question to other folks experiencing the issue: How much RAM/disk io do your processes take? May it be that a huge memory-hungry process being terminated will freeze for a while (>10 seconds), syncing data to the disk?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-03 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 A little follow-up. Carefully inspecting the zabbix logs it turned out that the issue was during the time when we were performing routine backups of the NFS home directories  at night .  We didn't hit this bug every night, but almost always on the second night. My hypothesis:  As the builds are performed on the NFS share, this may have resulted in filesystem lags. I have applied the following patch (600 s is quite an overkill, but let's see if it helps) and restarted the regression.{code:java}+++ b/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java@@ -451,7 +451,7 @@ public abstract class DurableTaskStep extends Step { return; // slave not yet ready, wait for another day } TaskListener listener = listener();-try (Timeout timeout = Timeout.limit(10, TimeUnit.SECONDS)) {+try (Timeout timeout = Timeout.limit(600, TimeUnit.SECONDS)) { if (watching) { Integer exitCode = controller.exitStatus(workspace, launcher(), listener); if (exitCode == null) {{code}A question to other folks experiencing the issue: How much RAM/disk io do your processes take? May it be that a huge memory-hungry process being terminated will freeze for a while (>10 seconds), syncing data to the disk?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-11-03 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 A little follow-up. Carefully inspecting the zabbix logs it turned out that the issue was during the time when we were performing routine backups of the NFS home directories. As the builds are performed on the NFS share, this may have resulted in filesystem lags. I have applied the following patch (600 s is quite an overkill, but let's see if it helps) and restarted the regression. 

 

+++ b/src/main/java/org/jenkinsci/plugins/workflow/steps/durable_task/DurableTaskStep.java
@@ -451,7 +451,7 @@ public abstract class DurableTaskStep extends Step {
 return; // slave not yet ready, wait for another day
 }
 TaskListener listener = listener();
-try (Timeout timeout = Timeout.limit(10, TimeUnit.SECONDS)) {
+try (Timeout timeout = Timeout.limit(600, TimeUnit.SECONDS)) {
 if (watching) {
 Integer exitCode = controller.exitStatus(workspace, launcher(), listener);
 if (exitCode == null) {

 

 A question to other folks experiencing the issue: How much RAM/disk io do your processes take? May it be that a huge memory-hungry process being terminated will freeze for a while (>10 seconds), syncing data to the disk?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-28 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Before actually starting the job for this weekend, I went to the folks at our IT department and talked about the issue. We looked at the zabbix logs and could see no problem with network connectivity during the previous runs, and the nodes still had plenty of free resources. The VM running jenkins was NOT being backed up (VM backup might break/freeze TCP connections).  The only thing that might have interferred - NFS home directories backup that is done every evening, since that will introduce some file system lags. We'll have a good look at the zabbix logs once again on monday. Meanwhile, here are the FINE exception logs:{code:java}Oct 28, 2018 12:58:13 AM FINE org.jenkinsci.plugins.workflow.support.concurrent.TimeoutInterrupting org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#222]: checking /home/jenkins/basis_rtl_regression/build on chinkara / waiting for chinkara id=541659 after 10 SECONDSjava.lang.Throwable at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:954) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:199) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:457) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:416) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Oct 28, 2018 12:58:14 AM FINE org.jenkinsci.plugins.workflow.support.concurrent.TimeoutInterrupting org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#159]: checking /home/jenkins/basis_rtl_regression/build on oryx / waiting for oryx id=541661 after 10 SECONDSjava.lang.Throwable at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:954) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:199) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:457) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:416) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Oct 28, 2018 1:26:27 AM FINE 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-28 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Before actually starting the job for this weekend, I went to the folks at our IT department and talked about the issue. We looked at the zabbix logs and could see no problem with network connectivity during the previous runs, and the nodes still had plenty of free resources. The VM running jenkins was NOT being backed up (VM backup might break/freeze TCP connections).  The only thing that might have interferred - NFS home directories backup that is done every evening, since that will introduce some file system lags. We'll have a good look at the zabbix logs once again on monday. Meanwhile, here are the FINE exception logs: 

 

Oct 28, 2018 12:58:13 AM FINE org.jenkinsci.plugins.workflow.support.concurrent.TimeoutInterrupting org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#222]: checking /home/jenkins/basis_rtl_regression/build on chinkara / waiting for chinkara id=541659 after 10 SECONDS
java.lang.Throwable
	at java.lang.Object.wait(Native Method)
	at hudson.remoting.Request.call(Request.java:177)
	at hudson.remoting.Channel.call(Channel.java:954)
	at hudson.FilePath.act(FilePath.java:1070)
	at hudson.FilePath.act(FilePath.java:1059)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:199)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:457)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:416)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Oct 28, 2018 12:58:14 AM FINE org.jenkinsci.plugins.workflow.support.concurrent.TimeoutInterrupting org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep [#159]: checking /home/jenkins/basis_rtl_regression/build on oryx / waiting for oryx id=541661 after 10 SECONDS
java.lang.Throwable
	at java.lang.Object.wait(Native Method)
	at hudson.remoting.Request.call(Request.java:177)
	at hudson.remoting.Channel.call(Channel.java:954)
	at hudson.FilePath.act(FilePath.java:1070)
	at hudson.FilePath.act(FilePath.java:1059)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.writeLog(FileMonitoringTask.java:199)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:457)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:416)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-28 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-46507  
 
 
  Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
Change By: 
 Andrew a  
 
 
Attachment: 
 jenkins.log  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-24 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Sounds good. When you abort the build, it can change the stored info, so I would collect an archive of the build files prior to aborting it.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-24 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Thanks for looking into it. Yes, I have been manually aborting the build, but already when they were stuck with the exception. I'll clean up all logs and fire up a fresh regression run this weekend with fine logging you've mentioned. Let's see if that gives us any better results.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-24 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Thanks for looking into it.  Yes, I have been manually aborting the build, but already when they were stuck with the exception. I'll clean up all logs and fire up a fresh regression run this weekend with fine logging you've mentioned. Let's see if that gives us any better results.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-23 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 For anyone still seeing the original exception in the description, try adding a custom logger (Manage Jenkins -> System Log -> Add new log recorder) at FINE level for org.jenkinsci.plugins.workflow.support.concurrent.Timeout, so that we see these log messages, and then next time you see the issue, add the output of the custom logger here, which will tell us if my hypothesis is correct, and if so, it will tell us what code was actually running at the time of the issue.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-23 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Andrew a The exception in your most recent build looks like a networking issue after you manually aborted the build. Here is the stack trace: 

 
org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.interrupt(CpsFlowExecution.java:1125)
at org.jenkinsci.plugins.workflow.job.WorkflowRun$2$1.run(WorkflowRun.java:378)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.Request.call(Request.java:177)
at hudson.remoting.Channel.call(Channel.java:954)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1084)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.stop(FileMonitoringTask.java:304)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.stop(DurableTaskStep.java:258)
at org.jenkinsci.plugins.workflow.cps.CpsThread.stop(CpsThread.java:296)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:1139)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:1128)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
... Suppressed exception repeated for each parallel block
 

 And the serialized FlowInterruptedException has: 

 

  

  andrianov
 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-11 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Devin Nusbaum Thanks a lot for looking into it. Let me know if I can be of any help with this issue.   P.S. Another run with a super-high heartbeat interval ran for a little longer, but ended up with another InterruptedException as well on one of our tests that is timing out.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-10 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~chamshoff] Without seeing your Jenkinsfile, it's hard to say, but my guess would be that you are hitting [the 5 minute timeout here|https://github.com/jenkinsci/workflow-cps-plugin/blob/a67af1db1eb7b83c287243547b8c6b0d452fd3c3/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java#L178] because your code is waiting for a network response or similar in a method that doesn't respond to being interrupted (so it doesn't fail until the next step is invoked which notices that the thread was interrupted, assuming your stack trace is original to the OP's). I would highly recommend not making direct HTTP calls in your Jenkinsfile /Shared Library  even if it seems like it works fine in some cases. You can create a bash script that does whatever you want (curl, java -jar yourRestClient.jar, etc.) and call that script using the {{sh}} or {{bat}} steps which run asynchronously and are not subject to a 5 minute timeout.[~ncrmnt] Based on those logs, there does appear to be a network issue with the remoting layer involved as well. I will check through the build files you sent to see if they give any indication of a root cause that we can address.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-10 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Christoph Amshoff Without seeing your Jenkinsfile, it's hard to say, but my guess would be that you are hitting the 5 minute timeout here because your code is waiting for a network response or similar in a method that doesn't respond to being interrupted (so it doesn't fail until the next step is invoked which notices that the thread was interrupted, assuming your stack trace is original to the OP's). I would highly recommend not making direct HTTP calls in your Jenkinsfile even if it seems like it works fine in some cases. You can create a bash script that does whatever you want (curl, java -jar yourRestClient.jar, etc.) and call that script using the sh or bat steps which run asynchronously and are not subject to a 5 minute timeout. Andrew a Based on those logs, there does appear to be a network issue with the remoting layer involved as well. I will check through the build files you sent to see if they give any indication of a root cause that we can address.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-10 Thread c.amsh...@gmx.de (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Christoph Amshoff commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Same issue here... We're having a quite small Jenkinsfile that calls a RESTful service, which takes (depending on parameters) few seconds up to an hour or more to complete. The JSON returned by the service should be attached to the build, that's why it's called synchronously. The longer the service call takes, the more likely the InterruptedException occurs. There are no parallel steps in our pipeline, so I'm pretty sure it's related to the long-running step. The REST call is done by some selfmade Groovy function and classes in our shared library, basically setting up a HttpURLConnection instance. However, that works nicely in fast service calls, so I don't think there is an issue in this code. Let me know if I can help with any files/logs.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-01 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 stuff.tgz Here goes the tarball with all the xml files. Be careful, there's a ton of them there  I do see some exceptions in jenkins master logs like this: 

 

Oct 01, 2018 1:34:10 PM org.jenkinsci.plugins.workflow.cps.CpsStepContext completed
INFO: new failure
org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.interrupt(CpsFlowExecution.java:1125)
at org.jenkinsci.plugins.workflow.job.WorkflowRun$2$1.run(WorkflowRun.java:378)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Suppressed: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at hudson.remoting.Request.call(Request.java:177)
at hudson.remoting.Channel.call(Channel.java:954)
at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1084)
at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.stop(FileMonitoringTask.java:304)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.stop(DurableTaskStep.java:258)
at org.jenkinsci.plugins.workflow.cps.CpsThread.stop(CpsThread.java:296)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:1139)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:1128)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:906)
at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 moreOct 01, 2018 1:34:10 PM org.jenkinsci.plugins.workflow.cps.CpsStepContext completed
INFO: previously delivered here
java.lang.Throwable
at org.jenkinsci.plugins.workflow.cps.CpsStepContext.completed(CpsStepContext.java:341)
at org.jenkinsci.plugins.workflow.cps.CpsStepContext.onFailure(CpsStepContext.java:329)
at 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-01 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-46507  
 
 
  Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
Change By: 
 Andrew a  
 
 
Attachment: 
 stuff.tgz  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-01 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 

Can you tell a bit more detail of where to find the relevant XML files? A quick grep on the slaves didn't reveal any workflow directories. Where should I look for them?
 Yep, sorry for not specifying earlier. They will be on the master, not on the agents. The workspace path will be $JENKINS_HOME/jobs//builds//. Inside, the main files that would be interesting are build.xml and everything in workflow/. If it is reproducible with sleep 26 that would be interesting. At those lengths of time I would start to suspect something going wrong with a network call from the master to the agent causing code somewhere to hang that Pipeline does not handle gracefully. Do your overall Jenkins master logs have any warnings/errors around the time the InterruptedException was thrown?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-01 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~dnusbaum], an update. It looks like I've got the hit the first try and now see a handful of   {code:java}[rumboot-default-rumboot-basis-Debug-basis-stress-test-uart-eth-dma] Cannot contact oribi: java.lang.InterruptedException[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_1] Cannot contact bison: java.lang.InterruptedException[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_6] Cannot contact oribi: java.lang.InterruptedException{code}in the log. I now have the build in the stuck state and can provide the data. Can you tell a bit more detail of where to find the relevant XML files? A quick grep on the slaves didn't reveal any workflow directories. Where should I look for them?EDIT: Hm... I've been running tests with a maximum timeout of 26 s and it seems it has failed once it stumbled upon the first test that actually timed out. Actually, all tests that have an interrupted exception are those that are currently timing out and being worked on! They  hang  shoud've hung  for at least 26 seconds consuming 100% of one CPU core with no stdout before being killed internally by ctest process.    InterruptedException arrives earlier.I wonder if wrapping up 'sleep 26' in groovy will produce the same results.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-01 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~dnusbaum], an update. It looks like I've got the hit the first try and now see a handful of  {code:java}[rumboot-default-rumboot-basis-Debug-basis-stress-test-uart-eth-dma] Cannot contact oribi: java.lang.InterruptedException[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_1] Cannot contact bison: java.lang.InterruptedException[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_6] Cannot contact oribi: java.lang.InterruptedException{code}in the log. I now have the build in the stuck state and can provide the data. Can you tell a bit more detail of where to find the relevant XML files? A quick grep on the slaves didn't reveal any workflow directories. Where should I look for them?      EDIT: Hm... I've been running tests with a maximum timeout of 26 s and it seems it has failed once it stumbled upon the first test that actually timed out.  Looking at the tests -  Actually,  all tests that  caused InterruptedException  have an interrupted exception  are  tests  those  that are  currently  timing out  in our current project  and being worked on! They hang for at least 26 seconds consuming 100% of one CPU core with no stdout before being killed internally by ctest process .  This leads me to believe that     
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-01 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~dnusbaum], an update. It looks like I've got the hit the first try and now see a handful of  {code:java}[rumboot-default-rumboot-basis-Debug-basis-stress-test-uart-eth-dma] Cannot contact oribi: java.lang.InterruptedException[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_1] Cannot contact bison: java.lang.InterruptedException[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_6] Cannot contact oribi: java.lang.InterruptedException{code}in the log. I now have the build in the stuck state and can provide the data. Can you tell a bit more detail of where to find the relevant XML files? A quick grep on the slaves didn't reveal any workflow directories. Where should I look for them?      EDIT: Hm... I've been running tests with a maximum timeout of 26 s and it seems it has failed once it stumbled upon the first test that actually timed out.  Looking at the tests -  Actually,  all tests that  caused InterruptedException  have an interrupted exception  are  tests  those  that are  currently  timing out  in our current project  and being worked on! They hang for at least 26 seconds consuming 100% of one CPU core with no stdout before being killed internally by ctest process .  This leads me to believe that     
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-10-01 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 [~dnusbaum], an update. It looks like I've got the hit the first try and now see a handful of  {code:java}[rumboot-default-rumboot-basis-Debug-basis-stress-test-uart-eth-dma] Cannot contact oribi: java.lang.InterruptedException[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_1] Cannot contact bison: java.lang.InterruptedException[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_6] Cannot contact oribi: java.lang.InterruptedException{code}in the log. I now have the build in the stuck state and can provide the data. Can you tell a bit more detail of where to find the relevant XML files? A quick grep on the slaves didn't reveal any workflow directories. Where should I look for them?   EDIT: Hm... I've been running tests with a maximum timeout of 26 s and it seems it has failed once it stumbled upon the first test that actually timed out. Looking at the tests - all tests that caused InterruptedException are tests that are timing out in our current project. This leads me to believe that   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-30 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Devin Nusbaum, an update. It looks like I've got the hit the first try and now see a handful of    

 

[rumboot-default-rumboot-basis-Debug-basis-stress-test-uart-eth-dma] Cannot contact oribi: java.lang.InterruptedException
[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_1] Cannot contact bison: java.lang.InterruptedException
[rumboot-default-rumboot-basis-Debug-simple-iram-mirror-hello-arinc_6] Cannot contact oribi: java.lang.InterruptedException
 

 in the log. I now have the build in the stuck state and can provide the data. Can you tell a bit more detail of where to find the relevant XML files? A quick grep on the slaves didn't reveal any workflow directories. Where should I look for them?     
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-30 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-46507  
 
 
  Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
Change By: 
 Andrew a  
 
 
Attachment: 
 hs_err_pid239040.log  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-26 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Devin Nusbaum, my last run was inconsistent due to the fact that the server with jenkins was out of disk space (sic!), I'll start the next regression run today and post results once it's done or hangs. This will take a few days though, depending on the load. Once it's done I'll try to provide your with as much data as I can. > How much of your build is running external processes sh/bat vs Pipeline/Groovy code (i.e. Are you more or less running a multi-day bash script (or many such scripts in a parallel block), or is most of the time spent running Groovy code in your Jenkinsfile, or is it about even)? Basically it all comes down to running a single "cmake's ctest" process with one test in the list that may take anything from a few minutes to a few days with little to no stdout. Here's my pipeline script: Jenkinsfile Please note, that it may seem weird in places due to our internal server config and me trying to workaround this bug. It configures and builds the project on one select node (the build is done on an NFS share with the exact same path on all nodes), touchstone simulation step creates simulation checkpoints and, finally, the result is executed in parallel during regression phase with throttling enabled with load distributed over our servers.         
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-26 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a updated an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-46507  
 
 
  Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
Change By: 
 Andrew a  
 
 
Attachment: 
 Jenkinsfile  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-25 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Andrew a Thank you for including all of those details in your comment! It's interesting that you are hitting the issue consistently on multi-day builds. How much of your build is running external processes sh/bat vs Pipeline/Groovy code (i.e. Are you more or less running a multi-day bash script (or many such scripts in a parallel block), or is most of the time spent running Groovy code in your Jenkinsfile, or is it about even)? I am not aware of any workarounds for the issue. One thing that would be helpful would be if you could upload the build folder for one of your builds which failed with this issue. In particular, I am interested in the XML files in the workflow/ directory to see if the serialized flow nodes can tell us what was executing when the issue occurred which may give us some insight.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-22 Thread s...@ncrmnt.org (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Andrew a commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Hi everyone. A little info on this issue that we've been hitting as well. We're using jenkins and it's pipeline plugin to run our regression and ci builds for a SoC (System-on-chip) we're designing. As a side effect of what we're doing - some tests (under the parallel step) take a very long time to complete and are very CPU-intensive. Here's some statistics I've noticed: 
 
CI builds that have a small amount (~10) of short tests under parallel step almost always complete successfully. 
Regression runs that take a few days to complete and have a huge list of longer tests almost always hit this bug. 
If the jenkins was restarted fresh, chances to hit this bug are somewhat lower. I came to think it was related to a memory leak in jenkins test-results-analyzer plugin at first, which (due to our huge, sometime over 500MiB log files) crashed jenkins often at fist, until I started filtering logs to mitigate the issue (https://issues.jenkins-ci.org/browse/JENKINS-34134). Tuning GC options seemed to further improve things. 
Since we're running builds on NFS filesystem to later distribute and execute tests on different servers, thus balancing the load, I first considered filesystem lags due to other tasks being run on our servers. Adding this seemed to improve things a little bit. The disk load is not evenly distributed. It's heavy on the start of the test (when the model is loaded to memory and during reset due to extensive logging), later it's usually very low throughout the whole run. 
   

 

/* Workaround laggy filesystem */
script {
   System.setProperty("org.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL", "3800");
} 

     5. Another thing I've noticed, some tests seem to randomly fail. According to the simulation log, the simulation process was just killed (by something other than OOM), rather that getting stuck, and that happens quite fast (e.g. the simulation haven't even completed the reset sequence, which takes a few seconds of real time, but usually spews a lot of info into the log).  6. At the Blue Ocean UI all fails in parallel steps seem to be grouped to the very bottom half of the screen, after 150+ of successfully executed tests. Since these are mostly in the order they're being started (I assume), it's making me believe, that it might be somehow related to either GC taking a lot of time or some memory leaking here and there.   A grand total would be of 5-10 tests of 300+ randomly crashing, and almost always the annoying java.lang.interruptedException with the whole run (That took 3-4 days to complete!) just freezing. A full regression that takes a few days to complete almost always triggers the bug. I'm willing to help solve this issue, but since it requires some sacred knowledge about jenkins' guts I do not have, I can either post my pipeline script (which will most likely be useless, due to a huge amount of our internal tools being 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-18 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I finally had a chance to take a look at this.[~iceiceice]: You are using println inside of an {{@NonCPS}} method. It might seem like this should be ok, but [behind the scenes|https://github.com/jenkinsci/workflow-cps-plugin/blob/0e4c25f8d7b84470aa523491e29933db3b3df588/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScript.java#L221] {{println}} ends up calling the {{echo}} step, which means that you are executing Pipeline Steps inside of an {{@NonCPS}} method, which should not be done even though it may appear to work correctly in some cases.[~tobybro] If you have a minimal pipeline that can reproduce the issue, or a scenario in which you can reproduce the problem consistently it would be very helpful if you could post the Pipeline script and the build folder of one of the builds that failed with the exception here. Otherwise, it would be great to see your actual pipeline that is hitting the error so we can check for any similarities with Alexey's or Rick's Pipelines that might give us an idea of what is happening. The only way EDIT: My original hypothesis was misleading and incorrect, so  I  have been able to reproduce anything remotely similar to 've edited it based on new information.Notably,  the  exceptions seen here has been to create  exception being thrown in all these cases comes from  {{ @NonCPS AbstractQueuedSynchronizer#acquireSharedInterruptibly }}  methods that take more than 5 minutes to complete  Line 1302 , which  causes [CpsThread#runNextChunk to  checks the  interrupt  status before  the thread |https://github  tries to acquire the lock . com/jenkinsci/workflow-cps-plugin/blob/54d2f4fe8069fde53789bfe21229ce8e545300bb/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java#L170]. AIUI  This means  that  may very well be  the  code that is interrupting  thread was actually interrupted before the step started executing, and  {{ pp.get}} [here|https://github.com/jenkinsci/workflow-cps-plugin/blob/54d2f4fe8069fde53789bfe21229ce8e545300bb/src/main/java/org/jenkinsci/plugins/workflow/cps/ CpsStepContext .java # L248], but even so, I am not sure _why_ the CpsFlowExecution's program promise would be taking so long to load. My only theory is that in {{CpsThread# getThreadGroupSynchronously}}  the local variable {{pp}} is set  just happens  to  be  the  nonnull value  first piece  of  {{flowExecution  code to check the interrupted flag and throw an exception . programPromise}}, but then maybe something else changes  I got access to  the  value  flow nodes  of  {{flowExecution.programPromise}}  another user's pipeline hitting the same exception ,  but we are still waiting on  and noticed that  the  stale program promise {{pp}} which does not complete for whatever reason  duration between the last successfully executed node  and  eventually it times out. If  the start of the step  that  is  threw  the  issue  exception was just over 5 minutes ,  we might be able  which leads me  to  fix this issue with a patch like  believe that  the  following to {{  code that is doing the interruption is the [Timeout in CpsThread# getThreadGroupSynchronously}} runNextChunk|https : {noformat}diff --git a / src / main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext github . java b com / src/main/java/org/ jenkinsci/ plugins/ workflow / - cps /CpsStepContext.javaindex 2b90bec1..6f0a45cb 100644 - -- a plugin / src blob / main 54d2f4fe8069fde53789bfe21229ce8e545300bb / java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java+++ b/ src/main/java/org/jenkinsci/plugins/workflow/cps/ 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-14 Thread dnusb...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Devin Nusbaum commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 I finally had a chance to take a look at this. Alexey Grigorov: You are using println inside of an @NonCPS method. It might seem like this should be ok, but behind the scenes println ends up calling the echo step, which means that you are executing Pipeline Steps inside of an @NonCPS method, which should not be done even though it may appear to work correctly in some cases. Toby Broyles If you have a minimal pipeline that can reproduce the issue, or a scenario in which you can reproduce the problem consistently it would be very helpful if you could post the Pipeline script and the build folder of one of the builds that failed with the exception here. Otherwise, it would be great to see your actual pipeline that is hitting the error so we can check for any similarities with Alexey's or Rick's Pipelines that might give us an idea of what is happening. The only way I have been able to reproduce anything remotely similar to the exceptions seen here has been to create @NonCPS methods that take more than 5 minutes to complete, which causes CpsThread#runNextChunk to interrupt the thread. AIUI that may very well be the code that is interrupting pp.get here, but even so, I am not sure why the CpsFlowExecution's program promise would be taking so long to load. My only theory is that in CpsThread#getThreadGroupSynchronously the local variable pp is set to the nonnull value of flowExecution.programPromise, but then maybe something else changes the value of flowExecution.programPromise, but we are still waiting on the stale program promise pp which does not complete for whatever reason and eventually it times out. If that is the issue, we might be able to fix this issue with a patch like the following to CpsThread#getThreadGroupSynchronously: 

 
diff --git a/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java b/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
index 2b90bec1..6f0a45cb 100644
--- a/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
+++ b/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
@@ -63,6 +63,8 @@ import java.util.concurrent.Callable;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
 import java.util.logging.Level;
 import java.util.logging.Logger;
 import jenkins.model.CauseOfInterruption;
@@ -241,14 +243,19 @@ public class CpsStepContext extends DefaultStepContext { // TODO add XStream cla
 if (threadGroup == null) {
 ListenableFuture pp;
 CpsFlowExecution flowExecution = getFlowExecution();
+while (true) {
 while ((pp = flowExecution.programPromise) == null) {
 Thread.sleep(100); // TODO does JENKINS-33005 remove the need for this?
 }
 try {
-threadGroup = pp.get();
+// Wait in 30-second intervals in case the local variable pp becomes stale.
+threadGroup = pp.get(30, TimeUnit.SECONDS);
+} catch (TimeoutException e) {
+// Update reference to flowExecution.programPromise and attempt to load it again.
 } catch (ExecutionException e) {
 throw new IOException(e);
 }
+   

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-14 Thread svano...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Sam Van Oort assigned an issue to Jose Blas Camacho Taboada  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-46507  
 
 
  Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
Change By: 
 Sam Van Oort  
 
 
Assignee: 
 Sam Van Oort Jose Blas Camacho Taboada  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-05 Thread alexeygrigo...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Alexey Grigorov edited a comment on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Same here. I'm running a @NonCPS labeled method in library and if fails with this error randomly in the middle{code:java}import jenkins.*import jenkins.model.*import hudson.*import hudson.model.*/* This method will find all jobs and verify their disk usage.Then it will remove job runs to fit  quotaThat will help to save disk space*/@NonCPSdef call() {def maxRunSize = 21474836480 //20Gbdef maxJobSize = 42949672960 //40GbJenkins.instance.getAllItems(hudson.model.Job).each{println itdef runs = it.getBuilds()def job = itdef totalSize = 0def jobSizeReached = falseruns.each{def currentRun = itdef currentRunSize = it.getRootDir().directorySize()println itprintln currentRunSizeswitch (currentRunSize) { case {it > maxRunSize} : currentRun.delete() println "deleted " + currentRun + " for having size " + it + " over allowed " +  maxRunSize break case {it + totalSize > maxJobSize || jobSizeReached } : jobSizeReached = true currentRun.delete() println "deleted " + currentRun + " for not fitting total  " + job + " jobs size " + (totalSize+currentRunSize) +" to max  allowed " +  maxJobSize break default : totalSize += currentRunSize println 'leaving '  + currentRun + ". current total = " + totalSize break}println '--'}println '=='}}{code}{noformat} <> --[Pipeline] echo[Pipeline] End of Pipelinejava.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:298) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61) at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:258) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:206) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:153) at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122) at org.jenkinsci.plugins.workflow.cps.CpsScript.println(CpsScript.java:221) at sun.reflect.GeneratedMethodAccessor1023.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022) at 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-05 Thread svano...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Sam Van Oort commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Devin Nusbaum Please, could you take a look?   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-05 Thread toby.broy...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Toby Broyles commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 We are seeing this error quite often lately. Is there anything I can do to help troubleshoot?  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-09-04 Thread alexeygrigo...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Alexey Grigorov commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Same here. I'm running a @NonCPS labeled method in library and if fails with this error randomly in the middle 

 

import jenkins.*
import jenkins.model.*
import hudson.*
import hudson.model.*

/* This method will find all jobs and verify their disk usage.
Then it will remove job runs to fit  quota

That will help to save disk space

*/
@NonCPS
def call() {
def maxRunSize = 21474836480 //20Gb
def maxJobSize = 42949672960 //40Gb
Jenkins.instance.getAllItems(hudson.model.Job).each
{
println it
def runs = it.getBuilds()
def job = it
def totalSize = 0
def jobSizeReached = false
runs.each{
def currentRun = it
def currentRunSize = it.getRootDir().directorySize()
println it
println currentRunSize
switch (currentRunSize) {
case {it > maxRunSize} :
currentRun.delete()
println "deleted " + currentRun + " for having size " + it + " over allowed " +  maxRunSize
break
case {it + totalSize > maxJobSize || jobSizeReached } :
jobSizeReached = true
currentRun.delete()
println "deleted " + currentRun + " for not fitting total  " + job + " jobs size " + (totalSize+currentRunSize) +" to max  allowed " +  maxJobSize
break
default :
totalSize += currentRunSize
println 'leaving '  + currentRun + ". current total = " + totalSize
break
}
println '--'
}
println '=='
}
}

 

 

 
--
[Pipeline] echo
[Pipeline] End of Pipeline
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302)
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:275)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:111)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadGroupSynchronously(CpsStepContext.java:248)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.getThreadSynchronously(CpsStepContext.java:237)
	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:298)
	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:61)
	at org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:258)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:206)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:153)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
	at 

[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-04-18 Thread svano...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Sam Van Oort commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 Georg Henzler Executors have nothing to do with this, and this issue is generally within the scope of a single build, so limiting the maximum number should have no impact – except when restarting Jenkins, where there might be enough of a bottleneck due to loading FlowExecutions to hit the 1 minute timeout. Maybe if you had maybe 10x as many Pipelines as CPU cores on the master, or really complex Program state – in this case we'd be blocking on the code here: https://github.com/jenkinsci/workflow-job-plugin/blob/master/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java#L1024 Do you have a message like this in your logs? WARNING, failed to wait for $someRun to be loaded (and then a stack trace).  More likely there's something subtly wrong with the concurrency/error-handling model here – some of what I'm doing right now to untangle some nasty issues in these guts may help – and there was a change in the last release of workflow-cps that might help if you don't have it already (it caught more exceptions and prevented threads from potentially waiting forever).  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-04-18 Thread jenk...@ghenzler.de (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Georg Henzler commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 We also had this problem again today... reading Jesse Glick's comment in https://issues.jenkins-ci.org/browse/JENKINS-35710?focusedCommentId=268398=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-268398 I suspect adding more slaves does not help (as the stacktrace clearly points to CPS, and according the comment that always runs on master)... but I suppose using https://github.com/jenkinsci/throttle-concurrent-builds-plugin to ensure only a maximum of builds run in parallel globally could help. Sam Van Oort before trying, do you think this is a valid way of getting rid of the problem? (asking before trying as this is not easy to reproduce)  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-04-18 Thread svano...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Sam Van Oort started work on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
Change By: 
 Sam Van Oort  
 
 
Status: 
 Open In Progress  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-04-18 Thread lightningblad...@hotmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Jarrett B commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 +1 Same here  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-46507) Parallel Pipeline random java.lang.InterruptedException

2018-04-11 Thread bsousa....@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Bruno Sousa commented on  JENKINS-46507  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Parallel Pipeline random java.lang.InterruptedException   
 

  
 
 
 
 

 
 +1 same here  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.