[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo updated JENKINS-53810 Jenkins / JENKINS-53810 Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Change By: Ivan Fernandez Calvo Status: Fixed but Unreleased Resolved Released As: ssh-slaves-1.30.0 Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.10309.1580577000928%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title pjdarton commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException I'd be a bit wary of updating the docker-plugin so it requires a fully up-to-date version of Jenkins. I believe that users should be able to use the latest docker-plugin with a core Jenkins that's at least 6 months out of date. If we had a critical security defect that required the latest Jenkins core then, sure, I guess we'd have to do that but for anything else, it'd be best not to force everyone to update. FYI what the docker-plugin is trying to do is to poll the docker container (that it just created) to wait until the container is ready to accept SSH connections before allowing the main SSHLauncher code to start trying to connect. If the docker-plugin declared its provisioning "complete" and the slave stayed offline for any period then Jenkins would ask it to make more slave nodes than are required, causing resource wastage. Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.6947.1571050860510%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo updated JENKINS-53810 Jenkins / JENKINS-53810 Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Change By: Ivan Fernandez Calvo Status: In Progress Fixed but Unreleased Resolution: Fixed Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.6665.1570997227673%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException I am pretty sure is related to JENKINS-59764, I have make the PR to the docker-plugin with the solution, it'd need trilead-ssh2-build-217-jenkins-15 on Jenkins Core 2.190- or trilead-api 1.0.5 on Jenkins core 2.190.1+ Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.6641.1570997160473%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException yep, it is really helpful, I'm fixing the retry method on https://issues.jenkins-ci.org/browse/JENKINS-58589, and I've found the connection was not closed in some cases, so now it forces to close the connection before to retry again, also the timeout is passed to the connection and the weird global timeout disappear, now it'd be a timeout per connection. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.2905.1564406400618%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Michael Zanetti commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Ivan Fernandez Calvo, In reply to your question on when exactly this happens to for me: So, my setup is as follows: I use AWS EC2 instances which are started on demand using the "Launch Agents via SSH" method. I also have an additional plugin installed which allows me to execute commands before the SSH connection. In my custom script I use "aws ec2 launch-instance..." and then "aws ec2 instance-wait ..." to wait until it has booted up. This will stall the normal "Launch agents via SSH" procedure until slaves are up and running. After that, it continues with the actual SSH call. Now, this works fine in 90% of the cases, however, in some cases, aws instance-wait returns before sshd on the slave is actually running and the connection attempt fails with the message: [07/29/19 11:57:14] [SSH] Opening SSH connection to ec2-obfuscated.eu-west-1.compute.amazonaws.com:22. Connection refused (Connection refused) SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 15 seconds. There are 10 more retries left. This waits for 15 seconds, and tries again. The second attempt succeeds and all is fine now, jobs start building. However, this is the situation where the bug strikes. The ssh connector plugin did not stop the connection timeout for this first failed connection. The connection timeout for the first, failed connection is still running, and by default for another 3 minutes. Now the jobs start building but when the 3 minutes have passed, the currently working SSH connection is killed because the ssh-connector thinks it is still trying to establish the first connection. Hope this makes it clear enough. Add Comment This message was sent by Atlassian Jira
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException AFAIU it uses the SSHLauncher https://github.com/jenkinsci/docker-plugin/blob/cde48970a0ccb2260afa290d020de79dda66ae4d/src/main/java/io/jenkins/docker/connector/DockerComputerSSHConnector.java#L457 https://github.com/jenkinsci/docker-plugin/blob/cde48970a0ccb2260afa290d020de79dda66ae4d/src/main/java/io/jenkins/docker/connector/DockerComputerSSHConnector.java#L264 https://github.com/jenkinsci/docker-plugin/blob/cde48970a0ccb2260afa290d020de79dda66ae4d/src/main/java/io/jenkins/docker/connector/DockerComputerSSHConnector.java#L359 https://github.com/jenkinsci/docker-plugin/blob/cde48970a0ccb2260afa290d020de79dda66ae4d/src/main/java/io/jenkins/docker/connector/DockerComputerSSHConnector.java#L407 it has his own connector but is not the one from the slaves-ssh-plugin or an extension of it or ComputerConnector, I don't really dig too much in the code. https://github.com/jenkinsci/docker-plugin/blob/master/src/main/java/io/jenkins/docker/connector/DockerComputerSSHConnector.java#L61 https://github.com/jenkinsci/docker-plugin/blob/master/src/main/java/io/jenkins/docker/connector/DockerComputerConnector.java#L28 In any case, I'm changing how the timeout works, because we fix the trilead-ssh2 lib to manage the timeout so the current behavior it is no longer valid, we have to manage the timeout per connection, not per retries block. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.18006.1563808560340%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Jesse Glick commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException how the Docker plugin uses the SSHLauncher, I think it reuses the same object to launch all agents Supposed to be using SSHConnector, designed for clouds. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.17980.1563806700337%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Arno Moonen edited a comment on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException We are not using Docker in our setup (might have the Docker plugin installed though). So I'm not sure if this is really caused by the Docker plugin. Currently we are not upgrading the SSH Slaves plugin to prevent these issues (we're still on version 1.26) Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.17539.1563780780305%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Arno Moonen commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException We are not using Docker in our setup (might have the Docker plugin installed though). So I'm not sure if this is really caused by the Docker plugin. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.17532.1563779700604%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo started work on JENKINS-53810 Change By: Ivan Fernandez Calvo Status: Open In Progress Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.17119.1563712860875%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Michael Zanetti I've read the whole thing again, and I think could be related to how the Docker plugin uses the SSHLauncher, I think it reuses the same object to launch all agents, SSHLauncher is designed to manage one agent, not a fleet, so I want to test something pretty similar to your configuration and conditions to see if I can replicate the issue and find a solution. Could you tell me how your agent configurations look like? and in which condition the issue is exposed? Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.17097.1563712800308%40Atlassian.JIRA.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Michael Zanetti edited a comment on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException I understand that this timeout is not supposed to do this. Still I am pretty sure it does... Ever since I increased it to an insanely high number, the issue is gone. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.10329.1563177240305%40Atlassian.JIRA. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Michael Zanetti commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException I understand that this timeout is not supposed to do this. Still I am pretty sure it does... Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.10317.1563177181031%40Atlassian.JIRA. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException this timeout is only used on a connection, the time that you see 3-4 min it is the time that takes the Ping Thread to check the connection if it cannot communicate with the agent it breaks the channel. You have to review the exact exception you show in the Jenkins instance and the exceptions in the Agent logs, I'd recommend enabling verbose log in the SSH server to show the exact moment when the disconnection happens, because it uses to happen befor the ping thread check the connection, in that case, the problem is in the SSH configuration or in the network infrastructure. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.7180.1561374480330%40Atlassian.JIRA. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Michael Zanetti commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Ok... I ran into the the same. I'd like to clarify the confusion about why increasing "Connection Timeouts in Seconds" actually helps with this issue even though that should only affect the connection establishment, but not already working connections. I have noticed that whenever I ran into this issue, first there has been a failed connection establishment. Jenkins would then retry to connect, and succeed on the second attempt and everything seems working, jobs start building. However, after about 3-4 minutes they fail with the above connection breakdown. This does indeed match the 210 seconds, but why? It seems there must be a bug in the ssh connector code, because when a connection attempt fails, jenkins does retry, but it does not seem to stop the timer running for that previous failed attempt. The second attempt might succeed to establish, but the timeout timer for the first is still active and when it runs out, it will kill the current working connection, causing builds to fail. So, increasing the "Connection timeout in seconds" does work around this issue, the actual cause however, seems somewhere in the code that handles ssh connections. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.7162.1561369920335%40Atlassian.JIRA. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Gerd Hoffmann commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException I've changed two values actually, one from 210 to 600. My UI is in in german, so not fully sure what the english label is, "Connection Timeout in Seconds" sounds right though. The other one is two lines below the first, switched from 10 to 60. This is the interval between connection attempts according to the label. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Do you mean "Connection Timeout in Seconds"? the default value is 210 seconds, this time is used for all retries so should be enough to cover them all, I mean, if you have 6 retries and a time to wait between them of 10 seconds, this timeout should be 6*10+The timeout of each connection, so if you want to wait for 30 seconds on each connection you would set this setting to 6*10+30*6=240 On this issue the settings are the default so it is not related to the timeout, see launchTimeoutSeconds value SSHLauncher{host='9.47.78.144', port=32870, credentialsId='slave-test', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, There is a couple of improvements to change this behavior JENKINS-48617 and JENKINS-48618, for me is really confusing In any case, 10 seconds is a shorter value to that timeout, maybe worth to validate that the setting is not lower than 60 and push value under that value to 60 https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/CONFIGURE.md#advanced-settings Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo edited a comment on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Do you mean "Connection Timeout in Seconds"? the default value is 210 seconds, this time is used for all retries so should be enough to cover them all, I mean, if you have 6 retries and a time to wait between them of 10 seconds, this timeout should be 6*10+The timeout of each connection, so if you want to wait for 30 seconds on each connection you would set this setting to 6*10+30*6=240On this issue the settings are the default so it is not related to the timeout, see launchTimeoutSeconds value{code}SSHLauncher{host='9.47.78.144', port=32870, credentialsId='slave-test', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=10, retryWaitTime=15, {code}There is a couple of improvements to change this behavior JENKINS-48617 and JENKINS-48618, for me is really confusing In any case, 10 seconds is a shorter value to that timeout, maybe worth to validate that the setting is not lower than 60 and push value under that value to 60 (JENKINS-55858) https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/CONFIGURE.md#advanced-settings Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Gerd Hoffmann commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Interesting. Raising the timeout helped in my case too. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Jesper Jensen commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Just had the same problem as described, the workaround I found was to change the timeout to 60 seconds (default 10). First time it too very long almost 60 seconds the second time only a few seconds. Hope that this helps! Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Jeff Thompson commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException I can't see how ClassFilter could cause that, either. It may take a little bit of time to process those patterns though it still shouldn't be very long. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Jesse Glick commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Not sure what would cause that. This is within ClassFilter.createDefaultInstance using the DEFAULT_PATTERNS, so it should not behave differently in one environment than another. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Jeff Thompson Jesse Glick Could the ClassFilter block the agent launch? the stack trace that Gerd Hoffmann post in the previous comment it is really near to the error but I am not sure if it could be the RC. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Gerd Hoffmann commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException I see the agent fail with the same stacktrace, thread dump below - cut here -- [11/29/18 14:35:45] [SSH] Checking java version of java [11/29/18 14:35:45] [SSH] java -version returned 1.8.0_191. [11/29/18 14:35:45] [SSH] Starting sftp client. [11/29/18 14:35:45] [SSH] Copying latest remoting.jar... [11/29/18 14:35:45] [SSH] Copied 776,717 bytes. Expanded the channel window size to 4MB [11/29/18 14:35:45] [SSH] Starting agent process: cd "/home/jenkins" && java -jar remoting.jar -workDir /home/jenkins Nov 29, 2018 2:35:50 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/remoting as a remoting work directory Both error and output logs will be printed to /home/jenkins/remoting 2018-11-29 14:36:18 Full thread dump OpenJDK Zero VM (25.191-b12 interpreted mode): "Service Thread" #5 daemon prio=9 os_prio=0 tid=0xb62d2cb8 nid=0x6692 runnable [0x] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0xb62d0530 nid=0x6691 waiting on condition [0x] java.lang.Thread.State: RUNNABLE "Finalizer" #3 daemon prio=8 os_prio=0 tid=0xb62b4ba0 nid=0x6690 in Object.wait() [0xa34fe000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0xa9718618> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144) locked <0xa9718618> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216) "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0xb62b2160 nid=0x668f in Object.wait() [0xa37fe000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) waiting on <0xa97187b8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference.tryHandlePending(Reference.java:191) locked <0xa97187b8> (a java.lang.ref.Reference$Lock) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) "main" #1 prio=5 os_prio=0 tid=0xb6246b10 nid=0x668d runnable [0xb6562000] java.lang.Thread.State: RUNNABLE at java.util.regex.Pattern$CharProperty.match(Pattern.java:3782) at java.util.regex.Pattern$Curly.match0(Pattern.java:4262) at java.util.regex.Pattern$Curly.match(Pattern.java:4236) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3779) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4660) at java.util.regex.Pattern$Loop.match(Pattern.java:4787) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4719) at java.util.regex.Pattern$Branch.match(Pattern.java:4604) at java.util.regex.Pattern$Curly.match0(Pattern.java:4274) at java.util.regex.Pattern$Curly.match(Pattern.java:4236) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3779) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4660) at java.util.regex.Pattern$Loop.match(Pattern.java:4787) at
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException yong wu Could you grab a thread dump meanwhile the agent is stuck trying to start? I see in the log that you have about a minute to get it https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Jeff Thompson assigned an issue to Ivan Fernandez Calvo Jenkins / JENKINS-53810 Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Change By: Jeff Thompson Assignee: Jeff Thompson Ivan Fernandez Calvo Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title yong wu commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException I also ran into similar problem while adding an node – SLES12.3 . Java ver on slave was up to 1.8 , not sure if this is related to ssh slave plugin or not... Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title yong wu updated an issue Jenkins / JENKINS-53810 Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Change By: yong wu Attachment: image-2018-10-09-19-18-08-873.png Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException So you set the property `-Dhudson.plugins.sshslaves.SSHLauncher.trackCredentials=false` on you Jenkins instance JVM parameters and the issue persist, In that case I need this info to try to understand/replicate whatever happen. https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug, what I saw in the log it is that the agent try to connect and after 4 min it is killed (pingThread probably) but it seems never end the connection. You said that this happens when you have when you have a huge queue, probably we'll need a thread dump of the instance when the issue happens to see what threads are blocked. https://wiki.jenkins.io/display/JENKINS/Obtaining+a+thread+dump Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Durgadas Kamath commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Ivan Fernandez Calvo I tried the above workaround but that didn't solve the problem. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Ivan Fernandez Calvo commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException 90% percent sure that it is related to https://issues.jenkins-ci.org/browse/JENKINS-49235, there is a workaround https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#threads-stuck-at-credentialsprovidertrackall Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Oleg Nenashev commented on JENKINS-53810 Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Also CC Ivan Fernandez Calvo. It looks rather like an SSH Slaves plugin issue Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Oleg Nenashev updated an issue Jenkins / JENKINS-53810 Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Change By: Oleg Nenashev Component/s: ssh-slaves-plugin Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException
Title: Message Title Durgadas Kamath created an issue Jenkins / JENKINS-53810 Launch Agents fails with ERROR: null java.util.concurrent.CancellationException Issue Type: Bug Assignee: Jeff Thompson Components: core, remoting Created: 2018-09-27 07:12 Environment: -- Jenkins running on x86 --- uname -a Linux ps 4.4.0-134-generic #160-Ubuntu SMP Wed Aug 15 14:58:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux --- Slaves are s390x - Linux jk-slave-3 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:07:17 UTC 2018 s390x s390x s390x GNU/Linux Plugins SSH Slaves plugin 1.28.1 Docker Plugin 1.1.5 Labels: jenkins remoting Priority: Major Reporter: Durgadas Kamath Launching node/agent fails with ERROR: null java.util.concurrent.CancellationException We have large number number of jobs in queue which gets assigned to slaves being created by Docker plugin. Even, if we try creating slave and try to launch agent, it fails. Note: Slave image adheres to all the requirement and works well if there is no huge queue. Executor Status SSHLauncher{host='9.47.78.144', port=32870,