[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException

2019-07-29 Thread michael_zane...@gmx.net (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Michael Zanetti commented on  JENKINS-53810  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException   
 

  
 
 
 
 

 
 Ivan Fernandez Calvo,  In reply to your question on when exactly this happens to for me: So, my setup is as follows: I use AWS EC2 instances which are started on demand using the "Launch Agents via SSH" method. I also have an additional plugin installed which allows me to execute commands before the SSH connection. In my custom script I use "aws ec2 launch-instance..." and then "aws ec2 instance-wait ..." to wait until it has booted up. This will stall the normal "Launch agents via SSH" procedure until slaves are up and running. After that, it continues with the actual SSH call. Now, this works fine in 90% of the cases, however, in some cases, aws instance-wait returns before sshd on the slave is actually running and the connection attempt fails with the message: 

 

[07/29/19 11:57:14] [SSH] Opening SSH connection to ec2-obfuscated.eu-west-1.compute.amazonaws.com:22. Connection refused (Connection refused) SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 15 seconds. There are 10 more retries left.
 

 This waits for 15 seconds, and tries again. The second attempt succeeds and all is fine now, jobs start building.   However, this is the situation where the bug strikes. The ssh connector plugin did not stop the connection timeout for this first failed connection. The connection timeout for the first, failed connection is still running, and by default for another 3 minutes. Now the jobs start building but when the 3 minutes have passed, the currently working SSH connection is killed because the ssh-connector thinks it is still trying to establish the first connection.   Hope this makes it clear enough.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#71100

[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException

2019-07-15 Thread michael_zane...@gmx.net (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Michael Zanetti edited a comment on  JENKINS-53810  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException   
 

  
 
 
 
 

 
 I understand that this timeout is not supposed to do this. Still I am pretty sure it does...    Ever since I increased it to an insanely high number, the issue is gone.   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.10329.1563177240305%40Atlassian.JIRA.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException

2019-07-15 Thread michael_zane...@gmx.net (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Michael Zanetti commented on  JENKINS-53810  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException   
 

  
 
 
 
 

 
 I understand that this timeout is not supposed to do this. Still I am pretty sure it does...   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.10317.1563177181031%40Atlassian.JIRA.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-53810) Launch Agents fails with ERROR: null java.util.concurrent.CancellationException

2019-06-24 Thread michael_zane...@gmx.net (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Michael Zanetti commented on  JENKINS-53810  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Launch Agents fails with ERROR: null java.util.concurrent.CancellationException   
 

  
 
 
 
 

 
 Ok... I ran into the the same. I'd like to clarify the confusion about why increasing "Connection Timeouts in Seconds" actually helps with this issue even though that should only affect the connection establishment, but not already working connections. I have noticed that whenever I ran into this issue, first there has been a failed connection establishment. Jenkins would then retry to connect, and succeed on the second attempt and everything seems working, jobs start building. However, after about 3-4 minutes they fail with the above connection breakdown. This does indeed match the 210 seconds, but why? It seems there must be a bug in the ssh connector code, because when a connection attempt fails, jenkins does retry, but it does not seem to stop the timer running for that previous failed attempt. The second attempt might succeed to establish, but the timeout  timer for the first is still active and when it runs out, it will kill the current working connection, causing builds to fail.   So, increasing the "Connection timeout in seconds" does work around this issue, the actual cause however, seems somewhere in the code that handles ssh connections.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.194324.1538032375000.7162.1561369920335%40Atlassian.JIRA.
For more options, visit https://groups.google.com/d/optout.