Issue Type: Bug Bug
Affects Versions: current
Assignee: Unassigned
Components: core, ssh
Created: 17/Jul/13 3:58 PM
Description:

In the last 12 months, we have encountered a very rare, but also very critical issue in Jenkins core or the SSH-Slaves plug-in.

The issue is that Jenkins spontaneously enters a state in which it reproducibly selects the wrong channel for some of its connected build hosts. All build hosts are connected via the SSH-Slaves plug-in.

This immediately leads failing builds, as they will not respect the workspace locks anymore, as they lock them on the correct host, but talk with a different host to execute builds.

A typical log-output looks like this:

-------------------
14:45:35 Started by command line by <user>
14:45:35 Building remotely on musxbird039 in workspace /local/jenkins_workspace/workspace/<PROJECT>
[...]
14:45:35 Checkout:<GIT-REPO> / /local/jenkins_workspace/workspace/<PROJECT> - hudson.remoting.Channel@37ecb28e:musxbird029
-------------------

As you can see, it selects musxbird039 for building, but uses the channel to musxbird029. Since the workspace is usually physically present on those machines, too, the build starts. Unfortunately, since the workspace is only locked on musxbird039, but not on musxbird029, a collision can occur freely.

This leads, of course, to a vast variety of build failures.

We do not know of a way to reliably reproduce this issue, as it appears randomly after some time. Sometimes it takes months to appear, sometimes only days.

The only known way of repairing the bug is to disconnect both machines and let them restart their slaves and all their associated threads on the Jenkins master. Rebooting the server itself obviously also works.

We are quite frankly stumped by this bug. Even examining the slave->channel allocation code of Jenkins ourselves did not lead to any clue.

If you need more information, we will be happy to give them.

Best regards,

Martin Schröder
Intel Mobile Communications GmbH.

Environment: Ubuntu 12.04 64-bit
Jenkins LTS 1.509.1
Java(TM) SE Runtime Environment (build 1.6.0_45-b06) 64-bit
Project: Jenkins
Labels: core node slave slaves jenkins ssh channel linux
Priority: Critical Critical
Reporter: Martin Schröder
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply via email to