[ 
https://issues.jenkins-ci.org/browse/JENKINS-6817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=163726#comment-163726
 ] 

Dejan Menges commented on JENKINS-6817:
---------------------------------------

Hi,

I did some investigation on this problem over last two months. My case was that 
this was happening on different linux flavors (but not on Windows build nodes). 
This started to happen after intensity and number of build jobs started to 
increase and was happening only for Scala projects. Build nodes are not unique, 
environment is mixed (some nodes are local, some nodes are on AWS, using two 
different types of VPN connection). 

So after digging more and more, I figured out actually that my kernel wasn't 
tuned correctly. I increased number of file descriptors and maximum number of 
processes (as by guys before me it was never tuned) and jobs were doing much 
more faster, but this started to cause more and more issues. 

So, it turned up that this was issue with RedHat based distribution and SSH - 
no matter I set /etc/security/limits.conf not all values (including this one 
with nproc setting) were set after SSH connection initiated from Jenkins master 
server to client(s). I tested it easily by making simple bash script which will 
connect from master using SSH and same user we use for builds to build nodes 
and issue ulimit -u and the value was default (1024).

Quick fix was to propagate change to all linux nodes to .bashrc and add 
explicitly 'ulimit -u value' there after what this stopped to occur and didn't 
happen anymore for a ten days now (and was happening multiple times a day, even 
hour). Also, if you want to tune it more elegantly, you should add 'session   
required   pam_lmits.so' to /etc/pam.d/login what will force every new SSH 
connection to use defined limits from /etc/security/limits.conf.

Maybe this is not real fix for this issue, but this resolved my problem (and 
now, after thinking about structure of my jobs, intensity, requirements, etc. 
and basic values of limits, it was quite logical that this was happening to me) 
and maybe this will help as well someone else. 
                
> FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: 
> Unexpected termination of the channel
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: JENKINS-6817
>                 URL: https://issues.jenkins-ci.org/browse/JENKINS-6817
>             Project: Jenkins
>          Issue Type: Bug
>          Components: clone-workspace, core
>    Affects Versions: current
>            Reporter: nirmal_patel
>            Assignee: abayer
>            Priority: Blocker
>
> I am seeing the same on my Windows XP master-slave setup. I am running latest 
> Hudson ver. 1.363
> I am using the close-workspace-scm plugin to copy my workspace from master to 
> slave(150).
> Started by user anonymous
> Building remotely on 150
> FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: 
> Unexpected termination of the channel
> hudson.remoting.RequestAbortedException: 
> hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected 
> termination of the channel
> at hudson.remoting.Request.call(Request.java:137)
> at hudson.remoting.Channel.call(Channel.java:555)
> at hudson.FilePath.act(FilePath.java:742)
> at hudson.FilePath.act(FilePath.java:735)
> at hudson.FilePath.unzip(FilePath.java:415)
> at 
> hudson.FileSystemProvisioner$Default$WorkspaceSnapshotImpl.restoreTo(FileSystemProvisioner.java:227)
> at 
> hudson.plugins.cloneworkspace.CloneWorkspaceSCM$Snapshot.restoreTo(CloneWorkspaceSCM.java:344)
> at 
> hudson.plugins.cloneworkspace.CloneWorkspaceSCM.checkout(CloneWorkspaceSCM.java:126)
> at hudson.model.AbstractProject.checkout(AbstractProject.java:1044)
> at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:479)
> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:411)
> at hudson.model.Run.run(Run.java:1253)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
> at hudson.model.ResourceController.execute(ResourceController.java:88)
> at hudson.model.Executor.run(Executor.java:127)
> Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: 
> Unexpected termination of the channel
> at hudson.remoting.Request.abort(Request.java:257)
> at hudson.remoting.Channel.terminate(Channel.java:602)
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:893)
> Caused by: java.io.IOException: Unexpected termination of the channel
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:875)
> Caused by: java.io.EOFException
> at java.io.ObjectInputStream$BlockDataInputStream.peekByte(Unknown Source)
> at java.io.ObjectInputStream.readObject0(Unknown Source)
> at java.io.ObjectInputStream.readObject(Unknown Source)
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:869)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to