[ 
https://issues.jenkins-ci.org/browse/JENKINS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=159173#comment-159173
 ] 

Sam Talebbeik commented on JENKINS-11622:
-----------------------------------------

One more piece of information. This issue happened again the other day while 
the target build slave server was cloning a repository. This means that the 
build slave server was very busy doing I/O and cpu activities. 

Could it be that the pinger's timeout is too short and too aggressive for 
situations like this? If the network is very busy and if the target build slave 
server is too busy with I/O and CPU activities then the response will not 
travel fast enough from the main Jenkins server to the build slave server and 
back from build slave server to the main Jenkins server.


                
> ChannelPinger fails while Free Swap Space checker is running on Windows Slaves
> ------------------------------------------------------------------------------
>
>                 Key: JENKINS-11622
>                 URL: https://issues.jenkins-ci.org/browse/JENKINS-11622
>             Project: Jenkins
>          Issue Type: Bug
>          Components: core
>         Environment: Windows Server 2003, 1 vCPU, 4GB RAM (32bit) 8GB RAM 
> (64bit), 50GB virtual disk, VMware Hypervisor.
>            Reporter: Ryan Hass
>              Labels: channelpinger
>
> Windows slaves randomly disconnect while idle. This appears to be caused by 
> free space threads which are stuck or still running, resulting in the SSH 
> conenction being terminated and connections being reestablished.
> I am not exactly sure what the expected behavior is for the low-level 
> handling and communication. However, at a high level, the expected behavior 
> is for the slave connections to persist the channel pinger not to cause a 
> reset.
> {noformat:title=jenkins.log}
> Nov 4, 2011 8:34:48 AM 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
> WARNING: Previous Free Swap Space monitoring activity still in progress. 
> Interrupting
> Nov 4, 2011 8:40:18 AM hudson.slaves.ChannelPinger$1 onDead
> INFO: Ping failed. Terminating the channel.
> Exception in thread "Monitoring w64-09 for Free Swap Space" 
> hudson.remoting.RequestAbortedException: hudson.remotin
> g.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.call(Request.java:149)
>         at hudson.remoting.Channel.call(Channel.java:660)
>         at 
> hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83)
>         at 
> hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81)
>         at 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202)
> Caused by: hudson.remoting.RequestAbortedException: 
> hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.abort(Request.java:269)
>         at hudson.remoting.Channel.terminate(Channel.java:711)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024)
> Caused by: hudson.remoting.Channel$OrderlyShutdown
>         ... 2 more
> Caused by: Command close created at
>         at hudson.remoting.Command.<init>(Command.java:62)
>         at hudson.remoting.Command.<init>(Command.java:47)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel.close(Channel.java:835)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793)
>         ... 1 more
> Exception in thread "Monitoring w64-09 for Free Temp Space" 
> hudson.remoting.RequestAbortedException: hudson.remotin
> g.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.call(Request.java:149)
>         at hudson.remoting.Channel.call(Channel.java:660)
>         at hudson.FilePath.act(FilePath.java:745)
>         at hudson.FilePath.act(FilePath.java:738)
>         at 
> hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:73)
>         at 
> hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:135)
>         at 
> hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:49)
>         at 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202)
> Caused by: hudson.remoting.RequestAbortedException: 
> hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.abort(Request.java:269)
>         at hudson.remoting.Channel.terminate(Channel.java:711)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024)
> Caused by: hudson.remoting.Channel$OrderlyShutdown
>         ... 2 more
> Caused by: Command close created at
>         at hudson.remoting.Command.<init>(Command.java:62)
>         at hudson.remoting.Command.<init>(Command.java:47)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel.close(Channel.java:835)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793)
>         ... 1 more
> Nov 4, 2011 8:40:57 AM hudson.slaves.SlaveComputer tryReconnect
> INFO: Attempting to reconnect w64-09
> Nov 4, 2011 9:34:48 AM 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
> WARNING: Previous Free Swap Space monitoring activity still in progress. 
> Interrupting
> Nov 4, 2011 9:34:48 AM 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
> WARNING: Previous Free Temp Space monitoring activity still in progress. 
> Interrupting
> Nov 4, 2011 9:40:18 AM hudson.slaves.ChannelPinger$1 onDead
> INFO: Ping failed. Terminating the channel.
> Exception in thread "Monitoring w64-09 for Free Swap Space" 
> hudson.remoting.RequestAbortedException: 
> hudson.remoting.RequestAbortedException: 
> hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.call(Request.java:149)
>         at hudson.remoting.Channel.call(Channel.java:660)
>         at 
> hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83)
>         at 
> hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81)
>         at 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202)
> Caused by: hudson.remoting.RequestAbortedException: 
> hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.abort(Request.java:269)
>         at hudson.remoting.Channel.terminate(Channel.java:711)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024)
> Caused by: hudson.remoting.Channel$OrderlyShutdown
>         ... 2 more
> Caused by: Command close created at
>         at hudson.remoting.Command.<init>(Command.java:62)
>         at hudson.remoting.Command.<init>(Command.java:47)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel.close(Channel.java:835)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793)
>         ... 1 more
> Nov 4, 2011 9:40:57 AM hudson.slaves.SlaveComputer tryReconnect
> INFO: Attempting to reconnect w64-09
> {noformat}
> Please note, this issue can be mitigated by disabling the Free Swap Space 
> check for all slaves. However, this a less than optimal solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to