[JIRA] (JENKINS-11622) ChannelPinger fails while Free Swap Space checker is running on Windows Slaves

sam7...@gmail.com (JIRA) Thu, 16 Feb 2012 12:05:40 -0800

    [ 
https://issues.jenkins-ci.org/browse/JENKINS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=159173#comment-159173
 ]


Sam Talebbeik edited comment on JENKINS-11622 at 2/16/12 8:05 PM:
------------------------------------------------------------------

One more piece of information. This issue happened again the other day while 
the target build slave server was cloning a repository. This means that the 
build slave server was very busy doing I/O and cpu activities. 

                     Jenkins Log in jenkins.log file 
================================================================================================
Feb 15, 2012 5:35:12 PM hudson.slaves.ChannelPinger$1 onDead
INFO: Ping failed. Terminating the channel.
Exception in thread "Monitoring slave-07 for Free Swap Space" 
hudson.remoting.RequestAbortedException: hud
son.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
        at hudson.remoting.Request.call(Request.java:149)
        at hudson.remoting.Channel.call(Channel.java:660)
        at 
hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83)
        at 
hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81)
        at 
hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:
202)
Caused by: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown
        at hudson.remoting.Request.abort(Request.java:269)
        at hudson.remoting.Channel.terminate(Channel.java:711)
        at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024)
Caused by: hudson.remoting.Channel$OrderlyShutdown
        ... 2 more
Caused by: Command close created at
        at hudson.remoting.Command.<init>(Command.java:62)
        at hudson.remoting.Command.<init>(Command.java:47)
        at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
        at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
        at hudson.remoting.Channel.close(Channel.java:835)
        at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793)
        ... 1 more
Feb 15, 2012 5:35:12 PM hudson.model.AbstractBuild$AbstractRunner 
performAllBuildSteps
WARNING: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to 
exception
java.lang.NullPointerException
        at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:83)
                        at 
hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:123)
        at 
hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:135)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
        at 
hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:649)
        at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:625)
        at 
hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:603)
        at hudson.model.Build$RunnerImpl.post2(Build.java:161)
        at 
hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:572)
        at hudson.model.Run.run(Run.java:1386)
        at hudson.matrix.MatrixRun.run(MatrixRun.java:137)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:145)
Feb 15, 2012 5:35:53 PM hudson.slaves.SlaveComputer tryReconnect
INFO: Attempting to reconnect slave-07


              Build Job failure messages
==================================================================================
FATAL: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
        at hudson.remoting.Request.call(Request.java:149)
        at hudson.remoting.Channel.call(Channel.java:660)
        at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
        at $Proxy17.join(Unknown Source)
        at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:850)
        at hudson.Launcher$ProcStarter.join(Launcher.java:336)
        at hudson.plugins.mercurial.MercurialSCM.clone(MercurialSCM.java:577)
        at hudson.plugins.mercurial.MercurialSCM.checkout(MercurialSCM.java:422)
        at hudson.model.AbstractProject.checkout(AbstractProject.java:1174)
        at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:523)
        at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:418)
        at hudson.model.Run.run(Run.java:1362)
        at hudson.matrix.MatrixRun.run(MatrixRun.java:137)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:145)
Caused by: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown
        at hudson.remoting.Request.abort(Request.java:269)
        at hudson.remoting.Channel.terminate(Channel.java:711)
        at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024)
Caused by: hudson.remoting.Channel$OrderlyShutdown
        ... 2 more
Caused by: Command close created at
        at hudson.remoting.Command.<init>(Command.java:62)
        at hudson.remoting.Command.<init>(Command.java:47)
        at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
        at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
        at hudson.remoting.Channel.close(Channel.java:835)
        at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793)
        ... 1 more

                
      was (Author: samt):
    One more piece of information. This issue happened again the other day 
while the target build slave server was cloning a repository. This means that 
the build slave server was very busy doing I/O and cpu activities. 

Could it be that the pinger's timeout is too short and too aggressive for 
situations like this? If the network is very busy and if the target build slave 
server is too busy with I/O and CPU activities then the response will not 
travel fast enough from the main Jenkins server to the build slave server and 
back from build slave server to the main Jenkins server.


                  
> ChannelPinger fails while Free Swap Space checker is running on Windows Slaves
> ------------------------------------------------------------------------------
>
>                 Key: JENKINS-11622
>                 URL: https://issues.jenkins-ci.org/browse/JENKINS-11622
>             Project: Jenkins
>          Issue Type: Bug
>          Components: core
>         Environment: Windows Server 2003, 1 vCPU, 4GB RAM (32bit) 8GB RAM 
> (64bit), 50GB virtual disk, VMware Hypervisor.
>            Reporter: Ryan Hass
>              Labels: channelpinger
>
> Windows slaves randomly disconnect while idle. This appears to be caused by 
> free space threads which are stuck or still running, resulting in the SSH 
> conenction being terminated and connections being reestablished.
> I am not exactly sure what the expected behavior is for the low-level 
> handling and communication. However, at a high level, the expected behavior 
> is for the slave connections to persist the channel pinger not to cause a 
> reset.
> {noformat:title=jenkins.log}
> Nov 4, 2011 8:34:48 AM 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
> WARNING: Previous Free Swap Space monitoring activity still in progress. 
> Interrupting
> Nov 4, 2011 8:40:18 AM hudson.slaves.ChannelPinger$1 onDead
> INFO: Ping failed. Terminating the channel.
> Exception in thread "Monitoring w64-09 for Free Swap Space" 
> hudson.remoting.RequestAbortedException: hudson.remotin
> g.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.call(Request.java:149)
>         at hudson.remoting.Channel.call(Channel.java:660)
>         at 
> hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83)
>         at 
> hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81)
>         at 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202)
> Caused by: hudson.remoting.RequestAbortedException: 
> hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.abort(Request.java:269)
>         at hudson.remoting.Channel.terminate(Channel.java:711)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024)
> Caused by: hudson.remoting.Channel$OrderlyShutdown
>         ... 2 more
> Caused by: Command close created at
>         at hudson.remoting.Command.<init>(Command.java:62)
>         at hudson.remoting.Command.<init>(Command.java:47)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel.close(Channel.java:835)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793)
>         ... 1 more
> Exception in thread "Monitoring w64-09 for Free Temp Space" 
> hudson.remoting.RequestAbortedException: hudson.remotin
> g.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.call(Request.java:149)
>         at hudson.remoting.Channel.call(Channel.java:660)
>         at hudson.FilePath.act(FilePath.java:745)
>         at hudson.FilePath.act(FilePath.java:738)
>         at 
> hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:73)
>         at 
> hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:135)
>         at 
> hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:49)
>         at 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202)
> Caused by: hudson.remoting.RequestAbortedException: 
> hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.abort(Request.java:269)
>         at hudson.remoting.Channel.terminate(Channel.java:711)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024)
> Caused by: hudson.remoting.Channel$OrderlyShutdown
>         ... 2 more
> Caused by: Command close created at
>         at hudson.remoting.Command.<init>(Command.java:62)
>         at hudson.remoting.Command.<init>(Command.java:47)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel.close(Channel.java:835)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793)
>         ... 1 more
> Nov 4, 2011 8:40:57 AM hudson.slaves.SlaveComputer tryReconnect
> INFO: Attempting to reconnect w64-09
> Nov 4, 2011 9:34:48 AM 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
> WARNING: Previous Free Swap Space monitoring activity still in progress. 
> Interrupting
> Nov 4, 2011 9:34:48 AM 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
> WARNING: Previous Free Temp Space monitoring activity still in progress. 
> Interrupting
> Nov 4, 2011 9:40:18 AM hudson.slaves.ChannelPinger$1 onDead
> INFO: Ping failed. Terminating the channel.
> Exception in thread "Monitoring w64-09 for Free Swap Space" 
> hudson.remoting.RequestAbortedException: 
> hudson.remoting.RequestAbortedException: 
> hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.call(Request.java:149)
>         at hudson.remoting.Channel.call(Channel.java:660)
>         at 
> hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83)
>         at 
> hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81)
>         at 
> hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202)
> Caused by: hudson.remoting.RequestAbortedException: 
> hudson.remoting.Channel$OrderlyShutdown
>         at hudson.remoting.Request.abort(Request.java:269)
>         at hudson.remoting.Channel.terminate(Channel.java:711)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794)
>         at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024)
> Caused by: hudson.remoting.Channel$OrderlyShutdown
>         ... 2 more
> Caused by: Command close created at
>         at hudson.remoting.Command.<init>(Command.java:62)
>         at hudson.remoting.Command.<init>(Command.java:47)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790)
>         at hudson.remoting.Channel.close(Channel.java:835)
>         at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793)
>         ... 1 more
> Nov 4, 2011 9:40:57 AM hudson.slaves.SlaveComputer tryReconnect
> INFO: Attempting to reconnect w64-09
> {noformat}
> Please note, this issue can be mitigated by disabling the Free Swap Space 
> check for all slaves. However, this a less than optimal solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[JIRA] (JENKINS-11622) ChannelPinger fails while Free Swap Space checker is running on Windows Slaves

Reply via email to