[ https://issues.jenkins-ci.org/browse/JENKINS-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=159173#comment-159173 ]
Sam Talebbeik edited comment on JENKINS-11622 at 2/16/12 8:05 PM: ------------------------------------------------------------------ One more piece of information. This issue happened again the other day while the target build slave server was cloning a repository. This means that the build slave server was very busy doing I/O and cpu activities. Jenkins Log in jenkins.log file ================================================================================================ Feb 15, 2012 5:35:12 PM hudson.slaves.ChannelPinger$1 onDead INFO: Ping failed. Terminating the channel. Exception in thread "Monitoring slave-07 for Free Swap Space" hudson.remoting.RequestAbortedException: hud son.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.call(Request.java:149) at hudson.remoting.Channel.call(Channel.java:660) at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83) at hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java: 202) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.abort(Request.java:269) at hudson.remoting.Channel.terminate(Channel.java:711) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024) Caused by: hudson.remoting.Channel$OrderlyShutdown ... 2 more Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:62) at hudson.remoting.Command.<init>(Command.java:47) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) at hudson.remoting.Channel.close(Channel.java:835) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793) ... 1 more Feb 15, 2012 5:35:12 PM hudson.model.AbstractBuild$AbstractRunner performAllBuildSteps WARNING: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception java.lang.NullPointerException at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:83) at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:123) at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:135) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:649) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:625) at hudson.model.AbstractBuild$AbstractRunner.performAllBuildSteps(AbstractBuild.java:603) at hudson.model.Build$RunnerImpl.post2(Build.java:161) at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:572) at hudson.model.Run.run(Run.java:1386) at hudson.matrix.MatrixRun.run(MatrixRun.java:137) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Feb 15, 2012 5:35:53 PM hudson.slaves.SlaveComputer tryReconnect INFO: Attempting to reconnect slave-07 Build Job failure messages ================================================================================== FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.call(Request.java:149) at hudson.remoting.Channel.call(Channel.java:660) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy17.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:850) at hudson.Launcher$ProcStarter.join(Launcher.java:336) at hudson.plugins.mercurial.MercurialSCM.clone(MercurialSCM.java:577) at hudson.plugins.mercurial.MercurialSCM.checkout(MercurialSCM.java:422) at hudson.model.AbstractProject.checkout(AbstractProject.java:1174) at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:523) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:418) at hudson.model.Run.run(Run.java:1362) at hudson.matrix.MatrixRun.run(MatrixRun.java:137) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:145) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.Request.abort(Request.java:269) at hudson.remoting.Channel.terminate(Channel.java:711) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024) Caused by: hudson.remoting.Channel$OrderlyShutdown ... 2 more Caused by: Command close created at at hudson.remoting.Command.<init>(Command.java:62) at hudson.remoting.Command.<init>(Command.java:47) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) at hudson.remoting.Channel.close(Channel.java:835) at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793) ... 1 more was (Author: samt): One more piece of information. This issue happened again the other day while the target build slave server was cloning a repository. This means that the build slave server was very busy doing I/O and cpu activities. Could it be that the pinger's timeout is too short and too aggressive for situations like this? If the network is very busy and if the target build slave server is too busy with I/O and CPU activities then the response will not travel fast enough from the main Jenkins server to the build slave server and back from build slave server to the main Jenkins server. > ChannelPinger fails while Free Swap Space checker is running on Windows Slaves > ------------------------------------------------------------------------------ > > Key: JENKINS-11622 > URL: https://issues.jenkins-ci.org/browse/JENKINS-11622 > Project: Jenkins > Issue Type: Bug > Components: core > Environment: Windows Server 2003, 1 vCPU, 4GB RAM (32bit) 8GB RAM > (64bit), 50GB virtual disk, VMware Hypervisor. > Reporter: Ryan Hass > Labels: channelpinger > > Windows slaves randomly disconnect while idle. This appears to be caused by > free space threads which are stuck or still running, resulting in the SSH > conenction being terminated and connections being reestablished. > I am not exactly sure what the expected behavior is for the low-level > handling and communication. However, at a high level, the expected behavior > is for the slave connections to persist the channel pinger not to cause a > reset. > {noformat:title=jenkins.log} > Nov 4, 2011 8:34:48 AM > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init> > WARNING: Previous Free Swap Space monitoring activity still in progress. > Interrupting > Nov 4, 2011 8:40:18 AM hudson.slaves.ChannelPinger$1 onDead > INFO: Ping failed. Terminating the channel. > Exception in thread "Monitoring w64-09 for Free Swap Space" > hudson.remoting.RequestAbortedException: hudson.remotin > g.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown > at hudson.remoting.Request.call(Request.java:149) > at hudson.remoting.Channel.call(Channel.java:660) > at > hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83) > at > hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81) > at > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202) > Caused by: hudson.remoting.RequestAbortedException: > hudson.remoting.Channel$OrderlyShutdown > at hudson.remoting.Request.abort(Request.java:269) > at hudson.remoting.Channel.terminate(Channel.java:711) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794) > at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024) > Caused by: hudson.remoting.Channel$OrderlyShutdown > ... 2 more > Caused by: Command close created at > at hudson.remoting.Command.<init>(Command.java:62) > at hudson.remoting.Command.<init>(Command.java:47) > at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) > at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) > at hudson.remoting.Channel.close(Channel.java:835) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793) > ... 1 more > Exception in thread "Monitoring w64-09 for Free Temp Space" > hudson.remoting.RequestAbortedException: hudson.remotin > g.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown > at hudson.remoting.Request.call(Request.java:149) > at hudson.remoting.Channel.call(Channel.java:660) > at hudson.FilePath.act(FilePath.java:745) > at hudson.FilePath.act(FilePath.java:738) > at > hudson.node_monitors.TemporarySpaceMonitor$1.getFreeSpace(TemporarySpaceMonitor.java:73) > at > hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:135) > at > hudson.node_monitors.DiskSpaceMonitorDescriptor.monitor(DiskSpaceMonitorDescriptor.java:49) > at > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202) > Caused by: hudson.remoting.RequestAbortedException: > hudson.remoting.Channel$OrderlyShutdown > at hudson.remoting.Request.abort(Request.java:269) > at hudson.remoting.Channel.terminate(Channel.java:711) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794) > at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024) > Caused by: hudson.remoting.Channel$OrderlyShutdown > ... 2 more > Caused by: Command close created at > at hudson.remoting.Command.<init>(Command.java:62) > at hudson.remoting.Command.<init>(Command.java:47) > at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) > at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) > at hudson.remoting.Channel.close(Channel.java:835) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793) > ... 1 more > Nov 4, 2011 8:40:57 AM hudson.slaves.SlaveComputer tryReconnect > INFO: Attempting to reconnect w64-09 > Nov 4, 2011 9:34:48 AM > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init> > WARNING: Previous Free Swap Space monitoring activity still in progress. > Interrupting > Nov 4, 2011 9:34:48 AM > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init> > WARNING: Previous Free Temp Space monitoring activity still in progress. > Interrupting > Nov 4, 2011 9:40:18 AM hudson.slaves.ChannelPinger$1 onDead > INFO: Ping failed. Terminating the channel. > Exception in thread "Monitoring w64-09 for Free Swap Space" > hudson.remoting.RequestAbortedException: > hudson.remoting.RequestAbortedException: > hudson.remoting.Channel$OrderlyShutdown > at hudson.remoting.Request.call(Request.java:149) > at hudson.remoting.Channel.call(Channel.java:660) > at > hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:83) > at > hudson.node_monitors.SwapSpaceMonitor$1.monitor(SwapSpaceMonitor.java:81) > at > hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:202) > Caused by: hudson.remoting.RequestAbortedException: > hudson.remoting.Channel$OrderlyShutdown > at hudson.remoting.Request.abort(Request.java:269) > at hudson.remoting.Channel.terminate(Channel.java:711) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:794) > at hudson.remoting.Channel$ReaderThread.run(Channel.java:1024) > Caused by: hudson.remoting.Channel$OrderlyShutdown > ... 2 more > Caused by: Command close created at > at hudson.remoting.Command.<init>(Command.java:62) > at hudson.remoting.Command.<init>(Command.java:47) > at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) > at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:790) > at hudson.remoting.Channel.close(Channel.java:835) > at hudson.remoting.Channel$CloseCommand.execute(Channel.java:793) > ... 1 more > Nov 4, 2011 9:40:57 AM hudson.slaves.SlaveComputer tryReconnect > INFO: Attempting to reconnect w64-09 > {noformat} > Please note, this issue can be mitigated by disabling the Free Swap Space > check for all slaves. However, this a less than optimal solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira