[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title Oleg Nenashev resolved as Fixed It was released in Jenkins 2.220 Jenkins / JENKINS-54496 Node monitoring should not log exceptions for disconnected nodes Change By: Oleg Nenashev Status: Open Resolved Resolution: Fixed Released As: Jenkins 2.220 Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title Jeff Thompson commented on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes Argghh. Yes. I use the "g" version on a number of other places and mistyped it once I started. I got the PR. I'll review it today hopefully. Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.195266.1541524267000.8939.1580397780139%40Atlassian.JIRA.
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title Jeff Thompson edited a comment on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes This makes good sense. I'm glad you were able to dig into it further and isolate the spurious messages you've been seeing. It would be great if you can see if that eliminates the issue for you. Even if you can't verify it it's probably a reasonable check. If you want to submit a PR for it, mention me so I can be sure to review ("jeffret- g b " on GitHub). Or I could see about submitting the PR sometime. Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.195266.1541524267000.8937.1580397720114%40Atlassian.JIRA.
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton commented on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes Well, it seems to be working. I'd normally expect to see over a dozen of those exceptions during the course of a morning and the log is quiet (well, it's still very active, but it's no longer getting lots of those exceptions ... which will make it easier for me to concentrate on tracking down the other exceptions I'm seeing now ) PR-4461 submitted (by me - I'm "pjdarton" on here and on github). Changes are as suggested above - just the addition of those 5 lines. Note that github doesn't seem to recognise "jeffret-g" so either it's not letting me see you, or there's a typo ... I did find a "jeffret-b" user with real name matching yours so I'm asssuming a b/g typo ... and if that's not the case, I've just pinged the wrong person... Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.195266.1541524267000.8915.1580394360475%40Atlassian.JIRA.
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton commented on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes I was able to build a jenkins.war this afternoon; it's going "live" on my main Jenkins server tonight and so, with the workload that gets each day, I should be able to tell if it's resolved within a day or two (assuming no other work gets in the way, that is). If it works, I'll submit a PR. Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.195266.1541524267000.8475.1580333520217%40Atlassian.JIRA.
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title Jeff Thompson commented on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes This makes good sense. I'm glad you were able to dig into it further and isolate the spurious messages you've been seeing. It would be great if you can see if that eliminates the issue for you. Even if you can't verify it it's probably a reasonable check. If you want to submit a PR for it, mention me so I can be sure to review ("jeffret-g" on GitHub). Or I could see about submitting the PR sometime. Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-issues/JIRA.195266.1541524267000.8466.1580332260127%40Atlassian.JIRA.
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton edited a comment on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes I've done a bit of digging...I believe the problem stems from the fact that {{monitorDetailed()}} gets the list of {{Computer}} s instances from Jenkins at the start, but this list can change while the monitoring is taking place such that, by the time {{Future.get}} returns with an answer (or an exception), the {{Computer}} being monitored may have been removed from Jenkinsand if it has been removed from Jenkins and the monitoring threw an exception of the form "can't talk to this slave node anymore" then that's pretty much "expected behavior" rather than something worthy of logging a warning & exception trace.i.e. before this method logs a big complaint to the logfile, it should check to see if the slave node being monitored still exists in Jenkins as, if Jenkins has forgotten about it, we shouldn't go logging about it.I think that the answer may be to modify AbstractAsyncNodeMonitorDescriptor's {{private void error(Computer c, Throwable x)}} method so that instead of [this|https://github.com/jenkinsci/jenkins/blob/jenkins-2.204.1/core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java#L126-L132]:{code:java}private void error(Computer c, Throwable x) {if (c instanceof SlaveComputer) {Functions.printStackTrace(x, ((SlaveComputer) c).getListener().error("Failed to monitor for " + getDisplayName()));} else {LOGGER.log(WARNING, "Failed to monitor " + c.getDisplayName() + " for " + getDisplayName(), x);}}{code}it does this{code:java}private void error(Computer c, Throwable x) {// JENKINS-54496: don't log if c was removed from Jenkins after we'd started monitoringfinal boolean cIsStillCurrent = Jenkins.get().getComputer(c.getName()) == c;if (!cIsStillCurrent) {return;}if (c instanceof SlaveComputer) {Functions.printStackTrace(x, ((SlaveComputer) c).getListener().error("Failed to monitor for " + getDisplayName()));} else {LOGGER.log(WARNING, "Failed to monitor " + c.getDisplayName() + " for " + getDisplayName(), x);}}{code}If I can persuade my dev environmemt to build me a custom jenkins.war with this code change, I'll be able to try that out and see if that resolves the issue ... but in the meantime, WDYT? Add Comment
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton edited a comment on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes I've done a bit of digging ...I believe the problem stems from the fact that {{monitorDetailed()}} gets the list of {{Computer}}s from Jenkins at the start, but this list can change while the monitoring is taking place such that, by the time {{Future.get}} returns with an answer (or an exception), the {{Computer}} being monitored may have been removed from Jenkins and if it has been removed from Jenkins and the monitoring threw an exception of the form "can't talk to this slave node anymore" then that's pretty much "expected behavior" rather than something worthy of logging a warning & exception trace.i.e. before this method logs a big complaint to the logfile, it should check to see if the slave node being monitored still exists in Jenkins as, if Jenkins has forgotten about it, we shouldn't go logging about it. I think that the answer may be to modify AbstractAsyncNodeMonitorDescriptor's {{private void error(Computer c, Throwable x)}} method so that instead of [this|https://github.com/jenkinsci/jenkins/blob/jenkins-2.204.1/core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java#L126-L132]:{code:java}private void error(Computer c, Throwable x) {if (c instanceof SlaveComputer) {Functions.printStackTrace(x, ((SlaveComputer) c).getListener().error("Failed to monitor for " + getDisplayName()));} else {LOGGER.log(WARNING, "Failed to monitor " + c.getDisplayName() + " for " + getDisplayName(), x);}}{code}it does this{code:java}private void error(Computer c, Throwable x) {// JENKINS-54496: don't log if c was removed from Jenkins after we'd started monitoringfinal boolean cIsStillCurrent = Jenkins.get().getComputer(c.getName()) == c;if (!cIsStillCurrent) {return;}if (c instanceof SlaveComputer) {Functions.printStackTrace(x, ((SlaveComputer) c).getListener().error("Failed to monitor for " + getDisplayName()));} else {LOGGER.log(WARNING, "Failed to monitor " + c.getDisplayName() + " for " + getDisplayName(), x);}}{code}If I can persuade my dev environmemt to build me a custom jenkins.war with this code change, I'll be able to try that out and see if that resolves the issue ... but in the meantime, WDYT? Add Comment
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton commented on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes I've done a bit of digging and I think that the answer may be to modify AbstractAsyncNodeMonitorDescriptor's private void error(Computer c, Throwable x) method so that instead of this: private void error(Computer c, Throwable x) { if (c instanceof SlaveComputer) { Functions.printStackTrace(x, ((SlaveComputer) c).getListener().error("Failed to monitor for " + getDisplayName())); } else { LOGGER.log(WARNING, "Failed to monitor " + c.getDisplayName() + " for " + getDisplayName(), x); } } it does this private void error(Computer c, Throwable x) { // JENKINS-54496: don't log if c was removed from Jenkins after we'd started monitoring final boolean cIsStillCurrent = Jenkins.get().getComputer(c.getName()) == c; if (!cIsStillCurrent) { return; } if (c instanceof SlaveComputer) { Functions.printStackTrace(x, ((SlaveComputer) c).getListener().error("Failed to monitor for " + getDisplayName())); } else { LOGGER.log(WARNING, "Failed to monitor " + c.getDisplayName() + " for " + getDisplayName(), x); } } If I can persuade my dev environmemt to build me a custom jenkins.war with this code change, I'll be able to try that out and see if that resolves the issue ... but in the meantime, WDYT? Add Comment This message was sent by Atlassian Jira (v7.13.6#713006-sha1:cc4451f)
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton updated an issue Jenkins / JENKINS-54496 Node monitoring should not log exceptions for disconnected nodes Change By: pjdarton In an environment in which Jenkins nodes are being created and removed a lot, e.g. when using the docker-plugin to create single-use containers as slave nodes, the node monitoring functionality can fill the logs full of spurious exceptions, e.g.{code:none}Oct 13, 2018 10:18:14 PM io.jenkins.docker.DockerTransientNode$1 printlnINFO: Disconnected computer for node 'docker-hst02-00030qwaqtu3g'.Oct 13, 2018 10:18:14 PM io.jenkins.docker.DockerTransientNode$1 printlnINFO: Removed Node for node 'docker-hst02-00030qwaqtu3g'.Oct 13, 2018 10:18:14 PM hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitorDetailedWARNING: Failed to monitor docker-hst02-00030qwaqtu3g for Clock Differencejava.util.concurrent.ExecutionException: java.io.IOException: Cannot locate RemoteClassLoader.ClassLoaderProxy(2) in the channel exported table at hudson.remoting.Channel$2.adapt(Channel.java:992) at hudson.remoting.Channel$2.adapt(Channel.java:986) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305)Caused by: java.io.IOException: Cannot locate RemoteClassLoader.ClassLoaderProxy(2) in the channel exported table at hudson.remoting.MultiClassLoaderSerializer$Input.readClassLoader(MultiClassLoaderSerializer.java:107) at hudson.remoting.MultiClassLoaderSerializer$Input.resolveClass(MultiClassLoaderSerializer.java:128) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at hudson.remoting.UserRequest.deserialize(UserRequest.java:291) at hudson.remoting.UserRequest$NormalResponse.retrieve(UserRequest.java:326) at hudson.remoting.Channel$2.adapt(Channel.java:990) ... 5 moreCaused by: java.util.concurrent.ExecutionException: Invalid object ID 2 iota=7 at hudson.remoting.ExportTable.diagnoseInvalidObjectId(ExportTable.java:478) at hudson.remoting.ExportTable.get(ExportTable.java:397) at hudson.remoting.Channel.getExportedObject(Channel.java:780) at hudson.remoting.MultiClassLoaderSerializer$Input.readClassLoader(MultiClassLoaderSerializer.java:105) ... 14 moreOct 13, 2018 10:18:14 PM hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitorDetailedWARNING: Failed to monitor docker-hst02-00030qwaqtu3g for Free Temp Spacejava.util.concurrent.ExecutionException: java.io.IOException: Cannot locate
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton edited a comment on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes The node existed when the method started its work, but the node is removed during execution (causing the exception). It's a race condition, but one that's very common with high-throughout dynamic slaves.I strongly suspect that, if the code checked again "does this node exist" before logging the exception, it would discover that the node does not exist anymore . (and, in my opinion, should either log at a lower level, or not log at all) Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton commented on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes The node existed when the method started its work, but the node is removed during execution (causing the exception). It's a race condition, but one that's very common with high-throughout dynamic slaves. I strongly suspect that, if the code checked again "does this node exist" before logging the exception, it would discover that the node does not exist anymore. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title Jeff Thompson commented on JENKINS-54496 Re: Node monitoring should not log exceptions for disconnected nodes I'm not really sure what the desired behavior here is. It seems like the ask is that Jenkins would be better able to distinguish between an ephemeral and a permanent agent and then react or log differently in the two situations. It seems like these log messages might be important for permanent agents but excessive for ephemeral ones. Maybe this type of monitoring should not be enabled for ephemeral agents. Alternatively, the ask might be to improve the detection of when an agent no longer exists. Looking at the stack trace and the code, this operation is only called if Jenkins thinks the node exists. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-54496) Node monitoring should not log exceptions for disconnected nodes
Title: Message Title pjdarton created an issue Jenkins / JENKINS-54496 Node monitoring should not log exceptions for disconnected nodes Issue Type: Improvement Assignee: Jeff Thompson Components: core, remoting Created: 2018-11-06 17:11 Priority: Minor Reporter: pjdarton In an environment in which Jenkins nodes are being created and removed a lot, e.g. when using the docker-plugin to create single-use containers as slave nodes, the node monitoring functionality can fill the logs full of spurious exceptions, e.g. Oct 13, 2018 10:18:14 PM io.jenkins.docker.DockerTransientNode$1 println INFO: Disconnected computer for node 'docker-hst02-00030qwaqtu3g'. Oct 13, 2018 10:18:14 PM io.jenkins.docker.DockerTransientNode$1 println INFO: Removed Node for node 'docker-hst02-00030qwaqtu3g'. Oct 13, 2018 10:18:14 PM hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitorDetailed WARNING: Failed to monitor docker-hst02-00030qwaqtu3g for Clock Difference java.util.concurrent.ExecutionException: java.io.IOException: Cannot locate RemoteClassLoader.ClassLoaderProxy(2) in the channel exported table at hudson.remoting.Channel$2.adapt(Channel.java:992) at hudson.remoting.Channel$2.adapt(Channel.java:986) at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitorDetailed(AbstractAsyncNodeMonitorDescriptor.java:114) at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76) at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:305) Caused by: java.io.IOException: Cannot locate RemoteClassLoader.ClassLoaderProxy(2) in the channel exported table at