On 9 May 2012 17:40, Wright, Clark <cwri...@litle.com> wrote: > Could you expand on the handling (or expectations) of stdout/stderr by the > slave?
A process is not terminated until the exit code has been returned and stdout & stderr have been closed. Any child / grandchild processes that remain and have their output directed through stdout / stderr will hold the process open as far as java is concerned. > > All the processes associated with the build have finished and exited, the > only process left running on that account is the jvm running the slave. > > Now, parts of our test harness do muck with stdout/stderr, but to the best of > our knowledge, those two are always reset correctly. well try a slightly different tack, modify your script so that you redirect stdout and stderr to /dev/null or a file. See if that resolves the issue, and then you at least know you are on the right track. > > Also, forcing a gc on the slave (that is where you think the GC problem is, > right?) doesn't change anything. No I am thinking that the slave might be busy doing GC if it is tight on memory, so busy that it takes it ages to complete the tidy-up work it needs. I would be looking at the GC stats on the slave VM to see how much time is spent doing GC, and if that is higher than 5-10% I would give the slave JVM more memory > > Thanks, > -clark. > -----Original Message----- > From: jenkinsci-users@googlegroups.com > [mailto:jenkinsci-users@googlegroups.com] On Behalf Of Stephen Connolly > Sent: Wednesday, May 09, 2012 11:55 AM > To: jenkinsci-users@googlegroups.com > Subject: Re: Delay between job finished and node finished on unix > > Another thing you could look into is forked child processes having captured > stdout / stderr. > > The process will not be seen as finished until all stdout/stderr has been > captured, so if your build leaves a non-daemon process hanging around, that > could be the RCA > > On 9 May 2012 16:53, Stephen Connolly <stephen.alan.conno...@gmail.com> wrote: >> On 9 May 2012 16:31, Wright, Clark <cwri...@litle.com> wrote: >>> Thank you. >>> >>> So how does remoting work with respect to end of job notification? >>> >>> My initial assumption was that it was simply waiting for the forked process >>> to finish, grab the resultant return code, and update the master. >>> >> >> well you could look at it like that, in actuality the better way to >> look at it is as more or a distributed jvm. The master sends a closure >> to the slave, the closure forks the child process and when the child >> process completes the closure should return the result to the master. >> >>> Also, any pointers/suggestions as to what information I need/want to get >>> out of the groovy script console? >>> >>> Will certainly look into the queue management code. However, the queue >>> itself is empty (we have more executors than needed at the moment). >>> Jenkins just believes that jobs that actually finished 5 hours ago are >>> still running. >> >> Smells like a GC issue but I could be wrong. >> >>> >>> - Clark. >>> >>>> So the questions I have are: >>>> >>>> 1. What is the polling cycle on the node monitoring the job >>>> and is it configurable? >>> >>> Not how the remoting works >>> >>>> >>>> 2. Is there a way to get more information out of the node than >>>> just pinging systeminfo on the main Jenkins? >>> >>> Yes via the groovy script console >>> >>>> >>>> 3. Where in the Jenkins code base is the node management code? >>>> >>> >>> Scattered all over, you will want to look into the remoting module, and >>> look at the Slave and Computer classes. >>> >>> But in reality you probably want to look at how the queue works and not >>> node management. >>> >>> You might want to investigate the GC cpu time on the slaves and the master. >>> >>>> >>>> >>>> >>>> >>>> This is the thread dump for one of them >>>> (http://jenkins/node1/systeminfo ) >>>> >>>> Thread Dump >>>> >>>> Channel reader thread: channel >>>> >>>> >>>> >>>> "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in >>>> native) >>>> >>>> at java.io.FileInputStream.readBytes(Native Method) >>>> >>>> at >>>> java.io.FileInputStream.read(FileInputStream.java:199) >>>> >>>> at >>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) >>>> >>>> at >>>> java.io.BufferedInputStream.read(BufferedInputStream.java:237) >>>> >>>> - locked java.io.BufferedInputStream@2486ae >>>> >>>> at >>>> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java: >>>> 2249) >>>> >>>> at >>>> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream. >>>> java:2542) >>>> >>>> at >>>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputS >>>> tr >>>> eam.java:2552) >>>> >>>> at >>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) >>>> >>>> at >>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) >>>> >>>> at >>>> hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) >>>> >>>> >>>> >>>> >>>> >>>> main >>>> >>>> >>>> >>>> "main" Id=1 Group=main WAITING on hudson.remoting.Channel@a17083 >>>> >>>> at java.lang.Object.wait(Native Method) >>>> >>>> - waiting on hudson.remoting.Channel@a17083 >>>> >>>> at java.lang.Object.wait(Object.java:485) >>>> >>>> at hudson.remoting.Channel.join(Channel.java:766) >>>> >>>> at hudson.remoting.Launcher.main(Launcher.java:420) >>>> >>>> at >>>> hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) >>>> >>>> at hudson.remoting.Launcher.run(Launcher.java:206) >>>> >>>> at hudson.remoting.Launcher.main(Launcher.java:168) >>>> >>>> >>>> >>>> >>>> >>>> Ping thread for channel hudson.remoting.Channel@a17083:channel >>>> >>>> >>>> >>>> "Ping thread for channel hudson.remoting.Channel@a17083:channel" >>>> Id=10 Group=main TIMED_WAITING >>>> >>>> at java.lang.Thread.sleep(Native Method) >>>> >>>> at >>>> hudson.remoting.PingThread.run(PingThread.java:86) >>>> >>>> >>>> >>>> >>>> >>>> pool-1-thread-666 >>>> >>>> >>>> >>>> "pool-1-thread-666" Id=719 Group=main RUNNABLE >>>> >>>> at sun.management.ThreadImpl.dumpThreads0(Native >>>> Method) >>>> >>>> at >>>> sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374) >>>> >>>> at >>>> hudson.Functions.getThreadInfos(Functions.java:872) >>>> >>>> at >>>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnosti >>>> cs >>>> .java:93) >>>> >>>> at >>>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnosti >>>> cs >>>> .java:89) >>>> >>>> at >>>> hudson.remoting.UserRequest.perform(UserRequest.java:118) >>>> >>>> at >>>> hudson.remoting.UserRequest.perform(UserRequest.java:48) >>>> >>>> at hudson.remoting.Request$2.run(Request.java:287) >>>> >>>> at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4 >>>> 41 >>>> ) >>>> >>>> at >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>> >>>> at >>>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>> >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExe >>>> cu >>>> tor.java:886) >>>> >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. >>>> java:908) >>>> >>>> at java.lang.Thread.run(Thread.java:619) >>>> >>>> >>>> >>>> Number of locked synchronizers = 1 >>>> >>>> - >>>> java.util.concurrent.locks.ReentrantLock$NonfairSync@1630de2 >>>> >>>> >>>> >>>> >>>> >>>> Finalizer >>>> >>>> >>>> >>>> "Finalizer" Id=3 Group=system WAITING on >>>> java.lang.ref.ReferenceQueue$Lock@64514 >>>> >>>> at java.lang.Object.wait(Native Method) >>>> >>>> - waiting on >>>> java.lang.ref.ReferenceQueue$Lock@64514 >>>> >>>> at >>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) >>>> >>>> at >>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) >>>> >>>> at >>>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) >>>> >>>> >>>> >>>> >>>> >>>> Reference Handler >>>> >>>> >>>> >>>> "Reference Handler" Id=2 Group=system WAITING on >>>> java.lang.ref.Reference$Lock@1a12930 >>>> >>>> at java.lang.Object.wait(Native Method) >>>> >>>> - waiting on java.lang.ref.Reference$Lock@1a12930 >>>> >>>> at java.lang.Object.wait(Object.java:485) >>>> >>>> at >>>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) >>>> >>>> >>>> >>>> >>>> >>>> Signal Dispatcher >>>> >>>> >>>> >>>> "Signal Dispatcher" Id=4 Group=system RUNNABLE >>>> >>>> >>>> >>>> Thank you, >>>> >>>> >>>> >>>> -Clark. >>>> >>>> The information in this message is for the intended recipient(s) >>>> only and may be the proprietary and/or confidential property of >>>> Litle & Co., LLC, and thus protected from disclosure. If you are not >>>> the intended recipient(s), or an employee or agent responsible for >>>> delivering this message to the intended recipient, you are hereby >>>> notified that any use, dissemination, distribution or copying of >>>> this communication is prohibited. If you have received this >>>> communication in error, please notify Litle & Co. immediately by >>>> replying to this message and then promptly deleting it and your reply >>>> permanently from your computer. >>> >>> The information in this message is for the intended recipient(s) only and >>> may be the proprietary and/or confidential property of Litle & Co., LLC, >>> and thus protected from disclosure. If you are not the intended >>> recipient(s), or an employee or agent responsible for delivering this >>> message to the intended recipient, you are hereby notified that any use, >>> dissemination, distribution or copying of this communication is prohibited. >>> If you have received this communication in error, please notify Litle & Co. >>> immediately by replying to this message and then promptly deleting it and >>> your reply permanently from your computer.