On 9 May 2012 17:40, Wright, Clark <cwri...@litle.com> wrote:
> Could you expand on the handling (or expectations) of stdout/stderr by the 
> slave?

A process is not terminated until the exit code has been returned and
stdout & stderr have been closed. Any child / grandchild processes
that remain and have their output directed through stdout / stderr
will hold the process open as far as java is concerned.

>
> All the processes associated with the build have finished and exited, the 
> only process left running on that account is the jvm running the  slave.
>
> Now, parts of our test harness do muck with stdout/stderr, but to the best of 
> our knowledge, those two are always reset correctly.

well try a slightly different tack, modify your script so that you
redirect stdout and stderr to /dev/null or a file. See if that
resolves the issue, and then you at least know you are on the right
track.

>
> Also, forcing a gc on the slave (that is where you think the GC problem is, 
> right?) doesn't change anything.

No I am thinking that the slave might be busy doing GC if it is tight
on memory, so busy that it takes it ages to complete the tidy-up work
it needs. I would be looking at the GC stats on the slave VM to see
how much time is spent doing GC, and if that is higher than 5-10% I
would give the slave JVM more memory

>
> Thanks,
> -clark.
> -----Original Message-----
> From: jenkinsci-users@googlegroups.com 
> [mailto:jenkinsci-users@googlegroups.com] On Behalf Of Stephen Connolly
> Sent: Wednesday, May 09, 2012 11:55 AM
> To: jenkinsci-users@googlegroups.com
> Subject: Re: Delay between job finished and node finished on unix
>
> Another thing you could look into is forked child processes having captured 
> stdout / stderr.
>
> The process will not be seen as finished until all stdout/stderr has been 
> captured, so if your build leaves a non-daemon process hanging around, that 
> could be the RCA
>
> On 9 May 2012 16:53, Stephen Connolly <stephen.alan.conno...@gmail.com> wrote:
>> On 9 May 2012 16:31, Wright, Clark <cwri...@litle.com> wrote:
>>> Thank you.
>>>
>>> So how does remoting work with respect to end of job notification?
>>>
>>> My initial assumption was that it was simply waiting for the forked process 
>>> to finish, grab the resultant return code, and update the master.
>>>
>>
>> well you could look at it like that, in actuality the better way to
>> look at it is as more or a distributed jvm. The master sends a closure
>> to the slave, the closure forks the child process and when the child
>> process completes the closure should return the result to the master.
>>
>>> Also, any pointers/suggestions as to what information I need/want to get 
>>> out of the groovy script console?
>>>
>>> Will certainly look into the queue management code.  However, the queue 
>>> itself is empty (we have more executors than needed at the moment).  
>>> Jenkins just believes that jobs that actually finished 5 hours ago are 
>>> still running.
>>
>> Smells like a GC issue but I could be wrong.
>>
>>>
>>> - Clark.
>>>
>>>> So the questions I have are:
>>>>
>>>> 1.       What is the polling cycle on the node monitoring the job
>>>> and is it configurable?
>>>
>>> Not how the remoting works
>>>
>>>>
>>>> 2.       Is there a way to get more information out of the node than
>>>> just pinging systeminfo on the main Jenkins?
>>>
>>> Yes via the groovy script console
>>>
>>>>
>>>> 3.       Where in the Jenkins code base is the node management code?
>>>>
>>>
>>> Scattered all over, you will want to look into the remoting module, and 
>>> look at the Slave and Computer classes.
>>>
>>> But in reality you probably want to look at how the queue works and not 
>>> node management.
>>>
>>> You might want to investigate the GC cpu time on the slaves and the master.
>>>
>>>>
>>>>
>>>>
>>>>
>>>> This is the thread dump for one of them
>>>> (http://jenkins/node1/systeminfo )
>>>>
>>>> Thread Dump
>>>>
>>>> Channel reader thread: channel
>>>>
>>>>
>>>>
>>>> "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in
>>>> native)
>>>>
>>>>                 at java.io.FileInputStream.readBytes(Native Method)
>>>>
>>>>                 at
>>>> java.io.FileInputStream.read(FileInputStream.java:199)
>>>>
>>>>                 at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>>>
>>>>                 at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>>>>
>>>>                 -  locked java.io.BufferedInputStream@2486ae
>>>>
>>>>                 at
>>>> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:
>>>> 2249)
>>>>
>>>>                 at
>>>> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.
>>>> java:2542)
>>>>
>>>>                 at
>>>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputS
>>>> tr
>>>> eam.java:2552)
>>>>
>>>>                 at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
>>>>
>>>>                 at
>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>>>>
>>>>                 at
>>>> hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> main
>>>>
>>>>
>>>>
>>>> "main" Id=1 Group=main WAITING on hudson.remoting.Channel@a17083
>>>>
>>>>                 at java.lang.Object.wait(Native Method)
>>>>
>>>>                 -  waiting on hudson.remoting.Channel@a17083
>>>>
>>>>                 at java.lang.Object.wait(Object.java:485)
>>>>
>>>>                 at hudson.remoting.Channel.join(Channel.java:766)
>>>>
>>>>                 at hudson.remoting.Launcher.main(Launcher.java:420)
>>>>
>>>>                 at
>>>> hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
>>>>
>>>>                 at hudson.remoting.Launcher.run(Launcher.java:206)
>>>>
>>>>                 at hudson.remoting.Launcher.main(Launcher.java:168)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Ping thread for channel hudson.remoting.Channel@a17083:channel
>>>>
>>>>
>>>>
>>>> "Ping thread for channel hudson.remoting.Channel@a17083:channel"
>>>> Id=10 Group=main TIMED_WAITING
>>>>
>>>>                 at java.lang.Thread.sleep(Native Method)
>>>>
>>>>                 at
>>>> hudson.remoting.PingThread.run(PingThread.java:86)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> pool-1-thread-666
>>>>
>>>>
>>>>
>>>> "pool-1-thread-666" Id=719 Group=main RUNNABLE
>>>>
>>>>                 at sun.management.ThreadImpl.dumpThreads0(Native
>>>> Method)
>>>>
>>>>                 at
>>>> sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
>>>>
>>>>                 at
>>>> hudson.Functions.getThreadInfos(Functions.java:872)
>>>>
>>>>                 at
>>>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnosti
>>>> cs
>>>> .java:93)
>>>>
>>>>                 at
>>>> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnosti
>>>> cs
>>>> .java:89)
>>>>
>>>>                 at
>>>> hudson.remoting.UserRequest.perform(UserRequest.java:118)
>>>>
>>>>                 at
>>>> hudson.remoting.UserRequest.perform(UserRequest.java:48)
>>>>
>>>>                 at hudson.remoting.Request$2.run(Request.java:287)
>>>>
>>>>                 at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:4
>>>> 41
>>>> )
>>>>
>>>>                 at
>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>>
>>>>                 at
>>>> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>>
>>>>                 at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExe
>>>> cu
>>>> tor.java:886)
>>>>
>>>>                 at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
>>>> java:908)
>>>>
>>>>                 at java.lang.Thread.run(Thread.java:619)
>>>>
>>>>
>>>>
>>>>                 Number of locked synchronizers = 1
>>>>
>>>>                 -
>>>> java.util.concurrent.locks.ReentrantLock$NonfairSync@1630de2
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Finalizer
>>>>
>>>>
>>>>
>>>> "Finalizer" Id=3 Group=system WAITING on
>>>> java.lang.ref.ReferenceQueue$Lock@64514
>>>>
>>>>                 at java.lang.Object.wait(Native Method)
>>>>
>>>>                 -  waiting on
>>>> java.lang.ref.ReferenceQueue$Lock@64514
>>>>
>>>>                 at
>>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>>>>
>>>>                 at
>>>> java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>>>>
>>>>                 at
>>>> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Reference Handler
>>>>
>>>>
>>>>
>>>> "Reference Handler" Id=2 Group=system WAITING on
>>>> java.lang.ref.Reference$Lock@1a12930
>>>>
>>>>                 at java.lang.Object.wait(Native Method)
>>>>
>>>>                 -  waiting on java.lang.ref.Reference$Lock@1a12930
>>>>
>>>>                 at java.lang.Object.wait(Object.java:485)
>>>>
>>>>                 at
>>>> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Signal Dispatcher
>>>>
>>>>
>>>>
>>>> "Signal Dispatcher" Id=4 Group=system RUNNABLE
>>>>
>>>>
>>>>
>>>> Thank you,
>>>>
>>>>
>>>>
>>>> -Clark.
>>>>
>>>> The information in this message is for the intended recipient(s)
>>>> only and may be the proprietary and/or confidential property of
>>>> Litle & Co., LLC, and thus protected from disclosure. If you are not
>>>> the intended recipient(s), or an employee or agent responsible for
>>>> delivering this message to the intended recipient, you are hereby
>>>> notified that any use, dissemination, distribution or copying of
>>>> this communication is prohibited. If you have received this
>>>> communication in error, please notify Litle & Co. immediately by
>>>> replying to this message and then promptly deleting it and your reply 
>>>> permanently from your computer.
>>>
>>> The information in this message is for the intended recipient(s) only and 
>>> may be the proprietary and/or confidential property of Litle & Co., LLC, 
>>> and thus protected from disclosure. If you are not the intended 
>>> recipient(s), or an employee or agent responsible for delivering this 
>>> message to the intended recipient, you are hereby notified that any use, 
>>> dissemination, distribution or copying of this communication is prohibited. 
>>> If you have received this communication in error, please notify Litle & Co. 
>>> immediately by replying to this message and then promptly deleting it and 
>>> your reply permanently from your computer.

Reply via email to