> On 16 apr 2015, at 21:01, Thomas Stüfe <thomas.stu...@gmail.com> wrote: > > Hi Roger, > > thank you for your answer! > > The reason I take an interest is not just theoretical. We (SAP) use our JVM > for our test infrastructure and we had exactly the problem allChildren() is > designed to solve: killing a process tree related to a specific tests > (similar to jtreg tests) in case of errors or hangs. We have test machines > running large workloads of tests in parallel and we reach pid wraparound - > depending on the OS - quite fast. > > We solved this by adding process groups to Process.java and we are very > happy with this solution. We are able to quickly kill a whole process tree, > cleanly and completely, without ambiguity or risk to other tests. Of course > we had to add this support as a "sideways hack" in order to not change the > official Process.java interface. Therefore I was hoping that with JEP 102, > we would get official support for process groups. Unfortunately, seems the > decision is already done and we are too late in the discussion :(
Interestingly we are hoping to use allChildren() to kill process trees in jtreg - exactly the use case you are describing. I haven’t been testing the current approach in allChildren(), but it seems your experience indicates that it will not be a perfect fit for the use case. In a previous test framework I was involved in we also used process groups for this with good results. This does beg the question: if the current approach isn’t useful for our own testing purposes, when is it useful? Thanks, /Staffan > > see my other comments inline. > > On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <roger.ri...@oracle.com > <mailto:roger.ri...@oracle.com>> wrote: > >> Hi Thomas, >> >> Thanks for the comments. >> >> On 4/11/2015 8:31 AM, Thomas Stüfe wrote: >> >> Hi Roger, >> >> I have a question about getChildren() and getAllChildren(). >> >> I assume the point of those functions is to implement point 4 of JEP 102 >> ("The ability to deal with process trees, in particular some means to >> destroy a process tree."), by returning a collection of PIDs which are the >> children of the process and then killing them? >> >> Earlier versions included a killProcess tree method but it was recommended >> to leave >> the exact algorithm to kill processes to the caller. >> >> >> However, I am not sure that this can be implemented in a safe way, at >> least on UNIX, because - as Martin already pointed out - of PID recycling. >> I do not see how you can prevent allChildren() from returning PIDs which >> may be already reaped and recyled when you use them later. How do you >> prevent that? >> >> Unless there is an extended time between getting the children and >> destroying them the pids will still be valid. >> > > Why? Child process may be getting reaped the instant you are done reading > it from /proc, and pid may have been recycled by the OS right away and > already pointing to another process when allChildren() returns. If a > process lives about as long as it takes the system to reach a pid > wraparound to the same pid value, its pid could be recycled right after it > is reaped, or? Sure, the longer you wait, the higher the chance of this to > happen, but it may happen right away. > > As Martin said, we had those races in the kill() code since a long time, > but children()/allChildren() could make those error more probable, because > now more processes are involved. Especially if you use allChildren to kill > a deep process tree. And there is nothing in the javadoc warning the user > about this scenario. You would just happen from time to time to kill an > unrelated process. Those problems are hard to debug. > > The technique of caching the start time can prevent that case; though it >> has AFAIK not been a problem. >> > > How would that work? User should, before issuing the kill, compare start > time of process to kill with cached start time? > >> Note even if your coding is bulletproof, that allChildren() will also >> return PIDs of sub processes which are completely unrelated to you and >> Process.java - they could have been forked by some third party native code >> which just happens to run in parallel in the same process. There, you have >> no control about when it gets reaped. It might already have been reaped by >> the time allChildren() returns, and now the same PID got recycled as >> another, unrelated process. >> >> Of course, the best case is for an application to spawn and manage its own >> processes >> and handle there proper termination. >> The use cases for children/allChildren are focused on >> supervisory/executive functions >> that monitor a running system and can cleanup even in the case of >> unexpected failures. >> > All management of processes is subject to OS limitations, if the PID were >> from a completely >> different process tree, the ordinary destroy/info functions would not be >> available >> unless the process was running as a privileged os user (same as any other >> native application). >> > > Could you explain this please? If both trees run under the same user, why > should I not be able to kill a process from a different tree? > >> If I am right, it would not be sufficient to state "There is no guarantee >> that a process is alive." - it may be alive but by now be a different >> process altogether. This makes "allChildren()" useless for many cases, >> because the returned information may already be obsolete the moment the >> function returns. >> >> The caching of startTime can remove the ambiguity. >> > >> >> Of course I may something missing here? >> >> But if I got all that right and the sole purpose of allChildren() is to >> be able to kill them (or otherwise signal them), why not use process >> groups? Process groups would be the traditional way on POSIX platforms to >> handle process trees, and they are also available on Windows in the form of >> Job Objects. >> >> Using process groups to signal sub process trees would be safe, would >> not rely on PID identity, and would be more efficient. Also way less >> coding. Also, it would be an old, established pattern - process groups have >> been around for a long time. Also, using process groups it is possible to >> break away from a group, so a program below you which wants to run as a >> demon can do so by removing itself from the process group and thus escaping >> your kill. >> >> On Windows we have Job objects, and I think there are enough >> similarities to POSIX process groups to abstract them into something >> platform independent. >> >> Earlier discussions of process termination and exit value reaping >> considered >> using process groups but it became evident that the Java runtime needed to >> be very careful to not interfere with processes that might be spawned and >> controlled by native libraries and that process groups would only increase >> complexity and the interactions. >> > >> Thanks, Roger >> >> > Thanks! Thomas