Re: page faults when spawning subprocesses
Dave Kirby wrote: > I am working on a network management program written in python that has > multiple threads (typically 20+) spawning subprocesses which are used > to communicate with other systems on the network. ... Let me check if I got you right: You are using fork() inside a thread in a multi-threaded environment. That sounds complicated. :-) Have a look at http://www.opengroup.org/onlinepubs/009695399/functions/fork.html It mentions your use of fork, i.e. to create a new process running a different program (in this case the call to fork() is soon followed by a call to exec()). If you fork in your multi-threaded environment, what happens with all your threads? The document resorts to "the effects of calling functions that require certain resources between the call to fork() and the call to an exec function are undefined.". Maybe you are just experiencing this :-) The document above recommends: "to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called." Maybe this discussion is also of some help: http://groups.google.com/group/comp.programming.threads/browse_thread/thread/37fe7e050b44c329/217660515af867ea?tvc=2#217660515af867ea Cheers Daniel -- http://mail.python.org/mailman/listinfo/python-list
Re: page faults when spawning subprocesses
Dave Kirby wrote: > > 5) WTF can I do about it? Maybe using vfork rather than fork would help. But I'm not sure that will work as intended when there are multiple threads, in fact I'm not sure fork will work either. You could have fork racing against another thread being in a critical region thus duplicating the memory map at some point where some data structures are in an inconsistent state and apparently locked by some thread existing in the parent. A possible solution would be to use fork to create two processes before creating any threads. Have the communicate over pipes or sockets when new processes are to be created. Then one process can create all the threads you need, and the other can fork off children. Even in that case vfork may come in handy. If you dislike the semantics of vfork, but still want the parent to block until the child has called execve, then you can do so manually using a pipe. Create the pipe before calling fork, in parent process you close write end and try to read from the pipe, in child process you close read end and mark write end close on exec. When exec succeeds, the pipe is closed and parent gets EOF. (I have tried some of this in C, but I must admit, I don't know if it can be done in Python as well.) -- Kasper Dupont Note to self: Don't try to allocate 256000 pages with GFP_KERNEL on x86. -- http://mail.python.org/mailman/listinfo/python-list
page faults when spawning subprocesses
I am working on a network management program written in python that has multiple threads (typically 20+) spawning subprocesses which are used to communicate with other systems on the network. This runs fine for a while, but eventually slows down to a crawl. Running sar shows that when it is running slowly there is an exceptionally large number of minor page faults - there are continuously 14000 faults/sec, with a variation of about +/-100. There are no pages swapped to disk, these are purely in-memory faults. I have a hypothesis about what is happening, but have not been able to prove or disprove it: the theory is that when a subprocess is spawned, there is a small window between the call to fork and the call to exec where the parent's memory is shared between the two processes. Linux marks the memory as copy-on-write, so if the parent process then accesses memory during that window a minor page fault is generated and the page is copied. Normally this is not a problem, but with a large number of threads all spawning subprocesses there is a chance of a another process being spawned during that window and the whole of memory is copied. This slows everything else down so the probability of another collision increases, and the whole thing snowballs. This could also happen if something else tries to write to large areas of memory (maybe the python garbage collector?). This is running on a Sun V40 64 bit SMP with Fedora Core 3. The same code has been run on intel systems and the problem has not been seen - this could be because the problem is specific to that hardware or because the intel systems are not fast enough for a collision to occur. My questions are: 1) is the theory plausible/likely? 2) what could I do to prove/disprove it? 3) has anyone else seen this problem? 4) are there any other situations that could be causing a continuous stream of minor page faults? 5) WTF can I do about it? Dave Kirby (dave.x.kirby at gmail dot com) -- http://mail.python.org/mailman/listinfo/python-list