On 22 Apr, 17:43, Michal Chruszcz <[email protected]> wrote: > > I am adding support for parallel processing to an existing program > which fetches some data and then performs some computation with > results saved to a database. Everything went just fine until I wanted > to gather all of the results from the subprocesses.
[Queue example] I have to say that I'm not familiar with the multiprocessing API, but for this kind of thing, there needs to be some reasonably complicated stuff happening in the background to test for readable conditions on the underlying pipes or sockets. In the pprocess module [1], I had to implement a poll-based framework (probably quite similar to Twisted and asyncore) to avoid deadlocks and other undesirable conditions. [Pipe example] Again, it's really awkward to monitor pipes between processes and to have them "go away" when closed. Indeed, I found that you don't really want them to disappear before everyone has finished reading from them, but Linux (at least) tends to break pipes quite readily. I got round this problem by having acknowledgements in pprocess, but it felt like a hack. > Most possibly I'm missing something in philosophy of multiprocessing, > but I couldn't find anything covering such a situation. I'd appreciate > any kind of hint on this topic, as it became a riddle I just have to > solve. :-) The multiprocessing module appears to offer map-based conveniences (Pool objects) where you indicate that you want the same callable executed multiple times and the results to be returned, so perhaps this is really what you want. In pprocess, there's a certain amount of flexibility exposed in the API, so that you can choose to use a map- like function, or you can open everything up and use the communications primitives directly (which would appear to be similar to the queue-oriented programming mentioned in the multiprocessing documentation). One thing that pprocess exposes (and which must be there in some form in the multiprocessing module) is the notion of an "exchange" which is something that monitors a number of communications channels between processes and is able to detect and act upon readable channels in an efficient way. If it's not the Pool class in multiprocessing that supports such things, I'd take a look for the component which does support them, if I were you, because this seems to be the functionality you need. Paul [1] http://www.boddie.org.uk/python/pprocess.html -- http://mail.python.org/mailman/listinfo/python-list
