On Tue, 11 Dec 2018 at 17:50, Pablo Galindo Salgado <pablog...@gmail.com> wrote: > I agree that misusage of the pool should not be encouraged but in this > situation the fact that > this code hangs: > > import multiprocessing > > for x in multiprocessing.Pool().imap(int, ["4", "3"]): > print(x) > > > is a bit worriying because although is incorrect and an abuse of the API, > users can do this easily with > no error message other than a misterious hang.
OK, so the first problem here (to me, at least) is that it's not obvious *why* this code is incorrect and an abuse of the API. It takes a reasonable amount of thinking about the problem to notice that the Pool object isn't retained, but the iterator returned from imap is. And when the pool is collected, the worker processes are terminated, causing the hang, as the worker never sends a result back to the main process. But it's not obvious to me why the pool is collected before the imap method has completed. As I understand it, originally the code worked because the pool *didn't* call terminate() when collected. Now it does, and we have a problem. I'm not *entirely* sure why, if the pool is terminated, the wait in the iterator doesn't terminate immediately with some sort of "process being waited on died" error, but let's assume there are good reasons for that (as I mentioned before, I'm not an expert in multiprocessing, so I'm OK with assuming that the original design, done by people who *are* experts, is sound :-)) Your original solution would have added a strong reference back to the pool from the iterator. At first glance, that seems like a reasonable solution to me. Victor is worried about the "risk of new reference cycles". But reference cycles are not a problem - we have the cyclic GC precisely to deal with them. So I'd like to see a better justification for rejecting that solution than "there might be reference cycles". But in response to that, you made the iterator have a weak reference back to the pool. That's flawed because it doesn't prevent the pool being terminated - as you say, the deadlock is still present. > I have found this on several places and people were > very confused because usually the interpreter throws some kind of error > indication. In my humble opinion, > we should try to avoid hanging as a consequence of the misusage, whatever we > do. I agree with this. But that implies to me that we should be holding a strong reference to the pool, As a (somewhat weak) analogy, consider for n in map(int, ["1", "2"]): print(n) That won't fail if the list gets collected, because map keeps a reference to the list. My intuition would be that the Pool().imap example would hold a reference to the pool on essentially the same basis. The more I think about this, the more I struggle to see Victor's logic for rejecting your original solution. And I *certainly* don't see why this issue should justify changing the whole API to require users to explicitly manage pool lifetimes. Paul _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com