Nick Guenther <n...@kousu.ca> added the comment:
Thank you for taking the time to consider my points! Yes, I think you understood exactly what I was getting at. I slept on it and thought about what I'd posted the day after and realized most of the points you raise, especially that serialized next() would mean serialized processing. So the naive approach is out. I am motivated by trying to introduce backpressure to my pipelines. The example you gave has potentially infinite memory usage; if I simply slow it down with sleep() I get a memory leak and the main python proc pinning my CPU, even though it "isn't" doing anything: with multiprocessing.Pool(4) as pool: for i, v in enumerate(pool.imap(worker, itertools.count(1)), 1): time.sleep(7) print(f"At {i}: {v}, memory usage is {sys.getallocatedblocks()}") At 1->1, memory usage is 230617 At 2->8, memory usage is 411053 At 3->27, memory usage is 581439 At 4->64, memory usage is 748584 At 5->125, memory usage is 909694 At 6->216, memory usage is 1074304 At 7->343, memory usage is 1238106 At 8->512, memory usage is 1389162 At 9->729, memory usage is 1537830 At 10->1000, memory usage is 1648502 At 11->1331, memory usage is 1759864 At 12->1728, memory usage is 1909807 At 13->2197, memory usage is 2005700 At 14->2744, memory usage is 2067407 At 15->3375, memory usage is 2156479 At 16->4096, memory usage is 2240936 At 17->4913, memory usage is 2328123 At 18->5832, memory usage is 2456865 At 19->6859, memory usage is 2614602 At 20->8000, memory usage is 2803736 At 21->9261, memory usage is 2999129 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11314 kousu 20 0 303308 40996 6340 S 91.4 2.1 0:21.68 python3.8 11317 kousu 20 0 54208 10264 4352 S 16.2 0.5 0:03.76 python3.8 11315 kousu 20 0 54208 10260 4352 S 15.8 0.5 0:03.74 python3.8 11316 kousu 20 0 54208 10260 4352 S 15.8 0.5 0:03.74 python3.8 11318 kousu 20 0 54208 10264 4352 S 15.5 0.5 0:03.72 python3.8 It seems to me like any usage of Pool.imap() either has the same issue lurking or is run on a small finite data set where you are better off using Pool.map(). I like generators because they keep constant-time constant-memory work, which seems like a natural fit for stream processing situations. Unix pipelines have backpressure built-in, because write() blocks when the pipe buffer is full. I did come up with one possibility after sleeping on it again: run the final iteration in parallel, perhaps by a special plist() method that makes as many parallel next() calls as possible. There's definitely details to work out but I plan to prototype when I have spare time in the next couple weeks. You're entirely right that it's a risky change to suggest, so maybe it would be best expressed as a package if I get it working. Can I keep this issue open to see if it draws in insights from anyone else in the meantime? Thanks again for taking a look! That's really cool of you! ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue40110> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com