On Mon, 25 Sep 2017 17:42:02 -0700 Nathaniel Smith <n...@pobox.com> wrote: > On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou <solip...@pitrou.net> wrote: > >> As to "running_interpreters()" and "idle_interpreters()", I'm not sure > >> what the benefit would be. You can compose either list manually with > >> a simple comprehension: > >> > >> [interp for interp in interpreters.list_all() if interp.is_running()] > >> [interp for interp in interpreters.list_all() if not > >> interp.is_running()] > > > > There is a inherit race condition in doing that, at least if > > interpreters are running in multiple threads (which I assume is going > > to be the overly dominant usage model). That is why I'm proposing all > > three variants. > > There's a race condition no matter what the API looks like -- having a > dedicated running_interpreters() lets you guarantee that the returned > list describes the set of interpreters that were running at some > moment in time, but you don't know when that moment was and by the > time you get the list, it's already out-of-date.
Hmm, you're right of course. > >> Likewise, > >> queue.Queue.send() supports blocking, in addition to providing a > >> put_nowait() method. > > > > queue.Queue.put() never blocks in the usual case (*), which is of an > > unbounded queue. Only bounded queues (created with an explicit > > non-zero max_size parameter) can block in Queue.put(). > > > > (*) and therefore also never deadlocks :-) > > Unbounded queues also introduce unbounded latency and memory usage in > realistic situations. This doesn't seem to pose much a problem in common use cases, though. How many Python programs have you seen switch from an unbounded to a bounded Queue to solve this problem? Conversely, choosing a buffer size is tricky. How do you know up front which amount you need? Is a fixed buffer size even ok or do you want it to fluctuate based on the current conditions? And regardless, my point was that a buffer is desirable. That send() may block when the buffer is full doesn't change that it won't block in the common case. > There's a reason why sockets > always have bounded buffers -- it's sometimes painful, but the pain is > intrinsic to building distributed systems, and unbounded buffers just > paper over it. Papering over a problem is sometimes the right answer actually :-) For example, most Python programs assume memory is unbounded... If I'm using a queue or channel to push events to a logging system, should I really block at every send() call? Most probably I'd rather run ahead instead. > > Also, suddenly an interpreter's ability to exploit CPU time is > > dependent on another interpreter's ability to consume data in a timely > > manner (what if the other interpreter is e.g. stuck on some disk I/O?). > > IMHO it would be better not to have such coupling. > > A small buffer probably is useful in some cases, yeah -- basically > enough to smooth out scheduler jitter. That's not about scheduler jitter, but catering for activities which occur at inherently different speed or rhythms. Requiring things run in lockstep removes a lot of flexibility and makes it harder to exploit CPU resources fully. > > I expect more often than expected, in complex systems :-) For example, > > you could have a recv() loop that also from time to time send()s some > > data on another queue, depending on what is received. But if that > > send()'s recipient also has the same structure (a recv() loop which > > send()s from time to time), then it's easy to imagine to two getting in > > a deadlock. > > You kind of want to be able to create deadlocks, since the alternative > is processes that can't coordinate and end up stuck in livelocks or > with unbounded memory use etc. I am not advocating we make it *impossible* to create deadlocks; just saying we should not make them more *likely* than they need to. > >> I'm not sure I understand your concern here. Perhaps I used the word > >> "sharing" too ambiguously? By "sharing" I mean that the two actors > >> have read access to something that at least one of them can modify. > >> If they both only have read-only access then it's effectively the same > >> as if they are not sharing. > > > > Right. What I mean is that you *can* share very simple "data" under > > the form of synchronization primitives. You may want to synchronize > > your interpreters even they don't share user-visible memory areas. The > > point of synchronization is not only to avoid memory corruption but > > also to regulate and orchestrate processing amongst multiple workers > > (for example processes or interpreters). For example, a semaphore is > > an easy way to implement "I want no more than N workers to do this > > thing at the same time" ("this thing" can be something such as disk > > I/O). > > It's fairly reasonable to implement a mutex using a CSP-style > unbuffered channel (send = acquire, receive = release). And the same > trick turns a channel with a fixed-size buffer into a bounded > semaphore. It won't be as efficient as a modern specialized mutex > implementation, of course, but it's workable. We are drifting away from the point I was trying to make here. I was pointing out that the claim that nothing can be shared is a lie. If it's possible to share a small datum (a synchronized counter aka semaphore) between processes, certainly there's no technical reason that should prevent it between interpreters. By the way, I do think efficiency is a concern here. Otherwise subinterpreters don't even have a point (just use multiprocessing). > Unfortunately while technically you can construct a buffered channel > out of an unbuffered channel, the construction's pretty unreasonable > (it needs two dedicated threads per channel). And the reverse is quite cumbersome as well. So we should favour the construct that's more convenient for users, or provide both. Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com