On Mon, 25 Sep 2017 17:42:02 -0700
Nathaniel Smith <[email protected]> wrote:
> On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou <[email protected]> wrote:
> >> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
> >> what the benefit would be. You can compose either list manually with
> >> a simple comprehension:
> >>
> >> [interp for interp in interpreters.list_all() if interp.is_running()]
> >> [interp for interp in interpreters.list_all() if not
> >> interp.is_running()]
> >
> > There is a inherit race condition in doing that, at least if
> > interpreters are running in multiple threads (which I assume is going
> > to be the overly dominant usage model). That is why I'm proposing all
> > three variants.
>
> There's a race condition no matter what the API looks like -- having a
> dedicated running_interpreters() lets you guarantee that the returned
> list describes the set of interpreters that were running at some
> moment in time, but you don't know when that moment was and by the
> time you get the list, it's already out-of-date.
Hmm, you're right of course.
> >> Likewise,
> >> queue.Queue.send() supports blocking, in addition to providing a
> >> put_nowait() method.
> >
> > queue.Queue.put() never blocks in the usual case (*), which is of an
> > unbounded queue. Only bounded queues (created with an explicit
> > non-zero max_size parameter) can block in Queue.put().
> >
> > (*) and therefore also never deadlocks :-)
>
> Unbounded queues also introduce unbounded latency and memory usage in
> realistic situations.
This doesn't seem to pose much a problem in common use cases, though.
How many Python programs have you seen switch from an unbounded to a
bounded Queue to solve this problem?
Conversely, choosing a buffer size is tricky. How do you know up front
which amount you need? Is a fixed buffer size even ok or do you want
it to fluctuate based on the current conditions?
And regardless, my point was that a buffer is desirable. That send()
may block when the buffer is full doesn't change that it won't block in
the common case.
> There's a reason why sockets
> always have bounded buffers -- it's sometimes painful, but the pain is
> intrinsic to building distributed systems, and unbounded buffers just
> paper over it.
Papering over a problem is sometimes the right answer actually :-) For
example, most Python programs assume memory is unbounded...
If I'm using a queue or channel to push events to a logging system,
should I really block at every send() call? Most probably I'd rather
run ahead instead.
> > Also, suddenly an interpreter's ability to exploit CPU time is
> > dependent on another interpreter's ability to consume data in a timely
> > manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> > IMHO it would be better not to have such coupling.
>
> A small buffer probably is useful in some cases, yeah -- basically
> enough to smooth out scheduler jitter.
That's not about scheduler jitter, but catering for activities which
occur at inherently different speed or rhythms. Requiring things run
in lockstep removes a lot of flexibility and makes it harder to exploit
CPU resources fully.
> > I expect more often than expected, in complex systems :-) For example,
> > you could have a recv() loop that also from time to time send()s some
> > data on another queue, depending on what is received. But if that
> > send()'s recipient also has the same structure (a recv() loop which
> > send()s from time to time), then it's easy to imagine to two getting in
> > a deadlock.
>
> You kind of want to be able to create deadlocks, since the alternative
> is processes that can't coordinate and end up stuck in livelocks or
> with unbounded memory use etc.
I am not advocating we make it *impossible* to create deadlocks; just
saying we should not make them more *likely* than they need to.
> >> I'm not sure I understand your concern here. Perhaps I used the word
> >> "sharing" too ambiguously? By "sharing" I mean that the two actors
> >> have read access to something that at least one of them can modify.
> >> If they both only have read-only access then it's effectively the same
> >> as if they are not sharing.
> >
> > Right. What I mean is that you *can* share very simple "data" under
> > the form of synchronization primitives. You may want to synchronize
> > your interpreters even they don't share user-visible memory areas. The
> > point of synchronization is not only to avoid memory corruption but
> > also to regulate and orchestrate processing amongst multiple workers
> > (for example processes or interpreters). For example, a semaphore is
> > an easy way to implement "I want no more than N workers to do this
> > thing at the same time" ("this thing" can be something such as disk
> > I/O).
>
> It's fairly reasonable to implement a mutex using a CSP-style
> unbuffered channel (send = acquire, receive = release). And the same
> trick turns a channel with a fixed-size buffer into a bounded
> semaphore. It won't be as efficient as a modern specialized mutex
> implementation, of course, but it's workable.
We are drifting away from the point I was trying to make here. I was
pointing out that the claim that nothing can be shared is a lie.
If it's possible to share a small datum (a synchronized counter aka
semaphore) between processes, certainly there's no technical reason
that should prevent it between interpreters.
By the way, I do think efficiency is a concern here. Otherwise
subinterpreters don't even have a point (just use multiprocessing).
> Unfortunately while technically you can construct a buffered channel
> out of an unbuffered channel, the construction's pretty unreasonable
> (it needs two dedicated threads per channel).
And the reverse is quite cumbersome as well. So we should favour the
construct that's more convenient for users, or provide both.
Regards
Antoine.
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com