[Python-Dev] Re: Comments on PEP 554 (Multiple Interpreters in the Stdlib)

Eric Snow Tue, 28 Apr 2020 20:07:22 -0700

On Tue, Apr 21, 2020 at 10:42 AM Mark Shannon <m...@hotpy.org> wrote:
> I'm generally in favour of PEP 554, but I don't think it is ready to be
> accepted in its current form.


Yay(ish)! :)

> My main objection is that without per-subinterpeter GILs (SILs?) PEP 554
> provides no value over threading or multi-processing.
> Multi-processing provides true parallelism and threads provide shared
> memory concurrency.

I disagree. :)  I believe there are merits to the kind of programming
one can do via subinterpreter + channels (i.e. threads with opt-in
sharing).  I would also like to get broader community exposure to the
subinterpreter functionality sooner rather than later.  Getting the
Python API out there now will help folks get ready sooner for the
(later?) switch to per-interpreter GIL.  As Antoine put it, it allows
folks to start experimenting.  I think there is enough value in all
that to warrant landing PEP 554 in 3.9 even if per-interpreter GIL
only happens in 3.10.

> If per-subinterpeter GILs are possible then, and only then,
> sub-interpreters will provide true parallelism and (limited) shared
> memory concurrency.
>
> The problem is that we don't know whether we can implement
> per-subinterpeter GILs without too large a negative performance impact.
> I think we can, but we can't say so for certain.

I think we can as well, but I'd like to hear more about what obstacles
you think we might run into.

> So, IMO, we should not accept PEP 554 until we know for sure that
> per-subinterpeter GILs can be implemented efficiently.
>
>
>
> Detailed critique
> -----------------
>
> I don't see how `list_all()` can be both safe and accurate. The Java
> equivalent makes no guarantees of accuracy.
> Attempting to lock the list is likely to lead to deadlock and not
> locking it will lead to races; potentially dangerous ones.
> I think it would be best to drop this.
>
> `list_all_channels()`. See `list_all()` above.
>
> [out of order] `Channel.interpreters` see `list_all()` and 
> `list_all_channels()` above.

I'm not sure I understand your objection.  If a user calls the
function then they get a list.  If that list becomes outdated in the
next minute or the next millisecond, it does not impact the utility of
having that list.  For example, without that list how would one make
sure all other interpreters have been destroyed?

> `.destroy()` is either misleading or unsafe.
> What does this do?
>
>  >>> is.destroy()
>  >>> is.run()
>
> If `run()` raises an exception then the interpreter must exist. Rename
> to `close()` perhaps?

I see what you mean.  "Interpreter" objects are wrappers rather than
the actual interpreters, but that might not stop folks from thinking
otherwise.  I agrree that "close" may communicate that nature better.
I guess so would "finalize", which is what the C-API calls it.  Then
again, you can't tell an object to "destroy" itself, can you?  It just
isn't clear what you are destroying (nor why we're so destructive
<wink>).

So "close" aligns with other similarly purposed methods out there,
while "finalize" aligns with the existing C-API and also elevates the
complex nature of what happens.  If we change the name from "destroy"
then I'd lean toward "finalize".

FWIW, in your example above, the is.run() call would raise a
RuntimeError saying that it couldn't find an interpreter with "that"
ID.

> How does `is_shareable()` work? Are you proposing some mechanism to
> transfer an object from one sub-interpreter to another? How would that
> work?

The PEP purposefully does not proscribe how "is_shareable()" works.
That depends on the implementation for channels, for which there could
be several, and which will likely differ based on the Python
implementation.  Likewise the PEP does not dictate how channels work
(i.e. how objects are "shared").  That is an implementation detail.
We could talk about how we've implemented PEP 554, but that's not
highly relevant to the merits of this proposal (its API in
particular).

> If objects are not shared but serialized, why not use marshal or
> pickle instead of inventing a third serialization protocol?

Again, that's an implementation detail.  The PEP does not specify that
objects are actually shared or not.  In fact, I was careful to say:

    This does not necessarily mean that the actual objects will be
    shared.  Insead, it means that the objects' underlying data will
    be shared in a cross-interpreter way, whether via a proxy, a
    copy, or some other means.

> It would be clearer if channels only dealt with simple, contiguous
> binary data. As it stands the PEP doesn't state what form the received
> object will take.

You're right.  The PEP is not clear enough about what object an
interpreter will receive for a given sent object.  The intent is that
it will be the same type with the same data.  This might not always be
possible, so there may be cases where we allow for a compatible proxy.
Either way, I'll clarify this point in the PEP.

> Once channels supporting the transfer of bytes objects work, then it is
> simple to pass more complex objects using pickle or marshal.

That is certainly possible already with the current implementation of
the PEP.  However, I wanted to avoid blanket share-ability at first so
that users don't get the wrong idea.  The goal is to start at a
minimal(ish) point and then grow from there (after the PEP is accepted
and landed).

> Channels need a more detailed description of their lifespan. Ideally a
> state machine.
> For example:
> How does an interpreter detach from the receiving end of a channel that
> is never empty?
> What happens if an interpreter deletes the last reference to a non-empty
> channel? On the receiving end, or on the sending end?

I'll look into this.  Channel lifetime is definitely the most complex
part of the PEP.

> Isn't the lack of buffering in channels a recipe for deadlocks?

The PEP has been updated to indicate support for buffering in channels.

> What is the mechanism for reliably copying exceptions from one
> sub-interpreter to another in the `run()` method?

Currently I'm using pickling, in conjunction with some extra code to
preserve __traceback__, etc.  However, that is an implementation
detail.  I'm also a little nervous about the possibility of silent
imports happening in the calling interpreter due to pickle.

> If `run()` can raise
> an exception, why not let it return values?

That's an interesting question.  On the one hand the C-API PyRun_*()
(which we're using) return an object.  On the other hand, what does a
script return?  What does a module return?  What does exec() return?
It kind of doesn't make sense.  If we were passing a function or
expression through then it would be a different story.  However, doing
so is one of the things that I've put on the PEP's "deferred" list, so
I'd rather we look into that after we've reached the proposed base
state.

Thanks for the feedback!

-eric
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/U7EJIOPNZMEJYX2MCAL6EFVHEJADSST6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Comments on PEP 554 (Multiple Interpreters in the Stdlib)

Reply via email to