Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-08 Thread Nick Coghlan
On 14 September 2017 at 11:44, Eric Snow 
wrote:

> Examples
> 
>
> Run isolated code
> -
>
> ::
>
>interp = interpreters.create()
>print('before')
>interp.run('print("during")')
>print('after')
>

A few more suggestions for examples:

Running a module:

main_module = mod_name
interp.run(f"import runpy; runpy.run_module({main_module!r})")

Running as script (including zip archives & directories):

main_script = path_name
interp.run(f"import runpy; runpy.run_path({main_script!r})")

Running in a thread pool executor:

interps = [interpreters.create() for i in range(5)]
with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps)) as
pool:
print('before')
for interp in interps:
pool.submit(interp.run, 'print("starting"); print("stopping")'
print('after')

That last one is prompted by the questions about the benefits of keeping
the notion of an interpreter state distinct from the notion of a main
thread (it allows a single "MainThread" object to be mapped to different OS
level threads at different points in time, which means it's easier to
combine with existing constructs for managing OS level thread pools).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-08 Thread Nick Coghlan
On 7 October 2017 at 02:29, Koos Zevenhoven  wrote:

> While I'm actually trying not to say much here so that I can avoid this
> discussion now, here's just a couple of ideas and thoughts from me at this
> point:
>
> (A)
> Instead of sending bytes and receiving memoryviews, one could consider
> sending *and* receiving memoryviews for now. That could then be extended
> into more types of objects in the future without changing the basic concept
> of the channel. Probably, the memoryview would need to be copied (but not
> the data of course). But I'm guessing copying a memoryview would be quite
> fast.
>

The proposal is to allow sending any buffer-exporting object, so sending a
memoryview would be supported.


> This would hopefully require less API changes or additions in the future.
> OTOH, giving it a different name like MemChannel or making it 3rd party
> will buy some more time to figure out the right API. But maybe that's not
> needed.
>

I think having both a memory-centric data channel and an object-centric
data channel would be useful long term, so I don't see a lot of downsides
to starting with the easier-to-implement MemChannel, and then looking at
how to define a plain Channel later.

For example, it occurs to me is that the closest current equivalent we have
to an object level counterpart to the memory buffer protocol would be the
weak reference protocol, wherein a multi-interpreter-aware proxy object
could actually take care of switching interpreters as needed when
manipulating reference counts.

While weakrefs themselves wouldn't be usable in the general case (many
builtin types don't support weak references, and we'd want to support
strong cross-interpreter references anyway), a wrapt-style object proxy
would provide us with a way to maintain a single strong reference to the
original object in its originating interpreter (implicitly switching to
that interpreter as needed), while also maintaining a regular local
reference count on the proxy object in the receiving interpreter.

And here's the neat thing: since subinterpreters share an address space, it
would be possible to experiment with an object-proxy based channel by
passing object pointers over a memoryview based channel.


> (B)
> We would probably then like to pretend that the object coming out the
> other end of a Channel *is* the original object. As long as these channels
> are the only way to directly pass objects between interpreters, there are
> essentially only two ways to tell the difference (AFAICT):
>
> 1. Calling id(...) and sending it over to the other interpreter and
> checking if it's the same.
>
> 2. When the same object is sent twice to the same interpreter. Then one
> can compare the two with id(...) or using the `is` operator.
>
> There are solutions to the problems too:
>
> 1. Send the id() from the sending interpreter along with the sent object
> so that the receiving interpreter can somehow attach it to the object and
> then return it from id(...).
>
> 2. When an object is received, make a lookup in an interpreter-wide cache
> to see if an object by this id has already been received. If yes, take that
> one.
>
> Now it should essentially look like the received object is really "the
> same one" as in the sending interpreter. This should also work with
> multiple interpreters and multiple channels, as long as the id is always
> preserved.
>

I don't personally think we want to expend much (if any) effort on
presenting the illusion that the objects on either end of the channel are
the "same" object, but postponing the question entirely is also one of the
benefits I see to starting with MemChannel, and leaving the object-centric
Channel until later.


> (C)
> One further complication regarding memoryview in general is that
> .release() should probably be propagated to the sending interpreter somehow.
>

Yep, switching interpreters when releasing the buffer is the main reason
you couldn't use a regular memoryview for this purpose - you need a variant
that holds a strong reference to the sending interpreter, and switches back
to it for the buffer release operation.


> (D)
> I think someone already mentioned this one, but would it not be better to
> start a new interpreter in the background in a new thread by default? I
> think this would make things simpler and leave more freedom regarding the
> implementation in the future. If you need to run an interpreter within the
> current thread, you could perhaps optionally do that too.
>

Not really, as that approach doesn't compose as well with existing thread
management primitives like concurrent.futures.ThreadPoolExecutor. It also
doesn't match the way the existing subinterpreter machinery works, where
threads can change their active interpreter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Uns

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-06 Thread Koos Zevenhoven
While I'm actually trying not to say much here so that I can avoid this
discussion now, here's just a couple of ideas and thoughts from me at this
point:

(A)
Instead of sending bytes and receiving memoryviews, one could consider
sending *and* receiving memoryviews for now. That could then be extended
into more types of objects in the future without changing the basic concept
of the channel. Probably, the memoryview would need to be copied (but not
the data of course). But I'm guessing copying a memoryview would be quite
fast.

This would hopefully require less API changes or additions in the future.
OTOH, giving it a different name like MemChannel or making it 3rd party
will buy some more time to figure out the right API. But maybe that's not
needed.

(B)
We would probably then like to pretend that the object coming out the other
end of a Channel *is* the original object. As long as these channels are
the only way to directly pass objects between interpreters, there are
essentially only two ways to tell the difference (AFAICT):

1. Calling id(...) and sending it over to the other interpreter and
checking if it's the same.

2. When the same object is sent twice to the same interpreter. Then one can
compare the two with id(...) or using the `is` operator.

There are solutions to the problems too:

1. Send the id() from the sending interpreter along with the sent object so
that the receiving interpreter can somehow attach it to the object and then
return it from id(...).

2. When an object is received, make a lookup in an interpreter-wide cache
to see if an object by this id has already been received. If yes, take that
one.

Now it should essentially look like the received object is really "the same
one" as in the sending interpreter. This should also work with multiple
interpreters and multiple channels, as long as the id is always preserved.

(C)
One further complication regarding memoryview in general is that .release()
should probably be propagated to the sending interpreter somehow.

(D)
I think someone already mentioned this one, but would it not be better to
start a new interpreter in the background in a new thread by default? I
think this would make things simpler and leave more freedom regarding the
implementation in the future. If you need to run an interpreter within the
current thread, you could perhaps optionally do that too.


––Koos


PS. I have lots of thoughts related to this, but I can't afford to engage
in them now. (Anyway, it's probably more urgent to get some stuff with PEP
555 and its spin-off thoughts out of the way).



On Fri, Oct 6, 2017 at 6:38 AM, Nick Coghlan  wrote:

> On 6 October 2017 at 11:48, Eric Snow  wrote:
>
>> > And that's the real pay-off that comes from defining this in terms of
>> the
>> > memoryview protocol: Py_buffer structs *aren't* Python objects, so it's
>> only
>> > a regular C struct that gets passed across the interpreter boundary (the
>> > reference to the original objects gets carried along passively as part
>> of
>> > the CIV - it never gets *used* in the receiving interpreter).
>>
>> Yeah, the (PEP 3118) buffer protocol offers precedent in a number of
>> ways that are applicable to channels here.  I'm simply reticent to
>> lock PEP 554 into such a specific solution as the buffer-specific CIV.
>> I'm trying to accommodate anticipated future needs while keeping the
>> PEP as simple and basic as possible.  It's driving me nuts! :P  Things
>> were *much* simpler before I added Channels to the PEP. :)
>>
>
> Starting with memory-sharing only doesn't lock us into anything, since you
> can still add a more flexible kind of channel based on a different protocol
> later if it turns out that memory sharing isn't enough.
>
> By contrast, if you make the initial channel semantics incompatible with
> multiprocessing by design, you *will* prevent anyone from experimenting
> with replicating the shared memory based channel API for communicating
> between processes :)
>
> That said, if you'd prefer to keep the "Channel" name available for the
> possible introduction of object channels at a later date, you could call
> the initial memoryview based channel a "MemChannel".
>
>
>> > I don't think we should be touching the behaviour of core builtins
>> solely to
>> > enable message passing to subinterpreters without a shared GIL.
>>
>> Keep in mind that I included the above as a possible solution using
>> tp_share() that would work *after* we stop sharing the GIL.  My point
>> is that with tp_share() we have a solution that works now *and* will
>> work later.  I don't care how we use tp_share to do so. :)  I long to
>> be able to say in the PEP that you can pass bytes through the channel
>> and get bytes on the other side.
>>
>
> Memory views are a builtin type as well, and they emphasise the practical
> benefit we're trying to get relative to typical multiprocessing
> arranagements: zero-copy data sharing.
>
> So here's my proposed experimentation-enabling development stra

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-05 Thread Nick Coghlan
On 6 October 2017 at 11:48, Eric Snow  wrote:

> > And that's the real pay-off that comes from defining this in terms of the
> > memoryview protocol: Py_buffer structs *aren't* Python objects, so it's
> only
> > a regular C struct that gets passed across the interpreter boundary (the
> > reference to the original objects gets carried along passively as part of
> > the CIV - it never gets *used* in the receiving interpreter).
>
> Yeah, the (PEP 3118) buffer protocol offers precedent in a number of
> ways that are applicable to channels here.  I'm simply reticent to
> lock PEP 554 into such a specific solution as the buffer-specific CIV.
> I'm trying to accommodate anticipated future needs while keeping the
> PEP as simple and basic as possible.  It's driving me nuts! :P  Things
> were *much* simpler before I added Channels to the PEP. :)
>

Starting with memory-sharing only doesn't lock us into anything, since you
can still add a more flexible kind of channel based on a different protocol
later if it turns out that memory sharing isn't enough.

By contrast, if you make the initial channel semantics incompatible with
multiprocessing by design, you *will* prevent anyone from experimenting
with replicating the shared memory based channel API for communicating
between processes :)

That said, if you'd prefer to keep the "Channel" name available for the
possible introduction of object channels at a later date, you could call
the initial memoryview based channel a "MemChannel".


> > I don't think we should be touching the behaviour of core builtins
> solely to
> > enable message passing to subinterpreters without a shared GIL.
>
> Keep in mind that I included the above as a possible solution using
> tp_share() that would work *after* we stop sharing the GIL.  My point
> is that with tp_share() we have a solution that works now *and* will
> work later.  I don't care how we use tp_share to do so. :)  I long to
> be able to say in the PEP that you can pass bytes through the channel
> and get bytes on the other side.
>

Memory views are a builtin type as well, and they emphasise the practical
benefit we're trying to get relative to typical multiprocessing
arranagements: zero-copy data sharing.

So here's my proposed experimentation-enabling development strategy:

1. Start out with a MemChannel API, that accepts any buffer-exporting
object as input, and outputs only a cross-interpreter memoryview subclass
2. Use that as the basis for the work to get to a per-interpreter locking
arrangement that allows subinterpreters to fully exploit multiple CPUs
3. Only then try to design a Channel API that allows for sharing builtin
immutable objects between interpreters (bytes, strings, numbers), at a time
when you can be certain you won't be inadvertently making it harder to make
the GIL a truly per-interpreter lock, rather than the current process
global runtime lock.

The key benefit of this approach is that we *know* MemChannel can work: the
buffer protocol already operates at the level of C structs and pointers,
not Python objects, and there are already plenty of interesting
buffer-protocol-supporting objects around, so as long as the CIV switches
interpreters at the right time, there aren't any fundamentally new runtime
level capabilities needed to implement it.

The lower level MemChannel API could then also be replicated for
multiprocessing, while the higher level more speculative object-based
Channel API would be specific to subinterpreters (and probably only ever
designed and implemented if you first succeed in making subinterpreters
sufficiently independent that they don't rely on a process-wide GIL any
more).

So I'm not saying "Never design an object-sharing protocol specifically for
use with subinterpreters". I'm saying "You don't have a demonstrated need
for that yet, so don't try to define it until you do".



> My mind is drawn to the comparison between that and the question of
> CIV vs. tp_share().  CIV would be more like the post-451 import world,
> where I expect the CIV would take care of the data sharing operations.
> That said, the situation in PEP 554 is sufficiently different that I'm
> not convinced a generic CIV protocol would be better.  I'm not sure
> how much CIV could do for you over helpers+tp_share.
>
> Anyway, here are the leading approaches that I'm looking at now:
>
> * adding a tp_share slot
>   + you send() the object directly and recv() the object coming out of
> tp_share()
>  (which will probably be the same type as the original)
>   + this would eventually require small changes in tp_free for
> participating types
>   + we would likely provide helpers (eventually), similar to the new
> buffer protocol,
>  to make it easier to manage sharing data
>

I'm skeptical about this approach because you'll be designing in a vacuum
against future possible constraints that you can't test yet: the inherent
complexity in the object sharing protocol will come from *not* having a
process-wide GIL,

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-05 Thread Eric Snow
On Thu, Oct 5, 2017 at 4:57 AM, Nick Coghlan  wrote:
> This would be hard to get to work reliably, because "orig.tp_share()" would
> be running in the receiving interpreter, but all the attributes of "orig"
> would have been allocated by the sending interpreter. It gets more reliable
> if it's *Channel.send* that calls tp_share() though, but moving the call to
> the sending side makes it clear that a tp_share protocol would still need to
> rely on a more primitive set of "shareable objects" that were the permitted
> return values from the tp_share call.

The point of running tp_share() in the receiving interpreter is to
force allocation under that interpreter, so that GC applies there.  I
agree that you basically can't do anything in tp_share() that would
affect the sending interpreter, including INCREF and DECREF.  Since we
INCREFed in send(), we know that the we have a safe reference, so we
don't have to worry about that part in tp_share().  We would only be
able to do low-level things (like the buffer protocol) that don't
interact with the original object's interpreter.

Given that this is a quite low-level tp slot and low-level
functionality, I'd expect that a sufficiently clear entry (i.e.
warning) in the docs would be enough for the few that dare. 
>From my perspective adding the tp_share slot allows for much more
experimentation with object sharing (right now, long before we get to
considering how to stop sharing the GIL) by us *and* third parties.
None of the alternatives seem to offer the same opportunity while
still working out *after* we stop sharing the GIL.

>
> And that's the real pay-off that comes from defining this in terms of the
> memoryview protocol: Py_buffer structs *aren't* Python objects, so it's only
> a regular C struct that gets passed across the interpreter boundary (the
> reference to the original objects gets carried along passively as part of
> the CIV - it never gets *used* in the receiving interpreter).

Yeah, the (PEP 3118) buffer protocol offers precedent in a number of
ways that are applicable to channels here.  I'm simply reticent to
lock PEP 554 into such a specific solution as the buffer-specific CIV.
I'm trying to accommodate anticipated future needs while keeping the
PEP as simple and basic as possible.  It's driving me nuts! :P  Things
were *much* simpler before I added Channels to the PEP. :)

>
>>
>> bytes.tp_share():
>> obj = blank_bytes(len(self))
>> obj.ob_sval = self.ob_sval # hand-wavy memory sharing
>> return obj
>
>
> This is effectively reinventing memoryview, while trying to pretend it's an
> ordinary bytes object. Don't reinvent memoryview :)
>
>>
>> bytes.tp_free():  # under no-shared-GIL:
>> # most of this could be pulled into a macro for re-use
>> orig = lookup_shared(self)
>> if orig != NULL:
>> current = release_LIL()
>> interp = lookup_owner(orig)
>> acquire_LIL(interp)
>> decref(orig)
>> release_LIL(interp)
>> acquire_LIL(current)
>> # clear shared/owner tables
>> # clear/release self.ob_sval
>> free(self)
>
>
> I don't think we should be touching the behaviour of core builtins solely to
> enable message passing to subinterpreters without a shared GIL.

Keep in mind that I included the above as a possible solution using
tp_share() that would work *after* we stop sharing the GIL.  My point
is that with tp_share() we have a solution that works now *and* will
work later.  I don't care how we use tp_share to do so. :)  I long to
be able to say in the PEP that you can pass bytes through the channel
and get bytes on the other side.

That said, I'm not sure how this could be made to work without
involving tp_free().  If that is really off the table (even in the
simplest possible ways) then I don't think there is a way to actually
share objects of builtin types between interpreters other than through
views like CIV.  We could still support tp_share() for the sake of
third parties, which would facilitate that simplicity I was aiming for
in sending data between interpreters, as well as leaving the door open
for nearly all the same experimentation.  However, I expect that most
*uses* of channels will involve builtin types, particularly as we
start off, so having to rely on view types for builtins would add
not-insignificant awkwardness to using channels.

I'd still like to avoid that if possible, so let's not rush to
completely close the door on small modifications to tp_free for
builtins. :)  Regardless, I still (after a night's rest and a day of
not thinking about it) consider tp_share() to be the solution I'd been
hoping we'd find, whether or not we can apply it to builtin types.

>
> The simplest possible variant of CIVs that I can think of would be able to
> avoid that outcome by being a memoryview subclass, since they just need to
> hold the extra reference to the original interpreter, and include some logic
> to swtich interpreters at the appropriate time.
>
>

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-05 Thread Nick Coghlan
On 5 October 2017 at 18:45, Eric Snow  wrote:

> After we move to not sharing the GIL between interpreters:
>
> Channel.send(obj):  # in interp A
> incref(obj)
> if type(obj).tp_share == NULL:
> raise ValueError("not a shareable type")
> set_owner(obj)  # obj.owner or add an obj -> interp entry to global
> table
> ch.objects.append(obj)
>
> Channel.recv():  # in interp B
> orig = ch.objects.pop(0)
> obj = orig.tp_share()
> set_shared(obj, orig)  # add to a global table
> return obj
>

This would be hard to get to work reliably, because "orig.tp_share()" would
be running in the receiving interpreter, but all the attributes of "orig"
would have been allocated by the sending interpreter. It gets more reliable
if it's *Channel.send* that calls tp_share() though, but moving the call to
the sending side makes it clear that a tp_share protocol would still need
to rely on a more primitive set of "shareable objects" that were the
permitted return values from the tp_share call.

And that's the real pay-off that comes from defining this in terms of the
memoryview protocol: Py_buffer structs *aren't* Python objects, so it's
only a regular C struct that gets passed across the interpreter boundary
(the reference to the original objects gets carried along passively as part
of the CIV - it never gets *used* in the receiving interpreter).


> bytes.tp_share():
> obj = blank_bytes(len(self))
> obj.ob_sval = self.ob_sval # hand-wavy memory sharing
> return obj
>

This is effectively reinventing memoryview, while trying to pretend it's an
ordinary bytes object. Don't reinvent memoryview :)


> bytes.tp_free():  # under no-shared-GIL:
> # most of this could be pulled into a macro for re-use
> orig = lookup_shared(self)
> if orig != NULL:
> current = release_LIL()
> interp = lookup_owner(orig)
> acquire_LIL(interp)
> decref(orig)
> release_LIL(interp)
> acquire_LIL(current)
> # clear shared/owner tables
> # clear/release self.ob_sval
> free(self)
>

I don't think we should be touching the behaviour of core builtins solely
to enable message passing to subinterpreters without a shared GIL.

The simplest possible variant of CIVs that I can think of would be able to
avoid that outcome by being a memoryview subclass, since they just need to
hold the extra reference to the original interpreter, and include some
logic to swtich interpreters at the appropriate time.

That said, I think there's definitely a useful design question to ask in
this area, not about bytes (which can be readily represented by a
memoryview variant in the receiving interpreter), but about *strings*: they
have a more complex internal layout than bytes objects, but as long as the
receiving interpreter can make sure that the original string continues to
exist, then you could usefully implement a "strview" type to avoid having
to go through an encode/decode cycle just to pass a string to another
subinterpreter.

That would provide a reasonable compelling argument that CIVs *shouldn't*
be implemented as memoryview subclasses, but instead defined as
*containing* a managed view of an object owned by a different interpreter.

That way, even if the initial implementation only supported CIVs that
contained a memoryview instance, we'd have the freedom to define other
kinds of views later (such as strview), while being able to reuse the same
CIV machinery.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-05 Thread Eric Snow
On Tue, Oct 3, 2017 at 8:55 AM, Antoine Pitrou  wrote:
> I think we need a sharing protocol, not just a flag.  We also need to
> think carefully about that protocol, so that it does not imply
> unnecessary memory copies.  Therefore I think the protocol should be
> something like the buffer protocol, that allows to acquire and release
> a set of shared memory areas, but without imposing any semantics onto
> those memory areas (each type implementing its own semantics).  And
> there needs to be a dedicated reference counting for object shares, so
> that the original object can be notified when all its shares have
> vanished.

I've come to agree. :)  I actually came to the same conclusion tonight
before I'd been able to read through your message carefully.  My idea
is below.  Your suggestion about protecting shared memory areas is
something to discuss further, though I'm not sure it's strictly
necessary yet (before we stop sharing the GIL).

On Wed, Oct 4, 2017 at 7:41 PM, Nick Coghlan  wrote:
> Having the sending interpreter do the INCREF just changes the problem
> to be a memory leak waiting to happen rather than an access-after-free
> issue, since the problematic non-synchronised scenario then becomes:
>
> * thread on CPU A has two references (ob_refcnt=2)
> * it sends a reference to a thread on CPU B via a channel
> * thread on CPU A releases its reference (ob_refcnt=1)
> * updated ob_refcnt value hasn't made it back to the shared memory cache yet
> * thread on CPU B releases its reference (ob_refcnt=1)
> * both threads have released their reference, but the refcnt is still
> 1 -> object leaks!
>
> We simply can't have INCREFs and DECREFs happening in different
> threads without some way of ensuring cache coherency for *both*
> operations - otherwise we risk either the refcount going to zero when
> it shouldn't, or *not* going to zero when it should.
>
> The current CPython implementation relies on the process global GIL
> for that purpose, so none of these problems will show up until you
> start trying to replace that with per-interpreter locks.
>
> Free threaded reference counting relies on (expensive) atomic
> increments & decrements.

Right.  I'm not sure why I was missing that, but I'm clear now.

Below is a rough idea of what I think may work instead (the result of
much tossing and turning in bed*).

While we're still sharing a GIL between interpreters:

Channel.send(obj):  # in interp A
incref(obj)
if type(obj).tp_share == NULL:
raise ValueError("not a shareable type")
ch.objects.append(obj)

Channel.recv():  # in interp B
orig = ch.objects.pop(0)
obj = orig.tp_share()
return obj

bytes.tp_share():
return self

After we move to not sharing the GIL between interpreters:

Channel.send(obj):  # in interp A
incref(obj)
if type(obj).tp_share == NULL:
raise ValueError("not a shareable type")
set_owner(obj)  # obj.owner or add an obj -> interp entry to global table
ch.objects.append(obj)

Channel.recv():  # in interp B
orig = ch.objects.pop(0)
obj = orig.tp_share()
set_shared(obj, orig)  # add to a global table
return obj

bytes.tp_share():
obj = blank_bytes(len(self))
obj.ob_sval = self.ob_sval # hand-wavy memory sharing
return obj

bytes.tp_free():  # under no-shared-GIL:
# most of this could be pulled into a macro for re-use
orig = lookup_shared(self)
if orig != NULL:
current = release_LIL()
interp = lookup_owner(orig)
acquire_LIL(interp)
decref(orig)
release_LIL(interp)
acquire_LIL(current)
# clear shared/owner tables
# clear/release self.ob_sval
free(self)

The CIV approach could be facilitated through something like a new
SharedBuffer type, or through a separate BufferViewChannel, etc.

Most notably, this approach avoids hard-coding specific type support
into channels and should work out fine under no-shared-GIL
subinterpreters.  One nice thing about the tp_share slot is that it
makes it much easier (along with C-API for managing the global
owned/shared tables) to implement other types that are legal to pass
through channels.  Such could be provided via extension modules.
Numpy arrays could be made to support it, if that's your thing.
Antoine could give tp_share to locks and semaphores. :)  Of course,
any such types would have to ensure that they are actually safe to
share between intepreters without a GIL between them...

For PEP 554, I'd only propose the tp_share slot and its use in
Channel.send()/.recv().  The parts related to global tables and memory
sharing and tp_free() wouldn't be necessary until we stop sharing the
GIL between interpreters.  However, I believe that tp_share would make
us ready for that.

-eric


* I should know by now that some ideas sound better in the middle of
the night than they do the next day, but this idea is keeping me awake
so I'll risk it! :)
___
Python-Dev

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-04 Thread Nick Coghlan
On 4 October 2017 at 23:51, Eric Snow  wrote:
> On Tue, Oct 3, 2017 at 11:36 PM, Nick Coghlan  wrote:
>> The problem relates to the fact that there aren't any memory barriers
>> around CPython's INCREF operations (they're implemented as an ordinary
>> C post-increment operation), so you can get the following scenario:
>>
>> * thread on CPU A has the sole reference (ob_refcnt=1)
>> * thread on CPU B acquires a new reference, but hasn't pushed the
>> updated ob_refcnt value back to the shared memory cache yet
>> * original thread on CPU A drops its reference, *thinks* the refcnt is
>> now zero, and deletes the object
>> * bad things now happen in CPU B as the thread running there tries to
>> use a deleted object :)
>
> I'm not clear on where we'd run into this problem with channels.
> Mirroring your scenario:
>
> * interpreter A (in thread on CPU A) INCREFs the object (the GIL is still 
> held)
> * interp A sends the object to the channel
> * interp B (in thread on CPU B) receives the object from the channel
> * the new reference is held until interp B DECREFs the object
>
> From what I see, at no point do we get a refcount of 0, such that
> there would be a race on the object being deleted.

Having the sending interpreter do the INCREF just changes the problem
to be a memory leak waiting to happen rather than an access-after-free
issue, since the problematic non-synchronised scenario then becomes:

* thread on CPU A has two references (ob_refcnt=2)
* it sends a reference to a thread on CPU B via a channel
* thread on CPU A releases its reference (ob_refcnt=1)
* updated ob_refcnt value hasn't made it back to the shared memory cache yet
* thread on CPU B releases its reference (ob_refcnt=1)
* both threads have released their reference, but the refcnt is still
1 -> object leaks!

We simply can't have INCREFs and DECREFs happening in different
threads without some way of ensuring cache coherency for *both*
operations - otherwise we risk either the refcount going to zero when
it shouldn't, or *not* going to zero when it should.

The current CPython implementation relies on the process global GIL
for that purpose, so none of these problems will show up until you
start trying to replace that with per-interpreter locks.

Free threaded reference counting relies on (expensive) atomic
increments & decrements.

The cross-interpreter view proposal aims to allow per-interpreter GILs
without introducing atomic increments & decrements by instead relying
on the view itself to ensure that it's holding the right GIL for the
object whose refcount it's manipulating, and the receiving interpreter
explicitly closing the view when it's done with it.

So while CIVs wouldn't be as easy to use as regular object references:

1. They'd be no harder to use than memoryviews in general
2. They'd structurally ensure that regular object refcounts can still
rely on "protected by the GIL" semantics
3. They'd structurally ensure zero performance degradation for regular
object refcounts
4. By virtue of being memoryview based, they'd encourage the adoption
of interfaces and practices that can be adapted to multiple processes
through the use of techniques like shared memory regions and memory
mapped files (see
http://www.boost.org/doc/libs/1_54_0/doc/html/interprocess/sharedmemorybetweenprocesses.html
for some detailed explanations of how that works, and
https://arrow.apache.org/ for an example of ways tools like Pandas can
use that to enable zero-copy data sharing)

> The only problem I'm aware of (it dawned on me last night), is in the
> case that the interpreter that created the object gets deleted before
> the object does.  In that case we can't pass the deletion back to the
> original interpreter.  (I don't think this problem is necessarily
> exclusive to the solution I've proposed for Bytes.)

The cross-interpreter-view idea proposes to deal with that by having
the CIV hold a strong reference not only to the sending object (which
is already part of the regular memoryview semantics), but *also* to
the sending interpreter - that way, neither the sending object nor the
sending interpreter can go away until the receiving interpreter closes
the view.

The refcount-integrity-ensuring sequence of events becomes:

1. Sending interpreter submits the object to the channel
2. Channel creates a CIV with references to the sending interpreter &
sending object, and a view on the sending object's memory
3. Receiving interpreter gets the CIV from the channel
4. Receiving interpreter closes the CIV either explicitly or via
__del__ (the latter would emit ResourceWarning)
5. CIV switches execution back to the sending interpreter and releases
both the memory buffer and the reference to the sending object
6. CIV switches execution back to the receiving interpreter, and
releases its reference to the sending interpreter
7. Execution continues in the receiving interpreter

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
__

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-04 Thread Antoine Pitrou
On Wed, 4 Oct 2017 17:50:33 +0200
Antoine Pitrou  wrote:
> On Mon, 2 Oct 2017 21:31:30 -0400
> Eric Snow  wrote:
> >   
> > > By contrast, if we allow an actual bytes object to be shared, then
> > > either every INCREF or DECREF on that bytes object becomes a
> > > synchronisation point, or else we end up needing some kind of
> > > secondary per-interpreter refcount where the interpreter doesn't drop
> > > its shared reference to the original object in its source interpreter
> > > until the internal refcount in the borrowing interpreter drops to
> > > zero.
> > 
> > There shouldn't be a need to synchronize on INCREF.  If both
> > interpreters have at least 1 reference then either one adding a
> > reference shouldn't be a problem.  
> 
> I'm not sure what Nick meant by "synchronization point", but at least
> you certainly need INCREF and DECREF to be atomic, which is a departure
> from today's Py_INCREF / Py_DECREF behaviour (and is significantly
> slower, even on high-level benchmarks).

To be clear, I'm writing this under the hypothesis of per-interpreter
GILs.  I'm not really interested in the per-process GIL case :-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-04 Thread Koos Zevenhoven
On Wed, Oct 4, 2017 at 4:51 PM, Eric Snow 
wrote:

> On Tue, Oct 3, 2017 at 11:36 PM, Nick Coghlan  wrote:
> > The problem relates to the fact that there aren't any memory barriers
> > around CPython's INCREF operations (they're implemented as an ordinary
> > C post-increment operation), so you can get the following scenario:
> >
> > * thread on CPU A has the sole reference (ob_refcnt=1)
> > * thread on CPU B acquires a new reference, but hasn't pushed the
> > updated ob_refcnt value back to the shared memory cache yet
> > * original thread on CPU A drops its reference, *thinks* the refcnt is
> > now zero, and deletes the object
> > * bad things now happen in CPU B as the thread running there tries to
> > use a deleted object :)
>
> I'm not clear on where we'd run into this problem with channels.
> Mirroring your scenario:
>
> * interpreter A (in thread on CPU A) INCREFs the object (the GIL is still
> held)
> * interp A sends the object to the channel
> * interp B (in thread on CPU B) receives the object from the channel
> * the new reference is held until interp B DECREFs the object
>
> From what I see, at no point do we get a refcount of 0, such that
> there would be a race on the object being deleted.
>
> ​
So what you're saying is that when Larry finishes the gilectomy,
subinterpreters will work GIL-free too?​-)

​––Koos
​

The only problem I'm aware of (it dawned on me last night), is in the
> case that the interpreter that created the object gets deleted before
> the object does.  In that case we can't pass the deletion back to the
> original interpreter.  (I don't think this problem is necessarily
> exclusive to the solution I've proposed for Bytes.)
>
> -eric
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> k7hoven%40gmail.com
>



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-04 Thread Antoine Pitrou
On Mon, 2 Oct 2017 21:31:30 -0400
Eric Snow  wrote:
> 
> > By contrast, if we allow an actual bytes object to be shared, then
> > either every INCREF or DECREF on that bytes object becomes a
> > synchronisation point, or else we end up needing some kind of
> > secondary per-interpreter refcount where the interpreter doesn't drop
> > its shared reference to the original object in its source interpreter
> > until the internal refcount in the borrowing interpreter drops to
> > zero.  
> 
> There shouldn't be a need to synchronize on INCREF.  If both
> interpreters have at least 1 reference then either one adding a
> reference shouldn't be a problem.

I'm not sure what Nick meant by "synchronization point", but at least
you certainly need INCREF and DECREF to be atomic, which is a departure
from today's Py_INCREF / Py_DECREF behaviour (and is significantly
slower, even on high-level benchmarks).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-04 Thread Eric Snow
On Tue, Oct 3, 2017 at 11:36 PM, Nick Coghlan  wrote:
> The problem relates to the fact that there aren't any memory barriers
> around CPython's INCREF operations (they're implemented as an ordinary
> C post-increment operation), so you can get the following scenario:
>
> * thread on CPU A has the sole reference (ob_refcnt=1)
> * thread on CPU B acquires a new reference, but hasn't pushed the
> updated ob_refcnt value back to the shared memory cache yet
> * original thread on CPU A drops its reference, *thinks* the refcnt is
> now zero, and deletes the object
> * bad things now happen in CPU B as the thread running there tries to
> use a deleted object :)

I'm not clear on where we'd run into this problem with channels.
Mirroring your scenario:

* interpreter A (in thread on CPU A) INCREFs the object (the GIL is still held)
* interp A sends the object to the channel
* interp B (in thread on CPU B) receives the object from the channel
* the new reference is held until interp B DECREFs the object

>From what I see, at no point do we get a refcount of 0, such that
there would be a race on the object being deleted.

The only problem I'm aware of (it dawned on me last night), is in the
case that the interpreter that created the object gets deleted before
the object does.  In that case we can't pass the deletion back to the
original interpreter.  (I don't think this problem is necessarily
exclusive to the solution I've proposed for Bytes.)

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Nick Coghlan
On 3 October 2017 at 11:31, Eric Snow  wrote:
> There shouldn't be a need to synchronize on INCREF.  If both
> interpreters have at least 1 reference then either one adding a
> reference shouldn't be a problem.  If only one interpreter has a
> reference then the other won't be adding any references.  If neither
> has a reference then neither is going to add any references.  Perhaps
> I've missed something.  Under what circumstances would INCREF happen
> while the refcount is 0?

The problem relates to the fact that there aren't any memory barriers
around CPython's INCREF operations (they're implemented as an ordinary
C post-increment operation), so you can get the following scenario:

* thread on CPU A has the sole reference (ob_refcnt=1)
* thread on CPU B acquires a new reference, but hasn't pushed the
updated ob_refcnt value back to the shared memory cache yet
* original thread on CPU A drops its reference, *thinks* the refcnt is
now zero, and deletes the object
* bad things now happen in CPU B as the thread running there tries to
use a deleted object :)

The GIL currently protects us from this, as switching CPUs requires
switching threads, which means the original thread has to release the
GIL (flushing all of its state changes to the shared cache), and the
new thread has to acquire it (hence refreshing its local cache from
the shared one).

The need to switch all incref/decref operations over to using atomic
thread-safe primitives when removing the GIL is one of the main
reasons that attempting to remove the GIL *within* an interpreter is
expensive (and why Larry et al are having to explore completely
different ref count management strategies for the GILectomy).

By contrast, if you rely on a new memoryview variant to mediate all
data sharing between interpreters, then you can make sure that *it* is
using synchronisation primitives as needed to ensure the required
cache coherency across different CPUs, without any negative impacts on
regular single interpreter code (which can still rely on the cache
coherency guarantees provided by the GIL).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Steve Dower

On 03Oct2017 0755, Antoine Pitrou wrote:

On Tue, 3 Oct 2017 08:36:55 -0600
Eric Snow  wrote:

On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou  wrote:

On Mon, 2 Oct 2017 22:15:01 -0400
Eric Snow  wrote:


I'm still not convinced that sharing synchronization primitives is
important enough to be worth including it in the PEP.  It can be added
later, or via an extension module in the meantime.  To that end, I'll
add a mechanism to the PEP for third-party types to indicate that they
can be passed through channels.  Something like
"obj.__channel_support__ = True".


How would that work?  If it's simply a matter of flipping a bit, why
don't we do it for all objects?


The type would also have to be safe to share between interpreters. :)


But what does it mean to be safe to share, while the exact degree
and nature of the isolation between interpreters (and also their
concurrent execution) is unspecified?

I think we need a sharing protocol, not just a flag.


The easiest such protocol is essentially:

* an object can represent itself as bytes (e.g. generate a bytes object 
representing some global token, such as a kernel handle or memory address)

* those bytes are sent over the standard channel
* the object can instantiate itself from those bytes (e.g. wrap the 
existing handle, create a memoryview over the same block of memory, etc.)
* cross-interpreter refcounting is either ignored (because the kernel is 
refcounting the resource) or manual (by including more shared info in 
the token)


Since this is trivial to implement over the basic bytes channel, and 
doesn't even require a standard protocol except for convenience, Eric 
decided to avoid blocking the core functionality on this. I'm inclined 
to agree - get the basic functionality supported and let people build on 
it before we try to lock down something we don't fully understand yet.


About the only thing that seems to be worth doing up-front is some sort 
of pending-call callback mechanism between interpreters, but even that 
doesn't need to block the core functionality (you can do it trivially 
with threads and another channel right now, and there's always room to 
make something more efficient later).


There are plenty of smart people out there who can and will figure out 
the best way to design this. By giving them the tools and the ability to 
design something awesome, we're more likely to get something awesome 
than by committing to a complete design now. Right now, they're all 
blocked on the fact that subinterpreters are incredibly hard to start 
running, let alone experiment with. Eric's PEP will fix that part and 
enable others to take it from building blocks to powerful libraries.


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Antoine Pitrou
On Tue, 3 Oct 2017 08:36:55 -0600
Eric Snow  wrote:
> On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou  wrote:
> > On Mon, 2 Oct 2017 22:15:01 -0400
> > Eric Snow  wrote:  
> >>
> >> I'm still not convinced that sharing synchronization primitives is
> >> important enough to be worth including it in the PEP.  It can be added
> >> later, or via an extension module in the meantime.  To that end, I'll
> >> add a mechanism to the PEP for third-party types to indicate that they
> >> can be passed through channels.  Something like
> >> "obj.__channel_support__ = True".  
> >
> > How would that work?  If it's simply a matter of flipping a bit, why
> > don't we do it for all objects?  
> 
> The type would also have to be safe to share between interpreters. :)

But what does it mean to be safe to share, while the exact degree
and nature of the isolation between interpreters (and also their
concurrent execution) is unspecified?

I think we need a sharing protocol, not just a flag.  We also need to
think carefully about that protocol, so that it does not imply
unnecessary memory copies.  Therefore I think the protocol should be
something like the buffer protocol, that allows to acquire and release
a set of shared memory areas, but without imposing any semantics onto
those memory areas (each type implementing its own semantics).  And
there needs to be a dedicated reference counting for object shares, so
that the original object can be notified when all its shares have
vanished.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Eric Snow
On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou  wrote:
> On Mon, 2 Oct 2017 22:15:01 -0400
> Eric Snow  wrote:
>>
>> I'm still not convinced that sharing synchronization primitives is
>> important enough to be worth including it in the PEP.  It can be added
>> later, or via an extension module in the meantime.  To that end, I'll
>> add a mechanism to the PEP for third-party types to indicate that they
>> can be passed through channels.  Something like
>> "obj.__channel_support__ = True".
>
> How would that work?  If it's simply a matter of flipping a bit, why
> don't we do it for all objects?

The type would also have to be safe to share between interpreters. :)
Eventually I'd like to make that work for all immutable objects (and
immutable containers thereof), but until then each type must be
adapted individually.  The PEP starts off with just Bytes.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Antoine Pitrou
On Mon, 2 Oct 2017 22:15:01 -0400
Eric Snow  wrote:
> 
> I'm still not convinced that sharing synchronization primitives is
> important enough to be worth including it in the PEP.  It can be added
> later, or via an extension module in the meantime.  To that end, I'll
> add a mechanism to the PEP for third-party types to indicate that they
> can be passed through channels.  Something like
> "obj.__channel_support__ = True".

How would that work?  If it's simply a matter of flipping a bit, why
don't we do it for all objects?

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-02 Thread Eric Snow
On Wed, Sep 27, 2017 at 1:26 AM, Nick Coghlan  wrote:
> It's also the case that unlike Go channels, which were designed from
> scratch on the basis of implementing pure CSP,

FWIW, Go's channels (and goroutines) don't implement pure CSP.  They
provide a variant that the Go authors felt was more in-line with the
language's flavor.  The channels in the PEP aim to support a more pure
implementation.

> Python has an
> established behavioural precedent in the APIs of queue.Queue and
> collections.deque: they're unbounded by default, and you have to opt
> in to making them bounded.

Right.  That's part of why I'm leaning toward support for buffered channels.

> While the article title is clickbaity,
> http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/
> actually has a good discussion of this point. Search for "compose" to
> find the relevant section ("Channels don’t compose well with other
> concurrency primitives").
>
> The specific problem cited is that only offering unbuffered or
> bounded-buffer channels means that every send call becomes a potential
> deadlock scenario, as all that needs to happen is for you to be
> holding a different synchronisation primitive when the send call
> blocks.

Yeah, that blog post was a reference for me as I was designing the
PEP's channels.

> The fact that the proposal now allows for M:N sender:receiver
> relationships (just as queue.Queue does with threads) makes that
> problem worse, since you may now have variability not only on the
> message consumption side, but also on the message production side.
>
> Consider this example where you have an event processing thread pool
> that we're attempting to isolate from blocking IO by using channels
> rather than coroutines.
>
> Desired flow:
>
> 1. Listener thread receives external message from socket
> 2. Listener thread files message for processing on receive channel
> 3. Listener thread returns to blocking on the receive socket
>
> 4. Processing thread picks up message from receive channel
> 5. Processing thread processes message
> 6. Processing thread puts reply on the send channel
>
> 7. Sending thread picks up message from send channel
> 8. Sending thread makes a blocking network send call to transmit the message
> 9. Sending thread returns to blocking on the send channel
>
> When queue.Queue is used to pass the messages between threads, such an
> arrangement will be effectively non-blocking as long as the send rate
> is greater than or equal to the receive rate. However, the GIL means
> it won't exploit all available cores, even if we create multiple
> processing threads: you have to switch to multiprocessing for that,
> with all the extra overhead that entails.
>
> So I see the essential premise of PEP 554 as being to ask the question
> "If each of these threads was running its own *interpreter*, could we
> use Sans IO style protocols with interpreter channels to separate
> internally "synchronous" processing threads from separate IO threads
> operating at system boundaries, without having to make the entire
> application pervasively asynchronous?"

+1

> If channels are an unbuffered blocking primitive, then we don't get
> that benefit: even when there are additional receive messages to be
> processed, the processing thread will block until the previous send
> has completed. Switching the listener and sender threads over to
> asynchronous IO would help with that, but they'd also end up having to
> implement their own message buffering to manage the lack of buffering
> in the core channel primitive.
>
> By contrast, if the core channels are designed to offer an unbounded
> buffer by default, then you can get close-to-CSP semantics just by
> setting the buffer size to 1 (it's still not exactly CSP, since that
> has a buffer size of 0, but you at least get the semantics of having
> to alternate sending and receiving of messages).

Yep, I came to the same conclusion.

>> By the way, I do think efficiency is a concern here.  Otherwise
>> subinterpreters don't even have a point (just use multiprocessing).
>
> Agreed, and I think the interaction between the threading module and
> the interpreters module is one we're going to have to explicitly call
> out as being covered by the provisional status of the interpreters
> module, as I think it could be incredibly valuable to be able to send
> at least some threading objects through channels, and have them be an
> interpreter-specific reference to a common underlying sync primitive.

Agreed.  I'll add a note to the PEP.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-02 Thread Eric Snow
On Mon, Sep 25, 2017 at 8:42 PM, Nathaniel Smith  wrote:
> It's fairly reasonable to implement a mutex using a CSP-style
> unbuffered channel (send = acquire, receive = release). And the same
> trick turns a channel with a fixed-size buffer into a bounded
> semaphore. It won't be as efficient as a modern specialized mutex
> implementation, of course, but it's workable.
>
> Unfortunately while technically you can construct a buffered channel
> out of an unbuffered channel, the construction's pretty unreasonable
> (it needs two dedicated threads per channel).

Yeah, if threading's synchronization primitives make sense between
interpreters then we'll add direct support.  Using channels for that
isn't a good option.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-02 Thread Eric Snow
On Mon, Oct 2, 2017 at 9:31 PM, Eric Snow  wrote:
> On DECREF there shouldn't be a problem except possibly with a small
> race between decrementing the refcount and checking for a refcount of
> 0.  We could address that several different ways, including allowing
> the pending call to get queued only once (or being a noop the second
> time).

Alternately, the channel could own a reference and DECREF it in the
owning interpreter once the refcount reaches 1.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-02 Thread Eric Snow
After having looked it over, I'm leaning toward supporting buffering,
as well as not blocking by default.  Neither adds much complexity to
the implementation.

On Sat, Sep 23, 2017 at 5:45 AM, Antoine Pitrou  wrote:
> On Fri, 22 Sep 2017 19:09:01 -0600
> Eric Snow  wrote:
>> > send() blocking until someone else calls recv() is not only bad for
>> > performance,
>>
>> What is the performance problem?
>
> Intuitively, there must be some kind of context switch (interpreter
> switch?) at each send() call to let the other end receive the data,
> since you don't have any internal buffering.

There would be an internal size-1 buffer.

>> (FWIW, CSP
>> provides rigorous guarantees about deadlock detection (which Go
>> leverages), though I'm not sure how much benefit that can offer such a
>> dynamic language as Python.)
>
> Hmm... deadlock detection is one thing, but when detected you must still
> solve those deadlock issues, right?

Yeah, I haven't given much thought into how we could leverage that
capability but my
gut feeling is that we won't have much opportunity to do so. :)

>> I'm not sure I understand your concern here.  Perhaps I used the word
>> "sharing" too ambiguously?  By "sharing" I mean that the two actors
>> have read access to something that at least one of them can modify.
>> If they both only have read-only access then it's effectively the same
>> as if they are not sharing.
>
> Right.  What I mean is that you *can* share very simple "data" under
> the form of synchronization primitives.  You may want to synchronize
> your interpreters even they don't share user-visible memory areas.  The
> point of synchronization is not only to avoid memory corruption but
> also to regulate and orchestrate processing amongst multiple workers
> (for example processes or interpreters).  For example, a semaphore is
> an easy way to implement "I want no more than N workers to do this
> thing at the same time" ("this thing" can be something such as disk
> I/O).

I'm still not convinced that sharing synchronization primitives is
important enough to be worth including it in the PEP.  It can be added
later, or via an extension module in the meantime.  To that end, I'll
add a mechanism to the PEP for third-party types to indicate that they
can be passed through channels.  Something like
"obj.__channel_support__ = True".

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-02 Thread Eric Snow
On Thu, Sep 14, 2017 at 8:44 PM, Nick Coghlan  wrote:
> Not really, because the only way to ensure object separation (i.e no
> refcounted objects accessible from multiple interpreters at once) with
> a bytes-based API would be to either:
>
> 1. Always copy (eliminating most of the low overhead communications
> benefits that subinterpreters may offer over multiple processes)
> 2. Make the bytes implementation more complicated by allowing multiple
> bytes objects to share the same underlying storage while presenting as
> distinct objects in different interpreters
> 3. Make the output on the receiving side not actually a bytes object,
> but instead a view onto memory owned by another object in a different
> interpreter (a "memory view", one might say)

4. Pass Bytes through directly.

The only problem of which I'm aware is that when Py_DECREF() triggers
Bytes.__del__(), it happens in the current interpreter, which may not
be the "owner" (i.e. allocated the object).  So the solution would be
to make PyBytesType.tp_free() effectively run as a "pending call"
under the owner.  This would require two things:

1. a new PyBytesObject.owner field (PyInterpreterState *), or a
separate owner table, which would be set when the object is passed
through a channel
2. a Py_AddPendingCall() that targets a specific interpreter (which I
expect would be desirable regardless)

Then, when the object has an owner, PyBytesType.tp_free() would add a
pending call on the owner to call PyObject_Del() on the Bytes object.

The catch is that currently "pending" calls (via Py_AddPendingCall)
are run only in the main thread of the main interpreter.  We'd need a
similar mechanism that targets a specific interpreter .

> By contrast, if we allow an actual bytes object to be shared, then
> either every INCREF or DECREF on that bytes object becomes a
> synchronisation point, or else we end up needing some kind of
> secondary per-interpreter refcount where the interpreter doesn't drop
> its shared reference to the original object in its source interpreter
> until the internal refcount in the borrowing interpreter drops to
> zero.

There shouldn't be a need to synchronize on INCREF.  If both
interpreters have at least 1 reference then either one adding a
reference shouldn't be a problem.  If only one interpreter has a
reference then the other won't be adding any references.  If neither
has a reference then neither is going to add any references.  Perhaps
I've missed something.  Under what circumstances would INCREF happen
while the refcount is 0?

On DECREF there shouldn't be a problem except possibly with a small
race between decrementing the refcount and checking for a refcount of
0.  We could address that several different ways, including allowing
the pending call to get queued only once (or being a noop the second
time).

FWIW, I'm not opposed to the CIV/memoryview approach, but want to make
sure we really can't use Bytes before going down that route.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-26 Thread Nick Coghlan
On 26 September 2017 at 17:04, Antoine Pitrou  wrote:
> On Mon, 25 Sep 2017 17:42:02 -0700 Nathaniel Smith  wrote:
>> Unbounded queues also introduce unbounded latency and memory usage in
>> realistic situations.
>
> This doesn't seem to pose much a problem in common use cases, though.
> How many Python programs have you seen switch from an unbounded to a
> bounded Queue to solve this problem?
>
> Conversely, choosing a buffer size is tricky.  How do you know up front
> which amount you need?  Is a fixed buffer size even ok or do you want
> it to fluctuate based on the current conditions?
>
> And regardless, my point was that a buffer is desirable.  That send()
> may block when the buffer is full doesn't change that it won't block in
> the common case.

It's also the case that unlike Go channels, which were designed from
scratch on the basis of implementing pure CSP, Python has an
established behavioural precedent in the APIs of queue.Queue and
collections.deque: they're unbounded by default, and you have to opt
in to making them bounded.

>> There's a reason why sockets
>> always have bounded buffers -- it's sometimes painful, but the pain is
>> intrinsic to building distributed systems, and unbounded buffers just
>> paper over it.
>
> Papering over a problem is sometimes the right answer actually :-)  For
> example, most Python programs assume memory is unbounded...
>
> If I'm using a queue or channel to push events to a logging system,
> should I really block at every send() call?  Most probably I'd rather
> run ahead instead.

While the article title is clickbaity,
http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/
actually has a good discussion of this point. Search for "compose" to
find the relevant section ("Channels don’t compose well with other
concurrency primitives").

The specific problem cited is that only offering unbuffered or
bounded-buffer channels means that every send call becomes a potential
deadlock scenario, as all that needs to happen is for you to be
holding a different synchronisation primitive when the send call
blocks.

>> > Also, suddenly an interpreter's ability to exploit CPU time is
>> > dependent on another interpreter's ability to consume data in a timely
>> > manner (what if the other interpreter is e.g. stuck on some disk I/O?).
>> > IMHO it would be better not to have such coupling.
>>
>> A small buffer probably is useful in some cases, yeah -- basically
>> enough to smooth out scheduler jitter.
>
> That's not about scheduler jitter, but catering for activities which
> occur at inherently different speed or rhythms.  Requiring things run
> in lockstep removes a lot of flexibility and makes it harder to exploit
> CPU resources fully.

The fact that the proposal now allows for M:N sender:receiver
relationships (just as queue.Queue does with threads) makes that
problem worse, since you may now have variability not only on the
message consumption side, but also on the message production side.

Consider this example where you have an event processing thread pool
that we're attempting to isolate from blocking IO by using channels
rather than coroutines.

Desired flow:

1. Listener thread receives external message from socket
2. Listener thread files message for processing on receive channel
3. Listener thread returns to blocking on the receive socket

4. Processing thread picks up message from receive channel
5. Processing thread processes message
6. Processing thread puts reply on the send channel

7. Sending thread picks up message from send channel
8. Sending thread makes a blocking network send call to transmit the message
9. Sending thread returns to blocking on the send channel

When queue.Queue is used to pass the messages between threads, such an
arrangement will be effectively non-blocking as long as the send rate
is greater than or equal to the receive rate. However, the GIL means
it won't exploit all available cores, even if we create multiple
processing threads: you have to switch to multiprocessing for that,
with all the extra overhead that entails.

So I see the essential premise of PEP 554 as being to ask the question
"If each of these threads was running its own *interpreter*, could we
use Sans IO style protocols with interpreter channels to separate
internally "synchronous" processing threads from separate IO threads
operating at system boundaries, without having to make the entire
application pervasively asynchronous?"

If channels are an unbuffered blocking primitive, then we don't get
that benefit: even when there are additional receive messages to be
processed, the processing thread will block until the previous send
has completed. Switching the listener and sender threads over to
asynchronous IO would help with that, but they'd also end up having to
implement their own message buffering to manage the lack of buffering
in the core channel primitive.

By contrast, if the core channels are designed to offer an unbounded
buffer

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-26 Thread Walter Dörwald

On 23 Sep 2017, at 3:09, Eric Snow wrote:


[...]

``list_all()``::

   Return a list of all existing interpreters.


See my naming proposal in the previous thread.


Sorry, your previous comment slipped through the cracks.  You 
suggested:


As for the naming, let's make it both unconfusing and explicit?
How about three functions: `all_interpreters()`, 
`running_interpreters()`

and `idle_interpreters()`, for example?

As to "all_interpreters()", I suppose it's the difference between
"interpreters.all_interpreters()" and "interpreters.list_all()".  To
me the latter looks better.


But in most cases when Python returns a container (list/dict/iterator) 
of things, the name of the function/method is the name of the things, 
not the name of the container, i.e. we have sys.modules, dict.keys, 
dict.values etc.. Or if the collection of things itself has a name, it 
is that name, i.e. os.environ, sys.path etc.


Its a little bit unfortunate that the name of the module would be the 
same as the name of the function, but IMHO interpreters() would be 
better than list().



As to "running_interpreters()" and "idle_interpreters()", I'm not sure
what the benefit would be.  You can compose either list manually with
a simple comprehension:

[interp for interp in interpreters.list_all() if 
interp.is_running()]
[interp for interp in interpreters.list_all() if not 
interp.is_running()]


Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-26 Thread Antoine Pitrou
On Mon, 25 Sep 2017 17:42:02 -0700
Nathaniel Smith  wrote:
> On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou  wrote:
> >> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
> >> what the benefit would be.  You can compose either list manually with
> >> a simple comprehension:
> >>
> >> [interp for interp in interpreters.list_all() if interp.is_running()]
> >> [interp for interp in interpreters.list_all() if not 
> >> interp.is_running()]  
> >
> > There is a inherit race condition in doing that, at least if
> > interpreters are running in multiple threads (which I assume is going
> > to be the overly dominant usage model).  That is why I'm proposing all
> > three variants.  
> 
> There's a race condition no matter what the API looks like -- having a
> dedicated running_interpreters() lets you guarantee that the returned
> list describes the set of interpreters that were running at some
> moment in time, but you don't know when that moment was and by the
> time you get the list, it's already out-of-date.

Hmm, you're right of course.

> >> Likewise,
> >> queue.Queue.send() supports blocking, in addition to providing a
> >> put_nowait() method.  
> >
> > queue.Queue.put() never blocks in the usual case (*), which is of an
> > unbounded queue.  Only bounded queues (created with an explicit
> > non-zero max_size parameter) can block in Queue.put().
> >
> > (*) and therefore also never deadlocks :-)  
> 
> Unbounded queues also introduce unbounded latency and memory usage in
> realistic situations.

This doesn't seem to pose much a problem in common use cases, though.
How many Python programs have you seen switch from an unbounded to a
bounded Queue to solve this problem?

Conversely, choosing a buffer size is tricky.  How do you know up front
which amount you need?  Is a fixed buffer size even ok or do you want
it to fluctuate based on the current conditions?

And regardless, my point was that a buffer is desirable.  That send()
may block when the buffer is full doesn't change that it won't block in
the common case.

> There's a reason why sockets
> always have bounded buffers -- it's sometimes painful, but the pain is
> intrinsic to building distributed systems, and unbounded buffers just
> paper over it.

Papering over a problem is sometimes the right answer actually :-)  For
example, most Python programs assume memory is unbounded...

If I'm using a queue or channel to push events to a logging system,
should I really block at every send() call?  Most probably I'd rather
run ahead instead.

> > Also, suddenly an interpreter's ability to exploit CPU time is
> > dependent on another interpreter's ability to consume data in a timely
> > manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> > IMHO it would be better not to have such coupling.  
> 
> A small buffer probably is useful in some cases, yeah -- basically
> enough to smooth out scheduler jitter.

That's not about scheduler jitter, but catering for activities which
occur at inherently different speed or rhythms.  Requiring things run
in lockstep removes a lot of flexibility and makes it harder to exploit
CPU resources fully.

> > I expect more often than expected, in complex systems :-)  For example,
> > you could have a recv() loop that also from time to time send()s some
> > data on another queue, depending on what is received.  But if that
> > send()'s recipient also has the same structure (a recv() loop which
> > send()s from time to time), then it's easy to imagine to two getting in
> > a deadlock.  
> 
> You kind of want to be able to create deadlocks, since the alternative
> is processes that can't coordinate and end up stuck in livelocks or
> with unbounded memory use etc.

I am not advocating we make it *impossible* to create deadlocks; just
saying we should not make them more *likely* than they need to.

> >> I'm not sure I understand your concern here.  Perhaps I used the word
> >> "sharing" too ambiguously?  By "sharing" I mean that the two actors
> >> have read access to something that at least one of them can modify.
> >> If they both only have read-only access then it's effectively the same
> >> as if they are not sharing.  
> >
> > Right.  What I mean is that you *can* share very simple "data" under
> > the form of synchronization primitives.  You may want to synchronize
> > your interpreters even they don't share user-visible memory areas.  The
> > point of synchronization is not only to avoid memory corruption but
> > also to regulate and orchestrate processing amongst multiple workers
> > (for example processes or interpreters).  For example, a semaphore is
> > an easy way to implement "I want no more than N workers to do this
> > thing at the same time" ("this thing" can be something such as disk
> > I/O).  
> 
> It's fairly reasonable to implement a mutex using a CSP-style
> unbuffered channel (send = acquire, receive = release). And the same
> trick turns a channel with a fixed-size b

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-25 Thread Nathaniel Smith
On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou  wrote:
>> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
>> what the benefit would be.  You can compose either list manually with
>> a simple comprehension:
>>
>> [interp for interp in interpreters.list_all() if interp.is_running()]
>> [interp for interp in interpreters.list_all() if not interp.is_running()]
>
> There is a inherit race condition in doing that, at least if
> interpreters are running in multiple threads (which I assume is going
> to be the overly dominant usage model).  That is why I'm proposing all
> three variants.

There's a race condition no matter what the API looks like -- having a
dedicated running_interpreters() lets you guarantee that the returned
list describes the set of interpreters that were running at some
moment in time, but you don't know when that moment was and by the
time you get the list, it's already out-of-date. So this doesn't seem
very useful. OTOH if we think that invariants like this are useful, we
might also want to guarantee that calling running_interpreters() and
idle_interpreters() gives two lists such that each interpreter appears
in exactly one of them, but that's impossible with this API; it'd
require a single function that returns both lists.

What problem are you trying to solve?

>> Likewise,
>> queue.Queue.send() supports blocking, in addition to providing a
>> put_nowait() method.
>
> queue.Queue.put() never blocks in the usual case (*), which is of an
> unbounded queue.  Only bounded queues (created with an explicit
> non-zero max_size parameter) can block in Queue.put().
>
> (*) and therefore also never deadlocks :-)

Unbounded queues also introduce unbounded latency and memory usage in
realistic situations. (E.g. a producer/consumer setup where the
producer runs faster than the consumer.) There's a reason why sockets
always have bounded buffers -- it's sometimes painful, but the pain is
intrinsic to building distributed systems, and unbounded buffers just
paper over it.

>> > send() blocking until someone else calls recv() is not only bad for
>> > performance,
>>
>> What is the performance problem?
>
> Intuitively, there must be some kind of context switch (interpreter
> switch?) at each send() call to let the other end receive the data,
> since you don't have any internal buffering.

Technically you just need the other end to wake up at some time in
between any two calls to send(), and if there's no GIL then this
doesn't necessarily require a context switch.

> Also, suddenly an interpreter's ability to exploit CPU time is
> dependent on another interpreter's ability to consume data in a timely
> manner (what if the other interpreter is e.g. stuck on some disk I/O?).
> IMHO it would be better not to have such coupling.

A small buffer probably is useful in some cases, yeah -- basically
enough to smooth out scheduler jitter.

>> > it also increases the likelihood of deadlocks.
>>
>> How much of a problem will deadlocks be in practice?
>
> I expect more often than expected, in complex systems :-)  For example,
> you could have a recv() loop that also from time to time send()s some
> data on another queue, depending on what is received.  But if that
> send()'s recipient also has the same structure (a recv() loop which
> send()s from time to time), then it's easy to imagine to two getting in
> a deadlock.

You kind of want to be able to create deadlocks, since the alternative
is processes that can't coordinate and end up stuck in livelocks or
with unbounded memory use etc.

>> I'm not sure I understand your concern here.  Perhaps I used the word
>> "sharing" too ambiguously?  By "sharing" I mean that the two actors
>> have read access to something that at least one of them can modify.
>> If they both only have read-only access then it's effectively the same
>> as if they are not sharing.
>
> Right.  What I mean is that you *can* share very simple "data" under
> the form of synchronization primitives.  You may want to synchronize
> your interpreters even they don't share user-visible memory areas.  The
> point of synchronization is not only to avoid memory corruption but
> also to regulate and orchestrate processing amongst multiple workers
> (for example processes or interpreters).  For example, a semaphore is
> an easy way to implement "I want no more than N workers to do this
> thing at the same time" ("this thing" can be something such as disk
> I/O).

It's fairly reasonable to implement a mutex using a CSP-style
unbuffered channel (send = acquire, receive = release). And the same
trick turns a channel with a fixed-size buffer into a bounded
semaphore. It won't be as efficient as a modern specialized mutex
implementation, of course, but it's workable.

Unfortunately while technically you can construct a buffered channel
out of an unbuffered channel, the construction's pretty unreasonable
(it needs two dedicated threads per channel).

-n

-- 
Nathaniel J. Smith -- https://

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-23 Thread MRAB

On 2017-09-23 10:45, Antoine Pitrou wrote:


Hi Eric,

On Fri, 22 Sep 2017 19:09:01 -0600
Eric Snow  wrote:


Please elaborate.  I'm interested in understanding what you mean here.
Do you have some subinterpreter-based concurrency improvements in
mind?  What aspect of CSP is the PEP following too faithfully?


See below the discussion of blocking send()s :-)


As to "running_interpreters()" and "idle_interpreters()", I'm not sure
what the benefit would be.  You can compose either list manually with
a simple comprehension:

[interp for interp in interpreters.list_all() if interp.is_running()]
[interp for interp in interpreters.list_all() if not interp.is_running()]


There is a inherit race condition in doing that, at least if
interpreters are running in multiple threads (which I assume is going
to be the overly dominant usage model).  That is why I'm proposing all
three variants.


An alternative to 3 variants would be:

interpreters.list_all(running=True)

interpreters.list_all(running=False)

interpreters.list_all(running=None)

[snip]
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-23 Thread Antoine Pitrou

Hi Eric,

On Fri, 22 Sep 2017 19:09:01 -0600
Eric Snow  wrote:
> 
> Please elaborate.  I'm interested in understanding what you mean here.
> Do you have some subinterpreter-based concurrency improvements in
> mind?  What aspect of CSP is the PEP following too faithfully?

See below the discussion of blocking send()s :-)

> As to "running_interpreters()" and "idle_interpreters()", I'm not sure
> what the benefit would be.  You can compose either list manually with
> a simple comprehension:
> 
> [interp for interp in interpreters.list_all() if interp.is_running()]
> [interp for interp in interpreters.list_all() if not interp.is_running()]

There is a inherit race condition in doing that, at least if
interpreters are running in multiple threads (which I assume is going
to be the overly dominant usage model).  That is why I'm proposing all
three variants.

> >  I don't think it's a
> > coincidence that the most varied kinds of I/O (from socket or file IO
> > to threading Queues to multiprocessing Pipes) have non-blocking send().  
> 
> Interestingly, you can set sockets to blocking mode, in which case
> send() will block until there is room in the kernel buffer.

Yes, but there *is* a kernel buffer. Which is the whole point of my
comment: most alike primitives have internal buffering to prevent the
user-facing send() API from blocking in the common case.

> Likewise,
> queue.Queue.send() supports blocking, in addition to providing a
> put_nowait() method.

queue.Queue.put() never blocks in the usual case (*), which is of an
unbounded queue.  Only bounded queues (created with an explicit
non-zero max_size parameter) can block in Queue.put().

(*) and therefore also never deadlocks :-)

> Note that the PEP provides "recv_nowait()" and "send_nowait()" (names
> inspired by queue.Queue), allowing for a non-blocking send.

True, but it's not the same thing at all.  In the objects I mentioned,
send() mostly doesn't block and doesn't fail either.  In your model,
send_nowait() will routinely fail with an error if a recipient isn't
immediately available to recv the data.

> > send() blocking until someone else calls recv() is not only bad for
> > performance,  
> 
> What is the performance problem?

Intuitively, there must be some kind of context switch (interpreter
switch?) at each send() call to let the other end receive the data,
since you don't have any internal buffering.

Also, suddenly an interpreter's ability to exploit CPU time is
dependent on another interpreter's ability to consume data in a timely
manner (what if the other interpreter is e.g. stuck on some disk I/O?).
IMHO it would be better not to have such coupling.

> > it also increases the likelihood of deadlocks.  
> 
> How much of a problem will deadlocks be in practice?

I expect more often than expected, in complex systems :-)  For example,
you could have a recv() loop that also from time to time send()s some
data on another queue, depending on what is received.  But if that
send()'s recipient also has the same structure (a recv() loop which
send()s from time to time), then it's easy to imagine to two getting in
a deadlock.

> (FWIW, CSP
> provides rigorous guarantees about deadlock detection (which Go
> leverages), though I'm not sure how much benefit that can offer such a
> dynamic language as Python.)

Hmm... deadlock detection is one thing, but when detected you must still
solve those deadlock issues, right?

> I'm not sure I understand your concern here.  Perhaps I used the word
> "sharing" too ambiguously?  By "sharing" I mean that the two actors
> have read access to something that at least one of them can modify.
> If they both only have read-only access then it's effectively the same
> as if they are not sharing.

Right.  What I mean is that you *can* share very simple "data" under
the form of synchronization primitives.  You may want to synchronize
your interpreters even they don't share user-visible memory areas.  The
point of synchronization is not only to avoid memory corruption but
also to regulate and orchestrate processing amongst multiple workers
(for example processes or interpreters).  For example, a semaphore is
an easy way to implement "I want no more than N workers to do this
thing at the same time" ("this thing" can be something such as disk
I/O).

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-22 Thread Eric Snow
Thanks for the feedback, Antoine.  Sorry for the delay; it's been a
busy week for me.  I just pushed an updated PEP to the repo.  Once
I've sorted out the question of passing bytes through channels I plan
on posting the PEP to the list again for another round of discussion.
In the meantime, I've replied below in-line.

-eric

On Mon, Sep 18, 2017 at 4:46 AM, Antoine Pitrou  wrote:
> First my high-level opinion about the PEP: the CSP model can probably
> be already implemented using Queues.  To me, the interesting promise of
> subinterpreters is if they allow to remove the GIL while sharing memory
> for big objects (such as Numpy arrays).  This means the PEP should
> probably focus on potential concurrency improvements rather than try to
> faithfully follow the CSP model.

Please elaborate.  I'm interested in understanding what you mean here.
Do you have some subinterpreter-based concurrency improvements in
mind?  What aspect of CSP is the PEP following too faithfully?

>> ``list_all()``::
>>
>>Return a list of all existing interpreters.
>
> See my naming proposal in the previous thread.

Sorry, your previous comment slipped through the cracks.  You suggested:

As for the naming, let's make it both unconfusing and explicit?
How about three functions: `all_interpreters()`, `running_interpreters()`
and `idle_interpreters()`, for example?

As to "all_interpreters()", I suppose it's the difference between
"interpreters.all_interpreters()" and "interpreters.list_all()".  To
me the latter looks better.

As to "running_interpreters()" and "idle_interpreters()", I'm not sure
what the benefit would be.  You can compose either list manually with
a simple comprehension:

[interp for interp in interpreters.list_all() if interp.is_running()]
[interp for interp in interpreters.list_all() if not interp.is_running()]

>>run(source_str, /, **shared):
>>
>>   Run the provided Python source code in the interpreter.  Any
>>   keyword arguments are added to the interpreter's execution
>>   namespace.
>
> "Execution namespace" specifically means the __main__ module in the
> target interpreter, right?

Right.  It's explained in more detail a little further down and
elsewhere in the PEP.  I've updated the PEP to explicitly mention
__main__ here too.

>>  If any of the values are not supported for sharing
>>   between interpreters then RuntimeError gets raised.  Currently
>>   only channels (see "create_channel()" below) are supported.
>>
>>   This may not be called on an already running interpreter.  Doing
>>   so results in a RuntimeError.
>
> I would distinguish between both error cases: RuntimeError for calling
> run() on an already running interpreter, ValueError for values which
> are not supported for sharing.

Good point.

>>   Likewise, if there is any uncaught
>>   exception, it propagates into the code where "run()" was called.
>
> That makes it a bit harder to differentiate with errors raised by run()
> itself (see above), though how much of an annoyance this is remains
> unclear.  The more litigious implication, though, is that it forces the
> interpreter to support migration of arbitrary objects from one
> interpreter to another (since a traceback keeps all local variables
> alive).

Yeah, the proposal to propagate exceptions out of the subinterpreter
is still rather weak.  I've added some notes the the PEP about this
open issue.

>> The mechanism for passing objects between interpreters is through
>> channels.  A channel is a simplex FIFO similar to a pipe.  The main
>> difference is that channels can be associated with zero or more
>> interpreters on either end.
>
> So it seems channels have become more complicated now?  Is it important
> to support multi-producer multi-consumer channels?

To me it made the API simpler.  The change did introduce the "close()"
method, which I suppose could be confusing.  However, I'm sure that in
practice it won't be.  In contrast, the FIFO/pipe-based API that I had
before required passing names around, required more calls, required
managing the channel/interpreter relationship more carefully, and made
it hard to follow that relationship.

>>  Unlike queues, which are also many-to-many,
>> channels have no buffer.
>
> How does it work?  Does send() block until someone else calls recv()?
> That does not sound like a good idea to me.

Correct "send()" blocks until the other end receives (if ever).
Likewise "recv()" blocks until the other end sends.  This specific
behavior is probably the main thing I borrowed from CSP.  It is *the*
synchronization mechanism.  Given the isolated nature of
subinterpreters, I consider using this concept from CSP to be a good
fit.

>  I don't think it's a
> coincidence that the most varied kinds of I/O (from socket or file IO
> to threading Queues to multiprocessing Pipes) have non-blocking send().

Interestingly, you can set sockets to blocking mode, in which case
send() will block until ther

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-18 Thread Antoine Pitrou

Hi,

First my high-level opinion about the PEP: the CSP model can probably
be already implemented using Queues.  To me, the interesting promise of
subinterpreters is if they allow to remove the GIL while sharing memory
for big objects (such as Numpy arrays).  This means the PEP should
probably focus on potential concurrency improvements rather than try to
faithfully follow the CSP model.

Other than that, a bunch of detailed comments follow:

On Wed, 13 Sep 2017 18:44:31 -0700
Eric Snow  wrote:
> 
> API for interpreters
> 
> 
> The module provides the following functions:
> 
> ``list_all()``::
> 
>Return a list of all existing interpreters.

See my naming proposal in the previous thread.

> 
>run(source_str, /, **shared):
> 
>   Run the provided Python source code in the interpreter.  Any
>   keyword arguments are added to the interpreter's execution
>   namespace.

"Execution namespace" specifically means the __main__ module in the
target interpreter, right?

>  If any of the values are not supported for sharing
>   between interpreters then RuntimeError gets raised.  Currently
>   only channels (see "create_channel()" below) are supported.
> 
>   This may not be called on an already running interpreter.  Doing
>   so results in a RuntimeError.

I would distinguish between both error cases: RuntimeError for calling
run() on an already running interpreter, ValueError for values which
are not supported for sharing.

>   Likewise, if there is any uncaught
>   exception, it propagates into the code where "run()" was called.

That makes it a bit harder to differentiate with errors raised by run()
itself (see above), though how much of an annoyance this is remains
unclear.  The more litigious implication, though, is that it forces the
interpreter to support migration of arbitrary objects from one
interpreter to another (since a traceback keeps all local variables
alive).

> API for sharing data
> 
> 
> The mechanism for passing objects between interpreters is through
> channels.  A channel is a simplex FIFO similar to a pipe.  The main
> difference is that channels can be associated with zero or more
> interpreters on either end.

So it seems channels have become more complicated now?  Is it important
to support multi-producer multi-consumer channels?

>  Unlike queues, which are also many-to-many,
> channels have no buffer.

How does it work?  Does send() block until someone else calls recv()?
That does not sound like a good idea to me.  I don't think it's a
coincidence that the most varied kinds of I/O (from socket or file IO
to threading Queues to multiprocessing Pipes) have non-blocking send().

send() blocking until someone else calls recv() is not only bad for
performance, it also increases the likelihood of deadlocks.

>recv_nowait(default=None):
> 
>   Return the next object from the channel.  If none have been sent
>   then return the default.  If the channel has been closed
>   then EOFError is raised.
> 
>close():
> 
>   No longer associate the current interpreter with the channel (on
>   the receiving end).  This is a noop if the interpreter isn't
>   already associated.  Once an interpreter is no longer associated
>   with the channel, subsequent (or current) send() and recv() calls
>   from that interpreter will raise EOFError.

EOFError normally means the *other* (sending) side has closed the
channel (but it becomes complicated with a multi-producer multi-consumer
setup...). When *this* side has closed the channel, we should raise
ValueError.

>  The Python runtime
>   will garbage collect all closed channels.  Note that "close()" is
>   automatically called when it is no longer used in the current
>   interpreter.

"No longer used" meaning it loses all references in this interpreter?

>send(obj):
> 
>Send the object to the receiving end of the channel.  Wait until
>the object is received.  If the channel does not support the
>object then TypeError is raised.  Currently only bytes are
>supported.  If the channel has been closed then EOFError is
>raised.

Similar remark as above (EOFError vs. ValueError).
More generally, send() raising EOFError sounds unheard of.

A sidenote: context manager support (__enter__ / __exit__) on channels
would sound more useful to me than iteration support.

> Initial support for buffers in channels
> ---
> 
> An alternative to support for bytes in channels in support for
> read-only buffers (the PEP 3119 kind).

Probably you mean PEP 3118.

> Then ``recv()`` would return
> a memoryview to expose the buffer in a zero-copy way.

It will probably not do much if you only can pass buffers and not
structured objects, because unserializing (e.g. unpickling) from a
buffer will still copy memory around.

To pass a Numpy array, for example, you not only need to pa

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-14 Thread Nick Coghlan
On 14 September 2017 at 11:44, Eric Snow  wrote:
> About Subinterpreters
> =
>
> Shared data
> ---

[snip]

> To make this work, the mutable shared state will be managed by the
> Python runtime, not by any of the interpreters.  Initially we will
> support only one type of objects for shared state: the channels provided
> by ``create_channel()``.  Channels, in turn, will carefully manage
> passing objects between interpreters.

Something I think you may want to explicitly call out as *not* being
shared is the thread objects in threading.enumerate(), as the way that
works in the current implementation makes sense, but isn't
particularly obvious (what I have below comes from experimenting with
your branch at https://github.com/python/cpython/pull/1748).

Specifically, what happens is that the operating system thread
underlying the existing interpreter thread that calls interp.run()
gets borrowed as the operating system thread underlying the MainThread
object in the called interpreter. That MainThread object then gets
preserved in the interpreter's interpreter state, but the mapping to
an underlying OS thread will change freely based on who's calling into
it. From outside an interpreter, you *can't* request to run code in
subthreads directly - you'll always run your given code in the main
thread, and it will be up to that to dispatch requests to subthreads.

Beyond the thread lending that happens when you call interp.run()
(where one of your threads gets borrowed as the other interpreter's
main thread), each interpreter otherwise maintains a completely
disjoint set of thread objects that it is solely responsible for.

This also clarifies for me what it means for an interpreter to be a
"main" interpreter: it's the interpreter who's main thread actually
corresponds to the main thread of the overall operating system
process, rather than being temporarily borrowed from another
interpreter.

We're going to have to put some thought into how we want that to
interact with the signal handling logic - right now, I believe *any*
main thread will consider it its responsibility to process signals
delivered to the runtime (and embedding application avoid the
potential problems arising from that by simply not installing the
CPython signal handlers in the first place), and we probably want to
change that condition to be "the main thread in the main interpreter".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-14 Thread Nick Coghlan
On 15 September 2017 at 12:04, Nathaniel Smith  wrote:
> On Thu, Sep 14, 2017 at 5:44 PM, Nick Coghlan  wrote:
>> The reason we're OK with this is that it means that only reading a new
>> message from a channel (i.e creating a cross-interpreter view) or
>> discarding a previously read message (i.e. closing a cross-interpreter
>> view) will be synchronisation points where the receiving interpreter
>> necessarily needs to acquire the sending interpreter's GIL.
>>
>> By contrast, if we allow an actual bytes object to be shared, then
>> either every INCREF or DECREF on that bytes object becomes a
>> synchronisation point, or else we end up needing some kind of
>> secondary per-interpreter refcount where the interpreter doesn't drop
>> its shared reference to the original object in its source interpreter
>> until the internal refcount in the borrowing interpreter drops to
>> zero.
>
> Ah, that makes more sense.
>
> I am nervous that allowing arbitrary memoryviews gives a *little* more
> power than we need or want. I like that the current API can reasonably
> be emulated using subprocesses -- it opens up the door for backports,
> compatibility support on language implementations that don't support
> subinterpreters, direct benchmark comparisons between the two
> implementation strategies, etc. But if we allow arbitrary memoryviews,
> then this requires that you can take (a) an arbitrary object, not
> specified ahead of time, and (b) provide two read-write views on it in
> separate interpreters such that modifications made in one are
> immediately visible in the other. Subprocesses can do one or the other
> -- they can copy arbitrary data, and if you warn them ahead of time
> when you allocate the buffer, they can do real zero-copy shared
> memory. But the combination is really difficult.

One constraint we'd want to impose is that the memory view in the
receiving interpreter should always be read-only - while we don't
currently expose the ability to request that at the Python layer,
memoryviews *do* support the creation of read-only views at the C API
layer (which then gets reported to Python code via the "view.readonly"
attribute).

While that change alone is enough to preserve the simplex nature of
the channel, it wouldn't be enough to prevent the *sender* from
mutating the buffer contents and having that change be visible in the
recipient.

In that regard it may make sense to maintain both restrictions
initially (as you suggested below): only accept bytes on the sending
side (to prevent mutation by the sender), and expose that as a
read-only memory view on the receiving side (to allow for zero-copy
data sharing without allowing mutation by the receiver).

> It'd be one thing if this were like a key feature that gave
> subinterpreters an advantage over subprocesses, but it seems really
> unlikely to me that a library won't know ahead of time when it's
> filling in a buffer to be transferred, and if anything it seems like
> we'd rather not expose read-write shared mappings in any case. It's
> extremely non-trivial to do right [1].
>
> tl;dr: let's not rule out a useful implementation strategy based on a
> feature we don't actually need.

Yeah, the description Eric currently has in the PEP is a summary of a
much longer suggestion Yury, Neil Schumenauer and I put together while
waiting for our flights following the core dev sprint, and the full
version had some of these additional constraints on it (most notably
the "read-only in the receiving interpreter" one).

> One alternative would be your option (3) -- you can put bytes in and
> get memoryviews out, and since bytes objects are immutable it's OK.

Indeed, I think that will be a sensible starting point. However, I
genuinely want to allow for zero-copy sharing of NumPy arrays
eventually, as that's where I think this idea gets most interesting:
the potential to allow for multiple parallel read operations on a
given NumPy array *in Python* (rather than Cython or C) without
running afoul of the GIL, and without needing to mess about with the
complexities of operating system level IPC.

 Handling an exception
>> That way channels can be a namespace *specifically* for passing in
>> channels, and can be reported as such on RunResult. If we decide to
>> allow arbitrary shared objects in the future, or add flag options like
>> "reraise=True" to reraise exceptions from the subinterpreter in the
>> current interpreter, we'd have that ability, rather than having the
>> entire potential keyword namespace taken up for passing shared
>> objects.
>
> Would channels be a dict, or...?

Yeah, it would be a direct replacement for the way the current draft
is proposing to use the keywords dict - it would just be a separate
dictionary instead.

It does occur to me that if we wanted to align with the way the
`runpy` module spells that concept, we'd call the option
`init_globals`, but I'm thinking it will be better to only allow
channels to be passed through directly, and requir

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-14 Thread Nathaniel Smith
On Thu, Sep 14, 2017 at 5:44 PM, Nick Coghlan  wrote:
> On 14 September 2017 at 15:27, Nathaniel Smith  wrote:
>> I don't get it. With bytes, you can either share objects or copy them and
>> the user can't tell the difference, so you can change your mind later if you
>> want.
>> But memoryviews require some kind of cross-interpreter strong
>> reference to keep the underlying buffer object alive. So if you want to
>> minimize object sharing, surely bytes are more future-proof.
>
> Not really, because the only way to ensure object separation (i.e no
> refcounted objects accessible from multiple interpreters at once) with
> a bytes-based API would be to either:
>
> 1. Always copy (eliminating most of the low overhead communications
> benefits that subinterpreters may offer over multiple processes)
> 2. Make the bytes implementation more complicated by allowing multiple
> bytes objects to share the same underlying storage while presenting as
> distinct objects in different interpreters
> 3. Make the output on the receiving side not actually a bytes object,
> but instead a view onto memory owned by another object in a different
> interpreter (a "memory view", one might say)
>
> And yes, using memory views for this does mean defining either a
> subclass or a mediating object that not only keeps the originating
> object alive until the receiving memoryview is closed, but also
> retains a reference to the originating interpreter so that it can
> switch to it when it needs to manipulate the source object's refcount
> or call one of the buffer methods.
>
> Yury and I are fine with that, since it means that either the sender
> *or* the receiver can decide to copy the data (e.g. by calling
> bytes(obj) before sending, or bytes(view) after receiving), and in the
> meantime, the object holding the cross-interpreter view knows that it
> needs to switch interpreters (and hence acquire the sending
> interpreter's GIL) before doing anything with the source object.
>
> The reason we're OK with this is that it means that only reading a new
> message from a channel (i.e creating a cross-interpreter view) or
> discarding a previously read message (i.e. closing a cross-interpreter
> view) will be synchronisation points where the receiving interpreter
> necessarily needs to acquire the sending interpreter's GIL.
>
> By contrast, if we allow an actual bytes object to be shared, then
> either every INCREF or DECREF on that bytes object becomes a
> synchronisation point, or else we end up needing some kind of
> secondary per-interpreter refcount where the interpreter doesn't drop
> its shared reference to the original object in its source interpreter
> until the internal refcount in the borrowing interpreter drops to
> zero.

Ah, that makes more sense.

I am nervous that allowing arbitrary memoryviews gives a *little* more
power than we need or want. I like that the current API can reasonably
be emulated using subprocesses -- it opens up the door for backports,
compatibility support on language implementations that don't support
subinterpreters, direct benchmark comparisons between the two
implementation strategies, etc. But if we allow arbitrary memoryviews,
then this requires that you can take (a) an arbitrary object, not
specified ahead of time, and (b) provide two read-write views on it in
separate interpreters such that modifications made in one are
immediately visible in the other. Subprocesses can do one or the other
-- they can copy arbitrary data, and if you warn them ahead of time
when you allocate the buffer, they can do real zero-copy shared
memory. But the combination is really difficult.

It'd be one thing if this were like a key feature that gave
subinterpreters an advantage over subprocesses, but it seems really
unlikely to me that a library won't know ahead of time when it's
filling in a buffer to be transferred, and if anything it seems like
we'd rather not expose read-write shared mappings in any case. It's
extremely non-trivial to do right [1].

tl;dr: let's not rule out a useful implementation strategy based on a
feature we don't actually need.

One alternative would be your option (3) -- you can put bytes in and
get memoryviews out, and since bytes objects are immutable it's OK.

[1] https://en.wikipedia.org/wiki/Memory_model_(programming)

>>> Handling an exception
>>> -
>> It would also be reasonable to simply not return any value/exception from
>> run() at all, or maybe just a bool for whether there was an unhandled
>> exception. Any high level API is going to be injecting code on both sides of
>> the interpreter boundary anyway, so it can do whatever exception and
>> traceback translation it wants to.
>
> So any more detailed response would *have* to come back as a channel message?
>
> That sounds like a reasonable option to me, too, especially since
> module level code doesn't have a return value as such - you can really
> only say "it raised an exception (and this was the exception it

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-14 Thread Nick Coghlan
On 14 September 2017 at 15:27, Nathaniel Smith  wrote:
> On Sep 13, 2017 9:01 PM, "Nick Coghlan"  wrote:
>
> On 14 September 2017 at 11:44, Eric Snow 
> wrote:
>>send(obj):
>>
>>Send the object to the receiving end of the channel.  Wait until
>>the object is received.  If the channel does not support the
>>object then TypeError is raised.  Currently only bytes are
>>supported.  If the channel has been closed then EOFError is
>>raised.
>
> I still expect any form of object sharing to hinder your
> per-interpreter GIL efforts, so restricting the initial implementation
> to memoryview-only seems more future-proof to me.
>
>
> I don't get it. With bytes, you can either share objects or copy them and
> the user can't tell the difference, so you can change your mind later if you
> want.
> But memoryviews require some kind of cross-interpreter strong
> reference to keep the underlying buffer object alive. So if you want to
> minimize object sharing, surely bytes are more future-proof.

Not really, because the only way to ensure object separation (i.e no
refcounted objects accessible from multiple interpreters at once) with
a bytes-based API would be to either:

1. Always copy (eliminating most of the low overhead communications
benefits that subinterpreters may offer over multiple processes)
2. Make the bytes implementation more complicated by allowing multiple
bytes objects to share the same underlying storage while presenting as
distinct objects in different interpreters
3. Make the output on the receiving side not actually a bytes object,
but instead a view onto memory owned by another object in a different
interpreter (a "memory view", one might say)

And yes, using memory views for this does mean defining either a
subclass or a mediating object that not only keeps the originating
object alive until the receiving memoryview is closed, but also
retains a reference to the originating interpreter so that it can
switch to it when it needs to manipulate the source object's refcount
or call one of the buffer methods.

Yury and I are fine with that, since it means that either the sender
*or* the receiver can decide to copy the data (e.g. by calling
bytes(obj) before sending, or bytes(view) after receiving), and in the
meantime, the object holding the cross-interpreter view knows that it
needs to switch interpreters (and hence acquire the sending
interpreter's GIL) before doing anything with the source object.

The reason we're OK with this is that it means that only reading a new
message from a channel (i.e creating a cross-interpreter view) or
discarding a previously read message (i.e. closing a cross-interpreter
view) will be synchronisation points where the receiving interpreter
necessarily needs to acquire the sending interpreter's GIL.

By contrast, if we allow an actual bytes object to be shared, then
either every INCREF or DECREF on that bytes object becomes a
synchronisation point, or else we end up needing some kind of
secondary per-interpreter refcount where the interpreter doesn't drop
its shared reference to the original object in its source interpreter
until the internal refcount in the borrowing interpreter drops to
zero.

>> Handling an exception
>> -
> It would also be reasonable to simply not return any value/exception from
> run() at all, or maybe just a bool for whether there was an unhandled
> exception. Any high level API is going to be injecting code on both sides of
> the interpreter boundary anyway, so it can do whatever exception and
> traceback translation it wants to.

So any more detailed response would *have* to come back as a channel message?

That sounds like a reasonable option to me, too, especially since
module level code doesn't have a return value as such - you can really
only say "it raised an exception (and this was the exception it
raised)" or "it reached the end of the code without raising an
exception".

Given that, I think subprocess.run() (with check=False) is the right
API precedent here:
https://docs.python.org/3/library/subprocess.html#subprocess.run

That always returns subprocess.CompletedProcess, and then you can call
"cp.check_returncode()" to get it to raise
subprocess.CalledProcessError for non-zero return codes.

For interpreter.run(), we could keep the initial RunResult *really*
simple and only report back:

* source: the source code passed to run()
* shared: the keyword args passed to run() (name chosen to match
functools.partial)
* completed: completed execution without raising an exception? (True
if yes, False otherwise)

Whether or not to report more details for a raised exception, and
provide some mechanism to reraise it in the calling interpreter could
then be deferred until later.

The subprocess.run() comparison does make me wonder whether this might
be a more future-proof signature for Interpreter.run() though:

def run(source_str, /, *, channels=None):
...

That way channels can

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-13 Thread Nathaniel Smith
On Sep 13, 2017 9:01 PM, "Nick Coghlan"  wrote:

On 14 September 2017 at 11:44, Eric Snow 
wrote:
>send(obj):
>
>Send the object to the receiving end of the channel.  Wait until
>the object is received.  If the channel does not support the
>object then TypeError is raised.  Currently only bytes are
>supported.  If the channel has been closed then EOFError is
>raised.

I still expect any form of object sharing to hinder your
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.


I don't get it. With bytes, you can either share objects or copy them and
the user can't tell the difference, so you can change your mind later if
you want. But memoryviews require some kind of cross-interpreter strong
reference to keep the underlying buffer object alive. So if you want to
minimize object sharing, surely bytes are more future-proof.


> Handling an exception
> -
>
> ::
>
>interp = interpreters.create()
>try:
>interp.run("""if True:
>raise KeyError
>""")
>except KeyError:
>print("got the error from the subinterpreter")

As with the message passing through channels, I think you'll really
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.

One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.


It would also be reasonable to simply not return any value/exception from
run() at all, or maybe just a bool for whether there was an unhandled
exception. Any high level API is going to be injecting code on both sides
of the interpreter boundary anyway, so it can do whatever exception and
traceback translation it wants to.


> Reseting __main__
> -
>
> As proposed, every call to ``Interpreter.run()`` will execute in the
> namespace of the interpreter's existing ``__main__`` module.  This means
> that data persists there between ``run()`` calls.  Sometimes this isn't
> desireable and you want to execute in a fresh ``__main__``.  Also,
> you don't necessarily want to leak objects there that you aren't using
> any more.
>
> Solutions include:
>
> * a ``create()`` arg to indicate resetting ``__main__`` after each
>   ``run`` call
> * an ``Interpreter.reset_main`` flag to support opting in or out
>   after the fact
> * an ``Interpreter.reset_main()`` method to opt in when desired
>
> This isn't a critical feature initially.  It can wait until later
> if desirable.

I was going to note that you can already do this:

interp.run("globals().clear()")

However, that turns out to clear *too* much, since it also clobbers
all the __dunder__ attributes that the interpreter needs in a code
execution environment.

Either way, if you added this, I think it would make more sense as an
"importlib.util.reset_globals()" operation, rather than have it be
something specific to subinterpreters.


This is another point where the API could reasonably say that if you want
clean namespaces then you should do that yourself (e.g. by setting up your
own globals dict and using it to execute any post-bootstrap code).

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-13 Thread Yury Selivanov
On Wed, Sep 13, 2017 at 11:56 PM, Nick Coghlan  wrote:
[..]
>>send(obj):
>>
>>Send the object to the receiving end of the channel.  Wait until
>>the object is received.  If the channel does not support the
>>object then TypeError is raised.  Currently only bytes are
>>supported.  If the channel has been closed then EOFError is
>>raised.
>
> I still expect any form of object sharing to hinder your
> per-interpreter GIL efforts, so restricting the initial implementation
> to memoryview-only seems more future-proof to me.

+1.  Working with memoryviews is as convenient as with bytes.

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-13 Thread Nick Coghlan
On 14 September 2017 at 11:44, Eric Snow  wrote:
> I've updated PEP 554 in response to feedback.  (thanks all!)  There
> are a few unresolved points (some of them added to the Open Questions
> section), but the current PEP has changed enough that I wanted to get
> it out there first.
>
> Notably changed:
>
> * the API relative to object passing has changed somewhat drastically
> (hopefully simpler and easier to understand), replacing "FIFO" with
> "channel"
> * added an examples section
> * added an open questions section
> * added a rejected ideas section
> * added more items to the deferred functionality section
> * the rationale section has moved down below the examples
>
> Please let me know what you think.  I'm especially interested in
> feedback about the channels.  Thanks!

I like the new pipe-like channels API more than the previous named
FIFO approach :)

>send(obj):
>
>Send the object to the receiving end of the channel.  Wait until
>the object is received.  If the channel does not support the
>object then TypeError is raised.  Currently only bytes are
>supported.  If the channel has been closed then EOFError is
>raised.

I still expect any form of object sharing to hinder your
per-interpreter GIL efforts, so restricting the initial implementation
to memoryview-only seems more future-proof to me.


> Pre-populate an interpreter
> ---
>
> ::
>
>interp = interpreters.create()
>interp.run("""if True:
>import some_lib
>import an_expensive_module
>some_lib.set_up()
>""")
>wait_for_request()
>interp.run("""if True:
>some_lib.handle_request()
>""")

I find the "if True:"'s sprinkled through the examples distracting, so
I'd prefer either:

1. Using textwrap.dedent; or
2. Assigning the code to a module level attribute

::
   interp = interpreters.create()
   setup_code = """\
   import some_lib
   import an_expensive_module
   some_lib.set_up()
   """
   interp.run(setup_code)
   wait_for_request()

   handler_code = """\
   some_lib.handle_request()
   """
   interp.run(handler_code)

> Handling an exception
> -
>
> ::
>
>interp = interpreters.create()
>try:
>interp.run("""if True:
>raise KeyError
>""")
>except KeyError:
>print("got the error from the subinterpreter")

As with the message passing through channels, I think you'll really
want to minimise any kind of implicit object sharing that may
interfere with future efforts to make the GIL truly an *interpreter*
lock, rather than the global process lock that it is currently.

One possible way to approach that would be to make the low level run()
API a more Go-style API rather than a Python-style one, and have it
return a (result, err) 2-tuple. "err.raise()" would then translate the
foreign interpreter's exception into a local interpreter exception,
but the *traceback* for that exception would be entirely within the
current interpreter.

> About Subinterpreters
> =
>
> Shared data
> ---
>
> Subinterpreters are inherently isolated (with caveats explained below),
> in contrast to threads.  This enables `a different concurrency model
> `_ than is currently readily available in Python.
> `Communicating Sequential Processes`_ (CSP) is the prime example.
>
> A key component of this approach to concurrency is message passing.  So
> providing a message/object passing mechanism alongside ``Interpreter``
> is a fundamental requirement.  This proposal includes a basic mechanism
> upon which more complex machinery may be built.  That basic mechanism
> draws inspiration from pipes, queues, and CSP's channels. [fifo]_
>
> The key challenge here is that sharing objects between interpreters
> faces complexity due in part to CPython's current memory model.
> Furthermore, in this class of concurrency, the ideal is that objects
> only exist in one interpreter at a time.  However, this is not practical
> for Python so we initially constrain supported objects to ``bytes``.
> There are a number of strategies we may pursue in the future to expand
> supported objects and object sharing strategies.
>
> Note that the complexity of object sharing increases as subinterpreters
> become more isolated, e.g. after GIL removal.  So the mechanism for
> message passing needs to be carefully considered.  Keeping the API
> minimal and initially restricting the supported types helps us avoid
> further exposing any underlying complexity to Python users.
>
> To make this work, the mutable shared state will be managed by the
> Python runtime, not by any of the interpreters.  Initially we will
> support only one type of objects for shared state: the channels provided
> by ``create_channel()``.  Channels, in turn, will carefully manage
> passing objects between interpreters.

Interpreters themselves will also need to be shared objects, as:

- they all have access to "