Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 14 September 2017 at 11:44, Eric Snow wrote: > Examples > > > Run isolated code > - > > :: > >interp = interpreters.create() >print('before') >interp.run('print("during")') >print('after') > A few more suggestions for examples: Running a module: main_module = mod_name interp.run(f"import runpy; runpy.run_module({main_module!r})") Running as script (including zip archives & directories): main_script = path_name interp.run(f"import runpy; runpy.run_path({main_script!r})") Running in a thread pool executor: interps = [interpreters.create() for i in range(5)] with concurrent.futures.ThreadPoolExecutor(max_workers=len(interps)) as pool: print('before') for interp in interps: pool.submit(interp.run, 'print("starting"); print("stopping")' print('after') That last one is prompted by the questions about the benefits of keeping the notion of an interpreter state distinct from the notion of a main thread (it allows a single "MainThread" object to be mapped to different OS level threads at different points in time, which means it's easier to combine with existing constructs for managing OS level thread pools). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 7 October 2017 at 02:29, Koos Zevenhoven wrote: > While I'm actually trying not to say much here so that I can avoid this > discussion now, here's just a couple of ideas and thoughts from me at this > point: > > (A) > Instead of sending bytes and receiving memoryviews, one could consider > sending *and* receiving memoryviews for now. That could then be extended > into more types of objects in the future without changing the basic concept > of the channel. Probably, the memoryview would need to be copied (but not > the data of course). But I'm guessing copying a memoryview would be quite > fast. > The proposal is to allow sending any buffer-exporting object, so sending a memoryview would be supported. > This would hopefully require less API changes or additions in the future. > OTOH, giving it a different name like MemChannel or making it 3rd party > will buy some more time to figure out the right API. But maybe that's not > needed. > I think having both a memory-centric data channel and an object-centric data channel would be useful long term, so I don't see a lot of downsides to starting with the easier-to-implement MemChannel, and then looking at how to define a plain Channel later. For example, it occurs to me is that the closest current equivalent we have to an object level counterpart to the memory buffer protocol would be the weak reference protocol, wherein a multi-interpreter-aware proxy object could actually take care of switching interpreters as needed when manipulating reference counts. While weakrefs themselves wouldn't be usable in the general case (many builtin types don't support weak references, and we'd want to support strong cross-interpreter references anyway), a wrapt-style object proxy would provide us with a way to maintain a single strong reference to the original object in its originating interpreter (implicitly switching to that interpreter as needed), while also maintaining a regular local reference count on the proxy object in the receiving interpreter. And here's the neat thing: since subinterpreters share an address space, it would be possible to experiment with an object-proxy based channel by passing object pointers over a memoryview based channel. > (B) > We would probably then like to pretend that the object coming out the > other end of a Channel *is* the original object. As long as these channels > are the only way to directly pass objects between interpreters, there are > essentially only two ways to tell the difference (AFAICT): > > 1. Calling id(...) and sending it over to the other interpreter and > checking if it's the same. > > 2. When the same object is sent twice to the same interpreter. Then one > can compare the two with id(...) or using the `is` operator. > > There are solutions to the problems too: > > 1. Send the id() from the sending interpreter along with the sent object > so that the receiving interpreter can somehow attach it to the object and > then return it from id(...). > > 2. When an object is received, make a lookup in an interpreter-wide cache > to see if an object by this id has already been received. If yes, take that > one. > > Now it should essentially look like the received object is really "the > same one" as in the sending interpreter. This should also work with > multiple interpreters and multiple channels, as long as the id is always > preserved. > I don't personally think we want to expend much (if any) effort on presenting the illusion that the objects on either end of the channel are the "same" object, but postponing the question entirely is also one of the benefits I see to starting with MemChannel, and leaving the object-centric Channel until later. > (C) > One further complication regarding memoryview in general is that > .release() should probably be propagated to the sending interpreter somehow. > Yep, switching interpreters when releasing the buffer is the main reason you couldn't use a regular memoryview for this purpose - you need a variant that holds a strong reference to the sending interpreter, and switches back to it for the buffer release operation. > (D) > I think someone already mentioned this one, but would it not be better to > start a new interpreter in the background in a new thread by default? I > think this would make things simpler and leave more freedom regarding the > implementation in the future. If you need to run an interpreter within the > current thread, you could perhaps optionally do that too. > Not really, as that approach doesn't compose as well with existing thread management primitives like concurrent.futures.ThreadPoolExecutor. It also doesn't match the way the existing subinterpreter machinery works, where threads can change their active interpreter. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Uns
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
While I'm actually trying not to say much here so that I can avoid this discussion now, here's just a couple of ideas and thoughts from me at this point: (A) Instead of sending bytes and receiving memoryviews, one could consider sending *and* receiving memoryviews for now. That could then be extended into more types of objects in the future without changing the basic concept of the channel. Probably, the memoryview would need to be copied (but not the data of course). But I'm guessing copying a memoryview would be quite fast. This would hopefully require less API changes or additions in the future. OTOH, giving it a different name like MemChannel or making it 3rd party will buy some more time to figure out the right API. But maybe that's not needed. (B) We would probably then like to pretend that the object coming out the other end of a Channel *is* the original object. As long as these channels are the only way to directly pass objects between interpreters, there are essentially only two ways to tell the difference (AFAICT): 1. Calling id(...) and sending it over to the other interpreter and checking if it's the same. 2. When the same object is sent twice to the same interpreter. Then one can compare the two with id(...) or using the `is` operator. There are solutions to the problems too: 1. Send the id() from the sending interpreter along with the sent object so that the receiving interpreter can somehow attach it to the object and then return it from id(...). 2. When an object is received, make a lookup in an interpreter-wide cache to see if an object by this id has already been received. If yes, take that one. Now it should essentially look like the received object is really "the same one" as in the sending interpreter. This should also work with multiple interpreters and multiple channels, as long as the id is always preserved. (C) One further complication regarding memoryview in general is that .release() should probably be propagated to the sending interpreter somehow. (D) I think someone already mentioned this one, but would it not be better to start a new interpreter in the background in a new thread by default? I think this would make things simpler and leave more freedom regarding the implementation in the future. If you need to run an interpreter within the current thread, you could perhaps optionally do that too. ––Koos PS. I have lots of thoughts related to this, but I can't afford to engage in them now. (Anyway, it's probably more urgent to get some stuff with PEP 555 and its spin-off thoughts out of the way). On Fri, Oct 6, 2017 at 6:38 AM, Nick Coghlan wrote: > On 6 October 2017 at 11:48, Eric Snow wrote: > >> > And that's the real pay-off that comes from defining this in terms of >> the >> > memoryview protocol: Py_buffer structs *aren't* Python objects, so it's >> only >> > a regular C struct that gets passed across the interpreter boundary (the >> > reference to the original objects gets carried along passively as part >> of >> > the CIV - it never gets *used* in the receiving interpreter). >> >> Yeah, the (PEP 3118) buffer protocol offers precedent in a number of >> ways that are applicable to channels here. I'm simply reticent to >> lock PEP 554 into such a specific solution as the buffer-specific CIV. >> I'm trying to accommodate anticipated future needs while keeping the >> PEP as simple and basic as possible. It's driving me nuts! :P Things >> were *much* simpler before I added Channels to the PEP. :) >> > > Starting with memory-sharing only doesn't lock us into anything, since you > can still add a more flexible kind of channel based on a different protocol > later if it turns out that memory sharing isn't enough. > > By contrast, if you make the initial channel semantics incompatible with > multiprocessing by design, you *will* prevent anyone from experimenting > with replicating the shared memory based channel API for communicating > between processes :) > > That said, if you'd prefer to keep the "Channel" name available for the > possible introduction of object channels at a later date, you could call > the initial memoryview based channel a "MemChannel". > > >> > I don't think we should be touching the behaviour of core builtins >> solely to >> > enable message passing to subinterpreters without a shared GIL. >> >> Keep in mind that I included the above as a possible solution using >> tp_share() that would work *after* we stop sharing the GIL. My point >> is that with tp_share() we have a solution that works now *and* will >> work later. I don't care how we use tp_share to do so. :) I long to >> be able to say in the PEP that you can pass bytes through the channel >> and get bytes on the other side. >> > > Memory views are a builtin type as well, and they emphasise the practical > benefit we're trying to get relative to typical multiprocessing > arranagements: zero-copy data sharing. > > So here's my proposed experimentation-enabling development stra
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 6 October 2017 at 11:48, Eric Snow wrote: > > And that's the real pay-off that comes from defining this in terms of the > > memoryview protocol: Py_buffer structs *aren't* Python objects, so it's > only > > a regular C struct that gets passed across the interpreter boundary (the > > reference to the original objects gets carried along passively as part of > > the CIV - it never gets *used* in the receiving interpreter). > > Yeah, the (PEP 3118) buffer protocol offers precedent in a number of > ways that are applicable to channels here. I'm simply reticent to > lock PEP 554 into such a specific solution as the buffer-specific CIV. > I'm trying to accommodate anticipated future needs while keeping the > PEP as simple and basic as possible. It's driving me nuts! :P Things > were *much* simpler before I added Channels to the PEP. :) > Starting with memory-sharing only doesn't lock us into anything, since you can still add a more flexible kind of channel based on a different protocol later if it turns out that memory sharing isn't enough. By contrast, if you make the initial channel semantics incompatible with multiprocessing by design, you *will* prevent anyone from experimenting with replicating the shared memory based channel API for communicating between processes :) That said, if you'd prefer to keep the "Channel" name available for the possible introduction of object channels at a later date, you could call the initial memoryview based channel a "MemChannel". > > I don't think we should be touching the behaviour of core builtins > solely to > > enable message passing to subinterpreters without a shared GIL. > > Keep in mind that I included the above as a possible solution using > tp_share() that would work *after* we stop sharing the GIL. My point > is that with tp_share() we have a solution that works now *and* will > work later. I don't care how we use tp_share to do so. :) I long to > be able to say in the PEP that you can pass bytes through the channel > and get bytes on the other side. > Memory views are a builtin type as well, and they emphasise the practical benefit we're trying to get relative to typical multiprocessing arranagements: zero-copy data sharing. So here's my proposed experimentation-enabling development strategy: 1. Start out with a MemChannel API, that accepts any buffer-exporting object as input, and outputs only a cross-interpreter memoryview subclass 2. Use that as the basis for the work to get to a per-interpreter locking arrangement that allows subinterpreters to fully exploit multiple CPUs 3. Only then try to design a Channel API that allows for sharing builtin immutable objects between interpreters (bytes, strings, numbers), at a time when you can be certain you won't be inadvertently making it harder to make the GIL a truly per-interpreter lock, rather than the current process global runtime lock. The key benefit of this approach is that we *know* MemChannel can work: the buffer protocol already operates at the level of C structs and pointers, not Python objects, and there are already plenty of interesting buffer-protocol-supporting objects around, so as long as the CIV switches interpreters at the right time, there aren't any fundamentally new runtime level capabilities needed to implement it. The lower level MemChannel API could then also be replicated for multiprocessing, while the higher level more speculative object-based Channel API would be specific to subinterpreters (and probably only ever designed and implemented if you first succeed in making subinterpreters sufficiently independent that they don't rely on a process-wide GIL any more). So I'm not saying "Never design an object-sharing protocol specifically for use with subinterpreters". I'm saying "You don't have a demonstrated need for that yet, so don't try to define it until you do". > My mind is drawn to the comparison between that and the question of > CIV vs. tp_share(). CIV would be more like the post-451 import world, > where I expect the CIV would take care of the data sharing operations. > That said, the situation in PEP 554 is sufficiently different that I'm > not convinced a generic CIV protocol would be better. I'm not sure > how much CIV could do for you over helpers+tp_share. > > Anyway, here are the leading approaches that I'm looking at now: > > * adding a tp_share slot > + you send() the object directly and recv() the object coming out of > tp_share() > (which will probably be the same type as the original) > + this would eventually require small changes in tp_free for > participating types > + we would likely provide helpers (eventually), similar to the new > buffer protocol, > to make it easier to manage sharing data > I'm skeptical about this approach because you'll be designing in a vacuum against future possible constraints that you can't test yet: the inherent complexity in the object sharing protocol will come from *not* having a process-wide GIL,
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Thu, Oct 5, 2017 at 4:57 AM, Nick Coghlan wrote: > This would be hard to get to work reliably, because "orig.tp_share()" would > be running in the receiving interpreter, but all the attributes of "orig" > would have been allocated by the sending interpreter. It gets more reliable > if it's *Channel.send* that calls tp_share() though, but moving the call to > the sending side makes it clear that a tp_share protocol would still need to > rely on a more primitive set of "shareable objects" that were the permitted > return values from the tp_share call. The point of running tp_share() in the receiving interpreter is to force allocation under that interpreter, so that GC applies there. I agree that you basically can't do anything in tp_share() that would affect the sending interpreter, including INCREF and DECREF. Since we INCREFed in send(), we know that the we have a safe reference, so we don't have to worry about that part in tp_share(). We would only be able to do low-level things (like the buffer protocol) that don't interact with the original object's interpreter. Given that this is a quite low-level tp slot and low-level functionality, I'd expect that a sufficiently clear entry (i.e. warning) in the docs would be enough for the few that dare. >From my perspective adding the tp_share slot allows for much more experimentation with object sharing (right now, long before we get to considering how to stop sharing the GIL) by us *and* third parties. None of the alternatives seem to offer the same opportunity while still working out *after* we stop sharing the GIL. > > And that's the real pay-off that comes from defining this in terms of the > memoryview protocol: Py_buffer structs *aren't* Python objects, so it's only > a regular C struct that gets passed across the interpreter boundary (the > reference to the original objects gets carried along passively as part of > the CIV - it never gets *used* in the receiving interpreter). Yeah, the (PEP 3118) buffer protocol offers precedent in a number of ways that are applicable to channels here. I'm simply reticent to lock PEP 554 into such a specific solution as the buffer-specific CIV. I'm trying to accommodate anticipated future needs while keeping the PEP as simple and basic as possible. It's driving me nuts! :P Things were *much* simpler before I added Channels to the PEP. :) > >> >> bytes.tp_share(): >> obj = blank_bytes(len(self)) >> obj.ob_sval = self.ob_sval # hand-wavy memory sharing >> return obj > > > This is effectively reinventing memoryview, while trying to pretend it's an > ordinary bytes object. Don't reinvent memoryview :) > >> >> bytes.tp_free(): # under no-shared-GIL: >> # most of this could be pulled into a macro for re-use >> orig = lookup_shared(self) >> if orig != NULL: >> current = release_LIL() >> interp = lookup_owner(orig) >> acquire_LIL(interp) >> decref(orig) >> release_LIL(interp) >> acquire_LIL(current) >> # clear shared/owner tables >> # clear/release self.ob_sval >> free(self) > > > I don't think we should be touching the behaviour of core builtins solely to > enable message passing to subinterpreters without a shared GIL. Keep in mind that I included the above as a possible solution using tp_share() that would work *after* we stop sharing the GIL. My point is that with tp_share() we have a solution that works now *and* will work later. I don't care how we use tp_share to do so. :) I long to be able to say in the PEP that you can pass bytes through the channel and get bytes on the other side. That said, I'm not sure how this could be made to work without involving tp_free(). If that is really off the table (even in the simplest possible ways) then I don't think there is a way to actually share objects of builtin types between interpreters other than through views like CIV. We could still support tp_share() for the sake of third parties, which would facilitate that simplicity I was aiming for in sending data between interpreters, as well as leaving the door open for nearly all the same experimentation. However, I expect that most *uses* of channels will involve builtin types, particularly as we start off, so having to rely on view types for builtins would add not-insignificant awkwardness to using channels. I'd still like to avoid that if possible, so let's not rush to completely close the door on small modifications to tp_free for builtins. :) Regardless, I still (after a night's rest and a day of not thinking about it) consider tp_share() to be the solution I'd been hoping we'd find, whether or not we can apply it to builtin types. > > The simplest possible variant of CIVs that I can think of would be able to > avoid that outcome by being a memoryview subclass, since they just need to > hold the extra reference to the original interpreter, and include some logic > to swtich interpreters at the appropriate time. > >
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 5 October 2017 at 18:45, Eric Snow wrote: > After we move to not sharing the GIL between interpreters: > > Channel.send(obj): # in interp A > incref(obj) > if type(obj).tp_share == NULL: > raise ValueError("not a shareable type") > set_owner(obj) # obj.owner or add an obj -> interp entry to global > table > ch.objects.append(obj) > > Channel.recv(): # in interp B > orig = ch.objects.pop(0) > obj = orig.tp_share() > set_shared(obj, orig) # add to a global table > return obj > This would be hard to get to work reliably, because "orig.tp_share()" would be running in the receiving interpreter, but all the attributes of "orig" would have been allocated by the sending interpreter. It gets more reliable if it's *Channel.send* that calls tp_share() though, but moving the call to the sending side makes it clear that a tp_share protocol would still need to rely on a more primitive set of "shareable objects" that were the permitted return values from the tp_share call. And that's the real pay-off that comes from defining this in terms of the memoryview protocol: Py_buffer structs *aren't* Python objects, so it's only a regular C struct that gets passed across the interpreter boundary (the reference to the original objects gets carried along passively as part of the CIV - it never gets *used* in the receiving interpreter). > bytes.tp_share(): > obj = blank_bytes(len(self)) > obj.ob_sval = self.ob_sval # hand-wavy memory sharing > return obj > This is effectively reinventing memoryview, while trying to pretend it's an ordinary bytes object. Don't reinvent memoryview :) > bytes.tp_free(): # under no-shared-GIL: > # most of this could be pulled into a macro for re-use > orig = lookup_shared(self) > if orig != NULL: > current = release_LIL() > interp = lookup_owner(orig) > acquire_LIL(interp) > decref(orig) > release_LIL(interp) > acquire_LIL(current) > # clear shared/owner tables > # clear/release self.ob_sval > free(self) > I don't think we should be touching the behaviour of core builtins solely to enable message passing to subinterpreters without a shared GIL. The simplest possible variant of CIVs that I can think of would be able to avoid that outcome by being a memoryview subclass, since they just need to hold the extra reference to the original interpreter, and include some logic to swtich interpreters at the appropriate time. That said, I think there's definitely a useful design question to ask in this area, not about bytes (which can be readily represented by a memoryview variant in the receiving interpreter), but about *strings*: they have a more complex internal layout than bytes objects, but as long as the receiving interpreter can make sure that the original string continues to exist, then you could usefully implement a "strview" type to avoid having to go through an encode/decode cycle just to pass a string to another subinterpreter. That would provide a reasonable compelling argument that CIVs *shouldn't* be implemented as memoryview subclasses, but instead defined as *containing* a managed view of an object owned by a different interpreter. That way, even if the initial implementation only supported CIVs that contained a memoryview instance, we'd have the freedom to define other kinds of views later (such as strview), while being able to reuse the same CIV machinery. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Tue, Oct 3, 2017 at 8:55 AM, Antoine Pitrou wrote: > I think we need a sharing protocol, not just a flag. We also need to > think carefully about that protocol, so that it does not imply > unnecessary memory copies. Therefore I think the protocol should be > something like the buffer protocol, that allows to acquire and release > a set of shared memory areas, but without imposing any semantics onto > those memory areas (each type implementing its own semantics). And > there needs to be a dedicated reference counting for object shares, so > that the original object can be notified when all its shares have > vanished. I've come to agree. :) I actually came to the same conclusion tonight before I'd been able to read through your message carefully. My idea is below. Your suggestion about protecting shared memory areas is something to discuss further, though I'm not sure it's strictly necessary yet (before we stop sharing the GIL). On Wed, Oct 4, 2017 at 7:41 PM, Nick Coghlan wrote: > Having the sending interpreter do the INCREF just changes the problem > to be a memory leak waiting to happen rather than an access-after-free > issue, since the problematic non-synchronised scenario then becomes: > > * thread on CPU A has two references (ob_refcnt=2) > * it sends a reference to a thread on CPU B via a channel > * thread on CPU A releases its reference (ob_refcnt=1) > * updated ob_refcnt value hasn't made it back to the shared memory cache yet > * thread on CPU B releases its reference (ob_refcnt=1) > * both threads have released their reference, but the refcnt is still > 1 -> object leaks! > > We simply can't have INCREFs and DECREFs happening in different > threads without some way of ensuring cache coherency for *both* > operations - otherwise we risk either the refcount going to zero when > it shouldn't, or *not* going to zero when it should. > > The current CPython implementation relies on the process global GIL > for that purpose, so none of these problems will show up until you > start trying to replace that with per-interpreter locks. > > Free threaded reference counting relies on (expensive) atomic > increments & decrements. Right. I'm not sure why I was missing that, but I'm clear now. Below is a rough idea of what I think may work instead (the result of much tossing and turning in bed*). While we're still sharing a GIL between interpreters: Channel.send(obj): # in interp A incref(obj) if type(obj).tp_share == NULL: raise ValueError("not a shareable type") ch.objects.append(obj) Channel.recv(): # in interp B orig = ch.objects.pop(0) obj = orig.tp_share() return obj bytes.tp_share(): return self After we move to not sharing the GIL between interpreters: Channel.send(obj): # in interp A incref(obj) if type(obj).tp_share == NULL: raise ValueError("not a shareable type") set_owner(obj) # obj.owner or add an obj -> interp entry to global table ch.objects.append(obj) Channel.recv(): # in interp B orig = ch.objects.pop(0) obj = orig.tp_share() set_shared(obj, orig) # add to a global table return obj bytes.tp_share(): obj = blank_bytes(len(self)) obj.ob_sval = self.ob_sval # hand-wavy memory sharing return obj bytes.tp_free(): # under no-shared-GIL: # most of this could be pulled into a macro for re-use orig = lookup_shared(self) if orig != NULL: current = release_LIL() interp = lookup_owner(orig) acquire_LIL(interp) decref(orig) release_LIL(interp) acquire_LIL(current) # clear shared/owner tables # clear/release self.ob_sval free(self) The CIV approach could be facilitated through something like a new SharedBuffer type, or through a separate BufferViewChannel, etc. Most notably, this approach avoids hard-coding specific type support into channels and should work out fine under no-shared-GIL subinterpreters. One nice thing about the tp_share slot is that it makes it much easier (along with C-API for managing the global owned/shared tables) to implement other types that are legal to pass through channels. Such could be provided via extension modules. Numpy arrays could be made to support it, if that's your thing. Antoine could give tp_share to locks and semaphores. :) Of course, any such types would have to ensure that they are actually safe to share between intepreters without a GIL between them... For PEP 554, I'd only propose the tp_share slot and its use in Channel.send()/.recv(). The parts related to global tables and memory sharing and tp_free() wouldn't be necessary until we stop sharing the GIL between interpreters. However, I believe that tp_share would make us ready for that. -eric * I should know by now that some ideas sound better in the middle of the night than they do the next day, but this idea is keeping me awake so I'll risk it! :) ___ Python-Dev
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 4 October 2017 at 23:51, Eric Snow wrote: > On Tue, Oct 3, 2017 at 11:36 PM, Nick Coghlan wrote: >> The problem relates to the fact that there aren't any memory barriers >> around CPython's INCREF operations (they're implemented as an ordinary >> C post-increment operation), so you can get the following scenario: >> >> * thread on CPU A has the sole reference (ob_refcnt=1) >> * thread on CPU B acquires a new reference, but hasn't pushed the >> updated ob_refcnt value back to the shared memory cache yet >> * original thread on CPU A drops its reference, *thinks* the refcnt is >> now zero, and deletes the object >> * bad things now happen in CPU B as the thread running there tries to >> use a deleted object :) > > I'm not clear on where we'd run into this problem with channels. > Mirroring your scenario: > > * interpreter A (in thread on CPU A) INCREFs the object (the GIL is still > held) > * interp A sends the object to the channel > * interp B (in thread on CPU B) receives the object from the channel > * the new reference is held until interp B DECREFs the object > > From what I see, at no point do we get a refcount of 0, such that > there would be a race on the object being deleted. Having the sending interpreter do the INCREF just changes the problem to be a memory leak waiting to happen rather than an access-after-free issue, since the problematic non-synchronised scenario then becomes: * thread on CPU A has two references (ob_refcnt=2) * it sends a reference to a thread on CPU B via a channel * thread on CPU A releases its reference (ob_refcnt=1) * updated ob_refcnt value hasn't made it back to the shared memory cache yet * thread on CPU B releases its reference (ob_refcnt=1) * both threads have released their reference, but the refcnt is still 1 -> object leaks! We simply can't have INCREFs and DECREFs happening in different threads without some way of ensuring cache coherency for *both* operations - otherwise we risk either the refcount going to zero when it shouldn't, or *not* going to zero when it should. The current CPython implementation relies on the process global GIL for that purpose, so none of these problems will show up until you start trying to replace that with per-interpreter locks. Free threaded reference counting relies on (expensive) atomic increments & decrements. The cross-interpreter view proposal aims to allow per-interpreter GILs without introducing atomic increments & decrements by instead relying on the view itself to ensure that it's holding the right GIL for the object whose refcount it's manipulating, and the receiving interpreter explicitly closing the view when it's done with it. So while CIVs wouldn't be as easy to use as regular object references: 1. They'd be no harder to use than memoryviews in general 2. They'd structurally ensure that regular object refcounts can still rely on "protected by the GIL" semantics 3. They'd structurally ensure zero performance degradation for regular object refcounts 4. By virtue of being memoryview based, they'd encourage the adoption of interfaces and practices that can be adapted to multiple processes through the use of techniques like shared memory regions and memory mapped files (see http://www.boost.org/doc/libs/1_54_0/doc/html/interprocess/sharedmemorybetweenprocesses.html for some detailed explanations of how that works, and https://arrow.apache.org/ for an example of ways tools like Pandas can use that to enable zero-copy data sharing) > The only problem I'm aware of (it dawned on me last night), is in the > case that the interpreter that created the object gets deleted before > the object does. In that case we can't pass the deletion back to the > original interpreter. (I don't think this problem is necessarily > exclusive to the solution I've proposed for Bytes.) The cross-interpreter-view idea proposes to deal with that by having the CIV hold a strong reference not only to the sending object (which is already part of the regular memoryview semantics), but *also* to the sending interpreter - that way, neither the sending object nor the sending interpreter can go away until the receiving interpreter closes the view. The refcount-integrity-ensuring sequence of events becomes: 1. Sending interpreter submits the object to the channel 2. Channel creates a CIV with references to the sending interpreter & sending object, and a view on the sending object's memory 3. Receiving interpreter gets the CIV from the channel 4. Receiving interpreter closes the CIV either explicitly or via __del__ (the latter would emit ResourceWarning) 5. CIV switches execution back to the sending interpreter and releases both the memory buffer and the reference to the sending object 6. CIV switches execution back to the receiving interpreter, and releases its reference to the sending interpreter 7. Execution continues in the receiving interpreter Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia __
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Wed, 4 Oct 2017 17:50:33 +0200 Antoine Pitrou wrote: > On Mon, 2 Oct 2017 21:31:30 -0400 > Eric Snow wrote: > > > > > By contrast, if we allow an actual bytes object to be shared, then > > > either every INCREF or DECREF on that bytes object becomes a > > > synchronisation point, or else we end up needing some kind of > > > secondary per-interpreter refcount where the interpreter doesn't drop > > > its shared reference to the original object in its source interpreter > > > until the internal refcount in the borrowing interpreter drops to > > > zero. > > > > There shouldn't be a need to synchronize on INCREF. If both > > interpreters have at least 1 reference then either one adding a > > reference shouldn't be a problem. > > I'm not sure what Nick meant by "synchronization point", but at least > you certainly need INCREF and DECREF to be atomic, which is a departure > from today's Py_INCREF / Py_DECREF behaviour (and is significantly > slower, even on high-level benchmarks). To be clear, I'm writing this under the hypothesis of per-interpreter GILs. I'm not really interested in the per-process GIL case :-) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Wed, Oct 4, 2017 at 4:51 PM, Eric Snow wrote: > On Tue, Oct 3, 2017 at 11:36 PM, Nick Coghlan wrote: > > The problem relates to the fact that there aren't any memory barriers > > around CPython's INCREF operations (they're implemented as an ordinary > > C post-increment operation), so you can get the following scenario: > > > > * thread on CPU A has the sole reference (ob_refcnt=1) > > * thread on CPU B acquires a new reference, but hasn't pushed the > > updated ob_refcnt value back to the shared memory cache yet > > * original thread on CPU A drops its reference, *thinks* the refcnt is > > now zero, and deletes the object > > * bad things now happen in CPU B as the thread running there tries to > > use a deleted object :) > > I'm not clear on where we'd run into this problem with channels. > Mirroring your scenario: > > * interpreter A (in thread on CPU A) INCREFs the object (the GIL is still > held) > * interp A sends the object to the channel > * interp B (in thread on CPU B) receives the object from the channel > * the new reference is held until interp B DECREFs the object > > From what I see, at no point do we get a refcount of 0, such that > there would be a race on the object being deleted. > > So what you're saying is that when Larry finishes the gilectomy, subinterpreters will work GIL-free too?-) ––Koos The only problem I'm aware of (it dawned on me last night), is in the > case that the interpreter that created the object gets deleted before > the object does. In that case we can't pass the deletion back to the > original interpreter. (I don't think this problem is necessarily > exclusive to the solution I've proposed for Bytes.) > > -eric > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > k7hoven%40gmail.com > -- + Koos Zevenhoven + http://twitter.com/k7hoven + ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Mon, 2 Oct 2017 21:31:30 -0400 Eric Snow wrote: > > > By contrast, if we allow an actual bytes object to be shared, then > > either every INCREF or DECREF on that bytes object becomes a > > synchronisation point, or else we end up needing some kind of > > secondary per-interpreter refcount where the interpreter doesn't drop > > its shared reference to the original object in its source interpreter > > until the internal refcount in the borrowing interpreter drops to > > zero. > > There shouldn't be a need to synchronize on INCREF. If both > interpreters have at least 1 reference then either one adding a > reference shouldn't be a problem. I'm not sure what Nick meant by "synchronization point", but at least you certainly need INCREF and DECREF to be atomic, which is a departure from today's Py_INCREF / Py_DECREF behaviour (and is significantly slower, even on high-level benchmarks). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Tue, Oct 3, 2017 at 11:36 PM, Nick Coghlan wrote: > The problem relates to the fact that there aren't any memory barriers > around CPython's INCREF operations (they're implemented as an ordinary > C post-increment operation), so you can get the following scenario: > > * thread on CPU A has the sole reference (ob_refcnt=1) > * thread on CPU B acquires a new reference, but hasn't pushed the > updated ob_refcnt value back to the shared memory cache yet > * original thread on CPU A drops its reference, *thinks* the refcnt is > now zero, and deletes the object > * bad things now happen in CPU B as the thread running there tries to > use a deleted object :) I'm not clear on where we'd run into this problem with channels. Mirroring your scenario: * interpreter A (in thread on CPU A) INCREFs the object (the GIL is still held) * interp A sends the object to the channel * interp B (in thread on CPU B) receives the object from the channel * the new reference is held until interp B DECREFs the object >From what I see, at no point do we get a refcount of 0, such that there would be a race on the object being deleted. The only problem I'm aware of (it dawned on me last night), is in the case that the interpreter that created the object gets deleted before the object does. In that case we can't pass the deletion back to the original interpreter. (I don't think this problem is necessarily exclusive to the solution I've proposed for Bytes.) -eric ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 3 October 2017 at 11:31, Eric Snow wrote: > There shouldn't be a need to synchronize on INCREF. If both > interpreters have at least 1 reference then either one adding a > reference shouldn't be a problem. If only one interpreter has a > reference then the other won't be adding any references. If neither > has a reference then neither is going to add any references. Perhaps > I've missed something. Under what circumstances would INCREF happen > while the refcount is 0? The problem relates to the fact that there aren't any memory barriers around CPython's INCREF operations (they're implemented as an ordinary C post-increment operation), so you can get the following scenario: * thread on CPU A has the sole reference (ob_refcnt=1) * thread on CPU B acquires a new reference, but hasn't pushed the updated ob_refcnt value back to the shared memory cache yet * original thread on CPU A drops its reference, *thinks* the refcnt is now zero, and deletes the object * bad things now happen in CPU B as the thread running there tries to use a deleted object :) The GIL currently protects us from this, as switching CPUs requires switching threads, which means the original thread has to release the GIL (flushing all of its state changes to the shared cache), and the new thread has to acquire it (hence refreshing its local cache from the shared one). The need to switch all incref/decref operations over to using atomic thread-safe primitives when removing the GIL is one of the main reasons that attempting to remove the GIL *within* an interpreter is expensive (and why Larry et al are having to explore completely different ref count management strategies for the GILectomy). By contrast, if you rely on a new memoryview variant to mediate all data sharing between interpreters, then you can make sure that *it* is using synchronisation primitives as needed to ensure the required cache coherency across different CPUs, without any negative impacts on regular single interpreter code (which can still rely on the cache coherency guarantees provided by the GIL). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 03Oct2017 0755, Antoine Pitrou wrote: On Tue, 3 Oct 2017 08:36:55 -0600 Eric Snow wrote: On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou wrote: On Mon, 2 Oct 2017 22:15:01 -0400 Eric Snow wrote: I'm still not convinced that sharing synchronization primitives is important enough to be worth including it in the PEP. It can be added later, or via an extension module in the meantime. To that end, I'll add a mechanism to the PEP for third-party types to indicate that they can be passed through channels. Something like "obj.__channel_support__ = True". How would that work? If it's simply a matter of flipping a bit, why don't we do it for all objects? The type would also have to be safe to share between interpreters. :) But what does it mean to be safe to share, while the exact degree and nature of the isolation between interpreters (and also their concurrent execution) is unspecified? I think we need a sharing protocol, not just a flag. The easiest such protocol is essentially: * an object can represent itself as bytes (e.g. generate a bytes object representing some global token, such as a kernel handle or memory address) * those bytes are sent over the standard channel * the object can instantiate itself from those bytes (e.g. wrap the existing handle, create a memoryview over the same block of memory, etc.) * cross-interpreter refcounting is either ignored (because the kernel is refcounting the resource) or manual (by including more shared info in the token) Since this is trivial to implement over the basic bytes channel, and doesn't even require a standard protocol except for convenience, Eric decided to avoid blocking the core functionality on this. I'm inclined to agree - get the basic functionality supported and let people build on it before we try to lock down something we don't fully understand yet. About the only thing that seems to be worth doing up-front is some sort of pending-call callback mechanism between interpreters, but even that doesn't need to block the core functionality (you can do it trivially with threads and another channel right now, and there's always room to make something more efficient later). There are plenty of smart people out there who can and will figure out the best way to design this. By giving them the tools and the ability to design something awesome, we're more likely to get something awesome than by committing to a complete design now. Right now, they're all blocked on the fact that subinterpreters are incredibly hard to start running, let alone experiment with. Eric's PEP will fix that part and enable others to take it from building blocks to powerful libraries. Cheers, Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Tue, 3 Oct 2017 08:36:55 -0600 Eric Snow wrote: > On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou wrote: > > On Mon, 2 Oct 2017 22:15:01 -0400 > > Eric Snow wrote: > >> > >> I'm still not convinced that sharing synchronization primitives is > >> important enough to be worth including it in the PEP. It can be added > >> later, or via an extension module in the meantime. To that end, I'll > >> add a mechanism to the PEP for third-party types to indicate that they > >> can be passed through channels. Something like > >> "obj.__channel_support__ = True". > > > > How would that work? If it's simply a matter of flipping a bit, why > > don't we do it for all objects? > > The type would also have to be safe to share between interpreters. :) But what does it mean to be safe to share, while the exact degree and nature of the isolation between interpreters (and also their concurrent execution) is unspecified? I think we need a sharing protocol, not just a flag. We also need to think carefully about that protocol, so that it does not imply unnecessary memory copies. Therefore I think the protocol should be something like the buffer protocol, that allows to acquire and release a set of shared memory areas, but without imposing any semantics onto those memory areas (each type implementing its own semantics). And there needs to be a dedicated reference counting for object shares, so that the original object can be notified when all its shares have vanished. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou wrote: > On Mon, 2 Oct 2017 22:15:01 -0400 > Eric Snow wrote: >> >> I'm still not convinced that sharing synchronization primitives is >> important enough to be worth including it in the PEP. It can be added >> later, or via an extension module in the meantime. To that end, I'll >> add a mechanism to the PEP for third-party types to indicate that they >> can be passed through channels. Something like >> "obj.__channel_support__ = True". > > How would that work? If it's simply a matter of flipping a bit, why > don't we do it for all objects? The type would also have to be safe to share between interpreters. :) Eventually I'd like to make that work for all immutable objects (and immutable containers thereof), but until then each type must be adapted individually. The PEP starts off with just Bytes. -eric ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Mon, 2 Oct 2017 22:15:01 -0400 Eric Snow wrote: > > I'm still not convinced that sharing synchronization primitives is > important enough to be worth including it in the PEP. It can be added > later, or via an extension module in the meantime. To that end, I'll > add a mechanism to the PEP for third-party types to indicate that they > can be passed through channels. Something like > "obj.__channel_support__ = True". How would that work? If it's simply a matter of flipping a bit, why don't we do it for all objects? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Wed, Sep 27, 2017 at 1:26 AM, Nick Coghlan wrote: > It's also the case that unlike Go channels, which were designed from > scratch on the basis of implementing pure CSP, FWIW, Go's channels (and goroutines) don't implement pure CSP. They provide a variant that the Go authors felt was more in-line with the language's flavor. The channels in the PEP aim to support a more pure implementation. > Python has an > established behavioural precedent in the APIs of queue.Queue and > collections.deque: they're unbounded by default, and you have to opt > in to making them bounded. Right. That's part of why I'm leaning toward support for buffered channels. > While the article title is clickbaity, > http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/ > actually has a good discussion of this point. Search for "compose" to > find the relevant section ("Channels don’t compose well with other > concurrency primitives"). > > The specific problem cited is that only offering unbuffered or > bounded-buffer channels means that every send call becomes a potential > deadlock scenario, as all that needs to happen is for you to be > holding a different synchronisation primitive when the send call > blocks. Yeah, that blog post was a reference for me as I was designing the PEP's channels. > The fact that the proposal now allows for M:N sender:receiver > relationships (just as queue.Queue does with threads) makes that > problem worse, since you may now have variability not only on the > message consumption side, but also on the message production side. > > Consider this example where you have an event processing thread pool > that we're attempting to isolate from blocking IO by using channels > rather than coroutines. > > Desired flow: > > 1. Listener thread receives external message from socket > 2. Listener thread files message for processing on receive channel > 3. Listener thread returns to blocking on the receive socket > > 4. Processing thread picks up message from receive channel > 5. Processing thread processes message > 6. Processing thread puts reply on the send channel > > 7. Sending thread picks up message from send channel > 8. Sending thread makes a blocking network send call to transmit the message > 9. Sending thread returns to blocking on the send channel > > When queue.Queue is used to pass the messages between threads, such an > arrangement will be effectively non-blocking as long as the send rate > is greater than or equal to the receive rate. However, the GIL means > it won't exploit all available cores, even if we create multiple > processing threads: you have to switch to multiprocessing for that, > with all the extra overhead that entails. > > So I see the essential premise of PEP 554 as being to ask the question > "If each of these threads was running its own *interpreter*, could we > use Sans IO style protocols with interpreter channels to separate > internally "synchronous" processing threads from separate IO threads > operating at system boundaries, without having to make the entire > application pervasively asynchronous?" +1 > If channels are an unbuffered blocking primitive, then we don't get > that benefit: even when there are additional receive messages to be > processed, the processing thread will block until the previous send > has completed. Switching the listener and sender threads over to > asynchronous IO would help with that, but they'd also end up having to > implement their own message buffering to manage the lack of buffering > in the core channel primitive. > > By contrast, if the core channels are designed to offer an unbounded > buffer by default, then you can get close-to-CSP semantics just by > setting the buffer size to 1 (it's still not exactly CSP, since that > has a buffer size of 0, but you at least get the semantics of having > to alternate sending and receiving of messages). Yep, I came to the same conclusion. >> By the way, I do think efficiency is a concern here. Otherwise >> subinterpreters don't even have a point (just use multiprocessing). > > Agreed, and I think the interaction between the threading module and > the interpreters module is one we're going to have to explicitly call > out as being covered by the provisional status of the interpreters > module, as I think it could be incredibly valuable to be able to send > at least some threading objects through channels, and have them be an > interpreter-specific reference to a common underlying sync primitive. Agreed. I'll add a note to the PEP. -eric ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Mon, Sep 25, 2017 at 8:42 PM, Nathaniel Smith wrote: > It's fairly reasonable to implement a mutex using a CSP-style > unbuffered channel (send = acquire, receive = release). And the same > trick turns a channel with a fixed-size buffer into a bounded > semaphore. It won't be as efficient as a modern specialized mutex > implementation, of course, but it's workable. > > Unfortunately while technically you can construct a buffered channel > out of an unbuffered channel, the construction's pretty unreasonable > (it needs two dedicated threads per channel). Yeah, if threading's synchronization primitives make sense between interpreters then we'll add direct support. Using channels for that isn't a good option. -eric ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Mon, Oct 2, 2017 at 9:31 PM, Eric Snow wrote: > On DECREF there shouldn't be a problem except possibly with a small > race between decrementing the refcount and checking for a refcount of > 0. We could address that several different ways, including allowing > the pending call to get queued only once (or being a noop the second > time). Alternately, the channel could own a reference and DECREF it in the owning interpreter once the refcount reaches 1. -eric ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
After having looked it over, I'm leaning toward supporting buffering, as well as not blocking by default. Neither adds much complexity to the implementation. On Sat, Sep 23, 2017 at 5:45 AM, Antoine Pitrou wrote: > On Fri, 22 Sep 2017 19:09:01 -0600 > Eric Snow wrote: >> > send() blocking until someone else calls recv() is not only bad for >> > performance, >> >> What is the performance problem? > > Intuitively, there must be some kind of context switch (interpreter > switch?) at each send() call to let the other end receive the data, > since you don't have any internal buffering. There would be an internal size-1 buffer. >> (FWIW, CSP >> provides rigorous guarantees about deadlock detection (which Go >> leverages), though I'm not sure how much benefit that can offer such a >> dynamic language as Python.) > > Hmm... deadlock detection is one thing, but when detected you must still > solve those deadlock issues, right? Yeah, I haven't given much thought into how we could leverage that capability but my gut feeling is that we won't have much opportunity to do so. :) >> I'm not sure I understand your concern here. Perhaps I used the word >> "sharing" too ambiguously? By "sharing" I mean that the two actors >> have read access to something that at least one of them can modify. >> If they both only have read-only access then it's effectively the same >> as if they are not sharing. > > Right. What I mean is that you *can* share very simple "data" under > the form of synchronization primitives. You may want to synchronize > your interpreters even they don't share user-visible memory areas. The > point of synchronization is not only to avoid memory corruption but > also to regulate and orchestrate processing amongst multiple workers > (for example processes or interpreters). For example, a semaphore is > an easy way to implement "I want no more than N workers to do this > thing at the same time" ("this thing" can be something such as disk > I/O). I'm still not convinced that sharing synchronization primitives is important enough to be worth including it in the PEP. It can be added later, or via an extension module in the meantime. To that end, I'll add a mechanism to the PEP for third-party types to indicate that they can be passed through channels. Something like "obj.__channel_support__ = True". -eric ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Thu, Sep 14, 2017 at 8:44 PM, Nick Coghlan wrote: > Not really, because the only way to ensure object separation (i.e no > refcounted objects accessible from multiple interpreters at once) with > a bytes-based API would be to either: > > 1. Always copy (eliminating most of the low overhead communications > benefits that subinterpreters may offer over multiple processes) > 2. Make the bytes implementation more complicated by allowing multiple > bytes objects to share the same underlying storage while presenting as > distinct objects in different interpreters > 3. Make the output on the receiving side not actually a bytes object, > but instead a view onto memory owned by another object in a different > interpreter (a "memory view", one might say) 4. Pass Bytes through directly. The only problem of which I'm aware is that when Py_DECREF() triggers Bytes.__del__(), it happens in the current interpreter, which may not be the "owner" (i.e. allocated the object). So the solution would be to make PyBytesType.tp_free() effectively run as a "pending call" under the owner. This would require two things: 1. a new PyBytesObject.owner field (PyInterpreterState *), or a separate owner table, which would be set when the object is passed through a channel 2. a Py_AddPendingCall() that targets a specific interpreter (which I expect would be desirable regardless) Then, when the object has an owner, PyBytesType.tp_free() would add a pending call on the owner to call PyObject_Del() on the Bytes object. The catch is that currently "pending" calls (via Py_AddPendingCall) are run only in the main thread of the main interpreter. We'd need a similar mechanism that targets a specific interpreter . > By contrast, if we allow an actual bytes object to be shared, then > either every INCREF or DECREF on that bytes object becomes a > synchronisation point, or else we end up needing some kind of > secondary per-interpreter refcount where the interpreter doesn't drop > its shared reference to the original object in its source interpreter > until the internal refcount in the borrowing interpreter drops to > zero. There shouldn't be a need to synchronize on INCREF. If both interpreters have at least 1 reference then either one adding a reference shouldn't be a problem. If only one interpreter has a reference then the other won't be adding any references. If neither has a reference then neither is going to add any references. Perhaps I've missed something. Under what circumstances would INCREF happen while the refcount is 0? On DECREF there shouldn't be a problem except possibly with a small race between decrementing the refcount and checking for a refcount of 0. We could address that several different ways, including allowing the pending call to get queued only once (or being a noop the second time). FWIW, I'm not opposed to the CIV/memoryview approach, but want to make sure we really can't use Bytes before going down that route. -eric ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 26 September 2017 at 17:04, Antoine Pitrou wrote: > On Mon, 25 Sep 2017 17:42:02 -0700 Nathaniel Smith wrote: >> Unbounded queues also introduce unbounded latency and memory usage in >> realistic situations. > > This doesn't seem to pose much a problem in common use cases, though. > How many Python programs have you seen switch from an unbounded to a > bounded Queue to solve this problem? > > Conversely, choosing a buffer size is tricky. How do you know up front > which amount you need? Is a fixed buffer size even ok or do you want > it to fluctuate based on the current conditions? > > And regardless, my point was that a buffer is desirable. That send() > may block when the buffer is full doesn't change that it won't block in > the common case. It's also the case that unlike Go channels, which were designed from scratch on the basis of implementing pure CSP, Python has an established behavioural precedent in the APIs of queue.Queue and collections.deque: they're unbounded by default, and you have to opt in to making them bounded. >> There's a reason why sockets >> always have bounded buffers -- it's sometimes painful, but the pain is >> intrinsic to building distributed systems, and unbounded buffers just >> paper over it. > > Papering over a problem is sometimes the right answer actually :-) For > example, most Python programs assume memory is unbounded... > > If I'm using a queue or channel to push events to a logging system, > should I really block at every send() call? Most probably I'd rather > run ahead instead. While the article title is clickbaity, http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/ actually has a good discussion of this point. Search for "compose" to find the relevant section ("Channels don’t compose well with other concurrency primitives"). The specific problem cited is that only offering unbuffered or bounded-buffer channels means that every send call becomes a potential deadlock scenario, as all that needs to happen is for you to be holding a different synchronisation primitive when the send call blocks. >> > Also, suddenly an interpreter's ability to exploit CPU time is >> > dependent on another interpreter's ability to consume data in a timely >> > manner (what if the other interpreter is e.g. stuck on some disk I/O?). >> > IMHO it would be better not to have such coupling. >> >> A small buffer probably is useful in some cases, yeah -- basically >> enough to smooth out scheduler jitter. > > That's not about scheduler jitter, but catering for activities which > occur at inherently different speed or rhythms. Requiring things run > in lockstep removes a lot of flexibility and makes it harder to exploit > CPU resources fully. The fact that the proposal now allows for M:N sender:receiver relationships (just as queue.Queue does with threads) makes that problem worse, since you may now have variability not only on the message consumption side, but also on the message production side. Consider this example where you have an event processing thread pool that we're attempting to isolate from blocking IO by using channels rather than coroutines. Desired flow: 1. Listener thread receives external message from socket 2. Listener thread files message for processing on receive channel 3. Listener thread returns to blocking on the receive socket 4. Processing thread picks up message from receive channel 5. Processing thread processes message 6. Processing thread puts reply on the send channel 7. Sending thread picks up message from send channel 8. Sending thread makes a blocking network send call to transmit the message 9. Sending thread returns to blocking on the send channel When queue.Queue is used to pass the messages between threads, such an arrangement will be effectively non-blocking as long as the send rate is greater than or equal to the receive rate. However, the GIL means it won't exploit all available cores, even if we create multiple processing threads: you have to switch to multiprocessing for that, with all the extra overhead that entails. So I see the essential premise of PEP 554 as being to ask the question "If each of these threads was running its own *interpreter*, could we use Sans IO style protocols with interpreter channels to separate internally "synchronous" processing threads from separate IO threads operating at system boundaries, without having to make the entire application pervasively asynchronous?" If channels are an unbuffered blocking primitive, then we don't get that benefit: even when there are additional receive messages to be processed, the processing thread will block until the previous send has completed. Switching the listener and sender threads over to asynchronous IO would help with that, but they'd also end up having to implement their own message buffering to manage the lack of buffering in the core channel primitive. By contrast, if the core channels are designed to offer an unbounded buffer
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 23 Sep 2017, at 3:09, Eric Snow wrote: [...] ``list_all()``:: Return a list of all existing interpreters. See my naming proposal in the previous thread. Sorry, your previous comment slipped through the cracks. You suggested: As for the naming, let's make it both unconfusing and explicit? How about three functions: `all_interpreters()`, `running_interpreters()` and `idle_interpreters()`, for example? As to "all_interpreters()", I suppose it's the difference between "interpreters.all_interpreters()" and "interpreters.list_all()". To me the latter looks better. But in most cases when Python returns a container (list/dict/iterator) of things, the name of the function/method is the name of the things, not the name of the container, i.e. we have sys.modules, dict.keys, dict.values etc.. Or if the collection of things itself has a name, it is that name, i.e. os.environ, sys.path etc. Its a little bit unfortunate that the name of the module would be the same as the name of the function, but IMHO interpreters() would be better than list(). As to "running_interpreters()" and "idle_interpreters()", I'm not sure what the benefit would be. You can compose either list manually with a simple comprehension: [interp for interp in interpreters.list_all() if interp.is_running()] [interp for interp in interpreters.list_all() if not interp.is_running()] Servus, Walter ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Mon, 25 Sep 2017 17:42:02 -0700 Nathaniel Smith wrote: > On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou wrote: > >> As to "running_interpreters()" and "idle_interpreters()", I'm not sure > >> what the benefit would be. You can compose either list manually with > >> a simple comprehension: > >> > >> [interp for interp in interpreters.list_all() if interp.is_running()] > >> [interp for interp in interpreters.list_all() if not > >> interp.is_running()] > > > > There is a inherit race condition in doing that, at least if > > interpreters are running in multiple threads (which I assume is going > > to be the overly dominant usage model). That is why I'm proposing all > > three variants. > > There's a race condition no matter what the API looks like -- having a > dedicated running_interpreters() lets you guarantee that the returned > list describes the set of interpreters that were running at some > moment in time, but you don't know when that moment was and by the > time you get the list, it's already out-of-date. Hmm, you're right of course. > >> Likewise, > >> queue.Queue.send() supports blocking, in addition to providing a > >> put_nowait() method. > > > > queue.Queue.put() never blocks in the usual case (*), which is of an > > unbounded queue. Only bounded queues (created with an explicit > > non-zero max_size parameter) can block in Queue.put(). > > > > (*) and therefore also never deadlocks :-) > > Unbounded queues also introduce unbounded latency and memory usage in > realistic situations. This doesn't seem to pose much a problem in common use cases, though. How many Python programs have you seen switch from an unbounded to a bounded Queue to solve this problem? Conversely, choosing a buffer size is tricky. How do you know up front which amount you need? Is a fixed buffer size even ok or do you want it to fluctuate based on the current conditions? And regardless, my point was that a buffer is desirable. That send() may block when the buffer is full doesn't change that it won't block in the common case. > There's a reason why sockets > always have bounded buffers -- it's sometimes painful, but the pain is > intrinsic to building distributed systems, and unbounded buffers just > paper over it. Papering over a problem is sometimes the right answer actually :-) For example, most Python programs assume memory is unbounded... If I'm using a queue or channel to push events to a logging system, should I really block at every send() call? Most probably I'd rather run ahead instead. > > Also, suddenly an interpreter's ability to exploit CPU time is > > dependent on another interpreter's ability to consume data in a timely > > manner (what if the other interpreter is e.g. stuck on some disk I/O?). > > IMHO it would be better not to have such coupling. > > A small buffer probably is useful in some cases, yeah -- basically > enough to smooth out scheduler jitter. That's not about scheduler jitter, but catering for activities which occur at inherently different speed or rhythms. Requiring things run in lockstep removes a lot of flexibility and makes it harder to exploit CPU resources fully. > > I expect more often than expected, in complex systems :-) For example, > > you could have a recv() loop that also from time to time send()s some > > data on another queue, depending on what is received. But if that > > send()'s recipient also has the same structure (a recv() loop which > > send()s from time to time), then it's easy to imagine to two getting in > > a deadlock. > > You kind of want to be able to create deadlocks, since the alternative > is processes that can't coordinate and end up stuck in livelocks or > with unbounded memory use etc. I am not advocating we make it *impossible* to create deadlocks; just saying we should not make them more *likely* than they need to. > >> I'm not sure I understand your concern here. Perhaps I used the word > >> "sharing" too ambiguously? By "sharing" I mean that the two actors > >> have read access to something that at least one of them can modify. > >> If they both only have read-only access then it's effectively the same > >> as if they are not sharing. > > > > Right. What I mean is that you *can* share very simple "data" under > > the form of synchronization primitives. You may want to synchronize > > your interpreters even they don't share user-visible memory areas. The > > point of synchronization is not only to avoid memory corruption but > > also to regulate and orchestrate processing amongst multiple workers > > (for example processes or interpreters). For example, a semaphore is > > an easy way to implement "I want no more than N workers to do this > > thing at the same time" ("this thing" can be something such as disk > > I/O). > > It's fairly reasonable to implement a mutex using a CSP-style > unbuffered channel (send = acquire, receive = release). And the same > trick turns a channel with a fixed-size b
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Sat, Sep 23, 2017 at 2:45 AM, Antoine Pitrou wrote: >> As to "running_interpreters()" and "idle_interpreters()", I'm not sure >> what the benefit would be. You can compose either list manually with >> a simple comprehension: >> >> [interp for interp in interpreters.list_all() if interp.is_running()] >> [interp for interp in interpreters.list_all() if not interp.is_running()] > > There is a inherit race condition in doing that, at least if > interpreters are running in multiple threads (which I assume is going > to be the overly dominant usage model). That is why I'm proposing all > three variants. There's a race condition no matter what the API looks like -- having a dedicated running_interpreters() lets you guarantee that the returned list describes the set of interpreters that were running at some moment in time, but you don't know when that moment was and by the time you get the list, it's already out-of-date. So this doesn't seem very useful. OTOH if we think that invariants like this are useful, we might also want to guarantee that calling running_interpreters() and idle_interpreters() gives two lists such that each interpreter appears in exactly one of them, but that's impossible with this API; it'd require a single function that returns both lists. What problem are you trying to solve? >> Likewise, >> queue.Queue.send() supports blocking, in addition to providing a >> put_nowait() method. > > queue.Queue.put() never blocks in the usual case (*), which is of an > unbounded queue. Only bounded queues (created with an explicit > non-zero max_size parameter) can block in Queue.put(). > > (*) and therefore also never deadlocks :-) Unbounded queues also introduce unbounded latency and memory usage in realistic situations. (E.g. a producer/consumer setup where the producer runs faster than the consumer.) There's a reason why sockets always have bounded buffers -- it's sometimes painful, but the pain is intrinsic to building distributed systems, and unbounded buffers just paper over it. >> > send() blocking until someone else calls recv() is not only bad for >> > performance, >> >> What is the performance problem? > > Intuitively, there must be some kind of context switch (interpreter > switch?) at each send() call to let the other end receive the data, > since you don't have any internal buffering. Technically you just need the other end to wake up at some time in between any two calls to send(), and if there's no GIL then this doesn't necessarily require a context switch. > Also, suddenly an interpreter's ability to exploit CPU time is > dependent on another interpreter's ability to consume data in a timely > manner (what if the other interpreter is e.g. stuck on some disk I/O?). > IMHO it would be better not to have such coupling. A small buffer probably is useful in some cases, yeah -- basically enough to smooth out scheduler jitter. >> > it also increases the likelihood of deadlocks. >> >> How much of a problem will deadlocks be in practice? > > I expect more often than expected, in complex systems :-) For example, > you could have a recv() loop that also from time to time send()s some > data on another queue, depending on what is received. But if that > send()'s recipient also has the same structure (a recv() loop which > send()s from time to time), then it's easy to imagine to two getting in > a deadlock. You kind of want to be able to create deadlocks, since the alternative is processes that can't coordinate and end up stuck in livelocks or with unbounded memory use etc. >> I'm not sure I understand your concern here. Perhaps I used the word >> "sharing" too ambiguously? By "sharing" I mean that the two actors >> have read access to something that at least one of them can modify. >> If they both only have read-only access then it's effectively the same >> as if they are not sharing. > > Right. What I mean is that you *can* share very simple "data" under > the form of synchronization primitives. You may want to synchronize > your interpreters even they don't share user-visible memory areas. The > point of synchronization is not only to avoid memory corruption but > also to regulate and orchestrate processing amongst multiple workers > (for example processes or interpreters). For example, a semaphore is > an easy way to implement "I want no more than N workers to do this > thing at the same time" ("this thing" can be something such as disk > I/O). It's fairly reasonable to implement a mutex using a CSP-style unbuffered channel (send = acquire, receive = release). And the same trick turns a channel with a fixed-size buffer into a bounded semaphore. It won't be as efficient as a modern specialized mutex implementation, of course, but it's workable. Unfortunately while technically you can construct a buffered channel out of an unbuffered channel, the construction's pretty unreasonable (it needs two dedicated threads per channel). -n -- Nathaniel J. Smith -- https://
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 2017-09-23 10:45, Antoine Pitrou wrote: Hi Eric, On Fri, 22 Sep 2017 19:09:01 -0600 Eric Snow wrote: Please elaborate. I'm interested in understanding what you mean here. Do you have some subinterpreter-based concurrency improvements in mind? What aspect of CSP is the PEP following too faithfully? See below the discussion of blocking send()s :-) As to "running_interpreters()" and "idle_interpreters()", I'm not sure what the benefit would be. You can compose either list manually with a simple comprehension: [interp for interp in interpreters.list_all() if interp.is_running()] [interp for interp in interpreters.list_all() if not interp.is_running()] There is a inherit race condition in doing that, at least if interpreters are running in multiple threads (which I assume is going to be the overly dominant usage model). That is why I'm proposing all three variants. An alternative to 3 variants would be: interpreters.list_all(running=True) interpreters.list_all(running=False) interpreters.list_all(running=None) [snip] ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
Hi Eric, On Fri, 22 Sep 2017 19:09:01 -0600 Eric Snow wrote: > > Please elaborate. I'm interested in understanding what you mean here. > Do you have some subinterpreter-based concurrency improvements in > mind? What aspect of CSP is the PEP following too faithfully? See below the discussion of blocking send()s :-) > As to "running_interpreters()" and "idle_interpreters()", I'm not sure > what the benefit would be. You can compose either list manually with > a simple comprehension: > > [interp for interp in interpreters.list_all() if interp.is_running()] > [interp for interp in interpreters.list_all() if not interp.is_running()] There is a inherit race condition in doing that, at least if interpreters are running in multiple threads (which I assume is going to be the overly dominant usage model). That is why I'm proposing all three variants. > > I don't think it's a > > coincidence that the most varied kinds of I/O (from socket or file IO > > to threading Queues to multiprocessing Pipes) have non-blocking send(). > > Interestingly, you can set sockets to blocking mode, in which case > send() will block until there is room in the kernel buffer. Yes, but there *is* a kernel buffer. Which is the whole point of my comment: most alike primitives have internal buffering to prevent the user-facing send() API from blocking in the common case. > Likewise, > queue.Queue.send() supports blocking, in addition to providing a > put_nowait() method. queue.Queue.put() never blocks in the usual case (*), which is of an unbounded queue. Only bounded queues (created with an explicit non-zero max_size parameter) can block in Queue.put(). (*) and therefore also never deadlocks :-) > Note that the PEP provides "recv_nowait()" and "send_nowait()" (names > inspired by queue.Queue), allowing for a non-blocking send. True, but it's not the same thing at all. In the objects I mentioned, send() mostly doesn't block and doesn't fail either. In your model, send_nowait() will routinely fail with an error if a recipient isn't immediately available to recv the data. > > send() blocking until someone else calls recv() is not only bad for > > performance, > > What is the performance problem? Intuitively, there must be some kind of context switch (interpreter switch?) at each send() call to let the other end receive the data, since you don't have any internal buffering. Also, suddenly an interpreter's ability to exploit CPU time is dependent on another interpreter's ability to consume data in a timely manner (what if the other interpreter is e.g. stuck on some disk I/O?). IMHO it would be better not to have such coupling. > > it also increases the likelihood of deadlocks. > > How much of a problem will deadlocks be in practice? I expect more often than expected, in complex systems :-) For example, you could have a recv() loop that also from time to time send()s some data on another queue, depending on what is received. But if that send()'s recipient also has the same structure (a recv() loop which send()s from time to time), then it's easy to imagine to two getting in a deadlock. > (FWIW, CSP > provides rigorous guarantees about deadlock detection (which Go > leverages), though I'm not sure how much benefit that can offer such a > dynamic language as Python.) Hmm... deadlock detection is one thing, but when detected you must still solve those deadlock issues, right? > I'm not sure I understand your concern here. Perhaps I used the word > "sharing" too ambiguously? By "sharing" I mean that the two actors > have read access to something that at least one of them can modify. > If they both only have read-only access then it's effectively the same > as if they are not sharing. Right. What I mean is that you *can* share very simple "data" under the form of synchronization primitives. You may want to synchronize your interpreters even they don't share user-visible memory areas. The point of synchronization is not only to avoid memory corruption but also to regulate and orchestrate processing amongst multiple workers (for example processes or interpreters). For example, a semaphore is an easy way to implement "I want no more than N workers to do this thing at the same time" ("this thing" can be something such as disk I/O). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
Thanks for the feedback, Antoine. Sorry for the delay; it's been a busy week for me. I just pushed an updated PEP to the repo. Once I've sorted out the question of passing bytes through channels I plan on posting the PEP to the list again for another round of discussion. In the meantime, I've replied below in-line. -eric On Mon, Sep 18, 2017 at 4:46 AM, Antoine Pitrou wrote: > First my high-level opinion about the PEP: the CSP model can probably > be already implemented using Queues. To me, the interesting promise of > subinterpreters is if they allow to remove the GIL while sharing memory > for big objects (such as Numpy arrays). This means the PEP should > probably focus on potential concurrency improvements rather than try to > faithfully follow the CSP model. Please elaborate. I'm interested in understanding what you mean here. Do you have some subinterpreter-based concurrency improvements in mind? What aspect of CSP is the PEP following too faithfully? >> ``list_all()``:: >> >>Return a list of all existing interpreters. > > See my naming proposal in the previous thread. Sorry, your previous comment slipped through the cracks. You suggested: As for the naming, let's make it both unconfusing and explicit? How about three functions: `all_interpreters()`, `running_interpreters()` and `idle_interpreters()`, for example? As to "all_interpreters()", I suppose it's the difference between "interpreters.all_interpreters()" and "interpreters.list_all()". To me the latter looks better. As to "running_interpreters()" and "idle_interpreters()", I'm not sure what the benefit would be. You can compose either list manually with a simple comprehension: [interp for interp in interpreters.list_all() if interp.is_running()] [interp for interp in interpreters.list_all() if not interp.is_running()] >>run(source_str, /, **shared): >> >> Run the provided Python source code in the interpreter. Any >> keyword arguments are added to the interpreter's execution >> namespace. > > "Execution namespace" specifically means the __main__ module in the > target interpreter, right? Right. It's explained in more detail a little further down and elsewhere in the PEP. I've updated the PEP to explicitly mention __main__ here too. >> If any of the values are not supported for sharing >> between interpreters then RuntimeError gets raised. Currently >> only channels (see "create_channel()" below) are supported. >> >> This may not be called on an already running interpreter. Doing >> so results in a RuntimeError. > > I would distinguish between both error cases: RuntimeError for calling > run() on an already running interpreter, ValueError for values which > are not supported for sharing. Good point. >> Likewise, if there is any uncaught >> exception, it propagates into the code where "run()" was called. > > That makes it a bit harder to differentiate with errors raised by run() > itself (see above), though how much of an annoyance this is remains > unclear. The more litigious implication, though, is that it forces the > interpreter to support migration of arbitrary objects from one > interpreter to another (since a traceback keeps all local variables > alive). Yeah, the proposal to propagate exceptions out of the subinterpreter is still rather weak. I've added some notes the the PEP about this open issue. >> The mechanism for passing objects between interpreters is through >> channels. A channel is a simplex FIFO similar to a pipe. The main >> difference is that channels can be associated with zero or more >> interpreters on either end. > > So it seems channels have become more complicated now? Is it important > to support multi-producer multi-consumer channels? To me it made the API simpler. The change did introduce the "close()" method, which I suppose could be confusing. However, I'm sure that in practice it won't be. In contrast, the FIFO/pipe-based API that I had before required passing names around, required more calls, required managing the channel/interpreter relationship more carefully, and made it hard to follow that relationship. >> Unlike queues, which are also many-to-many, >> channels have no buffer. > > How does it work? Does send() block until someone else calls recv()? > That does not sound like a good idea to me. Correct "send()" blocks until the other end receives (if ever). Likewise "recv()" blocks until the other end sends. This specific behavior is probably the main thing I borrowed from CSP. It is *the* synchronization mechanism. Given the isolated nature of subinterpreters, I consider using this concept from CSP to be a good fit. > I don't think it's a > coincidence that the most varied kinds of I/O (from socket or file IO > to threading Queues to multiprocessing Pipes) have non-blocking send(). Interestingly, you can set sockets to blocking mode, in which case send() will block until ther
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
Hi, First my high-level opinion about the PEP: the CSP model can probably be already implemented using Queues. To me, the interesting promise of subinterpreters is if they allow to remove the GIL while sharing memory for big objects (such as Numpy arrays). This means the PEP should probably focus on potential concurrency improvements rather than try to faithfully follow the CSP model. Other than that, a bunch of detailed comments follow: On Wed, 13 Sep 2017 18:44:31 -0700 Eric Snow wrote: > > API for interpreters > > > The module provides the following functions: > > ``list_all()``:: > >Return a list of all existing interpreters. See my naming proposal in the previous thread. > >run(source_str, /, **shared): > > Run the provided Python source code in the interpreter. Any > keyword arguments are added to the interpreter's execution > namespace. "Execution namespace" specifically means the __main__ module in the target interpreter, right? > If any of the values are not supported for sharing > between interpreters then RuntimeError gets raised. Currently > only channels (see "create_channel()" below) are supported. > > This may not be called on an already running interpreter. Doing > so results in a RuntimeError. I would distinguish between both error cases: RuntimeError for calling run() on an already running interpreter, ValueError for values which are not supported for sharing. > Likewise, if there is any uncaught > exception, it propagates into the code where "run()" was called. That makes it a bit harder to differentiate with errors raised by run() itself (see above), though how much of an annoyance this is remains unclear. The more litigious implication, though, is that it forces the interpreter to support migration of arbitrary objects from one interpreter to another (since a traceback keeps all local variables alive). > API for sharing data > > > The mechanism for passing objects between interpreters is through > channels. A channel is a simplex FIFO similar to a pipe. The main > difference is that channels can be associated with zero or more > interpreters on either end. So it seems channels have become more complicated now? Is it important to support multi-producer multi-consumer channels? > Unlike queues, which are also many-to-many, > channels have no buffer. How does it work? Does send() block until someone else calls recv()? That does not sound like a good idea to me. I don't think it's a coincidence that the most varied kinds of I/O (from socket or file IO to threading Queues to multiprocessing Pipes) have non-blocking send(). send() blocking until someone else calls recv() is not only bad for performance, it also increases the likelihood of deadlocks. >recv_nowait(default=None): > > Return the next object from the channel. If none have been sent > then return the default. If the channel has been closed > then EOFError is raised. > >close(): > > No longer associate the current interpreter with the channel (on > the receiving end). This is a noop if the interpreter isn't > already associated. Once an interpreter is no longer associated > with the channel, subsequent (or current) send() and recv() calls > from that interpreter will raise EOFError. EOFError normally means the *other* (sending) side has closed the channel (but it becomes complicated with a multi-producer multi-consumer setup...). When *this* side has closed the channel, we should raise ValueError. > The Python runtime > will garbage collect all closed channels. Note that "close()" is > automatically called when it is no longer used in the current > interpreter. "No longer used" meaning it loses all references in this interpreter? >send(obj): > >Send the object to the receiving end of the channel. Wait until >the object is received. If the channel does not support the >object then TypeError is raised. Currently only bytes are >supported. If the channel has been closed then EOFError is >raised. Similar remark as above (EOFError vs. ValueError). More generally, send() raising EOFError sounds unheard of. A sidenote: context manager support (__enter__ / __exit__) on channels would sound more useful to me than iteration support. > Initial support for buffers in channels > --- > > An alternative to support for bytes in channels in support for > read-only buffers (the PEP 3119 kind). Probably you mean PEP 3118. > Then ``recv()`` would return > a memoryview to expose the buffer in a zero-copy way. It will probably not do much if you only can pass buffers and not structured objects, because unserializing (e.g. unpickling) from a buffer will still copy memory around. To pass a Numpy array, for example, you not only need to pa
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 14 September 2017 at 11:44, Eric Snow wrote: > About Subinterpreters > = > > Shared data > --- [snip] > To make this work, the mutable shared state will be managed by the > Python runtime, not by any of the interpreters. Initially we will > support only one type of objects for shared state: the channels provided > by ``create_channel()``. Channels, in turn, will carefully manage > passing objects between interpreters. Something I think you may want to explicitly call out as *not* being shared is the thread objects in threading.enumerate(), as the way that works in the current implementation makes sense, but isn't particularly obvious (what I have below comes from experimenting with your branch at https://github.com/python/cpython/pull/1748). Specifically, what happens is that the operating system thread underlying the existing interpreter thread that calls interp.run() gets borrowed as the operating system thread underlying the MainThread object in the called interpreter. That MainThread object then gets preserved in the interpreter's interpreter state, but the mapping to an underlying OS thread will change freely based on who's calling into it. From outside an interpreter, you *can't* request to run code in subthreads directly - you'll always run your given code in the main thread, and it will be up to that to dispatch requests to subthreads. Beyond the thread lending that happens when you call interp.run() (where one of your threads gets borrowed as the other interpreter's main thread), each interpreter otherwise maintains a completely disjoint set of thread objects that it is solely responsible for. This also clarifies for me what it means for an interpreter to be a "main" interpreter: it's the interpreter who's main thread actually corresponds to the main thread of the overall operating system process, rather than being temporarily borrowed from another interpreter. We're going to have to put some thought into how we want that to interact with the signal handling logic - right now, I believe *any* main thread will consider it its responsibility to process signals delivered to the runtime (and embedding application avoid the potential problems arising from that by simply not installing the CPython signal handlers in the first place), and we probably want to change that condition to be "the main thread in the main interpreter". Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 15 September 2017 at 12:04, Nathaniel Smith wrote: > On Thu, Sep 14, 2017 at 5:44 PM, Nick Coghlan wrote: >> The reason we're OK with this is that it means that only reading a new >> message from a channel (i.e creating a cross-interpreter view) or >> discarding a previously read message (i.e. closing a cross-interpreter >> view) will be synchronisation points where the receiving interpreter >> necessarily needs to acquire the sending interpreter's GIL. >> >> By contrast, if we allow an actual bytes object to be shared, then >> either every INCREF or DECREF on that bytes object becomes a >> synchronisation point, or else we end up needing some kind of >> secondary per-interpreter refcount where the interpreter doesn't drop >> its shared reference to the original object in its source interpreter >> until the internal refcount in the borrowing interpreter drops to >> zero. > > Ah, that makes more sense. > > I am nervous that allowing arbitrary memoryviews gives a *little* more > power than we need or want. I like that the current API can reasonably > be emulated using subprocesses -- it opens up the door for backports, > compatibility support on language implementations that don't support > subinterpreters, direct benchmark comparisons between the two > implementation strategies, etc. But if we allow arbitrary memoryviews, > then this requires that you can take (a) an arbitrary object, not > specified ahead of time, and (b) provide two read-write views on it in > separate interpreters such that modifications made in one are > immediately visible in the other. Subprocesses can do one or the other > -- they can copy arbitrary data, and if you warn them ahead of time > when you allocate the buffer, they can do real zero-copy shared > memory. But the combination is really difficult. One constraint we'd want to impose is that the memory view in the receiving interpreter should always be read-only - while we don't currently expose the ability to request that at the Python layer, memoryviews *do* support the creation of read-only views at the C API layer (which then gets reported to Python code via the "view.readonly" attribute). While that change alone is enough to preserve the simplex nature of the channel, it wouldn't be enough to prevent the *sender* from mutating the buffer contents and having that change be visible in the recipient. In that regard it may make sense to maintain both restrictions initially (as you suggested below): only accept bytes on the sending side (to prevent mutation by the sender), and expose that as a read-only memory view on the receiving side (to allow for zero-copy data sharing without allowing mutation by the receiver). > It'd be one thing if this were like a key feature that gave > subinterpreters an advantage over subprocesses, but it seems really > unlikely to me that a library won't know ahead of time when it's > filling in a buffer to be transferred, and if anything it seems like > we'd rather not expose read-write shared mappings in any case. It's > extremely non-trivial to do right [1]. > > tl;dr: let's not rule out a useful implementation strategy based on a > feature we don't actually need. Yeah, the description Eric currently has in the PEP is a summary of a much longer suggestion Yury, Neil Schumenauer and I put together while waiting for our flights following the core dev sprint, and the full version had some of these additional constraints on it (most notably the "read-only in the receiving interpreter" one). > One alternative would be your option (3) -- you can put bytes in and > get memoryviews out, and since bytes objects are immutable it's OK. Indeed, I think that will be a sensible starting point. However, I genuinely want to allow for zero-copy sharing of NumPy arrays eventually, as that's where I think this idea gets most interesting: the potential to allow for multiple parallel read operations on a given NumPy array *in Python* (rather than Cython or C) without running afoul of the GIL, and without needing to mess about with the complexities of operating system level IPC. Handling an exception >> That way channels can be a namespace *specifically* for passing in >> channels, and can be reported as such on RunResult. If we decide to >> allow arbitrary shared objects in the future, or add flag options like >> "reraise=True" to reraise exceptions from the subinterpreter in the >> current interpreter, we'd have that ability, rather than having the >> entire potential keyword namespace taken up for passing shared >> objects. > > Would channels be a dict, or...? Yeah, it would be a direct replacement for the way the current draft is proposing to use the keywords dict - it would just be a separate dictionary instead. It does occur to me that if we wanted to align with the way the `runpy` module spells that concept, we'd call the option `init_globals`, but I'm thinking it will be better to only allow channels to be passed through directly, and requir
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Thu, Sep 14, 2017 at 5:44 PM, Nick Coghlan wrote: > On 14 September 2017 at 15:27, Nathaniel Smith wrote: >> I don't get it. With bytes, you can either share objects or copy them and >> the user can't tell the difference, so you can change your mind later if you >> want. >> But memoryviews require some kind of cross-interpreter strong >> reference to keep the underlying buffer object alive. So if you want to >> minimize object sharing, surely bytes are more future-proof. > > Not really, because the only way to ensure object separation (i.e no > refcounted objects accessible from multiple interpreters at once) with > a bytes-based API would be to either: > > 1. Always copy (eliminating most of the low overhead communications > benefits that subinterpreters may offer over multiple processes) > 2. Make the bytes implementation more complicated by allowing multiple > bytes objects to share the same underlying storage while presenting as > distinct objects in different interpreters > 3. Make the output on the receiving side not actually a bytes object, > but instead a view onto memory owned by another object in a different > interpreter (a "memory view", one might say) > > And yes, using memory views for this does mean defining either a > subclass or a mediating object that not only keeps the originating > object alive until the receiving memoryview is closed, but also > retains a reference to the originating interpreter so that it can > switch to it when it needs to manipulate the source object's refcount > or call one of the buffer methods. > > Yury and I are fine with that, since it means that either the sender > *or* the receiver can decide to copy the data (e.g. by calling > bytes(obj) before sending, or bytes(view) after receiving), and in the > meantime, the object holding the cross-interpreter view knows that it > needs to switch interpreters (and hence acquire the sending > interpreter's GIL) before doing anything with the source object. > > The reason we're OK with this is that it means that only reading a new > message from a channel (i.e creating a cross-interpreter view) or > discarding a previously read message (i.e. closing a cross-interpreter > view) will be synchronisation points where the receiving interpreter > necessarily needs to acquire the sending interpreter's GIL. > > By contrast, if we allow an actual bytes object to be shared, then > either every INCREF or DECREF on that bytes object becomes a > synchronisation point, or else we end up needing some kind of > secondary per-interpreter refcount where the interpreter doesn't drop > its shared reference to the original object in its source interpreter > until the internal refcount in the borrowing interpreter drops to > zero. Ah, that makes more sense. I am nervous that allowing arbitrary memoryviews gives a *little* more power than we need or want. I like that the current API can reasonably be emulated using subprocesses -- it opens up the door for backports, compatibility support on language implementations that don't support subinterpreters, direct benchmark comparisons between the two implementation strategies, etc. But if we allow arbitrary memoryviews, then this requires that you can take (a) an arbitrary object, not specified ahead of time, and (b) provide two read-write views on it in separate interpreters such that modifications made in one are immediately visible in the other. Subprocesses can do one or the other -- they can copy arbitrary data, and if you warn them ahead of time when you allocate the buffer, they can do real zero-copy shared memory. But the combination is really difficult. It'd be one thing if this were like a key feature that gave subinterpreters an advantage over subprocesses, but it seems really unlikely to me that a library won't know ahead of time when it's filling in a buffer to be transferred, and if anything it seems like we'd rather not expose read-write shared mappings in any case. It's extremely non-trivial to do right [1]. tl;dr: let's not rule out a useful implementation strategy based on a feature we don't actually need. One alternative would be your option (3) -- you can put bytes in and get memoryviews out, and since bytes objects are immutable it's OK. [1] https://en.wikipedia.org/wiki/Memory_model_(programming) >>> Handling an exception >>> - >> It would also be reasonable to simply not return any value/exception from >> run() at all, or maybe just a bool for whether there was an unhandled >> exception. Any high level API is going to be injecting code on both sides of >> the interpreter boundary anyway, so it can do whatever exception and >> traceback translation it wants to. > > So any more detailed response would *have* to come back as a channel message? > > That sounds like a reasonable option to me, too, especially since > module level code doesn't have a return value as such - you can really > only say "it raised an exception (and this was the exception it
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 14 September 2017 at 15:27, Nathaniel Smith wrote: > On Sep 13, 2017 9:01 PM, "Nick Coghlan" wrote: > > On 14 September 2017 at 11:44, Eric Snow > wrote: >>send(obj): >> >>Send the object to the receiving end of the channel. Wait until >>the object is received. If the channel does not support the >>object then TypeError is raised. Currently only bytes are >>supported. If the channel has been closed then EOFError is >>raised. > > I still expect any form of object sharing to hinder your > per-interpreter GIL efforts, so restricting the initial implementation > to memoryview-only seems more future-proof to me. > > > I don't get it. With bytes, you can either share objects or copy them and > the user can't tell the difference, so you can change your mind later if you > want. > But memoryviews require some kind of cross-interpreter strong > reference to keep the underlying buffer object alive. So if you want to > minimize object sharing, surely bytes are more future-proof. Not really, because the only way to ensure object separation (i.e no refcounted objects accessible from multiple interpreters at once) with a bytes-based API would be to either: 1. Always copy (eliminating most of the low overhead communications benefits that subinterpreters may offer over multiple processes) 2. Make the bytes implementation more complicated by allowing multiple bytes objects to share the same underlying storage while presenting as distinct objects in different interpreters 3. Make the output on the receiving side not actually a bytes object, but instead a view onto memory owned by another object in a different interpreter (a "memory view", one might say) And yes, using memory views for this does mean defining either a subclass or a mediating object that not only keeps the originating object alive until the receiving memoryview is closed, but also retains a reference to the originating interpreter so that it can switch to it when it needs to manipulate the source object's refcount or call one of the buffer methods. Yury and I are fine with that, since it means that either the sender *or* the receiver can decide to copy the data (e.g. by calling bytes(obj) before sending, or bytes(view) after receiving), and in the meantime, the object holding the cross-interpreter view knows that it needs to switch interpreters (and hence acquire the sending interpreter's GIL) before doing anything with the source object. The reason we're OK with this is that it means that only reading a new message from a channel (i.e creating a cross-interpreter view) or discarding a previously read message (i.e. closing a cross-interpreter view) will be synchronisation points where the receiving interpreter necessarily needs to acquire the sending interpreter's GIL. By contrast, if we allow an actual bytes object to be shared, then either every INCREF or DECREF on that bytes object becomes a synchronisation point, or else we end up needing some kind of secondary per-interpreter refcount where the interpreter doesn't drop its shared reference to the original object in its source interpreter until the internal refcount in the borrowing interpreter drops to zero. >> Handling an exception >> - > It would also be reasonable to simply not return any value/exception from > run() at all, or maybe just a bool for whether there was an unhandled > exception. Any high level API is going to be injecting code on both sides of > the interpreter boundary anyway, so it can do whatever exception and > traceback translation it wants to. So any more detailed response would *have* to come back as a channel message? That sounds like a reasonable option to me, too, especially since module level code doesn't have a return value as such - you can really only say "it raised an exception (and this was the exception it raised)" or "it reached the end of the code without raising an exception". Given that, I think subprocess.run() (with check=False) is the right API precedent here: https://docs.python.org/3/library/subprocess.html#subprocess.run That always returns subprocess.CompletedProcess, and then you can call "cp.check_returncode()" to get it to raise subprocess.CalledProcessError for non-zero return codes. For interpreter.run(), we could keep the initial RunResult *really* simple and only report back: * source: the source code passed to run() * shared: the keyword args passed to run() (name chosen to match functools.partial) * completed: completed execution without raising an exception? (True if yes, False otherwise) Whether or not to report more details for a raised exception, and provide some mechanism to reraise it in the calling interpreter could then be deferred until later. The subprocess.run() comparison does make me wonder whether this might be a more future-proof signature for Interpreter.run() though: def run(source_str, /, *, channels=None): ... That way channels can
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Sep 13, 2017 9:01 PM, "Nick Coghlan" wrote: On 14 September 2017 at 11:44, Eric Snow wrote: >send(obj): > >Send the object to the receiving end of the channel. Wait until >the object is received. If the channel does not support the >object then TypeError is raised. Currently only bytes are >supported. If the channel has been closed then EOFError is >raised. I still expect any form of object sharing to hinder your per-interpreter GIL efforts, so restricting the initial implementation to memoryview-only seems more future-proof to me. I don't get it. With bytes, you can either share objects or copy them and the user can't tell the difference, so you can change your mind later if you want. But memoryviews require some kind of cross-interpreter strong reference to keep the underlying buffer object alive. So if you want to minimize object sharing, surely bytes are more future-proof. > Handling an exception > - > > :: > >interp = interpreters.create() >try: >interp.run("""if True: >raise KeyError >""") >except KeyError: >print("got the error from the subinterpreter") As with the message passing through channels, I think you'll really want to minimise any kind of implicit object sharing that may interfere with future efforts to make the GIL truly an *interpreter* lock, rather than the global process lock that it is currently. One possible way to approach that would be to make the low level run() API a more Go-style API rather than a Python-style one, and have it return a (result, err) 2-tuple. "err.raise()" would then translate the foreign interpreter's exception into a local interpreter exception, but the *traceback* for that exception would be entirely within the current interpreter. It would also be reasonable to simply not return any value/exception from run() at all, or maybe just a bool for whether there was an unhandled exception. Any high level API is going to be injecting code on both sides of the interpreter boundary anyway, so it can do whatever exception and traceback translation it wants to. > Reseting __main__ > - > > As proposed, every call to ``Interpreter.run()`` will execute in the > namespace of the interpreter's existing ``__main__`` module. This means > that data persists there between ``run()`` calls. Sometimes this isn't > desireable and you want to execute in a fresh ``__main__``. Also, > you don't necessarily want to leak objects there that you aren't using > any more. > > Solutions include: > > * a ``create()`` arg to indicate resetting ``__main__`` after each > ``run`` call > * an ``Interpreter.reset_main`` flag to support opting in or out > after the fact > * an ``Interpreter.reset_main()`` method to opt in when desired > > This isn't a critical feature initially. It can wait until later > if desirable. I was going to note that you can already do this: interp.run("globals().clear()") However, that turns out to clear *too* much, since it also clobbers all the __dunder__ attributes that the interpreter needs in a code execution environment. Either way, if you added this, I think it would make more sense as an "importlib.util.reset_globals()" operation, rather than have it be something specific to subinterpreters. This is another point where the API could reasonably say that if you want clean namespaces then you should do that yourself (e.g. by setting up your own globals dict and using it to execute any post-bootstrap code). -n ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On Wed, Sep 13, 2017 at 11:56 PM, Nick Coghlan wrote: [..] >>send(obj): >> >>Send the object to the receiving end of the channel. Wait until >>the object is received. If the channel does not support the >>object then TypeError is raised. Currently only bytes are >>supported. If the channel has been closed then EOFError is >>raised. > > I still expect any form of object sharing to hinder your > per-interpreter GIL efforts, so restricting the initial implementation > to memoryview-only seems more future-proof to me. +1. Working with memoryviews is as convenient as with bytes. Yury ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 554 v3 (new interpreters module)
On 14 September 2017 at 11:44, Eric Snow wrote: > I've updated PEP 554 in response to feedback. (thanks all!) There > are a few unresolved points (some of them added to the Open Questions > section), but the current PEP has changed enough that I wanted to get > it out there first. > > Notably changed: > > * the API relative to object passing has changed somewhat drastically > (hopefully simpler and easier to understand), replacing "FIFO" with > "channel" > * added an examples section > * added an open questions section > * added a rejected ideas section > * added more items to the deferred functionality section > * the rationale section has moved down below the examples > > Please let me know what you think. I'm especially interested in > feedback about the channels. Thanks! I like the new pipe-like channels API more than the previous named FIFO approach :) >send(obj): > >Send the object to the receiving end of the channel. Wait until >the object is received. If the channel does not support the >object then TypeError is raised. Currently only bytes are >supported. If the channel has been closed then EOFError is >raised. I still expect any form of object sharing to hinder your per-interpreter GIL efforts, so restricting the initial implementation to memoryview-only seems more future-proof to me. > Pre-populate an interpreter > --- > > :: > >interp = interpreters.create() >interp.run("""if True: >import some_lib >import an_expensive_module >some_lib.set_up() >""") >wait_for_request() >interp.run("""if True: >some_lib.handle_request() >""") I find the "if True:"'s sprinkled through the examples distracting, so I'd prefer either: 1. Using textwrap.dedent; or 2. Assigning the code to a module level attribute :: interp = interpreters.create() setup_code = """\ import some_lib import an_expensive_module some_lib.set_up() """ interp.run(setup_code) wait_for_request() handler_code = """\ some_lib.handle_request() """ interp.run(handler_code) > Handling an exception > - > > :: > >interp = interpreters.create() >try: >interp.run("""if True: >raise KeyError >""") >except KeyError: >print("got the error from the subinterpreter") As with the message passing through channels, I think you'll really want to minimise any kind of implicit object sharing that may interfere with future efforts to make the GIL truly an *interpreter* lock, rather than the global process lock that it is currently. One possible way to approach that would be to make the low level run() API a more Go-style API rather than a Python-style one, and have it return a (result, err) 2-tuple. "err.raise()" would then translate the foreign interpreter's exception into a local interpreter exception, but the *traceback* for that exception would be entirely within the current interpreter. > About Subinterpreters > = > > Shared data > --- > > Subinterpreters are inherently isolated (with caveats explained below), > in contrast to threads. This enables `a different concurrency model > `_ than is currently readily available in Python. > `Communicating Sequential Processes`_ (CSP) is the prime example. > > A key component of this approach to concurrency is message passing. So > providing a message/object passing mechanism alongside ``Interpreter`` > is a fundamental requirement. This proposal includes a basic mechanism > upon which more complex machinery may be built. That basic mechanism > draws inspiration from pipes, queues, and CSP's channels. [fifo]_ > > The key challenge here is that sharing objects between interpreters > faces complexity due in part to CPython's current memory model. > Furthermore, in this class of concurrency, the ideal is that objects > only exist in one interpreter at a time. However, this is not practical > for Python so we initially constrain supported objects to ``bytes``. > There are a number of strategies we may pursue in the future to expand > supported objects and object sharing strategies. > > Note that the complexity of object sharing increases as subinterpreters > become more isolated, e.g. after GIL removal. So the mechanism for > message passing needs to be carefully considered. Keeping the API > minimal and initially restricting the supported types helps us avoid > further exposing any underlying complexity to Python users. > > To make this work, the mutable shared state will be managed by the > Python runtime, not by any of the interpreters. Initially we will > support only one type of objects for shared state: the channels provided > by ``create_channel()``. Channels, in turn, will carefully manage > passing objects between interpreters. Interpreters themselves will also need to be shared objects, as: - they all have access to "