Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-27 Thread Stephan Houben
Hi Nick,

> I guess I'll have to scale back my hopes on that front to be closer to
> what Stephan described - even a deep copy equivalent is often going to
> be cheaper than a full serialise/transmit/deserialise cycle or some
> other form of inter-process communication.

I would like to add that in many cases the underlying C objects *could* be
shared.
I identified some possible use cases of this.

1. numpy/scipy: share underlying memory of ndarray
  Effectively threads can then operate on the same array without GIL
interference.

2. Sqlite in-memory database
  Multiple threads can operate on it in parallel.
  If you have an ORM it might feel very similar to just sharing Python
objects across threads.

3. Tree of XML elements (like xml.etree)
   Assuming the tree data structure itself is in C, the tree could be
shared across
   interpreters. This would be an example of a "deep" datastructure which can
   still be efficiently shared.

So I feel this could still be very useful even if pure-Python objects
need to be copied.

Thanks,

Stephan


2017-05-27 9:32 GMT+02:00 Nick Coghlan :
> On 27 May 2017 at 03:30, Guido van Rossum  wrote:
>> On Fri, May 26, 2017 at 8:28 AM, Nick Coghlan  wrote:
>>>
>>> [...] assuming the rest of idea works out
>>> well, we'd eventually like to move to a tiered model where the GIL
>>> becomes a read/write lock. Most code execution in subinterpreters
>>> would then only need a read lock on the GIL, and hence could happily
>>> execute code in parallel with other subinterpreters running on other
>>> cores.
>>
>>
>> Since the GIL protects refcounts and refcounts are probably the most
>> frequently written item, I'm skeptical of this.
>
> Likewise - hence my somewhat garbled attempt to explain that actually
> doing that would be contingent on the GILectomy folks figuring out
> some clever way to cope with the refcounts :)
>
>>> By contrast, being able to reliably model Communicating Sequential
>>> Processes in Python without incurring any communications overhead
>>> though (ala goroutines)? Or doing the same with the Actor model (ala
>>> Erlang/BEAM processes)?
>>>
>>> Those are *very* interesting language design concepts, and something
>>> where offering a compelling alternative to the current practices of
>>> emulating them with threads or coroutines pretty much requires the
>>> property of zero-copy ownership transfer.
>>
>> But subinterpreters (which have independent sys.modules dicts) seem a poor
>> match for that. It feels as if you're speculating about an entirely
>> different language here, not named Python.
>
> Ah, you're right - the types are all going to be separate as well,
> which means "cost of a deep copy" is the cheapest we're going to be
> able to get with this model. Anything better than that would require a
> more esoteric memory management architecture like the one in
> PyParallel.
>
> I guess I'll have to scale back my hopes on that front to be closer to
> what Stephan described - even a deep copy equivalent is often going to
> be cheaper than a full serialise/transmit/deserialise cycle or some
> other form of inter-process communication.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-27 Thread Nick Coghlan
On 27 May 2017 at 03:30, Guido van Rossum  wrote:
> On Fri, May 26, 2017 at 8:28 AM, Nick Coghlan  wrote:
>>
>> [...] assuming the rest of idea works out
>> well, we'd eventually like to move to a tiered model where the GIL
>> becomes a read/write lock. Most code execution in subinterpreters
>> would then only need a read lock on the GIL, and hence could happily
>> execute code in parallel with other subinterpreters running on other
>> cores.
>
>
> Since the GIL protects refcounts and refcounts are probably the most
> frequently written item, I'm skeptical of this.

Likewise - hence my somewhat garbled attempt to explain that actually
doing that would be contingent on the GILectomy folks figuring out
some clever way to cope with the refcounts :)

>> By contrast, being able to reliably model Communicating Sequential
>> Processes in Python without incurring any communications overhead
>> though (ala goroutines)? Or doing the same with the Actor model (ala
>> Erlang/BEAM processes)?
>>
>> Those are *very* interesting language design concepts, and something
>> where offering a compelling alternative to the current practices of
>> emulating them with threads or coroutines pretty much requires the
>> property of zero-copy ownership transfer.
>
> But subinterpreters (which have independent sys.modules dicts) seem a poor
> match for that. It feels as if you're speculating about an entirely
> different language here, not named Python.

Ah, you're right - the types are all going to be separate as well,
which means "cost of a deep copy" is the cheapest we're going to be
able to get with this model. Anything better than that would require a
more esoteric memory management architecture like the one in
PyParallel.

I guess I'll have to scale back my hopes on that front to be closer to
what Stephan described - even a deep copy equivalent is often going to
be cheaper than a full serialise/transmit/deserialise cycle or some
other form of inter-process communication.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-26 Thread Nick Coghlan
On 26 May 2017 at 22:08, Stephan Houben  wrote:
> Hi all,
>
> Personally I feel that the current subinterpreter support falls short
> in the sense that it still requires
> a single GIL across interpreters.
>
> If interpreters would have their own individual GIL,
> we could have true shared-nothing multi-threaded support similar to
> Javascript's "Web Workers".
>
> Here is a point-wise overview of what I am imagining.
> I realize the following is very ambitious, but I would like to bring
> it to your consideration.
>
> 1. Multiple interpreters can be instantiated, each of which is
> completely independent.
>To this end, all  global interpreter state needs to go into an
> interpreter strucutre, including the GIL
> (which becomes per-interpreter)
>Interpreters share no state whatsoever.

There'd still be true process global state (i.e. anything managed by
the C runtime), so this would be a tiered setup with a read/write GIL
and multiple SILs. For the time being though, a single GIL remains
much easier to manage.

> 2. PyObject's are tied to a particular interpreter and cannot be
> shared between interpreters.
>(This is because each interpreter now has its own GIL.)
>I imagine a special debug build would actually store the
> interpreter pointer in the PyObject and would assert everywhere
>that the PyObject is only manipulated by its owning interpreter.

Yes, something like Rust's ownership model is the gist of what we had
in mind (i.e. allowing zero-copy transfer of ownership between
subinterpreters, but only the owning interpreter is allowed to do
anything else with the object).

> 3. Practically all existing APIs, including Py_INCREF and Py_DECREF,
> need to get an additional explicit interpreter argument.
> I imagine that we would have a new prefix, say MPy_, because the
> existing APIs must be left for backward compatibility.

This isn't necessary, as the active interpreter is already tracked as
part of the thread local state (otherwise mod_wsgi et al wouldn't work
at all).

> 4. At most one interpreter can be designated the "main" interpreter.
> This is for backward compatibility of existing extension modules ONLY.
> All the existing Py_* APIs operate implicitly on this main interpreter.

Yep, this is part of the concept. The PEP 432 draft has more details
on that: 
https://www.python.org/dev/peps/pep-0432/#interpreter-initialization-phases

> 5. Extension modules need to explicitly advertise multiple interpreter 
> support.
> If they don't, they can only be imported in the main interpreter.
> However, in that case they can safely use the existing Py_ APIs.

This is the direction we started moving the with multi-phase
initialisation PEP for extension modules:
https://www.python.org/dev/peps/pep-0489/

As Petr noted, the main missing piece there now is the fact that
object methods (as opposed to module level functions) implemented in C
currently don't have ready access to the module level state for the
modules where they're defined.

> 6. Since PyObject's cannot be shared across interpreters, there needs to be an
> explicit function which takes a PyObject in interpreter A and constructs a
> similar object in interpreter B.
>
> Conceptually this would be equivalent to pickling in A and
> unpickling in B, but presumably more efficient.
> It would use the copyreg registry in a similar way to pickle.

This would be an ownership transfer rather than a copy (which carries
the implication that all the subinterpreters would still need to share
a common memory allocator)

> 7.Extension modules would also be able to register their function
> for copying custom types across interpreters .
>   That would allow extension modules to provide custom types where
> the underlying C object is in fact not copied
>   but shared between interpreters.
>   I would imagine we would have a"shared memory" memoryview object
>   and also Mutex and other locking constructs which would work
> across interpreters.

We generally don't expect this to be needed given an ownership focused
approach. Instead, the focus would be on enabling efficient channel
based communication models that are cost-prohibitive when object
serialisation is involved.

> 8. Finally, the main application: functionality similar to the current
> `multiprocessing'  module, but with
> multiple interpreters on multiple threads in a single process.
> This would presumably be more efficient than `multiprocessing' and
> also allow extra functionality, since the underlying C objects
> can in fact be shared.
> (Imagine two interpreters operating in parallel on a single OpenCL 
> context.)

We're not sure how feasible it will be to enable this in general, but
even without it, zero-copy ownership transfers enable a *lot* of
interest concurrency models that Python doesn't currently offer great
primitives to support (they're mainly a matter of using threads in

Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-26 Thread Stephan Houben
Hi all,

Personally I feel that the current subinterpreter support falls short
in the sense that it still requires
a single GIL across interpreters.

If interpreters would have their own individual GIL,
we could have true shared-nothing multi-threaded support similar to
Javascript's "Web Workers".

Here is a point-wise overview of what I am imagining.
I realize the following is very ambitious, but I would like to bring
it to your consideration.

1. Multiple interpreters can be instantiated, each of which is
completely independent.
   To this end, all  global interpreter state needs to go into an
interpreter strucutre, including the GIL
(which becomes per-interpreter)
   Interpreters share no state whatsoever.

2. PyObject's are tied to a particular interpreter and cannot be
shared between interpreters.
   (This is because each interpreter now has its own GIL.)
   I imagine a special debug build would actually store the
interpreter pointer in the PyObject and would assert everywhere
   that the PyObject is only manipulated by its owning interpreter.

3. Practically all existing APIs, including Py_INCREF and Py_DECREF,
need to get an additional explicit interpreter argument.
I imagine that we would have a new prefix, say MPy_, because the
existing APIs must be left for backward compatibility.

4. At most one interpreter can be designated the "main" interpreter.
This is for backward compatibility of existing extension modules ONLY.
All the existing Py_* APIs operate implicitly on this main interpreter.

5. Extension modules need to explicitly advertise multiple interpreter support.
If they don't, they can only be imported in the main interpreter.
However, in that case they can safely use the existing Py_ APIs.

6. Since PyObject's cannot be shared across interpreters, there needs to be an
explicit function which takes a PyObject in interpreter A and constructs a
similar object in interpreter B.

Conceptually this would be equivalent to pickling in A and
unpickling in B, but presumably more efficient.
It would use the copyreg registry in a similar way to pickle.

7.Extension modules would also be able to register their function
for copying custom types across interpreters .
  That would allow extension modules to provide custom types where
the underlying C object is in fact not copied
  but shared between interpreters.
  I would imagine we would have a"shared memory" memoryview object
  and also Mutex and other locking constructs which would work
across interpreters.

8. Finally, the main application: functionality similar to the current
`multiprocessing'  module, but with
multiple interpreters on multiple threads in a single process.
This would presumably be more efficient than `multiprocessing' and
also allow extra functionality, since the underlying C objects
can in fact be shared.
(Imagine two interpreters operating in parallel on a single OpenCL context.)



Stephan



Op 26 mei 2017 10:41 a.m. schreef "Petr Viktorin" :
>
> On 05/25/2017 09:01 PM, Eric Snow wrote:
>>
>> On Thu, May 25, 2017 at 11:19 AM, Nathaniel Smith  wrote:
>>>
>>> My impression is that the code to support them inside CPython is fine, but
>>> they're broken and not very useful in the sense that lots of C extensions
>>> don't really support them, so in practice you can't reliably use them to run
>>> arbitrary code. Numpy for example definitely has lots of
>>> subinterpreter-related bugs, and when they get reported we close them as
>>> WONTFIX.
>>>
>>> Based on conversations at last year's pycon, my impression is that numpy
>>> probably *could* support subinterpreters (i.e. the required apis exist), but
>>> none of us really understand the details, it's the kind of problem that
>>> requires a careful whole-codebase audit, and a naive approach might make
>>> numpy's code slower and more complicated for everyone. (For example, there
>>> are lots of places where numpy keeps a little global cache that I guess
>>> should instead be per-subinterpreter caches, which would mean adding an
>>> extra lookup operation to fast paths.)
>>
>>
>> Thanks for pointing this out.  You've clearly described probably the
>> biggest challenge for folks that try to use subinterpreters.  PEP 384
>> was meant to help with this, but seems to have fallen short.  PEP 489
>> can help identify modules that profess subinterpreter support, as well
>> as facilitating future extension module helpers to deal with global
>> state.  However, I agree that *right now* getting extension modules to
>> reliably work with subinterpreters is not easy enough.  Furthermore,
>> that won't change unless there is sufficient benefit tied to
>> subinterpreters, as you point out below.
>
>
> PEP 489 was a first step; the work is not finished. The next step is solving 
> a major reason people are using global state in extension modules: per-module 
> state isn't accessible from all 

Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-26 Thread Petr Viktorin

On 05/25/2017 09:01 PM, Eric Snow wrote:

On Thu, May 25, 2017 at 11:19 AM, Nathaniel Smith  wrote:

My impression is that the code to support them inside CPython is fine, but
they're broken and not very useful in the sense that lots of C extensions
don't really support them, so in practice you can't reliably use them to run
arbitrary code. Numpy for example definitely has lots of
subinterpreter-related bugs, and when they get reported we close them as
WONTFIX.

Based on conversations at last year's pycon, my impression is that numpy
probably *could* support subinterpreters (i.e. the required apis exist), but
none of us really understand the details, it's the kind of problem that
requires a careful whole-codebase audit, and a naive approach might make
numpy's code slower and more complicated for everyone. (For example, there
are lots of places where numpy keeps a little global cache that I guess
should instead be per-subinterpreter caches, which would mean adding an
extra lookup operation to fast paths.)


Thanks for pointing this out.  You've clearly described probably the
biggest challenge for folks that try to use subinterpreters.  PEP 384
was meant to help with this, but seems to have fallen short.  PEP 489
can help identify modules that profess subinterpreter support, as well
as facilitating future extension module helpers to deal with global
state.  However, I agree that *right now* getting extension modules to
reliably work with subinterpreters is not easy enough.  Furthermore,
that won't change unless there is sufficient benefit tied to
subinterpreters, as you point out below.


PEP 489 was a first step; the work is not finished. The next step is 
solving a major reason people are using global state in extension 
modules: per-module state isn't accessible from all the places it should 
be, namely in methods of classes. In other words, I don't think Python 
is ready for big projects like Numpy to start properly supporting 
subinterpreters.


The work on fixing this has stalled, but it looks like I'll be getting 
back on track.
Discussions about this are on the import-sig list, reach out there if 
you'd like to help.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-26 Thread Ronald Oussoren

> On 25 May 2017, at 19:03, Eric Snow  wrote:
> 
> On Wed, May 24, 2017 at 8:30 PM, Guido van Rossum  wrote:
>> Hm... Curiously, I've heard a few people at PyCon
> 
> I'd love to get in touch with them and discuss the situation.  I've
> spoken with Graham Dumpleton on several occasions about
> subinterpreters and what needs to be fixed.
> 
>> mention they thought subinterpreters were broken
> 
> There are a number of related long-standing bugs plus a few that I
> created in the last year or two.  I'm motivated to get these resolved
> so that the multi-core Python project can take full advantage of
> subinterpreters without worry.
> 
> As well, there are known limitations to using extension modules in
> subinterpreters.  However, only extension modules that rely on process
> globals (rather than leveraging PEP 384, etc.) are affected, and we
> can control for that more carefully using the protocol introduced by
> PEP 489.

There also the PyGILState APIs (PEP 311), those assume there’s only one 
interpreter. 

Ronald
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-26 Thread Nathaniel Smith
On Thu, May 25, 2017 at 12:01 PM, Eric Snow  wrote:
> More significantly, I genuinely believe that isolated
> interpreters in the same process is a tool that many people will find
> extremely useful and will help the Python community.  Consequently,
> exposing subinterpreters in the stdlib would result in a stronger
> incentive for folks to fix the known bugs and find a solution to the
> challenges for extension modules.

I feel like the most effective incentive would be to demonstrate how
useful they are first? If we do it in the other order, then there's a
risk that cpython does provide an incentive, but it's of the form
"this thing doesn't actually accomplish anything useful yet, but it
got mentioned in whats-new-in-3.7 and now angry people are yelling at
me in my bug tracker for not 'fixing' my package, so I have to do a
bunch of pointless work to shut them up". This tends to leave bad
feelings all around.

I do get that this is a tricky chicken-and-egg situation: currently
subinterpreters don't work very well, so no-one writes cool
applications using them, so no-one bothers to make them work better.
And I share the general intuition that this is a powerful tool that
probably has some kind of useful applications. But I can't immediately
name any such applications, which makes me nervous :-). The obvious
application is your multi-core Python idea, and I think that would
convince a lot of people; in general I'm enthusiastic about the idea
of extending Python's semantics to enable better parallelism. But I
just re-read your blog post and some of the linked thread, and it's
not at all clear to me how you plan to solve the refcounting and
garbage collection problems that will arise once you have objects that
are shared between multiple subinterpreters and no GIL. Which makes it
hard for me to make a case to the other numpy devs that it's worth
spending energy on this now, to support a feature that might or might
not happen in the future, especially if angry shouty people start
joining the conversation.

Does that make sense? I want the project to succeed, and if one of the
conditions for that is getting buy-in from the community of C
extension developers then it seems important to have a good plan for
navigating the incentives tightrope.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-25 Thread Eric Snow
On Thu, May 25, 2017 at 11:55 AM, Brett Cannon  wrote:
> I'm +1 on Nick's idea of the low-level, private API existing first to
> facilitate testing, but putting off any public API until we're sure we can
> make it function in a way we're happy with to more generally expose.

Same here.  I hadn't expected the high-level API to be an immediate
(or contingent) addition.  My interest lies particularly with the
low-level module.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-25 Thread Eric Snow
On Thu, May 25, 2017 at 11:19 AM, Nathaniel Smith  wrote:
> My impression is that the code to support them inside CPython is fine, but
> they're broken and not very useful in the sense that lots of C extensions
> don't really support them, so in practice you can't reliably use them to run
> arbitrary code. Numpy for example definitely has lots of
> subinterpreter-related bugs, and when they get reported we close them as
> WONTFIX.
>
> Based on conversations at last year's pycon, my impression is that numpy
> probably *could* support subinterpreters (i.e. the required apis exist), but
> none of us really understand the details, it's the kind of problem that
> requires a careful whole-codebase audit, and a naive approach might make
> numpy's code slower and more complicated for everyone. (For example, there
> are lots of places where numpy keeps a little global cache that I guess
> should instead be per-subinterpreter caches, which would mean adding an
> extra lookup operation to fast paths.)

Thanks for pointing this out.  You've clearly described probably the
biggest challenge for folks that try to use subinterpreters.  PEP 384
was meant to help with this, but seems to have fallen short.  PEP 489
can help identify modules that profess subinterpreter support, as well
as facilitating future extension module helpers to deal with global
state.  However, I agree that *right now* getting extension modules to
reliably work with subinterpreters is not easy enough.  Furthermore,
that won't change unless there is sufficient benefit tied to
subinterpreters, as you point out below.

>
> Or maybe it'd be fine, but no one is motivated to figure it out, because the
> other side of the cost/benefit analysis is that almost nobody actually uses
> subinterpreters. I think the only two projects that do are mod_wsgi and jep
> [1].
>
> So yeah, the status quo is broken. But there are two possible ways to fix
> it: IMHO either subinterpreters should be removed *or* they should have some
> compelling features added to make them actually worth the effort of fixing c
> extensions to support them. If Eric can pull off this multi-core idea then
> that would be pretty compelling :-).

Agreed. :)

> (And my impression is that the things
> that break under subinterpreters are essentially the same as would break
> under any GIL-removal plan.)

More or less.  There's a lot of process-global state in CPython that
needs to get pulled into the interpreter state.  So in that regard the
effort and tooling will likely correspond fairly closely with what
extension modules have to do.

>
> The problem is that we don't actually know yet whether the multi-core idea
> will work, so it seems like a bad time to double down on committing to
> subinterpreter support and pressuring C extensions to keep up. Eric- do you
> have a plan written down somewhere? I'm wondering what the critical path
> from here to a multi-core proof of concept looks like.

Probably the best summary is here:

  http://ericsnowcurrently.blogspot.com/2016/09/solving-mutli-core-python.html

The caveat is that doing this myself is slow-going due to persistent
lack of time. :/  So any timely solution would require effort from
more people.  I've had enough positive responses from folks at PyCon
that I think enough people would pitch in to get it done in a timely
manner.  More significantly, I genuinely believe that isolated
interpreters in the same process is a tool that many people will find
extremely useful and will help the Python community.  Consequently,
exposing subinterpreters in the stdlib would result in a stronger
incentive for folks to fix the known bugs and find a solution to the
challenges for extension modules.

-eric
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-25 Thread Brett Cannon
On Thu, 25 May 2017 at 08:06 Nick Coghlan  wrote:

> On 25 May 2017 at 13:30, Guido van Rossum  wrote:
> > Hm... Curiously, I've heard a few people at PyCon mention they thought
> > subinterpreters were broken and not useful (and they share the GIL
> anyways)
> > and should be taken out.
>
> Taking them out entirely would break mod_wsgi (and hence the use of
> Apache httpd as a Python application server), so I hope we don't
> consider going down that path :)
>
> As far as the GIL goes, Eric has a few ideas around potentially
> getting to a tiered locking approach, where the GIL becomes a
> Read/Write lock shared across the interpreters, and there are separate
> subinterpreter locks to guard actual code execution. That becomes a
> lot more feasible in a subinterpreter model, since the eval loop and
> various other structures are already separate - the tiered locking
> would mainly need to account for management of "object ownership" that
> prevented multiple interpreters from accessing the same object at the
> same time.
>
> However, I do think subinterpreters can be accurately characterised as
> fragile, especially in the presence of extension modules. I also think
> a large part of that fragility can be laid at the feet of them
> currently being incredibly difficult to test - while _testembed
> includes a rudimentary check [1] to make sure the subinterpreter
> machinery itself basically works, it doesn't do anything in the way of
> checking that the rest of the standard library actually does the right
> thing when run in a subinterpreter.
>
> So I'm +1 for the idea of exposing a low-level CPython-specific
> _interpreters API in order to start building out a proper test suite
> for the capability, and to let folks interested in working with them
> do so without having to write a custom embedding application ala
> mod_wsgi.
>
> However, I think it's still far too soon to be talking about defining
> a public supported API for them - while their use in mod_wsgi gives us
> assurance that they do mostly work in CPython, other implementations
> don't necessarily have anything comparable (even as a private
> implementation detail), and the kinds of software that folks run
> directly under mod_wsgi isn't necessarily reflective of the full
> extent of variation in the kinds of code that Python developers write
> in general.
>

I'm +1 on Nick's idea of the low-level, private API existing first to
facilitate testing, but putting off any public API until we're sure we can
make it function in a way we're happy with to more generally expose.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-25 Thread Nathaniel Smith
On May 24, 2017 20:31, "Guido van Rossum"  wrote:

Hm... Curiously, I've heard a few people at PyCon mention they thought
subinterpreters were broken and not useful (and they share the GIL anyways)
and should be taken out. So we should at least have clarity on which
direction we want to take...


My impression is that the code to support them inside CPython is fine, but
they're broken and not very useful in the sense that lots of C extensions
don't really support them, so in practice you can't reliably use them to
run arbitrary code. Numpy for example definitely has lots of
subinterpreter-related bugs, and when they get reported we close them as
WONTFIX.

Based on conversations at last year's pycon, my impression is that numpy
probably *could* support subinterpreters (i.e. the required apis exist),
but none of us really understand the details, it's the kind of problem that
requires a careful whole-codebase audit, and a naive approach might make
numpy's code slower and more complicated for everyone. (For example, there
are lots of places where numpy keeps a little global cache that I guess
should instead be per-subinterpreter caches, which would mean adding an
extra lookup operation to fast paths.)

Or maybe it'd be fine, but no one is motivated to figure it out, because
the other side of the cost/benefit analysis is that almost nobody actually
uses subinterpreters. I think the only two projects that do are mod_wsgi
and jep [1].

So yeah, the status quo is broken. But there are two possible ways to fix
it: IMHO either subinterpreters should be removed *or* they should have
some compelling features added to make them actually worth the effort of
fixing c extensions to support them. If Eric can pull off this multi-core
idea then that would be pretty compelling :-). (And my impression is that
the things that break under subinterpreters are essentially the same as
would break under any GIL-removal plan.)

The problem is that we don't actually know yet whether the multi-core idea
will work, so it seems like a bad time to double down on committing to
subinterpreter support and pressuring C extensions to keep up. Eric- do you
have a plan written down somewhere? I'm wondering what ​the critical path
from here to a multi-core proof of concept looks like.

-n

[1] https://github.com/mrj0/jep/wiki/How-Jep-Works
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-25 Thread Nick Coghlan
On 25 May 2017 at 13:30, Guido van Rossum  wrote:
> Hm... Curiously, I've heard a few people at PyCon mention they thought
> subinterpreters were broken and not useful (and they share the GIL anyways)
> and should be taken out.

Taking them out entirely would break mod_wsgi (and hence the use of
Apache httpd as a Python application server), so I hope we don't
consider going down that path :)

As far as the GIL goes, Eric has a few ideas around potentially
getting to a tiered locking approach, where the GIL becomes a
Read/Write lock shared across the interpreters, and there are separate
subinterpreter locks to guard actual code execution. That becomes a
lot more feasible in a subinterpreter model, since the eval loop and
various other structures are already separate - the tiered locking
would mainly need to account for management of "object ownership" that
prevented multiple interpreters from accessing the same object at the
same time.

However, I do think subinterpreters can be accurately characterised as
fragile, especially in the presence of extension modules. I also think
a large part of that fragility can be laid at the feet of them
currently being incredibly difficult to test - while _testembed
includes a rudimentary check [1] to make sure the subinterpreter
machinery itself basically works, it doesn't do anything in the way of
checking that the rest of the standard library actually does the right
thing when run in a subinterpreter.

So I'm +1 for the idea of exposing a low-level CPython-specific
_interpreters API in order to start building out a proper test suite
for the capability, and to let folks interested in working with them
do so without having to write a custom embedding application ala
mod_wsgi.

However, I think it's still far too soon to be talking about defining
a public supported API for them - while their use in mod_wsgi gives us
assurance that they do mostly work in CPython, other implementations
don't necessarily have anything comparable (even as a private
implementation detail), and the kinds of software that folks run
directly under mod_wsgi isn't necessarily reflective of the full
extent of variation in the kinds of code that Python developers write
in general.

Cheers,
Nick.

[1] https://github.com/python/cpython/blob/master/Programs/_testembed.c#L41

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-24 Thread Guido van Rossum
Hm... Curiously, I've heard a few people at PyCon mention they thought
subinterpreters were broken and not useful (and they share the GIL anyways)
and should be taken out. So we should at least have clarity on which
direction we want to take...

On Wed, May 24, 2017 at 6:01 PM, Eric Snow 
wrote:

> Although I haven't been able to achieve the pace that I originally
> wanted, I have been able to work on my multi-core Python idea
> little-by-little.  Most notably, some of the blockers have been
> resolved at the recent PyCon sprints and I'm ready to move onto the
> next step: exposing multiple interpreters via a stdlib module.
>
> Initially I just want to expose basic support via 3 successive
> changes.  Below I've listed the corresponding (chained) PRs, along
> with what they add.  Note that the 2 proposed modules take some cues
> from the threading module, but don't try to be any sort of
> replacement.  Threading and subinterpreters are two different features
> that are used together rather than as alternatives to one another.
>
> At the very least I'd like to move forward with the _interpreters
> module sooner rather than later.  Doing so will facilitate more
> extensive testing of subinterpreters, in preparation for further use
> of them in the multi-core Python project.  We can iterate from there,
> but I'd at least like to get the basic functionality landed early.
> Any objections to (or feedback about) the low-level _interpreters
> module as described?  Likewise for the high-level interpreters module?
>
> Discussion on any expanded functionality for the modules or on the
> broader topic of the multi-core project are both welcome, but please
> start other threads for those topics.
>
> -eric
>
>
> basic low-level API: https://github.com/python/cpython/pull/1748
>
>   _interpreters.create() -> id
>   _interpreters.destroy(id)
>   _interpreters.run_string(id, code)
>   _interpreters.run_string_unrestricted(id, code, ns=None) -> ns
>
> extra low-level API: https://github.com/python/cpython/pull/1802
>
>   _interpreters.enumerate() -> [id, ...]
>   _interpreters.get_current() -> id
>   _interpreters.get_main() -> id
>   _interpreters.is_running(id) -> bool
>
> basic high-level API: https://github.com/python/cpython/pull/1803
>
>   interpreters.enumerate() -> [Interpreter, ...]
>   interpreters.get_current() -> Interpreter
>   interpreters.get_main() -> Interpreter
>   interpreters.create() -> Interpreter
>   interpreters.Interpreter(id)
>   interpreters.Interpreter.is_running()
>   interpreters.Interpreter.destroy()
>   interpreters.Interpreter.run(code)
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Exposing CPython's subinterpreter C-API in the stdlib.

2017-05-24 Thread Nathaniel Smith
CC'ing PyPy-dev...

On Wed, May 24, 2017 at 6:01 PM, Eric Snow  wrote:
> Although I haven't been able to achieve the pace that I originally
> wanted, I have been able to work on my multi-core Python idea
> little-by-little.  Most notably, some of the blockers have been
> resolved at the recent PyCon sprints and I'm ready to move onto the
> next step: exposing multiple interpreters via a stdlib module.
>
> Initially I just want to expose basic support via 3 successive
> changes.  Below I've listed the corresponding (chained) PRs, along
> with what they add.  Note that the 2 proposed modules take some cues
> from the threading module, but don't try to be any sort of
> replacement.  Threading and subinterpreters are two different features
> that are used together rather than as alternatives to one another.
>
> At the very least I'd like to move forward with the _interpreters
> module sooner rather than later.  Doing so will facilitate more
> extensive testing of subinterpreters, in preparation for further use
> of them in the multi-core Python project.  We can iterate from there,
> but I'd at least like to get the basic functionality landed early.
> Any objections to (or feedback about) the low-level _interpreters
> module as described?  Likewise for the high-level interpreters module?
>
> Discussion on any expanded functionality for the modules or on the
> broader topic of the multi-core project are both welcome, but please
> start other threads for those topics.
>
> -eric
>
>
> basic low-level API: https://github.com/python/cpython/pull/1748
>
>   _interpreters.create() -> id
>   _interpreters.destroy(id)
>   _interpreters.run_string(id, code)
>   _interpreters.run_string_unrestricted(id, code, ns=None) -> ns
>
> extra low-level API: https://github.com/python/cpython/pull/1802
>
>   _interpreters.enumerate() -> [id, ...]
>   _interpreters.get_current() -> id
>   _interpreters.get_main() -> id
>   _interpreters.is_running(id) -> bool
>
> basic high-level API: https://github.com/python/cpython/pull/1803
>
>   interpreters.enumerate() -> [Interpreter, ...]
>   interpreters.get_current() -> Interpreter
>   interpreters.get_main() -> Interpreter
>   interpreters.create() -> Interpreter
>   interpreters.Interpreter(id)
>   interpreters.Interpreter.is_running()
>   interpreters.Interpreter.destroy()
>   interpreters.Interpreter.run(code)
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/