Re: [Python-Dev] PEP 553; the interaction between $PYTHONBREAKPOINT and -E

2017-10-05 Thread Victor Stinner
I concur with Antoine, please don't add a special case for -E. But it seems
like you already agreed with that :-)

Victor

Le 5 oct. 2017 05:33, "Barry Warsaw"  a écrit :

> On Oct 4, 2017, at 21:52, Nick Coghlan  wrote:
> >
> >> Unfortunately we probably won’t really get a good answer in practice
> until Python 3.7 is released, so maybe I just choose one and document that
> the behavior of PYTHONBREAKPOINT under -E is provision for now.  If that’s
> acceptable, then I would just treat -E for PYTHONBREAKPOINT the same as all
> other environment variables, and we’ll see how it goes.
> >
> > I'd be fine with this as the main reason I wanted PYTHONBREAKPOINT=0
> > was for pre-merge CI systems, and those tend to have tightly
> > controlled environment settings, so you don't need to rely on -E or -I
> > when running your tests.
> >
> > That said, it may also be worth considering a "-X nobreakpoints"
> > option (and then -I could imply "-E -s -X nobreakpoints").
>
> Thanks for the feedback Nick.  For now we’ll go with the standard behavior
> of -E and see how it goes.  We can always add a -X later.
>
> Cheers,
> -Barry
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> victor.stinner%40gmail.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 553; the interaction between $PYTHONBREAKPOINT and -E

2017-10-05 Thread Serhiy Storchaka

04.10.17 21:06, Barry Warsaw пише:

Victor brings up a good question in his review of the PEP 553 implementation.

https://github.com/python/cpython/pull/3355
https://bugs.python.org/issue31353

The question is whether $PYTHONBREAKPOINT should be ignored if -E is given?

I think it makes sense for $PYTHONBREAKPOINT to be sensitive to -E, but in 
thinking about it some more, it might make better sense for the semantics to be 
that when -E is given, we treat it like PYTHONBREAKPOINT=0, i.e. disable the 
breakpoint, rather than fallback to the `pdb.set_trace` default.

My thinking is this: -E is often used in production environments to prevent 
stray environment settings from affecting the Python process.  In those 
environments, you probably also want to prevent stray breakpoints from stopping 
the process, so it’s more helpful to disable breakpoint processing when -E is 
given rather than running pdb.set_trace().

If you have a strong opinion either way, please follow up here, on the PR, or 
on the bug tracker.


What if make the default value depending on the debug level? In debug 
mode it is "pdb.set_trace", in optimized mode it is "0". Then in 
production environments you can use -E -O for ignoring environment 
settings and disable breakpoints.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-05 Thread Eric Snow
On Tue, Oct 3, 2017 at 8:55 AM, Antoine Pitrou  wrote:
> I think we need a sharing protocol, not just a flag.  We also need to
> think carefully about that protocol, so that it does not imply
> unnecessary memory copies.  Therefore I think the protocol should be
> something like the buffer protocol, that allows to acquire and release
> a set of shared memory areas, but without imposing any semantics onto
> those memory areas (each type implementing its own semantics).  And
> there needs to be a dedicated reference counting for object shares, so
> that the original object can be notified when all its shares have
> vanished.

I've come to agree. :)  I actually came to the same conclusion tonight
before I'd been able to read through your message carefully.  My idea
is below.  Your suggestion about protecting shared memory areas is
something to discuss further, though I'm not sure it's strictly
necessary yet (before we stop sharing the GIL).

On Wed, Oct 4, 2017 at 7:41 PM, Nick Coghlan  wrote:
> Having the sending interpreter do the INCREF just changes the problem
> to be a memory leak waiting to happen rather than an access-after-free
> issue, since the problematic non-synchronised scenario then becomes:
>
> * thread on CPU A has two references (ob_refcnt=2)
> * it sends a reference to a thread on CPU B via a channel
> * thread on CPU A releases its reference (ob_refcnt=1)
> * updated ob_refcnt value hasn't made it back to the shared memory cache yet
> * thread on CPU B releases its reference (ob_refcnt=1)
> * both threads have released their reference, but the refcnt is still
> 1 -> object leaks!
>
> We simply can't have INCREFs and DECREFs happening in different
> threads without some way of ensuring cache coherency for *both*
> operations - otherwise we risk either the refcount going to zero when
> it shouldn't, or *not* going to zero when it should.
>
> The current CPython implementation relies on the process global GIL
> for that purpose, so none of these problems will show up until you
> start trying to replace that with per-interpreter locks.
>
> Free threaded reference counting relies on (expensive) atomic
> increments & decrements.

Right.  I'm not sure why I was missing that, but I'm clear now.

Below is a rough idea of what I think may work instead (the result of
much tossing and turning in bed*).

While we're still sharing a GIL between interpreters:

Channel.send(obj):  # in interp A
incref(obj)
if type(obj).tp_share == NULL:
raise ValueError("not a shareable type")
ch.objects.append(obj)

Channel.recv():  # in interp B
orig = ch.objects.pop(0)
obj = orig.tp_share()
return obj

bytes.tp_share():
return self

After we move to not sharing the GIL between interpreters:

Channel.send(obj):  # in interp A
incref(obj)
if type(obj).tp_share == NULL:
raise ValueError("not a shareable type")
set_owner(obj)  # obj.owner or add an obj -> interp entry to global table
ch.objects.append(obj)

Channel.recv():  # in interp B
orig = ch.objects.pop(0)
obj = orig.tp_share()
set_shared(obj, orig)  # add to a global table
return obj

bytes.tp_share():
obj = blank_bytes(len(self))
obj.ob_sval = self.ob_sval # hand-wavy memory sharing
return obj

bytes.tp_free():  # under no-shared-GIL:
# most of this could be pulled into a macro for re-use
orig = lookup_shared(self)
if orig != NULL:
current = release_LIL()
interp = lookup_owner(orig)
acquire_LIL(interp)
decref(orig)
release_LIL(interp)
acquire_LIL(current)
# clear shared/owner tables
# clear/release self.ob_sval
free(self)

The CIV approach could be facilitated through something like a new
SharedBuffer type, or through a separate BufferViewChannel, etc.

Most notably, this approach avoids hard-coding specific type support
into channels and should work out fine under no-shared-GIL
subinterpreters.  One nice thing about the tp_share slot is that it
makes it much easier (along with C-API for managing the global
owned/shared tables) to implement other types that are legal to pass
through channels.  Such could be provided via extension modules.
Numpy arrays could be made to support it, if that's your thing.
Antoine could give tp_share to locks and semaphores. :)  Of course,
any such types would have to ensure that they are actually safe to
share between intepreters without a GIL between them...

For PEP 554, I'd only propose the tp_share slot and its use in
Channel.send()/.recv().  The parts related to global tables and memory
sharing and tp_free() wouldn't be necessary until we stop sharing the
GIL between interpreters.  However, I believe that tp_share would make
us ready for that.

-eric


* I should know by now that some ideas sound better in the middle of
the night than they do the next day, but this idea is keeping me awake
so I'll risk it! :)
___
Python-Dev

Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-05 Thread Nick Coghlan
On 5 October 2017 at 18:45, Eric Snow  wrote:

> After we move to not sharing the GIL between interpreters:
>
> Channel.send(obj):  # in interp A
> incref(obj)
> if type(obj).tp_share == NULL:
> raise ValueError("not a shareable type")
> set_owner(obj)  # obj.owner or add an obj -> interp entry to global
> table
> ch.objects.append(obj)
>
> Channel.recv():  # in interp B
> orig = ch.objects.pop(0)
> obj = orig.tp_share()
> set_shared(obj, orig)  # add to a global table
> return obj
>

This would be hard to get to work reliably, because "orig.tp_share()" would
be running in the receiving interpreter, but all the attributes of "orig"
would have been allocated by the sending interpreter. It gets more reliable
if it's *Channel.send* that calls tp_share() though, but moving the call to
the sending side makes it clear that a tp_share protocol would still need
to rely on a more primitive set of "shareable objects" that were the
permitted return values from the tp_share call.

And that's the real pay-off that comes from defining this in terms of the
memoryview protocol: Py_buffer structs *aren't* Python objects, so it's
only a regular C struct that gets passed across the interpreter boundary
(the reference to the original objects gets carried along passively as part
of the CIV - it never gets *used* in the receiving interpreter).


> bytes.tp_share():
> obj = blank_bytes(len(self))
> obj.ob_sval = self.ob_sval # hand-wavy memory sharing
> return obj
>

This is effectively reinventing memoryview, while trying to pretend it's an
ordinary bytes object. Don't reinvent memoryview :)


> bytes.tp_free():  # under no-shared-GIL:
> # most of this could be pulled into a macro for re-use
> orig = lookup_shared(self)
> if orig != NULL:
> current = release_LIL()
> interp = lookup_owner(orig)
> acquire_LIL(interp)
> decref(orig)
> release_LIL(interp)
> acquire_LIL(current)
> # clear shared/owner tables
> # clear/release self.ob_sval
> free(self)
>

I don't think we should be touching the behaviour of core builtins solely
to enable message passing to subinterpreters without a shared GIL.

The simplest possible variant of CIVs that I can think of would be able to
avoid that outcome by being a memoryview subclass, since they just need to
hold the extra reference to the original interpreter, and include some
logic to swtich interpreters at the appropriate time.

That said, I think there's definitely a useful design question to ask in
this area, not about bytes (which can be readily represented by a
memoryview variant in the receiving interpreter), but about *strings*: they
have a more complex internal layout than bytes objects, but as long as the
receiving interpreter can make sure that the original string continues to
exist, then you could usefully implement a "strview" type to avoid having
to go through an encode/decode cycle just to pass a string to another
subinterpreter.

That would provide a reasonable compelling argument that CIVs *shouldn't*
be implemented as memoryview subclasses, but instead defined as
*containing* a managed view of an object owned by a different interpreter.

That way, even if the initial implementation only supported CIVs that
contained a memoryview instance, we'd have the freedom to define other
kinds of views later (such as strview), while being able to reuse the same
CIV machinery.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 553; the interaction between $PYTHONBREAKPOINT and -E

2017-10-05 Thread Victor Stinner
> What if make the default value depending on the debug level? In debug mode
> it is "pdb.set_trace", in optimized mode it is "0". Then in production
> environments you can use -E -O for ignoring environment settings and disable
> breakpoints.

I don't know what is the best option, but I dislike adding two
options, PYTHONBREAKPOINT and -X nobreakpoint, for the same features.
I would become complicated to know which option has the priority.

I would prefer a generic "release mode" option. In the past, I
proposed the opposite: a "developer mode":

https://mail.python.org/pipermail/python-ideas/2016-March/039314.html

"python3 -X dev" would be an "alias" to "PYTHONMALLOC=debug python3.6
-Wd -bb -X faulthandler script.py".

Python has more and more options to enable debug checks at runtime,
it's hard to be aware of all of them. My intent is to run tests in
"developer mode": if tests pass, you are sure that they will pass in
the regular mode since the developer mode only enables more checks at
runtime, it shouldn't change the behaviour.

It seems like the consensus is more to run Python in "release mode" by
default, since it was decided to hide DeprecationWarning by default. I
understood that the default mode targets end users.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-05 Thread Barry Warsaw
On Oct 4, 2017, at 13:53, Benjamin Peterson  wrote:

> It might be helpful to enumerate the usecases for such an API. Perhaps a
> narrow, specialized API could satisfy most needs in a supportable way.

Currently `python -m dis thing.py` compiles the source then disassembles it.  
It would be kind of cool if you could pass a .pyc file to -m dis, in which case 
you’d need to unpack the header to get to the code object.  A naive 
implementation would unpack the magic number and refuse to disassemble any 
files that don’t match whatever that version of Python understands.  A more 
robust (possibly 3rd party) implementation could potentially disassemble a 
range of magic numbers and formats, and an API to get at the code object and 
metadata would help.

I was thinking about the bytecode hacking that some debuggers do.  This API 
would help them support multiple versions of Python.  They could use the API to 
discover what pyc format was in use, extract the code object, hack the bytecode 
and possibly rewrite a new PEP 3147 style pyc file with the debugger bytecodes 
inserted.

Third party bytecode optimizers could use the API to unpack multiple versions 
of pyc files, do their optimizations, and rewrite new files with the proper 
format.

Cheers,
-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-05 Thread Guido van Rossum
Honestly I think the API for accessing historic pyc headers should itself
also be 3rd party. CPython itself should not bother (backwards
compatibility with pyc files has never been a feature).

On Thu, Oct 5, 2017 at 6:44 AM, Barry Warsaw  wrote:

> On Oct 4, 2017, at 13:53, Benjamin Peterson  wrote:
>
> > It might be helpful to enumerate the usecases for such an API. Perhaps a
> > narrow, specialized API could satisfy most needs in a supportable way.
>
> Currently `python -m dis thing.py` compiles the source then disassembles
> it.  It would be kind of cool if you could pass a .pyc file to -m dis, in
> which case you’d need to unpack the header to get to the code object.  A
> naive implementation would unpack the magic number and refuse to
> disassemble any files that don’t match whatever that version of Python
> understands.  A more robust (possibly 3rd party) implementation could
> potentially disassemble a range of magic numbers and formats, and an API to
> get at the code object and metadata would help.
>
> I was thinking about the bytecode hacking that some debuggers do.  This
> API would help them support multiple versions of Python.  They could use
> the API to discover what pyc format was in use, extract the code object,
> hack the bytecode and possibly rewrite a new PEP 3147 style pyc file with
> the debugger bytecodes inserted.
>
> Third party bytecode optimizers could use the API to unpack multiple
> versions of pyc files, do their optimizations, and rewrite new files with
> the proper format.
>
> Cheers,
> -Barry
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reorganize Python categories (Core, Library, ...)?

2017-10-05 Thread Giampaolo Rodola'
On Wed, Oct 4, 2017 at 11:52 AM, Victor Stinner 
wrote:

> Hi,
>
> Python uses a few categories to group bugs (on bugs.python.org) and
> NEWS entries (in the Python changelog). List used by the blurb tool:
>
> #.. section: Security
> #.. section: Core and Builtins
> #.. section: Library
> #.. section: Documentation
> #.. section: Tests
> #.. section: Build
> #.. section: Windows
> #.. section: macOS
> #.. section: IDLE
> #.. section: Tools/Demos
> #.. section: C API
>
> My problem is that almost all changes go into "Library" category. When
> I read long changelogs, it's sometimes hard to identify quickly the
> context (ex: impacted modules) of a change.
>
> It's also hard to find open bugs of a specific module on
> bugs.python.org, since almost all bugs are in the very generic
> "Library" category. Using full text returns "false positives".
>
> I would prefer to see more specific categories like:
>
> * Buildbots: only issues specific to buildbots
> * Networking: socket, asyncio, asyncore, asynchat modules
> * Security: ssl module but also vulnerabilities in any other part of
> CPython -- we already added a Security category in NEWS/blurb
> * Parallelim: multiprocessing and concurrent.futures modules
>
> It's hard to find categories generic enough to not only contain a
> single item, but not contain too many items neither. Other ideas:
>
> * XML: xml.doc, xml.etree, xml.parsers, xml.sax modules
> * Import machinery: imp and importlib modules
> * Typing: abc and typing modules
>
> The best would be to have a mapping of a module name into a category,
> and make sure that all modules have a category. We might try to count
> the number of commits and NEWS entries of the last 12 months to decide
> if a category has the correct size.
>
> I don't think that we need a distinct categoy for each module. We can
> put many uncommon modules in a generic category.
>
> By the way, we need maybe also a new "module name" field in the bug
> tracker. But then comes the question of normalizing module names. For
> example, should "email.message" be normalized to "email"? Maybe store
> "email.message" but use "email" for search, display the module in the
> issue title, etc.
>
> Victor


Personally I've always dreamed about having *all* module names. That would
reflect experts.rst file:
https://github.com/python/devguide/blob/master/experts.rst



-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inheritance vs composition in backcompat (PEP521)

2017-10-05 Thread Koos Zevenhoven
On Tue, Oct 3, 2017 at 1:11 AM, Koos Zevenhoven  wrote:

> On Oct 3, 2017 01:00, "Guido van Rossum"  wrote:
>
>  Mon, Oct 2, 2017 at 2:52 PM, Koos Zevenhoven  wrote
>
> I don't mind this (or Nathaniel ;-) being academic. The backwards
>> incompatibility issue I've just described applies to any extension via
>> composition, if the underlying type/protocol grows new members (like the CM
>> protocol would have gained __suspend__ and __resume__ in PEP521).
>>
>
> Since you seem to have a good grasp on this issue, does PEP 550 suffer
> from the same problem? (Or PEP 555, for that matter? :-)
>
>
>
> Neither has this particular issue, because they don't extend an existing
> protocol. If this thread has any significance, it will most likely be
> elsewhere.
>

​Actually, I realize I should be more precise with terminology regarding
"extending an existing protocol"/"growing new members". Below, I'm still
using PEP 521 as an example (sorry).

In fact, in some sense, "adding" __suspend__ and __resume__ to context
managers *does not* extend the context manager protocol, even though it
kind of looks like it does.

There would instead be two separate protocols:

(A) The traditional PEP 343 context manager:
__enter__
__exit__

(B) The hyphothetical PEP 521 context manager:
__enter__
__suspend__
__resume__
__exit__

Protocols A and B are incompatible in both directions:

* It is generally not safe to use a type-A context manager assuming it
implements B.

* It is generally not safe to use a type-B context manager assuming it
implements A.

But if you now have a type-B object, it looks like it's also type-A,
especially for code that is not aware of the existence of B. This is where
the problems come from: a wrapper for type A does the wrong thing when
wrapping a type-B object (except when using inheritance).


[Side note:

Another interpretation of the situation is that, instead of adding protocol
B, A is removed and is replaced with:

(C) The hypothetical PEP 521 context manager with optional members:
__enter__
__suspend__ (optional)
__resume__  (optional)
__exit__

But now the same problems just come from the fact that A no longer exists
while there is code out there that assumes A. But this is only a useful
interpretation if you are the only user of the protocol or if it's
otherwise ok to remove A. So let's go back to the A-B interpretation.]


Q: Could the problem of protocol conflict be solved?

One way to tell A and B apart would be to always explicitly mark the
protocol with a base class. Obviously this is not the case with existing
uses of context managers.

But there's another way, which is to change the naming:

(A) The traditional PEP 343 context manager:
__enter__
__exit__

(Z) The *modified* hyphothetical PEP 521 context manager:
__begin__
__suspend__
__resume__
__end__

Now, A and Z are easy to tell apart. A context manager wrapper designed for
type A immediately fails if used to wrap a type-Z object. But of course the
whole context manager concept now suddenly became a lot more complicated.


It is interesting that, in the A-B scheme, making a general context manager
wrapper using inheritance *just works*, even if A is not a subprotocol of B
and B is not a subprotocol of A.

Anyway, a lot of this is amplified by the fact that the methods of the
context manager protocols are not independent functionality. Instead,
calling one of them leads to the requirement that the other methods are
also called at the right moments.

--Koos

-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 553; the interaction between $PYTHONBREAKPOINT and -E

2017-10-05 Thread Barry Warsaw
> I don't know what is the best option, but I dislike adding two
> options, PYTHONBREAKPOINT and -X nobreakpoint, for the same features.
> I would become complicated to know which option has the priority.

Just to close the loop, I’ve landed the PEP 553 PR treating PYTHONBREAKPOINT 
the same as all other environment variables when -E is present.  Let’s see how 
that goes.

Thanks all for the great feedback and reviews.  Now I’m thinking about putting 
a backport version on PyPI. :)

Cheers,
-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-05 Thread Eric Snow
On Thu, Oct 5, 2017 at 4:57 AM, Nick Coghlan  wrote:
> This would be hard to get to work reliably, because "orig.tp_share()" would
> be running in the receiving interpreter, but all the attributes of "orig"
> would have been allocated by the sending interpreter. It gets more reliable
> if it's *Channel.send* that calls tp_share() though, but moving the call to
> the sending side makes it clear that a tp_share protocol would still need to
> rely on a more primitive set of "shareable objects" that were the permitted
> return values from the tp_share call.

The point of running tp_share() in the receiving interpreter is to
force allocation under that interpreter, so that GC applies there.  I
agree that you basically can't do anything in tp_share() that would
affect the sending interpreter, including INCREF and DECREF.  Since we
INCREFed in send(), we know that the we have a safe reference, so we
don't have to worry about that part in tp_share().  We would only be
able to do low-level things (like the buffer protocol) that don't
interact with the original object's interpreter.

Given that this is a quite low-level tp slot and low-level
functionality, I'd expect that a sufficiently clear entry (i.e.
warning) in the docs would be enough for the few that dare. 
>From my perspective adding the tp_share slot allows for much more
experimentation with object sharing (right now, long before we get to
considering how to stop sharing the GIL) by us *and* third parties.
None of the alternatives seem to offer the same opportunity while
still working out *after* we stop sharing the GIL.

>
> And that's the real pay-off that comes from defining this in terms of the
> memoryview protocol: Py_buffer structs *aren't* Python objects, so it's only
> a regular C struct that gets passed across the interpreter boundary (the
> reference to the original objects gets carried along passively as part of
> the CIV - it never gets *used* in the receiving interpreter).

Yeah, the (PEP 3118) buffer protocol offers precedent in a number of
ways that are applicable to channels here.  I'm simply reticent to
lock PEP 554 into such a specific solution as the buffer-specific CIV.
I'm trying to accommodate anticipated future needs while keeping the
PEP as simple and basic as possible.  It's driving me nuts! :P  Things
were *much* simpler before I added Channels to the PEP. :)

>
>>
>> bytes.tp_share():
>> obj = blank_bytes(len(self))
>> obj.ob_sval = self.ob_sval # hand-wavy memory sharing
>> return obj
>
>
> This is effectively reinventing memoryview, while trying to pretend it's an
> ordinary bytes object. Don't reinvent memoryview :)
>
>>
>> bytes.tp_free():  # under no-shared-GIL:
>> # most of this could be pulled into a macro for re-use
>> orig = lookup_shared(self)
>> if orig != NULL:
>> current = release_LIL()
>> interp = lookup_owner(orig)
>> acquire_LIL(interp)
>> decref(orig)
>> release_LIL(interp)
>> acquire_LIL(current)
>> # clear shared/owner tables
>> # clear/release self.ob_sval
>> free(self)
>
>
> I don't think we should be touching the behaviour of core builtins solely to
> enable message passing to subinterpreters without a shared GIL.

Keep in mind that I included the above as a possible solution using
tp_share() that would work *after* we stop sharing the GIL.  My point
is that with tp_share() we have a solution that works now *and* will
work later.  I don't care how we use tp_share to do so. :)  I long to
be able to say in the PEP that you can pass bytes through the channel
and get bytes on the other side.

That said, I'm not sure how this could be made to work without
involving tp_free().  If that is really off the table (even in the
simplest possible ways) then I don't think there is a way to actually
share objects of builtin types between interpreters other than through
views like CIV.  We could still support tp_share() for the sake of
third parties, which would facilitate that simplicity I was aiming for
in sending data between interpreters, as well as leaving the door open
for nearly all the same experimentation.  However, I expect that most
*uses* of channels will involve builtin types, particularly as we
start off, so having to rely on view types for builtins would add
not-insignificant awkwardness to using channels.

I'd still like to avoid that if possible, so let's not rush to
completely close the door on small modifications to tp_free for
builtins. :)  Regardless, I still (after a night's rest and a day of
not thinking about it) consider tp_share() to be the solution I'd been
hoping we'd find, whether or not we can apply it to builtin types.

>
> The simplest possible variant of CIVs that I can think of would be able to
> avoid that outcome by being a memoryview subclass, since they just need to
> hold the extra reference to the original interpreter, and include some logic
> to swtich interpreters at the appropriate time.
>
>

[Python-Dev] how/where is open() implemented ?

2017-10-05 Thread Yubin Ruan
Hi,
I am looking for the implementation of open() in the src, but so far I
am not able to do this.

>From my observation, the implementation of open() in python2/3 does
not employ the open(2) system call. However without open(2) how can
one possibly obtain a file descriptor?

Yubin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-05 Thread Nick Coghlan
On 5 October 2017 at 23:44, Barry Warsaw  wrote:

> On Oct 4, 2017, at 13:53, Benjamin Peterson  wrote:
>
> > It might be helpful to enumerate the usecases for such an API. Perhaps a
> > narrow, specialized API could satisfy most needs in a supportable way.
>
> Currently `python -m dis thing.py` compiles the source then disassembles
> it.  It would be kind of cool if you could pass a .pyc file to -m dis, in
> which case you’d need to unpack the header to get to the code object.  A
> naive implementation would unpack the magic number and refuse to
> disassemble any files that don’t match whatever that version of Python
> understands.  A more robust (possibly 3rd party) implementation could
> potentially disassemble a range of magic numbers and formats, and an API to
> get at the code object and metadata would help.
>
> I was thinking about the bytecode hacking that some debuggers do.  This
> API would help them support multiple versions of Python.  They could use
> the API to discover what pyc format was in use, extract the code object,
> hack the bytecode and possibly rewrite a new PEP 3147 style pyc file with
> the debugger bytecodes inserted.
>
> Third party bytecode optimizers could use the API to unpack multiple
> versions of pyc files, do their optimizations, and rewrite new files with
> the proper format.
>

Actually doing that properly also requires keeping track of which opcodes
were valid in different versions of the eval loop, so as Guido suggests,
such an abstraction layer would make the most sense as a third party
project that tracked:

- the magic number for each CPython feature release (plus the 3.5.3+
anomaly)
- the pyc header format for each CPython feature release
- the valid opcode set for each CPython feature release
- any other version dependent variations (e.g. the expected stack layout
for BUILD_MAP changed in Python 3.5, when the evaluation order for dict
displays was updated to be key then value, rather than the other way around)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Reorganize Python categories (Core, Library, ...)?

2017-10-05 Thread Nick Coghlan
On 6 October 2017 at 06:35, Giampaolo Rodola'  wrote:

> On Wed, Oct 4, 2017 at 11:52 AM, Victor Stinner 
> wrote:
>
>> By the way, we need maybe also a new "module name" field in the bug
>> tracker. But then comes the question of normalizing module names. For
>> example, should "email.message" be normalized to "email"? Maybe store
>> "email.message" but use "email" for search, display the module in the
>> issue title, etc.
>>
>> Victor
>
>
> Personally I've always dreamed about having *all* module names. That would
> reflect experts.rst file:
> https://github.com/python/devguide/blob/master/experts.rst
>

Right. One UX note though, based on similarly long lists in the Bugzilla
component fields for Fedora and RHEL: list boxes don't scale well to really
long lists of items, so such a field would ideally be based on a combo-box
with typeahead support. (We have something like that already for the nosy
list, where the typeahead support checks for Experts Index entries)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how/where is open() implemented ?

2017-10-05 Thread Jelle Zijlstra
2017-10-05 19:19 GMT-07:00 Yubin Ruan :

> Hi,
> I am looking for the implementation of open() in the src, but so far I
> am not able to do this.
>
> In Python 3, builtins.open is the same as io.open, which is implemented in
the _io_open function in Modules/_io/_iomodule.c.


> From my observation, the implementation of open() in python2/3 does
> not employ the open(2) system call. However without open(2) how can
> one possibly obtain a file descriptor?
>
There is a call to open() (the C function) in _io_FileIO___init___impl in
Modules/_io/fileio.c. I haven't traced through all the code, but I suspect
builtins.open ends up calling that.

>
> Yubin
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> jelle.zijlstra%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how/where is open() implemented ?

2017-10-05 Thread MRAB

On 2017-10-06 03:19, Yubin Ruan wrote:

Hi,
I am looking for the implementation of open() in the src, but so far I
am not able to do this.


From my observation, the implementation of open() in python2/3 does

not employ the open(2) system call. However without open(2) how can
one possibly obtain a file descriptor?


I think it's somewhere in here:

https://github.com/python/cpython/blob/master/Modules/_io/fileio.c
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-05 Thread Nick Coghlan
On 6 October 2017 at 11:48, Eric Snow  wrote:

> > And that's the real pay-off that comes from defining this in terms of the
> > memoryview protocol: Py_buffer structs *aren't* Python objects, so it's
> only
> > a regular C struct that gets passed across the interpreter boundary (the
> > reference to the original objects gets carried along passively as part of
> > the CIV - it never gets *used* in the receiving interpreter).
>
> Yeah, the (PEP 3118) buffer protocol offers precedent in a number of
> ways that are applicable to channels here.  I'm simply reticent to
> lock PEP 554 into such a specific solution as the buffer-specific CIV.
> I'm trying to accommodate anticipated future needs while keeping the
> PEP as simple and basic as possible.  It's driving me nuts! :P  Things
> were *much* simpler before I added Channels to the PEP. :)
>

Starting with memory-sharing only doesn't lock us into anything, since you
can still add a more flexible kind of channel based on a different protocol
later if it turns out that memory sharing isn't enough.

By contrast, if you make the initial channel semantics incompatible with
multiprocessing by design, you *will* prevent anyone from experimenting
with replicating the shared memory based channel API for communicating
between processes :)

That said, if you'd prefer to keep the "Channel" name available for the
possible introduction of object channels at a later date, you could call
the initial memoryview based channel a "MemChannel".


> > I don't think we should be touching the behaviour of core builtins
> solely to
> > enable message passing to subinterpreters without a shared GIL.
>
> Keep in mind that I included the above as a possible solution using
> tp_share() that would work *after* we stop sharing the GIL.  My point
> is that with tp_share() we have a solution that works now *and* will
> work later.  I don't care how we use tp_share to do so. :)  I long to
> be able to say in the PEP that you can pass bytes through the channel
> and get bytes on the other side.
>

Memory views are a builtin type as well, and they emphasise the practical
benefit we're trying to get relative to typical multiprocessing
arranagements: zero-copy data sharing.

So here's my proposed experimentation-enabling development strategy:

1. Start out with a MemChannel API, that accepts any buffer-exporting
object as input, and outputs only a cross-interpreter memoryview subclass
2. Use that as the basis for the work to get to a per-interpreter locking
arrangement that allows subinterpreters to fully exploit multiple CPUs
3. Only then try to design a Channel API that allows for sharing builtin
immutable objects between interpreters (bytes, strings, numbers), at a time
when you can be certain you won't be inadvertently making it harder to make
the GIL a truly per-interpreter lock, rather than the current process
global runtime lock.

The key benefit of this approach is that we *know* MemChannel can work: the
buffer protocol already operates at the level of C structs and pointers,
not Python objects, and there are already plenty of interesting
buffer-protocol-supporting objects around, so as long as the CIV switches
interpreters at the right time, there aren't any fundamentally new runtime
level capabilities needed to implement it.

The lower level MemChannel API could then also be replicated for
multiprocessing, while the higher level more speculative object-based
Channel API would be specific to subinterpreters (and probably only ever
designed and implemented if you first succeed in making subinterpreters
sufficiently independent that they don't rely on a process-wide GIL any
more).

So I'm not saying "Never design an object-sharing protocol specifically for
use with subinterpreters". I'm saying "You don't have a demonstrated need
for that yet, so don't try to define it until you do".



> My mind is drawn to the comparison between that and the question of
> CIV vs. tp_share().  CIV would be more like the post-451 import world,
> where I expect the CIV would take care of the data sharing operations.
> That said, the situation in PEP 554 is sufficiently different that I'm
> not convinced a generic CIV protocol would be better.  I'm not sure
> how much CIV could do for you over helpers+tp_share.
>
> Anyway, here are the leading approaches that I'm looking at now:
>
> * adding a tp_share slot
>   + you send() the object directly and recv() the object coming out of
> tp_share()
>  (which will probably be the same type as the original)
>   + this would eventually require small changes in tp_free for
> participating types
>   + we would likely provide helpers (eventually), similar to the new
> buffer protocol,
>  to make it easier to manage sharing data
>

I'm skeptical about this approach because you'll be designing in a vacuum
against future possible constraints that you can't test yet: the inherent
complexity in the object sharing protocol will come from *not* having a
process-wide GIL,