Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Nick Coghlan
On 3 October 2017 at 11:31, Eric Snow  wrote:
> There shouldn't be a need to synchronize on INCREF.  If both
> interpreters have at least 1 reference then either one adding a
> reference shouldn't be a problem.  If only one interpreter has a
> reference then the other won't be adding any references.  If neither
> has a reference then neither is going to add any references.  Perhaps
> I've missed something.  Under what circumstances would INCREF happen
> while the refcount is 0?

The problem relates to the fact that there aren't any memory barriers
around CPython's INCREF operations (they're implemented as an ordinary
C post-increment operation), so you can get the following scenario:

* thread on CPU A has the sole reference (ob_refcnt=1)
* thread on CPU B acquires a new reference, but hasn't pushed the
updated ob_refcnt value back to the shared memory cache yet
* original thread on CPU A drops its reference, *thinks* the refcnt is
now zero, and deletes the object
* bad things now happen in CPU B as the thread running there tries to
use a deleted object :)

The GIL currently protects us from this, as switching CPUs requires
switching threads, which means the original thread has to release the
GIL (flushing all of its state changes to the shared cache), and the
new thread has to acquire it (hence refreshing its local cache from
the shared one).

The need to switch all incref/decref operations over to using atomic
thread-safe primitives when removing the GIL is one of the main
reasons that attempting to remove the GIL *within* an interpreter is
expensive (and why Larry et al are having to explore completely
different ref count management strategies for the GILectomy).

By contrast, if you rely on a new memoryview variant to mediate all
data sharing between interpreters, then you can make sure that *it* is
using synchronisation primitives as needed to ensure the required
cache coherency across different CPUs, without any negative impacts on
regular single interpreter code (which can still rely on the cache
coherency guarantees provided by the GIL).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Inheritance vs composition in backcompat (PEP521)

2017-10-03 Thread Nick Coghlan
On 3 October 2017 at 03:13, Koos Zevenhoven  wrote:
> Well, it's not completely unrelated to that. The problem I'm talking about
> is perhaps most easily seen from a simple context manager wrapper that uses
> composition instead of inheritance:
>
> class Wrapper:
> def __init__(self):
> self._wrapped = SomeContextManager()
>
> def __enter__(self):
> print("Entering context")
> return self._wrapped.__enter__()
>
> def __exit__(self):
> self._wrapped.__exit__()
> print("Exited context")
>
>
> Now, if the wrapped contextmanager becomes a PEP 521 one with __suspend__
> and __resume__, the Wrapper class is broken, because it does not respect
> __suspend__ and __resume__. So actually this is a backwards compatiblity
> issue.

This is a known problem, and one of the main reasons that having a
truly transparent object proxy like
https://wrapt.readthedocs.io/en/latest/wrappers.html#object-proxy as
part of the standard library would be highly desirable.

Actually getting such a proxy defined, implemented, and integrated
isn't going to be easy though, so while Graham (Dumpleton, the author
of wrapt) is generally amenable to the idea, he doesn't have the time
or inclination to do that work himself.

In the meantime, we mostly work around the problem by defining new
protocols rather than extending existing ones, but it still means it
takes longer than it otherwise for full support for new interfaces to
ripple out through various object proxying libraries (especially for
hard-to-proxy protocols like the new asynchronous ones that require
particular methods to be defined as coroutines).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup optimization: script vs. service

2017-10-03 Thread Nick Coghlan
On 3 October 2017 at 03:02, Christian Heimes  wrote:
> On 2017-10-02 16:59, Barry Warsaw wrote:
>> On Oct 2, 2017, at 10:48, Christian Heimes  wrote:
>>>
>>> That approach could work, but I think that it is the wrong approach. I'd
>>> rather keep Python optimized for long-running processes and introduce a
>>> new mode / option to optimize for short-running scripts.
>>
>> What would that look like, how would it be invoked, and how would that 
>> change the behavior of the interpreter?
>
> I haven't given it much thought yet. Here are just some wild ideas:
>
> - add '-l' command line option (l for lazy)
> - in lazy mode, delay some slow operations (re compile, enum, ...)
> - delay some imports in lazy mode, e.g. with a deferred import proxy

I don't think is the right way to structure the defaults, since the
web services world is in the middle of moving back closer to the
CLI/CGI model, where a platform like AWS Lambda will take care of
spinning up language interpreter instances on demand, using them to
process a single request, and then discarding them.

It's also somewhat unreliable to pass command line options as part of
shebang lines, and packaging tools need to be able to generate shebang
lines that are compatible with a wide variety of Python
implementations

By contrast, long running Python services will typically be using some
form of WSGI server (whether that's mod_wsgi, uWSGI, gunicorn,
Twisted, tornado, or something else) that can choose to adjust *their*
defaults to force the underlying language runtime into an "eager state
initialisation" mode, even if the default setting is to respect
requests for lazy initialisation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASE] Python 3.6.3 is now available

2017-10-03 Thread Victor Stinner
Hi,

Good news: Python 3.6.3 has no more known security vulnerabilities ;-)

Python 3.6.3 fixes two security vulnerabilities:

"urllib FTP protocol stream injection"
https://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html

"Expat 2.2.3" (don't impact Linux, since Linux distros use the system
expat library)
https://python-security.readthedocs.io/vuln/expat_2.2.3.html

Note: I'm not sure that the vulnerabilities fixed in Expat 2.2.2 and
Expat 2.2.3 really impacted Python, since Python uses its own entropy
source to set the "hash secret", but well, it's usually safer to use a
more recent library version :-)

Victor

2017-10-03 22:06 GMT+02:00 Ned Deily :
> On behalf of the Python development community and the Python 3.6
> release team, I am happy to announce the availability of Python 3.6.3,
> the third maintenance release of Python 3.6.  Detailed information
> about the changes made in 3.6.3 can be found in the change log here:
>
> https://docs.python.org/3.6/whatsnew/changelog.html#python-3-6-3-final
>
> Please see "What’s New In Python 3.6" for more information about the
> new features in Python 3.6:
>
> https://docs.python.org/3.6/whatsnew/3.6.html
>
> You can download Python 3.6.3 here:
>
> https://www.python.org/downloads/release/python-363/
>
> The next maintenance release of Python 3.6 is expected to follow in
> about 3 months, around the end of 2017-12.  More information about the
> 3.6 release schedule can be found here:
>
> https://www.python.org/dev/peps/pep-0494/
>
> Enjoy!
>
> --
>   Ned Deily
>   n...@python.org -- []
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [RELEASE] Python 3.6.3 is now available

2017-10-03 Thread Ned Deily
On behalf of the Python development community and the Python 3.6
release team, I am happy to announce the availability of Python 3.6.3,
the third maintenance release of Python 3.6.  Detailed information
about the changes made in 3.6.3 can be found in the change log here:

https://docs.python.org/3.6/whatsnew/changelog.html#python-3-6-3-final

Please see "What’s New In Python 3.6" for more information about the
new features in Python 3.6:

https://docs.python.org/3.6/whatsnew/3.6.html

You can download Python 3.6.3 here:

https://www.python.org/downloads/release/python-363/

The next maintenance release of Python 3.6 is expected to follow in
about 3 months, around the end of 2017-12.  More information about the
3.6 release schedule can be found here:

https://www.python.org/dev/peps/pep-0494/

Enjoy!

--
  Ned Deily
  n...@python.org -- []

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-03 Thread Benjamin Peterson


On Tue, Oct 3, 2017, at 08:03, Barry Warsaw wrote:
> Guido van Rossum wrote:
> > There have been no further comments. PEP 552 is now accepted.
> > 
> > Congrats, Benjamin! Go ahead and send your implementation for review.Oops.
> > Let me try that again.
> 
> While I'm very glad PEP 552 has been accepted, it occurs to me that it
> will now be more difficult to parse the various pyc file formats from
> Python.  E.g. I used to be able to just open the pyc in binary mode,
> read all the bytes, and then lop off the first 8 bytes to get to the
> code object.  With the addition of the source file size, I now have to
> (maybe, if I have to also read old-style pyc files) lop off the front 12
> bytes, but okay.
> 
> With PEP 552, I have to do a lot more work to just get at the code
> object.  How many bytes at the front of the file do I need to skip past?
>  What about all the metadata at the front of the pyc, how do I interpret
> that if I want to get at it from Python code?

As Guido points out, the header is just now always 4 32-bit words rather
than 3. Not long ago we underwent the transition from 2-3 words without
widespread disaster.

> 
> Should the PEP 552 implementation add an API, probably to
> importlib.util, that would understand all current and future formats?
> Something like this perhaps?
> 
> class PycFileSpec:
> magic_number: bytes
> timestamp: Optional[bytes] # maybe an int? datetime?
> source_size: Optional[bytes]
> bit_field: Optional[bytes]
> code_object: bytes
> 
> def parse_pyc(path: str) -> PycFileSpec:

I'm not sure turning the implementation details of our internal formats
into APIs is the way to go.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Steve Dower

On 03Oct2017 0755, Antoine Pitrou wrote:

On Tue, 3 Oct 2017 08:36:55 -0600
Eric Snow  wrote:

On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou  wrote:

On Mon, 2 Oct 2017 22:15:01 -0400
Eric Snow  wrote:


I'm still not convinced that sharing synchronization primitives is
important enough to be worth including it in the PEP.  It can be added
later, or via an extension module in the meantime.  To that end, I'll
add a mechanism to the PEP for third-party types to indicate that they
can be passed through channels.  Something like
"obj.__channel_support__ = True".


How would that work?  If it's simply a matter of flipping a bit, why
don't we do it for all objects?


The type would also have to be safe to share between interpreters. :)


But what does it mean to be safe to share, while the exact degree
and nature of the isolation between interpreters (and also their
concurrent execution) is unspecified?

I think we need a sharing protocol, not just a flag.


The easiest such protocol is essentially:

* an object can represent itself as bytes (e.g. generate a bytes object 
representing some global token, such as a kernel handle or memory address)

* those bytes are sent over the standard channel
* the object can instantiate itself from those bytes (e.g. wrap the 
existing handle, create a memoryview over the same block of memory, etc.)
* cross-interpreter refcounting is either ignored (because the kernel is 
refcounting the resource) or manual (by including more shared info in 
the token)


Since this is trivial to implement over the basic bytes channel, and 
doesn't even require a standard protocol except for convenience, Eric 
decided to avoid blocking the core functionality on this. I'm inclined 
to agree - get the basic functionality supported and let people build on 
it before we try to lock down something we don't fully understand yet.


About the only thing that seems to be worth doing up-front is some sort 
of pending-call callback mechanism between interpreters, but even that 
doesn't need to block the core functionality (you can do it trivially 
with threads and another channel right now, and there's always room to 
make something more efficient later).


There are plenty of smart people out there who can and will figure out 
the best way to design this. By giving them the tools and the ability to 
design something awesome, we're more likely to get something awesome 
than by committing to a complete design now. Right now, they're all 
blocked on the fact that subinterpreters are incredibly hard to start 
running, let alone experiment with. Eric's PEP will fix that part and 
enable others to take it from building blocks to powerful libraries.


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-03 Thread Serhiy Storchaka

26.09.17 23:47, Guido van Rossum пише:
I've read the current version of PEP 552 over and I think everything 
looks good for acceptance. I believe there are no outstanding objections 
(or they have been adequately addressed in responses).


Therefore I intend to accept PEP 552 this Friday, unless grave 
objections are raised on this mailing list (python-dev).


Congratulations Benjamin. Gotta love those tristate options!


While PEP 552 is accepted, I would want to see some changes.

1. Increase the size of the constant part of the signature to at least 
32 bits. Currently only the third and forth bytes are constant, and they 
are '\r\n', that is often occurred in text files. The first two bytes 
can be different in every Python version. This make hard detecting pyc 
files by utilities like file (1).


2. Split the "version" of pyc files by "major" and "minor" parts. Every 
major version is incompatible with other major versions, the interpreter 
accepts only one particular major version. It can't be changed in a 
bugfix release. But all minor versions inside the same major version are 
forward and backward compatible. The interpreter should be able to 
execute pyc file with arbitrary minor version, but it can use minor 
version of pyc file to handle errors in older versions. Minor version 
can be changed in a bugfix release. I hope this can help us with issues 
like https://bugs.python.org/issue29537. Currently 3.5 supports two 
magic numbers.


If we change the pyc format, it would be easy to make the above changes.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-03 Thread Serhiy Storchaka

03.10.17 18:15, Guido van Rossum пише:
It's really not that hard. You just check the magic number and if it's 
the new one, skip 4 words. No need to understand the internals of the 
header.


Hence you should know all old magic numbers to determine if the read 
magic number is the new one. Right?


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-03 Thread Guido van Rossum
I'm fine with adding an API, though I don't think that an API that knows
about all current (historic) and future formats belongs in importlib.util
-- that module only concerns itself with the *current* format.

In terms of the API design I'd make take an IO[bytes] and just read and
parse the header, so after that you can use marshal.load() straight from
the file object. File size, mtime and bitfield should be represented as
ints (the parser should take care of endianness).The hash should be a bytes.

On Tue, Oct 3, 2017 at 8:24 AM, Antoine Pitrou  wrote:

> On Tue, 3 Oct 2017 08:15:04 -0700
> Guido van Rossum  wrote:
> > It's really not that hard. You just check the magic number and if it's
> the
> > new one, skip 4 words. No need to understand the internals of the header.
>
> Still, I agree with Barry that an API would be nice.
>
> Regards
>
> Antoine.
>
> >
> > On Oct 3, 2017 08:06, "Barry Warsaw"  wrote:
> >
> > > Guido van Rossum wrote:
> > > > There have been no further comments. PEP 552 is now accepted.
> > > >
> > > > Congrats, Benjamin! Go ahead and send your implementation for
> > > review.Oops.
> > > > Let me try that again.
> > >
> > > While I'm very glad PEP 552 has been accepted, it occurs to me that it
> > > will now be more difficult to parse the various pyc file formats from
> > > Python.  E.g. I used to be able to just open the pyc in binary mode,
> > > read all the bytes, and then lop off the first 8 bytes to get to the
> > > code object.  With the addition of the source file size, I now have to
> > > (maybe, if I have to also read old-style pyc files) lop off the front
> 12
> > > bytes, but okay.
> > >
> > > With PEP 552, I have to do a lot more work to just get at the code
> > > object.  How many bytes at the front of the file do I need to skip
> past?
> > >  What about all the metadata at the front of the pyc, how do I
> interpret
> > > that if I want to get at it from Python code?
> > >
> > > Should the PEP 552 implementation add an API, probably to
> > > importlib.util, that would understand all current and future formats?
> > > Something like this perhaps?
> > >
> > > class PycFileSpec:
> > > magic_number: bytes
> > > timestamp: Optional[bytes] # maybe an int? datetime?
> > > source_size: Optional[bytes]
> > > bit_field: Optional[bytes]
> > > code_object: bytes
> > >
> > > def parse_pyc(path: str) -> PycFileSpec:
> > >
> > > Cheers,
> > > -Barry
> > >
> > > ___
> > > Python-Dev mailing list
> > > Python-Dev@python.org
> > > https://mail.python.org/mailman/listinfo/python-dev
> > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> > > guido%40python.org
> > >
> >
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make re.compile faster

2017-10-03 Thread Serhiy Storchaka

03.10.17 17:21, Barry Warsaw пише:

What if the compiler could recognize constant arguments to re.compile() and do 
the regex compilation at that point?  You’d need a way to represent the 
precompiled regex in the bytecode, and it would technically be a semantic 
change since regex problems would be discovered at compilation time instead of 
runtime - but that might be a good thing.  You could also make that an 
optimization flag for opt-in, or a flag to allow opt out.


The representation of the compiled regex is an implementation detail. It 
is even not exposed since the regex is compiled. And it is changed 
faster than bytecode and marshal format. It can be changed even in a 
bugfix release.


For implementing this idea we need:

1. Invent a universal portable regex bytecode. It shouldn't contain 
flaws and limitations and should support all features of Unicode regexps 
and possible extensions. It should also predict future Unicode changes 
and be able to code them.


2. Add support of regex objects in marshal format.

3. Implement an advanced AST optimizer.

4. Rewrite the regex compiler in C or make the AST optimizer able to 
execute Python code.


I think we are far away from this. Any of the above problems is much 
larger and can give larger benefit than changing several microseconds at 
startup.


Forget about this. Let's first get rid of GIL!

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-03 Thread Antoine Pitrou
On Tue, 3 Oct 2017 08:15:04 -0700
Guido van Rossum  wrote:
> It's really not that hard. You just check the magic number and if it's the
> new one, skip 4 words. No need to understand the internals of the header.

Still, I agree with Barry that an API would be nice.

Regards

Antoine.

> 
> On Oct 3, 2017 08:06, "Barry Warsaw"  wrote:
> 
> > Guido van Rossum wrote:  
> > > There have been no further comments. PEP 552 is now accepted.
> > >
> > > Congrats, Benjamin! Go ahead and send your implementation for  
> > review.Oops.  
> > > Let me try that again.  
> >
> > While I'm very glad PEP 552 has been accepted, it occurs to me that it
> > will now be more difficult to parse the various pyc file formats from
> > Python.  E.g. I used to be able to just open the pyc in binary mode,
> > read all the bytes, and then lop off the first 8 bytes to get to the
> > code object.  With the addition of the source file size, I now have to
> > (maybe, if I have to also read old-style pyc files) lop off the front 12
> > bytes, but okay.
> >
> > With PEP 552, I have to do a lot more work to just get at the code
> > object.  How many bytes at the front of the file do I need to skip past?
> >  What about all the metadata at the front of the pyc, how do I interpret
> > that if I want to get at it from Python code?
> >
> > Should the PEP 552 implementation add an API, probably to
> > importlib.util, that would understand all current and future formats?
> > Something like this perhaps?
> >
> > class PycFileSpec:
> > magic_number: bytes
> > timestamp: Optional[bytes] # maybe an int? datetime?
> > source_size: Optional[bytes]
> > bit_field: Optional[bytes]
> > code_object: bytes
> >
> > def parse_pyc(path: str) -> PycFileSpec:
> >
> > Cheers,
> > -Barry
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> > guido%40python.org
> >  
> 



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-03 Thread Guido van Rossum
It's really not that hard. You just check the magic number and if it's the
new one, skip 4 words. No need to understand the internals of the header.

On Oct 3, 2017 08:06, "Barry Warsaw"  wrote:

> Guido van Rossum wrote:
> > There have been no further comments. PEP 552 is now accepted.
> >
> > Congrats, Benjamin! Go ahead and send your implementation for
> review.Oops.
> > Let me try that again.
>
> While I'm very glad PEP 552 has been accepted, it occurs to me that it
> will now be more difficult to parse the various pyc file formats from
> Python.  E.g. I used to be able to just open the pyc in binary mode,
> read all the bytes, and then lop off the first 8 bytes to get to the
> code object.  With the addition of the source file size, I now have to
> (maybe, if I have to also read old-style pyc files) lop off the front 12
> bytes, but okay.
>
> With PEP 552, I have to do a lot more work to just get at the code
> object.  How many bytes at the front of the file do I need to skip past?
>  What about all the metadata at the front of the pyc, how do I interpret
> that if I want to get at it from Python code?
>
> Should the PEP 552 implementation add an API, probably to
> importlib.util, that would understand all current and future formats?
> Something like this perhaps?
>
> class PycFileSpec:
> magic_number: bytes
> timestamp: Optional[bytes] # maybe an int? datetime?
> source_size: Optional[bytes]
> bit_field: Optional[bytes]
> code_object: bytes
>
> def parse_pyc(path: str) -> PycFileSpec:
>
> Cheers,
> -Barry
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make re.compile faster

2017-10-03 Thread Stefan Behnel
INADA Naoki schrieb am 03.10.2017 um 05:29:
> Before deferring re.compile, can we make it faster?

I tried cythonizing both sre_compile.py and sre_parse.py, which gave me a
speedup of a bit more than 2x. There is definitely space left for further
improvements since I didn't know much about the code, and also didn't dig
very deeply. I used this benchmark to get uncached patterns:

[re_compile("[a-z]{%d}[0-9]+[0-9a-z]*[%d-9]" % (i, i%8))
 for i in range(2)]

Time for Python master version:
2.14 seconds
Time for Cython compiled version:
1.05 seconds

I used the latest Cython master for it, as I had to make a couple of type
inference improvements for bytearray objects along the way.

Cython's master branch is here:
https://github.com/cython/cython

My CPython changes are here:
https://github.com/scoder/cpython/compare/master...scoder:cythonized_sre_compile

They are mostly just external type declarations and a tiny type inference
helper fix. I could have used the more maintainable PEP-484 annotations for
local variables right in the .py files, but AFAIK, those are still not
wanted in the standard library. And they also won't suffice for switching
to extension types in sre_parse.

Together with the integer flag changes, that could give a pretty noticible
improvement overall.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intention to accept PEP 552 soon (deterministic pyc files)

2017-10-03 Thread Barry Warsaw
Guido van Rossum wrote:
> There have been no further comments. PEP 552 is now accepted.
> 
> Congrats, Benjamin! Go ahead and send your implementation for review.Oops.
> Let me try that again.

While I'm very glad PEP 552 has been accepted, it occurs to me that it
will now be more difficult to parse the various pyc file formats from
Python.  E.g. I used to be able to just open the pyc in binary mode,
read all the bytes, and then lop off the first 8 bytes to get to the
code object.  With the addition of the source file size, I now have to
(maybe, if I have to also read old-style pyc files) lop off the front 12
bytes, but okay.

With PEP 552, I have to do a lot more work to just get at the code
object.  How many bytes at the front of the file do I need to skip past?
 What about all the metadata at the front of the pyc, how do I interpret
that if I want to get at it from Python code?

Should the PEP 552 implementation add an API, probably to
importlib.util, that would understand all current and future formats?
Something like this perhaps?

class PycFileSpec:
magic_number: bytes
timestamp: Optional[bytes] # maybe an int? datetime?
source_size: Optional[bytes]
bit_field: Optional[bytes]
code_object: bytes

def parse_pyc(path: str) -> PycFileSpec:

Cheers,
-Barry

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-03 Thread Stéfane Fermigier
Hi,

On Mon, Oct 2, 2017 at 11:42 AM, Raymond Hettinger <
raymond.hettin...@gmail.com> wrote:
>
>
> I don't expect to find anything that would help users of Django, Flask,
> and Bottle since those are typically long-running apps where we value
> response time more than startup time.
>

Actually, as web developers, we also value startup time when in development
mode, specially when we are in "hot reload" mode (when the app restarts
automatically each time we save a development file).

In my mid-sized projects (~10 kE LOC, ~150 pip dependencies) it takes
between 5 and 10s. This is probably the upper limit to "stay in flow".

Same for unit tests.

There is this famous Gary Bernhardt talk [https://youtu.be/RAxiiRPHS9k?t=12m
] he argues that a whole unit test suite should be able to run in < 1s and
actually show examples where the developer is able to run hundreds of tests
in less that 1s.

Note: In my projects, it take 3-4 seconds just to collect them (using
pytest --collect-only), but I suspect Python's startup time is only
responsible for a small part of this delay. Still, this is an important
point to keep in mind.

  S.

-- 
Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier -
http://linkedin.com/in/sfermigier
Founder & CEO, Abilian - Enterprise Social Software -
http://www.abilian.com/
Chairman, Free&OSS Group / Systematic Cluster -
http://www.gt-logiciel-libre.org/
Co-Chairman, National Council for Free & Open Source Software (CNLL) -
http://cnll.fr/
Founder & Organiser, PyData Paris - http://pydata.fr/
---
“You never change things by fighting the existing reality. To change
something, build a new model that makes the existing model obsolete.” — R.
Buckminster Fuller
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Antoine Pitrou
On Tue, 3 Oct 2017 08:36:55 -0600
Eric Snow  wrote:
> On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou  wrote:
> > On Mon, 2 Oct 2017 22:15:01 -0400
> > Eric Snow  wrote:  
> >>
> >> I'm still not convinced that sharing synchronization primitives is
> >> important enough to be worth including it in the PEP.  It can be added
> >> later, or via an extension module in the meantime.  To that end, I'll
> >> add a mechanism to the PEP for third-party types to indicate that they
> >> can be passed through channels.  Something like
> >> "obj.__channel_support__ = True".  
> >
> > How would that work?  If it's simply a matter of flipping a bit, why
> > don't we do it for all objects?  
> 
> The type would also have to be safe to share between interpreters. :)

But what does it mean to be safe to share, while the exact degree
and nature of the isolation between interpreters (and also their
concurrent execution) is unspecified?

I think we need a sharing protocol, not just a flag.  We also need to
think carefully about that protocol, so that it does not imply
unnecessary memory copies.  Therefore I think the protocol should be
something like the buffer protocol, that allows to acquire and release
a set of shared memory areas, but without imposing any semantics onto
those memory areas (each type implementing its own semantics).  And
there needs to be a dedicated reference counting for object shares, so
that the original object can be notified when all its shares have
vanished.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Eric Snow
On Tue, Oct 3, 2017 at 5:00 AM, Antoine Pitrou  wrote:
> On Mon, 2 Oct 2017 22:15:01 -0400
> Eric Snow  wrote:
>>
>> I'm still not convinced that sharing synchronization primitives is
>> important enough to be worth including it in the PEP.  It can be added
>> later, or via an extension module in the meantime.  To that end, I'll
>> add a mechanism to the PEP for third-party types to indicate that they
>> can be passed through channels.  Something like
>> "obj.__channel_support__ = True".
>
> How would that work?  If it's simply a matter of flipping a bit, why
> don't we do it for all objects?

The type would also have to be safe to share between interpreters. :)
Eventually I'd like to make that work for all immutable objects (and
immutable containers thereof), but until then each type must be
adapted individually.  The PEP starts off with just Bytes.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make re.compile faster

2017-10-03 Thread Antoine Pitrou
On Tue, 3 Oct 2017 10:21:55 -0400
Barry Warsaw  wrote:
> On Oct 3, 2017, at 01:41, Serhiy Storchaka  wrote:
> > 
> > 03.10.17 06:29, INADA Naoki пише:  
> >> More optimization can be done with implementing sre_parse and sre_compile 
> >> in C.
> >> But I have no time for it in this year.  
> > 
> > And please don't do this! This would make maintaining the re module hard. 
> > The performance of the compiler is less important than correctness and 
> > performance of matching and searching.  
> 
> What if the compiler could recognize constant arguments to re.compile() and 
> do the regex compilation at that point?  You’d need a way to represent the 
> precompiled regex in the bytecode, and it would technically be a semantic 
> change since regex problems would be discovered at compilation time instead 
> of runtime - but that might be a good thing.  You could also make that an 
> optimization flag for opt-in, or a flag to allow opt out.

We need a regex literal!
With bytes, formatted, and bytes formatted variants.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make re.compile faster

2017-10-03 Thread Barry Warsaw
On Oct 3, 2017, at 01:41, Serhiy Storchaka  wrote:
> 
> 03.10.17 06:29, INADA Naoki пише:
>> More optimization can be done with implementing sre_parse and sre_compile in 
>> C.
>> But I have no time for it in this year.
> 
> And please don't do this! This would make maintaining the re module hard. The 
> performance of the compiler is less important than correctness and 
> performance of matching and searching.

What if the compiler could recognize constant arguments to re.compile() and do 
the regex compilation at that point?  You’d need a way to represent the 
precompiled regex in the bytecode, and it would technically be a semantic 
change since regex problems would be discovered at compilation time instead of 
runtime - but that might be a good thing.  You could also make that an 
optimization flag for opt-in, or a flag to allow opt out.

-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make re.compile faster

2017-10-03 Thread Barry Warsaw
On Oct 3, 2017, at 01:35, Serhiy Storchaka  wrote:
> 
>> diff --git a/Lib/string.py b/Lib/string.py
>> index b46e60c38f..fedd92246d 100644
>> --- a/Lib/string.py
>> +++ b/Lib/string.py
>> @@ -81,7 +81,7 @@ class Template(metaclass=_TemplateMetaclass):
>>  delimiter = '$'
>>  idpattern = r'[_a-z][_a-z0-9]*'
>>  braceidpattern = None
>> -flags = _re.IGNORECASE
>> +flags = _re.IGNORECASE | _re.ASCII
>>  def __init__(self, template):
>>  self.template = template
>> patched:
>> import time:  1191 |   8479 | string
>> Of course, this patch is not backward compatible. [a-z] doesn't match with 
>> 'ı' or 'ſ' anymore.
>> But who cares?
> 
> This looks like a bug fix. I'm wondering if it is worth to backport it to 
> 3.6. But the change itself can break a user code that changes idpattern 
> without touching flags. There is other way, but it should be discussed on the 
> bug tracker.

It’s definitely an API change, as I mention in the bug tracker.  It’s 
*probably* safe in practice given that the documentation does say that 
identifiers are ASCII by default, but it also means that a client who wants to 
use Unicode previously didn’t have to touch flags, and after this change would 
now have to do so.  `flags` is part of the public API.

Maybe for subclasses you could say that if delimiter, idpattern, or 
braceidpattern are anything but the defaults, fall back to just re.IGNORECASE.

Cheers,
-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-10-03 Thread Antoine Pitrou
On Mon, 2 Oct 2017 22:15:01 -0400
Eric Snow  wrote:
> 
> I'm still not convinced that sharing synchronization primitives is
> important enough to be worth including it in the PEP.  It can be added
> later, or via an extension module in the meantime.  To that end, I'll
> add a mechanism to the PEP for third-party types to indicate that they
> can be passed through channels.  Something like
> "obj.__channel_support__ = True".

How would that work?  If it's simply a matter of flipping a bit, why
don't we do it for all objects?

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Investigating time for `import requests`

2017-10-03 Thread Koos Zevenhoven
I've probably missed a lot of this discussion, but this lazy import
discussion confuses me. We already have both eager import (import at the
top of the file), and lazy import (import right before use).

The former is good when you know you need the module, and the latter is
good when you having the overhead at first use is preferable over having
the overhead at startup. But like Raymond was saying, this is of course
especially relevant when that import is likely never used.

Maybe the fact that the latter is not recommended gives people the feeling
that we don't have lazy imports, although we do.

What we *don't* have, however, is *partially* lazy imports and partially
executed code, something like:

on demand:
class Foo:
# a lot of stuff here

def foo_function(my_foo, bar):
# more stuff here


When executed, the `on demand` block would only keep track of which names
are being bound to (here, "Foo" and "foo_function"), and on the lookup of
those names in the namespace, the code would actually be run.

Then you could also do

on demand:
import sometimes_needed_module

Or

on demand:
from . import all, submodules, of, this, package


This would of course drift away from "namespaces are simply dicts". But who
cares, if they still provide the dict interface. See e.g. this example with
automatic lazy imports:

https://gist.github.com/k7hoven/21c5532ce19b306b08bb4e82cfe5a609


Another thing we *don't* have is unimporting. What if I know that I'm only
going to need some particular module in this one initialization function.
Why should I keep it in memory for the whole lifetime of the program?

––Koos


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Make re.compile faster

2017-10-03 Thread Victor Stinner
> * RegexFlag.__and__ and __new__ is called very often.

Yeah, when the re module was modified to use enums for flags,
re.compile() became slower:
https://pyperformance.readthedocs.io/cpython_results_2017.html#slowdown

https://speed.python.org/timeline/#/?exe=12&ben=regex_compile&env=1&revs=200&equid=off&quarts=on&extr=on

It would be nice if internally we could use integers again to reduce
this overhead, without loosing the nice representation:

>>> re.I


Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com