[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread STINNER Victor

STINNER Victor  added the comment:

Serhiy: "PR 3867 looks unpythonic to me. We usually don't check the type of 
arguments. This complicate and slow down a code. Do you have a realistic 
example when the current behavior harms?"

No, I don't have any realistic example. I just noticed the bug and I was 
surprised. In practice, flags=2.0 instead of flags=2 is not totally wrong, it's 
okish to accept 2.0. Let's just forget this corner case. I abandon my change.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread STINNER Victor

STINNER Victor  added the comment:

Hum, I ran again my microbenchmark on re.compile() cache:

haypo@selma$ ./python -m perf timeit -s 'import re; re_compile=re.compile' 
're_compile("a", flags=2)' --duplicate=1024 -o ref.json  --inherit=PYTHONPATH -v

Sadly, the commit c1c47c166b1012d34f2c6e111ee9ccb5c4d12de7 made the cache 
slower:

Mean +- std dev: [ref] 364 ns +- 26 ns -> [patch] 545 ns +- 19 ns: 1.50x slower 
(+50%)

Just to check, I reverted the change on Scanner, the benchmark is still "560 ns 
+- 27 ns".


"if isinstance(flags, RegexFlag): flags = flags.value" added 181 nanoseconds? A 
quick "isinstance(flags, RegexFlag)" timeit microbenchmark says that this 
operation has a cost of 46.2 ns (+- 1.6 ns). Benchmarks are strange, as usual 
:-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

PR 3867 looks unpythonic to me. We usually don't check the type of arguments. 
This complicate and slow down a code.

Do you have a realistic example when the current behavior harms?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread INADA Naoki

INADA Naoki  added the comment:

Instead caching type(flags), how about this?

  if not isinstance(flags, int):
  raise TypeError(f"flags must be int or RegexFlag, got {flags!r}")
  flags = int(flags)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread STINNER Victor

STINNER Victor  added the comment:

My PR 3867 fixes a corner case when the re.compile() is misused ("on 
purpose"?). I'm going to reject (abandon) my own PR since it makes re.compile() 
slower:

Mean +- std dev: [ref] 364 ns +- 6 ns -> [patch] 459 ns +- 10 ns: 1.26x slower 
(+26%)

@Serhiy, @Naoki: Are you ok to reject this PR and keep the corner case bug?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
type:  -> performance

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Thanks Naoki!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread INADA Naoki

Change by INADA Naoki :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-05 Thread INADA Naoki

INADA Naoki  added the comment:


New changeset c1c47c166b1012d34f2c6e111ee9ccb5c4d12de7 by INADA Naoki in branch 
'master':
bpo-31671: re: Convert RegexFlag to int before compile (GH-3862)
https://github.com/python/cpython/commit/c1c47c166b1012d34f2c6e111ee9ccb5c4d12de7


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread Ethan Furman

Ethan Furman  added the comment:

INADA Naoki said:
> But while new instance is not created each time, 4 Python method
> calls (e,g.  IntFlag.__and__() -> IntFlag.__new__()
> -> IntFlag._missing_() -> IntFlag._create_pseudo_member_())
> are much slower than int & int.

Only the first two calls always happen -- the last two calls only happen for 
numbers that haven't been seen yet.  And yes, two Python level calls are much 
slower than an int & int.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread STINNER Victor

STINNER Victor  added the comment:

The perf module always starts with a "warmup" run to fill caches. If enum
has a cache, it should be filled automatically.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread INADA Naoki

INADA Naoki  added the comment:

> IntFlag.__and__ does not create a new instance every time -- all new 
> instances are cached in the IntFlag machinery (so RegexFlag(7) is only 
> created once).

I'm sorry, I misunderstood.
But while new instance is not created each time, 4 Python method calls
(e,g.  IntFlag.__and__() -> IntFlag.__new__() -> IntFlag._missing_() -> 
IntFlag._create_pseudo_member_()) are much slower than int & int.

> If all the RegexFlag combinations are created before the regex compile 
> benchmark do we still see a speed-up?

I believe that's what Victor benchmarked.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread Ethan Furman

Ethan Furman  added the comment:

IntFlag.__and__ does not create a new instance every time -- all new instances 
are cached in the IntFlag machinery (so RegexFlag(7) is only created once).

If all the RegexFlag combinations are created before the regex compile 
benchmark do we still see a speed-up?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread STINNER Victor

STINNER Victor  added the comment:

I don't really care if Python 3.7 is still slower than 3.6 with PR 3862. The 
speedup is significant, the change is short and safe, so the PR LGTM :-) We can 
implement further optimizations later ;-)

By the way, speed.python.org can be used to track performances over time. 
Sadly, I didn't have much time to take care of the website: run manually to job 
to draw new plots :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

It makes sense to report the performance gain only in comparison with the 
previous released version. I expect that re compiling is slower in 3.7 due to 
new features and optimizations. Generating more optimal code takes a time.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread STINNER Victor

STINNER Victor  added the comment:

> Did you compare the benchmarking results against 3.7 or 3.6?

My lasted benchmark is on the master branch: original code ("ref") vs code 
patched with PR 3862 ("patch").

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Did you compare the benchmarking results against 3.7 or 3.6?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread STINNER Victor

STINNER Victor  added the comment:

Naoki: "I've added news entry about it."

Nice. You might also mention the "optimization" in What's New in Python 3.7 
doc, in the Optimisations section. The issue here is that it's only faster than 
Python 3.6, but Python 3.6 was slower than 3.5 :-) The "optimization" just 
makes Python 3.7 as fast as Python 3.5 ... :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread INADA Naoki

INADA Naoki  added the comment:

Thank you for benchmarking.  I've added news entry about it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-04 Thread STINNER Victor

STINNER Victor  added the comment:

Nice speedup!

I ran a benchmark on PR 3862:

 1002  ./python 
~/prog/python/performance/performance/benchmarks/bm_regex_compile.py 
--inherit=PYTHONPATH -v -o patch.json
 1003  git co master
 1004  make
 1005  ./python 
~/prog/python/performance/performance/benchmarks/bm_regex_compile.py 
--inherit=PYTHONPATH -v -o ref.json
 1007  ./python -m perf compare_to ref.json patch.json 

Mean +- std dev: [ref] 396 ms +- 16 ms -> [patch] 347 ms +- 8 ms: 1.14x faster 
(-12%)

It's the following benchmark on *uncached* regular expressions. So it really 
measures re.compile() performance, not the cache.
https://pyperformance.readthedocs.io/benchmarks.html#regex-compile

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread INADA Naoki

INADA Naoki  added the comment:

> When proposing to undo recent decisions, please add the people to the nosy 
> list who were involved in making that decision in the first place.

I don't propose reverting IntFlag.
I just propose convert IntFlag to int in re module, before passing it to 
sre_compile.
So this change is not visible to re users.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

When proposing to undo recent decisions, please add the people to the nosy list 
who were involved in making that decision in the first place.

--
nosy: +ethan.furman, gvanrossum, rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

I don't think there is a problem which is worth to be solved by PR 3867. It is 
very unlikely that anyone uses re functions with a pattern and float flags 
which accidentally matches already cached valid pattern and integer flag. If 
float value is passed as a flag it is likely is passed the first time. 
Fractional floats are rejected in any case. This is different from the problem 
with PR 3862.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread Barry A. Warsaw

Change by Barry A. Warsaw :


--
nosy: +barry

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread STINNER Victor

STINNER Victor  added the comment:

Serhiy: "Victor, how large is performance regression of your patch?"

I tested bm_regex_compile.py of the Python performance benchmark suite:
https://pyperformance.readthedocs.io/benchmarks.html#regex-compile

My patch has no impact on *uncached* re.compile() performances according to 
this benchmark.

haypo@selma$ ./python -m perf compare_to ref.json patch.json  
Benchmark hidden because not significant (1): regex_compile

haypo@selma$ ./python -m perf compare_to ref.json patch.json  -v
Mean +- std dev: [ref] 386 ms +- 6 ms -> [patch] 387 ms +- 6 ms: 1.00x slower 
(+0%)
Not significant!

--

On a microbenchmark on the *cache* itself, the difference is visible:

 1018  ./python -m perf timeit -s 'import re; re_compile=re.compile' 
're_compile("a", flags=2)' --duplicate=1024 -o ref.json  --inherit=PYTHONPATH -v
 1019  git co re_compile_flags_type 
 1020  make
 1021  ./python -m perf timeit -s 'import re; re_compile=re.compile' 
're_compile("a", flags=2)' --duplicate=1024 -o patch.json  --inherit=PYTHONPATH 
-v
 1022  ./python -m perf compare_to ref.json patch.json  
 1023  git diff master..

haypo@selma$ ./python -m perf compare_to ref.json patch.json  
Mean +- std dev: [ref] 364 ns +- 6 ns -> [patch] 459 ns +- 10 ns: 1.26x slower 
(+26%)

If you replace type(flags) with flags.__class__, the change has no impact on 
performances :-) But obj.__class__ can be different than type(obj).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Victor, how large is performance regression of your patch?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread STINNER Victor

STINNER Victor  added the comment:

> :) This is due to caching. 2.0 == 2.

I proposed PR 3867 to fix the re.compile() cache: check flags type.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread STINNER Victor

Change by STINNER Victor :


--
pull_requests: +3847

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread STINNER Victor

STINNER Victor  added the comment:

Oh, right. Strange.

>>> re.compile("e", flags=2.0)
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/haypo/prog/python/master/Lib/re.py", line 240, in compile
return _compile(pattern, flags)
  File "/home/haypo/prog/python/master/Lib/re.py", line 288, in _compile
p = sre_compile.compile(pattern, flags)
  File "/home/haypo/prog/python/master/Lib/sre_compile.py", line 747, in compile
p = sre_parse.parse(p, flags)
  File "/home/haypo/prog/python/master/Lib/sre_parse.py", line 890, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
TypeError: unsupported operand type(s) for &: 'float' and 'int'

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

:) This is due to caching. 2.0 == 2.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread STINNER Victor

STINNER Victor  added the comment:

Oh, Python already accepts floating point numbers:

haypo@selma$ ./python
Python 3.7.0a1+ (heads/master-dirty:efb560eee2, Oct  3 2017, 12:15:58) 
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile("[a-z]", flags=re.I).match("A")
<_sre.SRE_Match object; span=(0, 1), match='A'>
>>> re.I

>>> re.compile("[a-z]", flags=2).match("A")
<_sre.SRE_Match object; span=(0, 1), match='A'>

# and now a float
>>> re.compile("[a-z]", flags=2.0).match("A")
<_sre.SRE_Match object; span=(0, 1), match='A'>

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread STINNER Victor

Change by STINNER Victor :


--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

See also issue28637. Using IntFlag in the re module impacted the Python startup 
time. This was "fixed" by getting rid of re in site.py.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread INADA Naoki

Change by INADA Naoki :


--
keywords: +patch
pull_requests: +3842
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31671] IntFlag makes re.compile slower

2017-10-03 Thread INADA Naoki

New submission from INADA Naoki :

flags exposed by re module are IntFlag from Python 3.4.

Since it is passed to underlying sre_parse and sre_compile,
tests in loop (e.g. `flags & SRE_FLAG_IGNORECASE`) calls
IntFlag.__and__ and creates new instance everytime.

It makes re.compile() slower.

--
components: Library (Lib)
messages: 303594
nosy: inada.naoki
priority: normal
severity: normal
status: open
title: IntFlag makes re.compile slower
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com