[issue31672] string.Template should use re.ASCII flag

2018-01-04 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Results.

There are 4 cases:

1) default idpattern and flags
2) overridden only idpattern
3) overridden only flags
4) overridden both idpattern and flags

The case 1 was the one that was broken when this issue was opened. The initial 
Inada's version fixed the case 1, but broke the case 2. His final version 
(applied also to 3.6) fixed the case 1, but broke the case 3. This is a win, 
because cases 1 and 2 look much more common than the case 3. And finally PR 
5099 has fixed also the case 3. The case 4 obviously is not affected by any 
changes of default values.

Now all four cases are correct in 3.7 and the only broken case in 3.6 is the 
uncommon case 3.

--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2018-01-04 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:


New changeset 87be28f4a1c5b76926c71a3d9f92503f9eb82d51 by Serhiy Storchaka in 
branch 'master':
bpo-31672: Restore the former behavior when override flags in Template. (#5099)
https://github.com/python/cpython/commit/87be28f4a1c5b76926c71a3d9f92503f9eb82d51


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2018-01-04 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

PR 5099 reverts in 3.7 a subtle behavior change made by the fix for this issue. 
Before merging the fix setting flags to 0 in a Template subclass made the 
default pattern matching only lower case letters. Currently setting flags to 0 
doesn't have any effect.

--
resolution: fixed -> 
stage: resolved -> 
status: closed -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2018-01-04 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
pull_requests: +4967

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-11-21 Thread Barry A. Warsaw

Change by Barry A. Warsaw :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-11-21 Thread Barry A. Warsaw

Barry A. Warsaw  added the comment:


New changeset e256b408889eba867e1d90e5e1a0904843256255 by Barry Warsaw in 
branch 'master':
bpo-31672 - Add one last minor clarification for idpattern (#4483)
https://github.com/python/cpython/commit/e256b408889eba867e1d90e5e1a0904843256255


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-11-20 Thread Barry A. Warsaw

Barry A. Warsaw  added the comment:

@inada.naoki - I think changing this to ?a will make things more clear.  I 
submitted a PR for that, and then once it lands, we can close this issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-11-20 Thread Barry A. Warsaw

Change by Barry A. Warsaw :


--
pull_requests: +4420

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-11-16 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Is something left to do here?

In 3.7 r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)' can be replaced with 
r'(?a:[_a-z][_a-z0-9]*)' if it will add clarity.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-14 Thread INADA Naoki

INADA Naoki  added the comment:


New changeset 073150db39408c1800e4b9e895ad0b0e195f1056 by INADA Naoki in branch 
'master':
bpo-31672: doc: Remove one sentence from library/string.rst (GH-3990)
https://github.com/python/cpython/commit/073150db39408c1800e4b9e895ad0b0e195f1056


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-13 Thread INADA Naoki

Change by INADA Naoki :


--
pull_requests: +3966

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-13 Thread INADA Naoki

INADA Naoki  added the comment:


New changeset 7060380d577690a40ebc201c0725076349e977cd by INADA Naoki in branch 
'3.6':
bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers 
(GH-3872)
https://github.com/python/cpython/commit/7060380d577690a40ebc201c0725076349e977cd


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-13 Thread INADA Naoki

Change by INADA Naoki :


--
pull_requests: +3958

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-13 Thread INADA Naoki

INADA Naoki  added the comment:


New changeset b22273ec5d1992b0cbe078b887427ae9977dfb78 by INADA Naoki in branch 
'master':
bpo-31672: Fix string.Template accidentally matched non-ASCII identifiers 
(GH-3872)
https://github.com/python/cpython/commit/b22273ec5d1992b0cbe078b887427ae9977dfb78


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-10 Thread Barry A. Warsaw

Barry A. Warsaw  added the comment:

With some comments to clarify of course.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-10 Thread Barry A. Warsaw

Barry A. Warsaw  added the comment:

On Oct 6, 2017, at 15:04, Serhiy Storchaka  wrote:
> Serhiy Storchaka  added the comment:
> 
> Another solution (works in 3.6 too): set idpattern to 
> r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'.
> 
> This looks pretty weird, setting the re.IGNORECASE flag and then unsetting 
> it. But it works, and don't break the user code that changes idpattern 
> without changing flags.

Oh, I think I like that :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-06 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Another solution (works in 3.6 too): set idpattern to 
r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'.

This looks pretty weird, setting the re.IGNORECASE flag and then unsetting it. 
But it works, and don't break the user code that changes idpattern without 
changing flags.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-04 Thread Barry A. Warsaw

Barry A. Warsaw  added the comment:

On Oct 4, 2017, at 10:05, Serhiy Storchaka  wrote:
> 
> See issue31690. But this solution can be used only in 3.7.

That’s fine.  I don’t think this is important enough to backport.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-04 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

See issue31690. But this solution can be used only in 3.7.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-04 Thread Barry A. Warsaw

Barry A. Warsaw  added the comment:

On Oct 4, 2017, at 02:29, INADA Naoki  wrote:
> 
> INADA Naoki  added the comment:
> 
>> Yet one way -- make re.ASCII a local flag. Than we could just change the 
>> idpattern attribute to r'(?a:[_a-z][_a-z0-9]*)', without touching the flags 
>> attribute.
> 
> https://docs.python.org/3.6/library/re.html#regular-expression-syntax says
> `(?imsx-imsx:...)`
> 
> Anyway, Template.idpattern is documented public API too.

Too bad, because I like that approach.  How hard would it be to add support for 
‘a’ as a local flag?  (I’m kind of surprised that it isn’t already supported - 
it seems like it would be useful.)

I would like this better than hacking Template.flags after the fact.  It seems 
like the right way to align the code with the documentation (i.e. fix a bug).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-04 Thread INADA Naoki

INADA Naoki  added the comment:

Current pull request override `Template.flags = re.I` after class creation for 
backward compatibility, without any API change.

But I'm not sure it's right approach.
How many people who subclass string.Template expect non-ASCII match?
If this change is bugfix for 99% of subclasses, why should we keep pitfall?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-04 Thread INADA Naoki

INADA Naoki  added the comment:

> Yet one way -- make re.ASCII a local flag. Than we could just change the 
> idpattern attribute to r'(?a:[_a-z][_a-z0-9]*)', without touching the flags 
> attribute.

https://docs.python.org/3.6/library/re.html#regular-expression-syntax says
`(?imsx-imsx:...)`

Anyway, Template.idpattern is documented public API too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-04 Thread INADA Naoki

INADA Naoki  added the comment:

> When optimizing, please don't make API changes.

This is not only optimization, but bugfix.

Document of string.Template says:

> By default, "identifier" is restricted to any case-insensitive ASCII  
> alphanumeric string (including underscores) that starts with an underscore or 
> ASCII letter.

So, missing re.ASCII flag is bug because non-ASCII alphabet can be matched.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-03 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Yet one way -- make re.ASCII a local flag. Than we could just change the 
idpattern attribute to r'(?a:[_a-z][_a-z0-9]*)', without touching the flags 
attribute.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-03 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

When optimizing, please don't make API changes.

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-03 Thread INADA Naoki

Change by INADA Naoki :


--
stage:  -> patch review
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-03 Thread INADA Naoki

INADA Naoki  added the comment:

> This means if someone does subclass string.Template and changes the pattern 
> to accept Unicode identifiers, then with this change they will also have to 
> modify flags, whereas before they didn't.

Thank you for pointing it out.
I removed re.A flag after compilation.

> OTOH, making the change for performance reasons might be questionable, given 
> that the regular expressions are compiled by the Template's metaclass, so 
> unlikely to contribute significantly to overall performance wins.

original: import time:  2310 |   9589 | string
patched:  import time:  1420 |   8702 | string

We can save about 900 us.

--
stage: patch review -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-03 Thread INADA Naoki

Change by INADA Naoki :


--
keywords: +patch
pull_requests: +3851
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-03 Thread Barry A. Warsaw

Barry A. Warsaw  added the comment:

Technically it *is* an API change since `flags` is a part of the public API.  
The documentation says:

$identifier names a substitution placeholder matching a mapping key of 
"identifier". By default, "identifier" is restricted to any case-
insensitive ASCII alphanumeric string (including underscores) that starts 
with an underscore or ASCII letter. The first non-identifier character 
after the $ character terminates this placeholder specification.

This means if someone does subclass string.Template and changes the pattern to 
accept Unicode identifiers, then with this change they will also have to modify 
flags, whereas before they didn't.

It really wasn't ever the intention to allow non-ASCII identifiers, so this is 
probably safe in practice.  OTOH, making the change for performance reasons 
might be questionable, given that the regular expressions are compiled by the 
Template's metaclass, so unlikely to contribute significantly to overall 
performance wins.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-03 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
nosy: +barry, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31672] string.Template should use re.ASCII flag

2017-10-03 Thread INADA Naoki

New submission from INADA Naoki :

Currently, strings.Template uses re.IGNORECASE without re.ASCII:

idpattern = r'[_a-z][_a-z0-9]*'
flags = _re.IGNORECASE

[a-z] matches against 'ı' (0x131, DOTLESS I) and 'ſ' (0x17f, LONG S).
It is not intentional, and it makes re.compile slower.

--
components: Regular Expressions
messages: 303596
nosy: ezio.melotti, inada.naoki, mrabarnett
priority: normal
severity: normal
status: open
title: string.Template should use re.ASCII flag
versions: Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com