[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2021-05-26 Thread Antoine Pitrou
Change by Antoine Pitrou : -- stage: test needed -> needs patch versions: +Python 3.11 -Python 3.6, Python 3.7, Python 3.8 ___ Python tracker ___

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2020-02-03 Thread STINNER Victor
Change by STINNER Victor : -- nosy: -vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2020-01-31 Thread Terry J. Reedy
Change by Terry J. Reedy : -- assignee: docs@python -> components: +Unicode -Documentation nosy: +benjamin.peterson, lemburg, serhiy.storchaka ___ Python tracker ___

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2020-01-31 Thread Henry S. Thompson
Henry S. Thompson added the comment: [One year and 2 days later... :-[ Is this fixed in 3.9? If not, the Versions list above should be updated. The failure of lower() to preserve 'alpha-ness' is a serious bug, it causes significant failures in e.g. Turkish NLP, and it's _not_ just a

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2019-09-07 Thread Justin Arthur
Change by Justin Arthur : -- nosy: +JustinTArthur ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2019-01-29 Thread Henry S. Thompson
Henry S. Thompson added the comment: This issue is also implicated in a failure of isalpha and friends. Easy way to see this is to compare >>> isalpha('İ') True >>> isalpha('İ'.lower()) False This results from the use of a combining character to encode lower-case Turkish dotted i: >>>

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2018-03-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: Whatever I may have said before, I favor supporting the Unicode standard for \w, which is related to the standard for identifiers. This is one of 2 issues about \w being defined too narrowly. I am somewhat arbitrarily closing #1693050 as a

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2013-07-10 Thread Terry J. Reedy
Changes by Terry J. Reedy tjre...@udel.edu: -- versions: +Python 3.4 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___ ___

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-09-29 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: The failing re tests after PEP 393 are: FAIL lib re found non alphanumeric string 'cafe' FAIL lib re found non alphanumeric string 'Ⓚ' FAIL lib re found non alphanumeric string '' FAIL lib re found non alphanumeric string '' FAIL lib re

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-28 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Or the re module should be *replaced* by the code from the regex module (but renamed to re, and with certain backwards compatibilities restored, probably). This is what I meant. But I really hope the re module (really: the _sre

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-28 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: But I really hope the re module (really: the _sre extension module) can be fixed. If you mean on 2.7/3.2, then I guess we could extract the fixes from regex, but we have to see if it's doable and someone will have to do it. Also

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-28 Thread Guido van Rossum
Guido van Rossum gu...@python.org added the comment: [me] But I really hope the re module (really: the _sre extension module) can be fixed. [Ezio] Start fixing these issues from scratch doesn't make much sense IMHO.  We could extract the fixes from regex and merge them in re, but then

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-28 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Ideally, we need a Unicode czar -- a core developer whose job it is to keep track of Python's compliance with various parts and versions of the Unicode standard and who can nudge other developers towards fixing bugs or implementing

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-26 Thread Guido van Rossum
Guido van Rossum gu...@python.org added the comment: Really? The re module cannot be salvaged and we should add regex but keep the (buggy) re? That does not make a lot of sense to me. I think it should just be fixed in the re module. Or the re module should be *replaced* by the code from

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-15 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: If the regex module works fine here, I think it's better to leave the re module alone and include the regex module in 3.3. -- ___ Python tracker rep...@bugs.python.org

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-13 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___ ___ Python-bugs-list mailing

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-13 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: However, because the \wc issues are bigger, Java addressed the tr18 RL1.2a issues differently, this time by creating a new compilation flag called UNICODE_CHARACTER_CLASSES (with corresponding embedded (?U) regex flag.) Truth be told, even

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-12 Thread Arfrever Frehtes Taifersar Arahesis
Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com: -- nosy: +Arfrever ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-12 Thread Terry J. Reedy
Terry J. Reedy tjre...@udel.edu added the comment: However desireable it would be, I do not believe there is any claim in the manual that the re module follows the evolving Unicode consortium r.e. standard. If I understand, you are saying that this statement in the doc, Matches Unicode word

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-12 Thread Tom Christiansen
Tom Christiansen tchr...@perl.com added the comment: Terry J. Reedy tjre...@udel.edu added the comment: However desireable it would be, I do not believe there is any claim in the = manual that the re module follows the evolving Unicode consortium r.e. stan= My from the hip thought is that

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-12 Thread Matthew Barnett
Changes by Matthew Barnett pyt...@mrabarnett.plus.com: -- nosy: +mrabarnett ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___ ___

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-11 Thread Tom Christiansen
New submission from Tom Christiansen tchr...@perl.com: You cannot use Python's lib re for handling Unicode regular expressions because it violates the standard set out for the same in UTS#18 on Unicode Regular Expressions in RL1.2a on compatibility properties. What \w is allowed to match is

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-11 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___ ___