Vlastimil Brom <vlastimil.b...@gmail.com> added the comment:

I just noticed a cornercase with the newly introduced grapheme matcher \X, if 
this is used in the character set:

>>> regex.findall("\X", "abc")
['a', 'b', 'c']
>>> regex.findall("[\X]", "abc")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "regex.pyc", line 218, in findall
  File "regex.pyc", line 1435, in _compile
  File "regex.pyc", line 2351, in optimise
  File "regex.pyc", line 2705, in optimise
  File "regex.pyc", line 2798, in optimise
  File "regex.pyc", line 2268, in __hash__
AttributeError: '_Sequence' object has no attribute '_key'

It obviously doesn't make much sense to use this universal literal in the 
character class (the same with "." in its metacharacter role) and also 
http://www.regular-expressions.info/refunicode.html doesn't mention this 
possibility; but the error message might probably be more descriptive, or the 
pattern might match "X" or "\" and "\X" (?)

I was originally thinking about the possibility to combine the positive and 
negative character classes, where e.g. \X would be a kind of base; I am not 
aware of any re engine supporting this, but I eventually found an unicode 
guidelines for regular expressions, which also covers this:

http://unicode.org/reports/tr18/#Subtraction_and_Intersection

It also surprises a bit, that these are all included in
Basic Unicode Support: Level 1; (even with arbitrary unions, intersections, 
differences ...) it suggests, that there is probably no implementation 
available (AFAIK) - even on this basic level, according to this guideline.

Among other features on this level, the section
http://unicode.org/reports/tr18/#Supplementary_Characters
seems useful, especially the handling of the characters beyond \uffff, also in 
the form of surrogate pairs as single characters.

This might be useful on the narrow python builds, but it is possible, that 
there would be be an incompatibility with the handling of these data in 
"narrow" python itself.

Just some suggestions or rather remarks, as you already implemented many 
advanced features and are also considering some different approaches ...:-)

vbr

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue2636>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to