[issue10703] Regex 0.1.20101210

2010-12-15 Thread Steve Moran

Steve Moran s...@uw.edu added the comment:

(Forehead slap.)

On Tue, 14 Dec 2010, Matthew Barnett wrote:


 Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

 The regex module is intended to replace the re module, so its default 
 behaviour is the same: in Python 2, regexes default to matching ASCII, and in 
 Python 3, they default to matching Unicode.

 If you want to use a regex on a Unicode string in Python 2 then you need to 
 set the Unicode flag, either by providing the UNICODE flag or by putting 
 (?u) in the regex itself.

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue10703
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10703
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10703] Regex 0.1.20101210

2010-12-14 Thread Steve Moran

New submission from Steve Moran s...@uw.edu:

The regex package doesn't seem to correctly implement the single grapheme match 
\X (\P{M}\p{M}*) for pre-Python 3. I'm using the string íi-te (i, U+0301, 
i, -, t, e -- where U+0301 is Unicode COMBINING ACUTE ACCENT), reading it in 
from a file to bypass Unicode cp issues in the older IDLEs). 


s...@x$ python3.1
Python 3.1.2 (r312:79147, May 19 2010, 11:50:28) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type help, copyright, credits or license for more information.
 import regex
 file = open(test_data, rt, encoding=utf-8)
 s = file.readline()
 print (s)
íi-te
 print (g.findall(s))
['í', 'i', '-', 't', 'e']

* Correct in 3.1 - i+U+0301 considered one grapheme.

s...@x$ python2.7
Python 2.7 (r27:82500, Oct  4 2010, 14:49:53) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type help, copyright, credits or license for more information.
 import codecs
 import regex
 file = codecs.open(test_data, r, utf-8)
 g = regex.compile(\X)
 s = file.readline()
 s
u'i\u0301i-te'
 print s.encode(utf-8)
íi-te
 print g.findall(s)
[u'i', u'\u0301', u'i', u'-', u't', u'e']

*Not correct -- accent is treated as a separate character.

Thanks.

--
components: Regular Expressions
messages: 123961
nosy: stiv
priority: normal
severity: normal
status: open
title: Regex 0.1.20101210
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10703
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10703] Regex 0.1.20101210

2010-12-14 Thread Alexander Belopolsky

Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Regex 0.1.20101210 is not part of the standard Python distribution, so this bug 
report is invalid.

--
nosy: +belopolsky
resolution:  - invalid
status: open - closed
superseder:  - Regexp 2.7 (modifications to current re 2.2.2)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10703
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10703] Regex 0.1.20101210

2010-12-14 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
assignee:  - mark.dickinson
nosy: +mark.dickinson, mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10703
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10703] Regex 0.1.20101210

2010-12-14 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

The regex module is intended to replace the re module, so its default behaviour 
is the same: in Python 2, regexes default to matching ASCII, and in Python 3, 
they default to matching Unicode.

If you want to use a regex on a Unicode string in Python 2 then you need to set 
the Unicode flag, either by providing the UNICODE flag or by putting (?u) in 
the regex itself.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10703
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10703] Regex 0.1.20101210

2010-12-14 Thread R. David Murray

Changes by R. David Murray rdmur...@bitdance.com:


--
assignee: mark.dickinson - 
nosy:  -mark.dickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10703
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com