John Machin wrote:

John, nothing I wrote was directed at you. If you feel insulted, you have my apology. My intention was and is to get future movement on an issue that was reported 20 months ago but which has lain dead since, until re-reported (a bit more clearly) a week ago, because of a misunderstanding by the person who (I believe) rewrote re for unicode several years ago.

Like this:

| >>> w1 = u"L\N{LATIN SMALL LETTER O WITH DIAERESIS}wis"
| >>> w2 = u"Lo\N{COMBINING DIAERESIS}wis"
| >>> w1
| u'L\xf6wis'
| >>> w2
| u'Lo\u0308wis'
| >>> import unicodedats as ucd
| >>> ucd.category(u'\u0308')
| 'Mn'
| >>> u'\u0308'.isalpha()
| False
| >>> regex = re.compile(ur'\w+', re.UNICODE)
| >>> regex.match(w1).group(0)
| u'L\xf6wis'
| >>> regex.match(w2).group(0)
| u'Lo'

Yes, thank you.  FWIW, that confirms my suspicion.

Terry

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to