MRAB wrote:

Should the Mc and Mn codepoints match \w in the re module even though u'हिन्दी'.isalpha() returns False (in Python 2.x, haven't tried Python 3.x)?

Same. And to me, that is wrong. The condensation of vowel characters (which Hindi, etc, also have for words that begin with vowels) to 'vowel marks' attached to the previous consonant does change their nature as indications of speech sounds. The difference is purely graphical.


Issue 1693050 said no.
The full url
http://bugs.python.org/issue1693050
would have been nice, but thank you for finding this. I search but obviously not with the right word. In any case, this issue is still open. MAL is wrong about at least Mc and Mn. I will explain there also.

> Perhaps someone with knowledge of Hindi
could suggest how Python should handle it.

Recognize that vowel are parts of words, as it already does for identifiers.

I wouldn't want the re module to say one thing and the rest of the language to say another! :-)

I will add a note about .isapha

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to