Re: Unicode regex and Hindi language

Terry Reedy Fri, 28 Nov 2008 13:16:06 -0800

MRAB wrote:

Should the Mc and Mn codepoints match \w in the re module even thoughu'हिन्दी'.isalpha() returns False (in Python 2.x, haven't tried Python3.x)?

Same. And to me, that is wrong. The condensation of vowel characters(which Hindi, etc, also have for words that begin with vowels) to 'vowelmarks' attached to the previous consonant does change their nature asindications of speech sounds. The difference is purely graphical.

Issue 1693050 said no.

The full url
http://bugs.python.org/issue1693050

would have been nice, but thank you for finding this. I search butobviously not with the right word. In any case, this issue is stillopen. MAL is wrong about at least Mc and Mn. I will explain there also.


> Perhaps someone with knowledge of Hindi

could suggest how Python should handle it.


Recognize that vowel are parts of words, as it already does for identifiers.

I wouldn't want the re moduleto say one thing and the rest of the language to say another! :-)


I will add a note about .isapha

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode regex and Hindi language

Reply via email to