Terry J. Reedy <[EMAIL PROTECTED]> added the comment: Vowel 'marks' are condensed vowel characters and are very much part of words and do not separate words. Python3 properly includes Mn and Mc as identifier characters.
http://docs.python.org/dev/3.0/reference/lexical_analysis.html#identifiers-and-keywords For instance, the word 'hindi' has 3 consonants 'h', 'n', 'd', 2 vowels 'i' and 'ii' (long i) following 'h' and 'd', and a null vowel (virama) after 'n'. [The null vowel is needed because no vowel mark indicates the default vowel short a. So without it, the word would be hinadii.] The difference between the devanagari vowel characters, used at the beginning of words, and the vowel marks, used thereafter, is purely graphical and not phonological. In short, in the sanskrit family, word = syllable+ syllable = vowel | consonant + vowel mark From a clp post asking why re does not see hindi as a word: हिन्दी ह DEVANAGARI LETTER HA (Lo) ि DEVANAGARI VOWEL SIGN I (Mc) न DEVANAGARI LETTER NA (Lo) ् DEVANAGARI SIGN VIRAMA (Mn) द DEVANAGARI LETTER DA (Lo) ी DEVANAGARI VOWEL SIGN II (Mc) .isapha and possibly other unicode methods need fixing also >>> 'हिन्दी'.isalpha()#2.x and 3.0 False ---------- nosy: +tjreedy versions: +Python 3.1 _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1693050> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com