On 4/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > On 4/18/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > On 4/18/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > > > > Today, string.letters works most easily with ASCII supersets, and is > > > effectively limited to 8-bit encodings. Once everything is unicode, I > > > don't think that 8-bit restriction should apply any more. > > > But we already went over this. There are over 40K letters in Unicode. > > It simply makes no sense to have a string.letters approaching that > > size. > > Agreed. But there aren't 40K (alphabetic) letters in any particular > locale. Most individual languages will have less than 100.
Isn't that excluding the written language of half the world population (at least China, Korea and Japan)? > As a proxy for measuring "local" characters, I'll note that during > some optimization drives for Pango (e.g., > http://primates.ximian.com/~federico/news-2005-11.html#04 ) it turned > out that there were only two non C-J-K languages that needed more than > 256 cache positions in their character glyph tables. But here we're talking features, not optimizations. I really don't think it's a good idea to propose a feature that can't be used reasonably for CJK languages. > > > Unless I missed it (and I may have), unicode itself sort of ducks the > > > question about how to sort strings. Python really needs to provide > > > *an* answer, but I'm not sure it is possible to provide the (single) > > > correct answer. > > > The Unicode standard certainly has a solution, but it is complicated > > and I don't believe it is currently implemented in core Python. > > I guess you're right; I saw too many alternatives the last time I > looked, and must have stopped reading http://unicode.org/reports/tr10/ > after section 1, where it becomes obvious that there is no > context-free right answer. > > > > string.letters is one workaround, and I don't think we should remove > > > it until a better solution (or workaround) is available. > > > I disagree. The correct solution is to implement the Unicode support > > for locale-specific sorting. > > And set-inclusion. For set-inclusion we already have isalpha() etc. That should be enough. I really don't see much of a use case for inquiries of the type "is this a letter in my locale" -- by the time you are doing that, you probably are only thinking of one specific locale, and then you should just reject non-locale charaters altogether rather than treating them as punctuation. > I'm not convinced that waiting for such a heavyweight solution is > really the best choice, particularly since the spec itself warns > against using the strictest forms (too inefficient). > > > Remember that the locale module supports only a single, global locale > > at a time. This renders it totally useless in many apps requiring > > locale support (such as web servers). > > Fair enough. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
