> If string-ref also required O(1) time complexity, then you'd be right. > But it doesn't; it's perfectly fine to implement string-ref on top of > underlying UTF-8 or UTF-16 character sequences; you just have to settle > for O(N) performance.
Are you suggesting that indexes represent code points rather than code units? I haven't seen anyone do that, not as the one-and-only interface to elements of a string. Have you? And do you think UTF-8/UTF-16 implementations should be *required* to do that? (Obviously, then, string-length would have to return the number of code points rather than the number of code units.) It's interesting that you bring up the point about O(1) complexity. That is, of course, the assumption people currently make. If the assumption wasn't justified that should probably be made formal, since it would affect the way people write string processing algorithms. Note: Perhaps a solution is to have two variants of the procs, one for code points and one for code units. The code units variants would guarantee O(1) and the code point ones wouldn't. > Python has been suffering through that for several years now, and has > decided to break backward compatibility and abandon the 8-bit strings -- > but using the 8-bit names for Unicode strings. I don't know what the > internal implementation is. John, I can't find any support for that, at least not among the developer mailing list summaries at http://www.python.org/dev/summary/ nor among the Python Enhancement Proposals (PEPs) at http://www.python.org/dev/peps/. Here are the Unicode-related ones that I *could* find: Python Unicode Integration [Final] http://www.python.org/dev/peps/pep-0100/ Support for "wide" Unicode characters [Final] http://www.python.org/dev/peps/pep-0261/ Unicode file name support for Windows NT [Final] http://www.python.org/dev/peps/pep-0277/ Byte vectors and String/Unicode Unification [Rejected] http://www.python.org/dev/peps/pep-0332/ Allow str() to return unicode strings [Deferred] http://www.python.org/dev/peps/pep-0349/ Note that the last PEP, dated August 2005, references Python 2.5 and is deferred. Here is what the Rationale text says: Python has had a Unicode string type for some time now but use of it is not yet widespread. There is a large amount of Python code that assumes that string data is represented as str instances. The long term plan for Python is to phase out the str type and use unicode for all string data. Clearly, a smooth migration path must be provided. The PEP is old, but it's still deferred. _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
