Re: [r6rs-discuss] perhaps i should be formal, but....

MichaelL Thu, 15 Mar 2007 07:14:06 -0800

> If string-ref also required O(1) time complexity, then you'd be right.
> But it doesn't; it's perfectly fine to implement string-ref on top of
> underlying UTF-8 or UTF-16 character sequences; you just have to settle
> for O(N) performance.


Are you suggesting that indexes represent code points rather than code 
units? I haven't seen anyone do that, not as the one-and-only interface to 
elements of a string. Have you? And do you think UTF-8/UTF-16 
implementations should be *required* to do that? (Obviously, then, 
string-length would have to return the number of code points rather than 
the number of code units.)

It's interesting that you bring up the point about O(1) complexity. That 
is, of course, the assumption people currently make. If the assumption 
wasn't justified that should probably be made formal, since it would 
affect the way people write string processing algorithms.

Note: Perhaps a solution is to have two variants of the procs, one for 
code points and one for code units. The code units variants would 
guarantee O(1) and the code point ones wouldn't.

> Python has been suffering through that for several years now, and has
> decided to break backward compatibility and abandon the 8-bit strings --
> but using the 8-bit names for Unicode strings.  I don't know what the
> internal implementation is.

John, I can't find any support for that, at least not among the developer 
mailing list summaries at http://www.python.org/dev/summary/ nor among the 
Python Enhancement Proposals (PEPs) at http://www.python.org/dev/peps/. 
Here are the Unicode-related ones that I *could* find:

        Python Unicode Integration [Final]
        http://www.python.org/dev/peps/pep-0100/

        Support for "wide" Unicode characters [Final]
        http://www.python.org/dev/peps/pep-0261/

        Unicode file name support for Windows NT [Final]
        http://www.python.org/dev/peps/pep-0277/

        Byte vectors and String/Unicode Unification [Rejected]
        http://www.python.org/dev/peps/pep-0332/

        Allow str() to return unicode strings [Deferred]
        http://www.python.org/dev/peps/pep-0349/

Note that the last PEP, dated August 2005, references Python 2.5 and is 
deferred. Here is what the Rationale text says:

        Python has had a Unicode string type for some time now but use of
        it is not yet widespread.  There is a large amount of Python code
        that assumes that string data is represented as str instances.
        The long term plan for Python is to phase out the str type and use
        unicode for all string data.  Clearly, a smooth migration path
        must be provided.

The PEP is old, but it's still deferred.

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] perhaps i should be formal, but....

Reply via email to