Jason Orendorff wrote: > And most (but not all) Unicode string implementations use UTF-16. > Among languages and libraries that are very widely used, the majority > is overwhelming: Java, Microsoft's CLR, Python, JavaScript, Qt, > Xerces-C, and on and on.
(...and Windows and Mac and IBM's ICU and PHP 6 and...) > Higher-level APIs are a fine approach. > > The other solution is to standardize the implementation, so that the > efficient algorithms don't differ. I want to push this seriously one > last time: Unicode strings have been kicked around for a while now, > and despite Will's link, real-world implementations do not vary much. > I don't think it's premature to standardize. I started looking into these issues a while ago when we were faced with internationalizing an app. (The app runs on several platforms and under several web servers.) Before learning about what's out there I would have wanted to keep my options open; knowing what I know now I'd agree with Jason. It would make sense to standardize on UTF-16 strings and UTF-32 characters. (Note, btw, that that doesn't preclude UTF-8 strings. It just means that the built-in string type would be UTF-16.) On a different note, I find this desire to shield programmers from code units odd and senseless. If R6RS intends Scheme to be a higher-level language that abstracts away representation issues why is it adding fixnums and flonums? Why do bytevectors have operations that get and set singles and doubles? _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
