Re: [r6rs-discuss] Strings

MichaelL Mon, 26 Mar 2007 07:31:07 -0800

> > "Important: Supplementary code points must be supported for full 
Unicode 
> > support, regardless of the encoding form.
> 
> That's the theory. But UTF-16 is strictly less convenient than UTF-32,
> which means that a lot of code working in terms of UTF-16 doesn't bother
> to support supplementary code points.


>From Wikipedia:

"Unfortunately using UTF-16 makes characters outside the Basic 
Multilingual Plane a special case which increases the risk of oversights 
related to their handling. That said, programs that mishandle surrogate 
pairs probably also have problems with combining sequences, so using 
UTF-32 is unlikely to solve the more general problem of poor handling of 
multi-code-unit characters."

> The only advantage of UTF-16 over UTF-32 is memory usage, and data
> exchange with those who already use UTF-16. *Nothing* in UTF-16 is more
> convenient or simpler than UTF-32, it's an additional complexity layer.

"The only advantage of fixnums over bignums is [performance and] memory 
usage, and data exchange with those who already use fixnums. *Nothing* in 
fixnums is more convenient or simpler than bignums, it's an additional 
complexity layer."

> > But I'll tell you what. Find a document, written by someone with 
> > substantial Unicode experience, that recommends UTF-32 as the best 
overall 
> > in-memory encoding.

I don't agree with everything you said, but more to the point none of it 
related to the question I asked: can you find a single document written by 
a Unicode expert that recommends UTF-32? Every such document I can find 
recommends UTF-16 as the best overall encoding, with UTF-8 a second choice 
(based on expected usage). UTF-32 is always the third choice and it always 
has the caveat "if space doesn't matter."


_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Strings

Reply via email to