On Sat, 2007-12-01 at 16:16 +0100, Tinco Andringa wrote: > Speaking of unicode/utf-16 and memory, couldn't a lot of > memory be spared if strings where stored as utf8 internally, > which would be converted back to utf-16 when more than 256 > different characters would be used?
It depends on what you're storing in the strings. If you're only storing ASCII or Western European characters in your strings, then yes, UTF-8 would require less memory. If, on the other hand, you're storing Asian language text (Japanese, Chinese, Korean), or anything else containing any character >= U+0800 (i.e. > 97%+ of all potential characters), then UTF-8 is a space *loss*, not a gain, as each glyph would require at least 3 bytes to store, while UTF-16 would need 2. See also: http://blogs.msdn.com/michkap/archive/2005/05/20/420317.aspx http://blogs.msdn.com/michkap/archive/2005/05/22/420822.aspx http://blogs.msdn.com/michkap/archive/2005/05/25/421828.aspx Because of this, it is not uncommon for Linux apps to use UTF-16 internally, e.g. Mozilla, Qt, Python (which iirc has a configure-time command to control use of UTF-16 vs. UTF-32 strings), etc. - Jon _______________________________________________ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list