> > For a number of languages, the UTF-8 representation saves some > > storage when compared with UTF-16, but for Asian characters UTF-8 > > requires 50% more storage than UTF-16. > > Yes, it does. And for English and German UTF-16 requires 100% more > storage than UTF-8. You can use SCSU to compress your data. It works with short strings also (which is not true for generic compression algorithms like LZW). The Technical Report #6 (http://www.unicode.org/unicode/reports/tr6/) gives the following examples: UTF-16 German: 9 chars (18 Bytes) -> SCSU 9 Bytes Russian: 6 chars (12 Bytes) -> 7 Bytes Japanese: 116 chars (232 Bytes) -> 178 Bytes Werner - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
- Re: Proposal for 2 Byte Unicode implementation i... Andrew Cunningham
- Re: Proposal for 2 Byte Unicode implementat... Edmund GRIMLEY EVANS
- Re: Proposal for 2 Byte Unicode implementat... Jamie Lokier
- Re: Proposal for 2 Byte Unicode impleme... Jean-Marc Desperrier
- Re: Proposal for 2 Byte Unicode implementat... Marcin 'Qrczak' Kowalczyk
- Re: Proposal for 2 Byte Unicode implementat... Bruno Haible
- Re: Proposal for 2 Byte Unicode impleme... Werner LEMBERG
- Re: Proposal for 2 Byte Unicode implementat... Brink, Ulrich
- Re: Proposal for 2 Byte Unicode impleme... Jamie Lokier
- Re: Proposal for 2 Byte Unicode implementat... Marcin 'Qrczak' Kowalczyk
- Re: Proposal for 2 Byte Unicode implementat... Markus Kuhn
- Re: Proposal for 2 Byte Unicode implementat... Florian Weimer
- Re: Proposal for 2 Byte Unicode implementat... Ulrich Drepper
- Re: Proposal for 2 Byte Unicode implementat... Markus Kuhn
- Re: Proposal for 2 Byte Unicode implementat... Robert Dewar
- Re: Proposal for 2 Byte Unicode implementat... Henry Spencer
- Re: Proposal for 2 Byte Unicode implementat... Florian Weimer
