Re: Proposal for 2 Byte Unicode implementation in gcc and glibc

Werner LEMBERG Fri, 04 Aug 2000 18:50:00 -0700


> > For a number of languages, the UTF-8 representation saves some
> > storage when compared with UTF-16, but for Asian characters UTF-8
> > requires 50% more storage than UTF-16.
> 
> Yes, it does. And for English and German UTF-16 requires 100% more
> storage than UTF-8.

You can use SCSU to compress your data.  It works with short strings
also (which is not true for generic compression algorithms like LZW).
The Technical Report #6 (http://www.unicode.org/unicode/reports/tr6/)
gives the following examples:

  UTF-16 German:     9 chars  (18 Bytes)   -> SCSU   9 Bytes
         Russian:    6 chars  (12 Bytes)   ->        7 Bytes
         Japanese: 116 chars (232 Bytes)   ->      178 Bytes


    Werner
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Re: Proposal for 2 Byte Unicode implementation in gcc and glibc

Reply via email to