Re: [lazarus] UTF-8 vs UTF-16 support

Mattias Gaertner Fri, 05 Oct 2007 03:15:41 -0700

On Fri, 5 Oct 2007 10:45:18 +0200
ik <[EMAIL PROTECTED]> wrote:

> On 10/5/07, Mattias Gaertner <[EMAIL PROTECTED]> wrote:
> > On Fri, 05 Oct 2007 16:00:41 +0800
> > Paul Ishenin <[EMAIL PROTECTED]> wrote:
> >
> > > Graeme Geldenhuys wrote:
> > > > Does this mean UTF-8 was chosen only because it is more
> > > > compatible with existing pascal programs?  Any other reasons?
> > > >
> > >
> > > Is UTF-16 cover all languages? As I know it have problems with
> > > Chinese and/or Japanese languages. While utf-8 doesnot have such
> > > problems. More over most software uses English as default
> > > language. UTF-8 encoded English words are still the same as
> > > non-encoded English words.
> > >
> > > Btw, I dont know other advantages.
> >
> > UTF-8, UTF-16 and UTF-32 are just different encodings for the same
> > unicode characterset.
> >
> > UTF-16 is often confused with UCS-2, which is indeed only 2-byte
> > characters and has the widestring advantage (length=#words). But
> > for the price, that it does not support all characters. That's why
> > M$ switched from UCS-2 to UTF-16 keeping the W functions, which may
> > be one of the main reasons for the confusion.
> 
> As far as I know the Unicode organization no longer support in UCS-2
> and recommend that any implementation of such encoding will be used as
> UTF-16.
> 
> Another issue, is that on UTF-8 I think that some of the languages
> such as Korean and Japanese does not include all of the symbols it
> requires, but I'm not sure.
> 
> I believe that all the encoding should be supported, and be used
> according to the way that the developers of the software will decide
> rather then to "force" them in choosing specific encoding.


For compatibility, complexity and usability reasons the LCL should use
only one encoding. For example TControl.Caption is a string on all
platforms. There will be no CaptionW or CaptionA or CaptionUTF32,
because this would be more confusing than it would help. Of course
FPC/Laz provides converter functions for those prefering widestring or
UTF-16 or UTF-32.
The LCL are visual components, so the speed cost of converting the
strings is hardly measurable against the cost of drawing the unicode
characters on the screen. OTOH it can matter if you often traverse a
tree with ten thousand nodes. 
Looking at the lazarus code the LCL encoding of UTF-8 was a
good choice, because the multibyte encoding is only important in
synedit and the LCL interfaces. With UTF-16 additional conversions
would be needed for all text file operations including codetools, which
would slow down a lot.


Mattias

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Re: [lazarus] UTF-8 vs UTF-16 support

Reply via email to