Re: [lazarus] UTF-8 vs UTF-16 support

Luca Olivetti Mon, 08 Oct 2007 00:41:26 -0700

En/na Marco Ciampa ha escrit:

On Fri, Oct 05, 2007 at 01:14:23PM +0200, Luca Olivetti wrote:
En/na [EMAIL PROTECTED] ha escrit:
* WideString allows indexed "[]" accessing individual chars.
This does not seem to be correct. I read that utf16 can be 4 byte long..Then calculation is needed sometimes...
Unless you're dealing with klingon and ancient languages,
Like Chinese? Just a billion people use it...not a real problem at all...
:-\


I (wrongly) thought that chines was in the bmp :-(

I think you can assume that for 99.99% of currently spoken languages every
character will be exactly 2 bytes long.
Wrong as I said before.
There's a risk of having some character with more that 2 bytes but it isa small risk.With utf-8 the risk is bigger, so you have always to traversethe string if you need access to a specific character index.
You have to go through the string for UTF-8 and UTF-16 encodingsso the advantages are at least questionable...

Yes, but my (wrong) premise is that you could assume all characters are2 bytes wide, so the Nth character would be at N*2 byte.


Bye
--
Luca Olivetti
Wetron Automatización S.A. http://www.wetron.es/
Tel. +34 93 5883004      Fax +34 93 5883007

_________________________________________________________________
    To unsubscribe: mail [EMAIL PROTECTED] with
               "unsubscribe" as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives

Re: [lazarus] UTF-8 vs UTF-16 support

Reply via email to