Graeme Geldenhuys schrieb:
On 14/09/2011 17:02, Hans-Peter Diettrich wrote:
Many  users still  want simple  string handling,  with direct  mapping
between logical and physical chars (SBCS). This is not possible at all
with UTF-8, while UTF-16 works fine with the BMP, at least.

What rubbish! The only "utf-8 limit" is  that the current FPC and Delphi
RTL's  don't cater  for it  due  to the  legacy ANSI  support that  came
before.

What data type would you use, to store an UTF-8 character?
And how to access the n-th character in an UTF-8 string?
...


(platform  dependent) RTL  conventions,  but it  affects the  standard
components (string lists...)  in the FCL, and the  other components in
the LCL.

Please give a concrete example  where using platfrom dependent encodings
(eg: UnicodeString  =  UTF-8  on  Linux, but  UTF-16  on  Windows)  will
cause  problems? I really  cannot see  any issues  here, only  positives
like  better  performance   for  each  platform  due  to   no  need  for
auto-conversions.

As already pointed out, string encoding conversions between application and widgets are rare, consequently performance depends more on string handling in application code. Now the new Delphi string types, with automatic conversion when required, can cause a slowdown. In FPC character-based access to strings also can cause a slowdown (iterators...).

When a multi-platform application must be aware of possible UTF-8 strings, depending on the platform, the code must be MBCS aware. This again is complicated string handling, when otherwise immediate indexed access is possible :-(


Here again  the average user  will prefer UTF-16  component libraries,
compatible  with his  own code,  while more  experienced users  may be
happier with the current UTF-8 libraries.

What the  hell has "experience"  got to  do with the  preference between
UTF-8  and UTF-16? To  the developer  (and more  so to  the end-user)  a
Unicode string should  act like any other  Unicode string. What encoding
is used to represent "hello world" shouldn't even come into question.

This applies only to constant string literals, where the user never has to care for string encoding and conversion.


English (ASCII)  users also may prefer  UTF-8, as long as  they do not
have to (or want to) deal with strings in foreign languages.

Rubbish  once again! Our  applications  use UTF-8,  I  have no  problems
writing application that support multiple  foreign language - as long as
those  languages are  left-to-right (I  don't understand  RTL languages,
so  can't  comment).

You better should understand ;-)

RTL is a mere *display* feature, the chars still are stored from first to last. More important is the SBCS/MBCS difference, which must be reflected in user code. Even if *you* have no problems with MBCS (like UTF-8), other users have.

DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to