Graeme Geldenhuys schrieb:
On 14/09/2011 17:02, Hans-Peter Diettrich wrote:
Many users still want simple string handling, with direct mapping
between logical and physical chars (SBCS). This is not possible at all
with UTF-8, while UTF-16 works fine with the BMP, at least.
What rubbish! The only "utf-8 limit" is that the current FPC and Delphi
RTL's don't cater for it due to the legacy ANSI support that came
before.
What data type would you use, to store an UTF-8 character?
And how to access the n-th character in an UTF-8 string?
...
(platform dependent) RTL conventions, but it affects the standard
components (string lists...) in the FCL, and the other components in
the LCL.
Please give a concrete example where using platfrom dependent encodings
(eg: UnicodeString = UTF-8 on Linux, but UTF-16 on Windows) will
cause problems? I really cannot see any issues here, only positives
like better performance for each platform due to no need for
auto-conversions.
As already pointed out, string encoding conversions between application
and widgets are rare, consequently performance depends more on string
handling in application code. Now the new Delphi string types, with
automatic conversion when required, can cause a slowdown. In FPC
character-based access to strings also can cause a slowdown (iterators...).
When a multi-platform application must be aware of possible UTF-8
strings, depending on the platform, the code must be MBCS aware. This
again is complicated string handling, when otherwise immediate indexed
access is possible :-(
Here again the average user will prefer UTF-16 component libraries,
compatible with his own code, while more experienced users may be
happier with the current UTF-8 libraries.
What the hell has "experience" got to do with the preference between
UTF-8 and UTF-16? To the developer (and more so to the end-user) a
Unicode string should act like any other Unicode string. What encoding
is used to represent "hello world" shouldn't even come into question.
This applies only to constant string literals, where the user never has
to care for string encoding and conversion.
English (ASCII) users also may prefer UTF-8, as long as they do not
have to (or want to) deal with strings in foreign languages.
Rubbish once again! Our applications use UTF-8, I have no problems
writing application that support multiple foreign language - as long as
those languages are left-to-right (I don't understand RTL languages,
so can't comment).
You better should understand ;-)
RTL is a mere *display* feature, the chars still are stored from first
to last. More important is the SBCS/MBCS difference, which must be
reflected in user code. Even if *you* have no problems with MBCS (like
UTF-8), other users have.
DoDi
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel