I think at most two are required for any target: unicodestring (D2009
compatibility), and if really necessary because somehow the unicodestring
version causes too much overhead, an ansistring($ffff) version as well. That's
only for the classes though, I think most of the base RTL can be simply
ansistring($ffff).
So if I understand correctly, then UnicodeString and also AnsiString
types must "be extended" that they will hold also information about
actual codepage (encoding) of string data they hold.
(AFAIK ATM they hold only information about "reference count" and "size"
and of course "data")
I am not expert, so I do not understand all aspect/problems which are
joined with proper string handling, but some kind of implicit
conversions (based on actual encoding of string data) is necessary (ANSI
<-> UTF-8 <-> UTF-16 <-> ANSI ... etc.).
For example known problem with Euro currency symbol. In Windows is in
CurrencyString global variable stored using ANSI codepage, but used in
LCL (which expect UTF-8 encoding) without any explicit conversion, what
leads to displayng "?" instead of "€" (for example in TDBEdit or TDBGrid)
Another problem when displaying character data in data-aware database
controls (TDBEdit, TDBGrid). Data-aware controls (LCL) reads data from
TField descendatns (FCL) using TField.Text property which returns
"string" (without codepage information is not clear if it is AnsiString
or UTF8String or UnicodeString). LCL expect UTF-8 strings, but it is not
true in all cases (for example in case of ODBC)
-Laco.
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel