Am 28.02.2014 14:16, schrieb Michael Schnell:
On 02/28/2014 12:53 PM, Sven Barth wrote:

Problem: there is (currently) no string type that can handle ANSI, UTF-8 and UTF-16 at once. The first two are handled by AnsiString and the third by UnicodeString. And those two are not equal which would be important for overrides/overloads/inheritance. Without that your whole idea is useless.


Of course this only is relevant when "New Delphi" (i.e. "partly" dynamically encoded) Strings" are introduced (I decline to use the terms "AnsiString" and "UnicodeString" due to ambiguity, unless it comes with a clear definition close by).
As long as not stated differently AnsiString and UnicodeString are meant as implemented in FPC trunk.
Here, The Delphi model does not provide a String encoding type (and appropriate "compiler magic") that can be used for that purpose (i.e. "fully dynamically encoded").
Basically it does. In theory the additional record prepended to each string (wich contains the reference count among others) could be used for 1-, 2-, 4- or multi-Byte strings as it carries a "ElementSize" field which is currently fixed to 1 for AnsiString (even with UTF-8) and to 2 for UnicodeString (both strings use the same record layout though they are declared as different ones). Also there is the StringElementSize function which is overloaded for RawByteString and UnicodeString and which already returs the value of ElementSize. So purely in theory the current AnsiString type would already be capable enough. Also the compiler might already handle overloads correctly if we'd have a (for now hypothetical "AnsiString(UTF16)" (which would be equal to UnicodeString)). One of the problematic parts that already Marco mentioned is character access. A possible solution here would be to force the character size depending on the declared string type (2 for AnsiString(UTF16), 4 for AnsiString(UTF32), 1 for any 1-Byte AnsiString encoding and either 1 or 6 for UTF-8 (6 is the maximum number of Bytes that UTF-8 might encode a character with, but it's currently the maximum used is 4)) and not depending on the runtime type. The compiler would then either need to insert approbiate conversions if the runtime type does not match the declared type (for whatever reason) or the compiler would need to assume that the runtime type always matches the declared type. In the former case this might be quite some performance penalty (this could be avoided if the compiler would create approbiate inline code for detecting the runtime encoding). An open problem left would be RTTI as there currently are tkUString (for UnicodeString) and tkLString (for AnsiString) of which the second contains a codepage field while the first does not. And to keep Delphi code as much compatible as possible the compiler would then again need to handle the RTTI of a AnsiString(UTF16) differently...

Regards,
Sven
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to