Re: [fpc-pascal] Generic String Functions

Sven Barth Fri, 28 Feb 2014 06:01:54 -0800

Am 28.02.2014 14:16, schrieb Michael Schnell:

On 02/28/2014 12:53 PM, Sven Barth wrote:
Problem: there is (currently) no string type that can handle ANSI,UTF-8 and UTF-16 at once. The first two are handled by AnsiString andthe third by UnicodeString. And those two are not equal which wouldbe important for overrides/overloads/inheritance. Without that yourwhole idea is useless.
Of course this only is relevant when "New Delphi" (i.e. "partly"dynamically encoded) Strings" are introduced (I decline to use theterms "AnsiString" and "UnicodeString" due to ambiguity, unless itcomes with a clear definition close by).

As long as not stated differently AnsiString and UnicodeString are meantas implemented in FPC trunk.

Here, The Delphi model does not provide a String encoding type (andappropriate "compiler magic") that can be used for that purpose (i.e."fully dynamically encoded").

Basically it does. In theory the additional record prepended to eachstring (wich contains the reference count among others) could be usedfor 1-, 2-, 4- or multi-Byte strings as it carries a "ElementSize" fieldwhich is currently fixed to 1 for AnsiString (even with UTF-8) and to 2for UnicodeString (both strings use the same record layout though theyare declared as different ones). Also there is the StringElementSizefunction which is overloaded for RawByteString and UnicodeString andwhich already returs the value of ElementSize. So purely in theory thecurrent AnsiString type would already be capable enough. Also thecompiler might already handle overloads correctly if we'd have a (fornow hypothetical "AnsiString(UTF16)" (which would be equal toUnicodeString)). One of the problematic parts that already Marcomentioned is character access. A possible solution here would be toforce the character size depending on the declared string type (2 forAnsiString(UTF16), 4 for AnsiString(UTF32), 1 for any 1-Byte AnsiStringencoding and either 1 or 6 for UTF-8 (6 is the maximum number of Bytesthat UTF-8 might encode a character with, but it's currently the maximumused is 4)) and not depending on the runtime type. The compiler wouldthen either need to insert approbiate conversions if the runtime typedoes not match the declared type (for whatever reason) or the compilerwould need to assume that the runtime type always matches the declaredtype. In the former case this might be quite some performance penalty(this could be avoided if the compiler would create approbiate inlinecode for detecting the runtime encoding).An open problem left would be RTTI as there currently are tkUString (forUnicodeString) and tkLString (for AnsiString) of which the secondcontains a codepage field while the first does not. And to keep Delphicode as much compatible as possible the compiler would then again needto handle the RTTI of a AnsiString(UTF16) differently...


Regards,
Sven
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Generic String Functions

Reply via email to