BTW.

I think the implementation would be quite easy, straight forward, fast and compatible.

 - The compiler knows the static encoding type of each string variable.
- The dynamic encoding type of a String is preset to the static encoding type when the string is allocated - only RawByteStrings (EncodingType $FFFF) are allowed to change their dynamic encoding type, with other Strings this will lead to unpredictable results


When Strings are assigned:
- If the static encoding type of source and target is identical (be it normal or RAW) (already checked by the compiler) -> the same happens as with the pre-Unicode compiler (setting the pointer to the StringRecord and managing the RefCount)
otherwise:
- If the target is statically defined as RawByteString (already checked by the compiler) -> the same happens - If the source is statically defined as RawByteString (already checked by the compiler), code is implemented that checks if the dynamic encoding of the source is identical to the (known to the compiler) static encoding type of the target -> the same happens

otherwise the conversion library is called. Same checks the _dynamic_ encoding type of source and target (thus it only needs to be provided with the Strings themselves and no additional information generated by the compiler) and does the conversion appropriately.


When doing operation on two Strings (such as "+" and compare), one of the operators is (virtually) copied to a String with the same encoding type as the other.

Here:
- if one operand is a RawByteString use the (static or dynamic) encoding of the other. - if both are RawByteStrings use the dynamic encoding use the dynamic encoding of one of them (supposedly this is no alternate case to before)

If the conversion library sees a dynamic encoding type of $FFFF for either source or target it will fail and issue an exception.


IMHO it makes a much more sense to implement things like TStringList on base of RawByteString, as when doing it based on the default System encoding, there will be a dual conversion when using it with any other encoding type.

IMHO big commonly used, arch independent, non super high-performance libraries (like LCL) should use RawByteString as their user interface and internally as widely as possible, so that conversions are prevented whenever possible (e.g. when the user's call provides a string and during the work in the library it is decided that it is not actually used.)

-Michael (the weird one)

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to