On 10 Jan 2011, at 13:57, Marco van de Voort wrote:

In our previous episode, Jonas Maebe said:

If/when this is done, it will only be with a compiler switch or
directive.

(
That won't be enough, since that would not change the relevant units
and
classes to such type. (e.g. tstringlist would remain defined
ansistring)

If it's a D2009-style ansistring, does that matter?

A lot of conversion, since it will use ansistring(0) so reading/ writing ansistring(cp_utf8) will force conversions. (0 means system encoding, $FFFF
means never convert)

Why should a tstringlist force ansistring(0)? Or does Delphi force it to be that way?

Conversion may indeed be required for output (input would only pass on the encoding of the input if based on ansistring($ffff)), but I think doing that only when necessary at the lowest level should be no problem. Many existing frameworks work that way.

Besides that the usual three problems:

- I don't know how VAR behaves in this case. (passing a ansistring(cp_utf8) to a "var ansistring(0)" parameter),

var-parameters may indeed pose a problem in case some parameters of OS- neutral routines are required to have a particular encoding specified.

- maybe overloading (only cornercases?) etc.

Possibly, although I guess there are probably rules for that (whether they are document is another case though, probably...)

- inheritance. FPC defines base classes as ansistring(0) parameters, and Lazarus wants to inherit and override them with a different type. This will clash.

Why ansistring(0) for base classes? OS-level interfaces: yes, but why base classes?

I've thought long and hard about this. Since the discussion what the
dominant type should be won't stop anytime soon, and we probably will have to support both UTF8 (*nix) and UTF16 (Windows and *nix/QT) as basetypes in the long run, plus a time ANSI as legacy, the RTL has to be prepared for it
anyway, we might as well allow this on all platforms from the start.
(actually releasing them is a different question and depends on manpower)

I agree that the RTL should work regardless of the used string encoding, but I don't see why a particular encoding should be enforced throughout the entire RTL rather than just using ansistring($ffff) almost everywhere.

I also agree that we should strive to minimize the number of conversions in the RTL for some encodings (in particular indeed ansi, utf-8 and utf-16), but again this should not require a specially compiled RTL. E.g., insert(ansistring($ffff)), delete(ansistring($ffff)), etc. can call to special-purpose versions for certain specific encodings of the input (e.g., for the three you mentioned), and only if the encoding is not directly supported or if different encodings are mixed then perform a round trip via some generic format (utf-16, utf-32, or something that depends on the platform).

This has the advantage that you always have all optimal implementations available, regardless of the platform or default string encoding. It does not require extra work because we have to write all those versions also if we want the RTL to be compilable for different default string encodings. And three checks in a case statement are not going to define the performance in a context of atomic reference counting, dynamic memory management and the occasional code page conversion (and since this may reduce the number of code page conversions when working with "non-native" strings, it can also be a performance win).

Outside the RTL, the encoding mainly matters if you perform manual low- level processing of a string (for i:=1 to length(s) do something_with(s[i])). But in that case your your code will either work with only one encoding and you have to enforce it via the parameter type anyway, or if it has to work with multiple encodings and then you can use a technique similar to what I described above for the RTL.

That doesn't mean that a per unit switch is useless, but I think a target switch to fixate the bulk of the cases will save both us and the users a lot
of grief.

It's not really clear to me which problem this would solve, but I may be missing something.


Jonas
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to