> You should weigh the advantages you outline here against the disadvantages of > no longer knowing how string literals will be encoded. As a programmer, either I don't want to know (declared const without giving explicit type) or I do, then I did declare it correctly:
{$codepage utf8} var u: UTF8String = 'äöüالعَرَبِيَّة'; -> UTF8String containing the characters I entered in the source file (in this case(!!) just 1:1 copy). {$codepage utf8} var u: UCS4String= 'äöü'; -> UCS4 encoded Version, either 000000e4 000000f6 000000fc or the equivalent with combining characters There should probably be an error if the characters I typed don't actually exist in the declared type (emoji in an UCS2String), but otherwise, there's no good reason why that shouldn't "just work". > It means e.g. the resource string tables will have entries that are UTF16 > encoded > or entries that are UTF8 encoded, depending on the unit they come from. > This is highly undesirable. Always convert from "unit CP" to UTF8 (or UTF16 if some binary compat is required), done. Aren't they just internal anyway? > By forcing everything UTF16 we ensure delphi compatibility (yes it does > matter) > and we also ensure a uniform set of string tables. If that was what happened, ok. But from the error message Matthias listed as (1) I would assume that the actual string type is UCS2String, at least at some point in the process. Just my 2 cents... Martok PS: adding to the discussion over on the Lazarus ML: I just found a fourth wiki page describing a slightly different Unicode support. This is getting ridiculous. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel