On 3/31/16, Mattias Gaertner <nc-gaert...@netcologne.de> wrote: >> AFAIK the IDE does not save the file with a BOM, so the compiler may >> very well decide that my sourcefile has ACP codepage? > > Yes and no. > When the compiler assumes ACP, it treats the string special. It does > not convert it and stores it as byte copy. At runtime the string has > CP_ACP and its codepage is defined by the variable > DefaultSystemCodePage. LazUTF8 sets this to CP_UTF8, so the string is > treated as UTF-8. Note that it does that without any conversion. > > OTOH when you tell the compiler that the source is UTF-8, it converts > the literal to UTF-16. At runtime it converts the string back to UTF-8. > It does that everytime you assign the literal. > > So, with both you get an UTF-8 string, but the latter has a bit more > overhead. Also the latter needs special care when typecasting (e.g. > PChar).
So, when my usecase for string constants with diacritics in real life most of the time is just captions for buttons/menu's etc., the extra overhead will not really be something to worry about I guess,and in this scenario adding {$codepage utf8} may be the wise thing to do: it eliminates all confusion about the intended encoding of the string constant. So, my current intended approach for GUI applications will be: - declare all strings as just String - have stringconstants with unicode character all in one file and add {$codepage utf8) to that file, and then don't use -FcUTF8 anymore (which is what I'm doing ATM), That should be rather safe then I guess. Will all this mess go away if we would go the Delphi way (String=UnicodeString)? (I know *nix users are going to hate me now) Bart -- _______________________________________________ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus