Am 30.08.2023 um 20:21 schrieb Ondrej Pokorny via fpc-devel:
On 30.08.2023 17:35, Tomas Hajny via fpc-devel wrote:
On 2023-08-30 17:23, Ondrej Pokorny via fpc-devel wrote:
Sorry to bother you with something as trivial: is your t2.pas file
really encoded in UTF-8?

Because if I compile an ANSI file with the {$codepage utf8}
declaration, then I get "correct" output. But obviously this is very
wrong.

You can try yourself with the attached files. So maybe this is your mistake?

Well, you're right, this was indeed my mistake, shame on me. :-( Then I can confirm that the compiler behaviour is indeed wrong (although I have no clue why it behaves that way).

Having seen the outputs, I think that the compiler just ignores the source file encoding for {$MESSAGE} and {$NOTE}. It reads them always as ANSI and then converts them to DOS-whatever.

That would explain why UTF-8 byte stream is encoded into DOS CP.

So the fix should be quite easy - when {$MESSAGE} or {$NOTE} is read into a string, set the correct codepage of the string.

I was correct in my assumption and I was able to fix it: https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/482

On the other hand, when I read the $CODEPAGE docs: https://www.freepascal.org/docs-html/prog/progsu87.html#x95-940001.3.4 There it is stated that only literal strings follow $CODEPAGE and the actual code must be in US-ASCII.

But you know: Delphi compatibility :) ...and there is no "illegal character" compiler error as it is for:

var
  ä: string;

so one would expect {$note ä} to show up correctly.

Ondrej

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to