Am 30.08.2023 um 20:21 schrieb Ondrej Pokorny via fpc-devel:
On 30.08.2023 17:35, Tomas Hajny via fpc-devel wrote:
On 2023-08-30 17:23, Ondrej Pokorny via fpc-devel wrote:
Sorry to bother you with something as trivial: is your t2.pas file
really encoded in UTF-8?
Because if I compile an ANSI file with the {$codepage utf8}
declaration, then I get "correct" output. But obviously this is very
wrong.
You can try yourself with the attached files. So maybe this is your
mistake?
Well, you're right, this was indeed my mistake, shame on me. :-( Then
I can confirm that the compiler behaviour is indeed wrong (although I
have no clue why it behaves that way).
Having seen the outputs, I think that the compiler just ignores the
source file encoding for {$MESSAGE} and {$NOTE}. It reads them always
as ANSI and then converts them to DOS-whatever.
That would explain why UTF-8 byte stream is encoded into DOS CP.
So the fix should be quite easy - when {$MESSAGE} or {$NOTE} is read
into a string, set the correct codepage of the string.
I was correct in my assumption and I was able to fix it:
https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/482
On the other hand, when I read the $CODEPAGE docs:
https://www.freepascal.org/docs-html/prog/progsu87.html#x95-940001.3.4
There it is stated that only literal strings follow $CODEPAGE and the
actual code must be in US-ASCII.
But you know: Delphi compatibility :) ...and there is no "illegal
character" compiler error as it is for:
var
ä: string;
so one would expect {$note ä} to show up correctly.
Ondrej
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel