Am 05.05.2017 um 13:53 schrieb Mattias Gaertner: > Hi, > > AFAIK FPC stores UTF-8 string literals (-Fcutf8)
-Fc tells the compiler only the encoding of the source code page, it says nothing how string constant shall be encoded. > as widestrings > instead of UTF8String. Please correct me if I'm wrong. > > This has several side effects: > > 1. When using a character outside BMP FPC stops with: > Error: UTF-8 code greater than 65535 found > For example: > const Eyes = '👀'; > > 2. Assigning a UTF-8 literal to an UTF8String requires a > widestringmanager. > For example non ISO-8859-1 chars are mangled: > var u: UTF8String = 'äöüالعَرَبِيَّة'; > > 3. PChar on a string literal does not work as expected. You get the > bytes of a widestring instead. Well, it depends on what you expect :) > > > What would happen if FPC would be extended to store UTF-8 > literals as UTF8String? > What are the disadvantages? 1. Backward compatibility. Due to its windows origins and history, the default unicode encoding in FPC is UTF-16, FPC uses also internally UTF-16 everywhere. 2. What would happen then the other way around? When casting the string constant to a PUnicodeChar (what probably a lot of delphi code does)? 3. Personally, I still think, UTF-16 is the "native" unicode type: all important APIs use UTF-16, for me, UTF-8 is a hack. What we could do of course is, that if a constant is assigned to a string with explicit utf-8 encoding, that the compiler does the conversion at run time. But it complicates things even more. This does not solve the PChar problem, but I think, when somebody uses unicode source files and PChar, he is on how own :) I think, it would nice if Michael (v. C.) prepares some section for the docs and we comment and help him to improve it. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel