In response to another thread (this should start an new thread):

"CP_NONE: this value indicates that no code page information has been
associated with the string data. The result of any explicit or implicit
operation that converts this data to another code page is undefined."

After rereading I found this definition incorrect, the entire section (and more) deserves a correction/clarification. The implementation may have to be changed accordingly.

This is my interpretation of the Delphi API around encoded AnsiStrings, as documented and implemented there, with added clarifications and notes on omissions and possible problems on non-Windows platforms.

I do not expect that the FPC developers fully agree with this interpretation, but I expect that all items of a revised version of the following draft become part of the FPC documentation, somehow.

<Draft>

1) CP_ACP, CP_OEM and CP_NONE are "generic" encodings (placeholders), applicable as *static* string encodings inside a program only, they never can denote a dynamic string encoding.

Note: "codepage" here means byte-based ANSI/ISO codepages, applicable to AnsiStrings, not Unicode codepages (BMP...). While CP_UTF16 (and BE/LE variations) can be used to specify a concrete (string,textfile...) *encoding*, they do not describe codepages (neither Ansi nor Unicode).

Note: these identifiers (names) should be used with exreme care in documentation/discussions. In most cases CP_ACP stands for the *actual* default encoding, equivalent to the value of a hypothetical *variable* named CP_ACP, i.e. currently (see below) should be understood as DefaultSystemCodePage. It should be made clear that the value of the CP_ACP *constant* identifier (=0) is meant and usable only in few cases, like in the declaration of an string type; it may also be acceptable in explicit conversion requests, and to denote the encoding to use in file/stream I/O, where the functions replace CP_ACP by the actual (DefaultSystemCodePage) value internally.

Note: in compiler, library and application code a value of CP_ACP should be considered equal to (be mapped into) the actual (DefaultSystemCodePage) encoding.

2) A platform (or Unicode library) may or may not provide their own *generic* values (constants) for application (CP_ACP) and console (CP_OEM) encoding, as well as further constants for e.g. filenames.

Note: CP_ACP is zero on Windows, possibly different on other platforms or libraries. Thus AnsiString(0) may be different from AnsiString(CP_ACP). It may be required to distinguish between a named Pascal constant CP_ACP=0, and the value of the generic application/default encoding in API calls (CP_SYS?).

3) The *actually* associated codepages are defined by the platform, eventually can be changed by the user (admin). A program may or may not be allowed to change the associated codepages, either locally (process wide) or globally (system wide).

Note: the name "DefaultSystemCodePage" should be reserved for the *system* defined codepage. When this setting can be different from an application-wide setting, another DefaultApplicationCodePage variable should be added. See the comments on Modifications and Notes on DefaultSystemCodePage in the Wiki page!

Note: a process should determine (retrieve) the platform settings *before* any attempt to interpret system-provided strings (commandline, environment variables...). Depending on the platform, more generic settings may apply to specific strings, like for filenames. In all external API calls, the RTL is responsible for the correct encoding of all string arguments, as expected by the called function. This applies in detail to CP_ACP, when this encoding can be changed inside a program to something different from the external (platform...) setting.

4) A RawByteString variable, of the static encoding CP_NONE, can hold strings of *any* dynamic encoding. No conversion is performed when a string is assigned to such a variable. In the opposite direction the standard handling should apply, i.e. different static encodings require a conversion into the static target encoding.

Note: Its known that Delphi does not always convert an RawByteString, in an assignment to a variable of an different type. This flaw should be fixed in FPC. Is the according Delphi behaviour *defined* anywhere?

5) Use StringCodePage to get an actual (dynamic) string encoding. StringCodePage never returns one of the generic values. The dynamic codepage of an unassigned (empty) string is assumed (by Delphi) as the actually selected CP_ACP codepage for AnsiString arguments, CP_UTF16 (or whatever applicable) for UnicodeString arguments.

Note: while an unassigned (empty) string variable has a static encoding, known to the compiler, this encoding is unknown to StringCodePage. The overloaded Ansi/Unicode versions of StringCodePage only know about the basic string type (Ansi/Unicode) of their arguments, but cannot determine a static encoding from the inexistent string header. That's why in this case they return the according default encoding, as assumed in default type declarations, where AnsiString becomes AnsiString(CP_ACP).

Note: The Unicode overload is questionable, since in contrast to its name it returns an *encoding*, not a *codepage*. It should return the *native* (CPU specific BE/LE) UTF-16 encoding, used for strings declared as UnicodeString. [Actually I cannot check the applicable Delphi constants and behaviour on non-Intel platforms]

</Draft>



IMO the result is well defined: it's the string with the encoding of
that "other" codepage.

Unless you actually tested this on all platforms and noted that is the
case, you cannot state this. And if you would actually test it, you
would discover that it is wrong
(http://bugs.freepascal.org/view.php?id=22501#c61238 ).

In that discussion I found several errors, which are not detected by the compiler nor handled in the RTL. In the concrete entry the illegal use of the *generic* CP_NONE identifier is mentioned. That's why I felt a need to address several specific topics in above draft.

DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to