Jonas Maebe schrieb:

Technically, that section literally states that they will be
concatenated without data loss and that the result is then converted to
the target string's encoding (except in case the target is
RawByteString). How that is implemented exactly is undefined; again in
the meaning of "undefined", not in the meaning of "undefined when
defined as meaning X".

In this case the implementation is "compiler specific", somewhat different from "undefined" (in a RawByteString): "CP_NONE: this value indicates that no code page information has been associated with the string data. The result of any explicit or implicit operation that converts this data to another code page is undefined."

IMO the result is well defined: it's the string with the encoding of that "other" codepage. An "undefined" result, as I understand it, would mean "the result can be anything, unrelated to the function input".

The branch taken in execution of an IF statement also is not "undefined", only because it depends on the actual condition value.

The value of a local variable initially is "undefined", i.e. can be any value. But after an assignment it *is* defined, even if that value still may be *unpredictable* by static code analysis.

IMO a better wording should be found, that does not cause the current obvious confusion of some readers.


Regarding RawByteStrings there has been the definition "a RawByteString
has exactly the same behavior as assigning that AnsiString(X) to another
AnsiString(X) variable with the same value of X: no code page conversion
or copying occurs". Seemingly this is not true for the intermediate
results of concatenations.

That paragraph only specifies that code page-aware strings are
concatenated without data loss, and then defines to which code page the
result will be converted before assigning it to the target.

What's the meaning of "no copying occurs"? Of course the reference to the string is copied into the target variable!

What's "the same value of X", in case of AnsiString(CP_ACP) and AnsiString(DefaultSystemCodePage)?


Even if the intermediary result of a concatenation would be a
RawByteString (which is not stated nor necessarily ever the case), then
the above would apply and hence the (dynamic) code page of that
RawByteString would be the one as defined by the above-mentioned rules
before it would be assigned to the target.

Please note that the other statements refer to *static* encodings, therefore my question about the (assumed) static encoding of an intermediate result. When the compiler inserts an conversion request based on *static* encodings, will it or will it not insert such an request, before an intermediate result is assigned to the target variable?


Suggestion:

"During string operations the source strings are converted [to CP_ACP?] when they have a different [dynamic?] encoding. When the result is stored in a variable, it is converted as required by the static encoding of the target."

Where "as required" means that a static target encoding of CP_ACP is replaced by the DefaultSystemCodePage, while CP_NONE does not require a conversion.

The CP_ACP case should be clarified as well, because it's unclear whether CP_ACP(=0) is *considered* equal to the current DefaultSystemCodePage, even if both values are *always* different (see above). The use of "CP_ACP" instead of "DefaultSystemCodePage" can be confusing and should be avoided or clarified before.

Perhaps it would help to concentrate on the following steps:
1) (string) operand fetch
2) (string) operations
3) (string) assignment

1) Fetching an operand removes any information about the static encoding of the source, only its dynamic encoding persists. [Now the handling of non-AnsiString sources can be explained, like for literals, ShortString etc.
RawByteString is not special here, it's only a static encoding.
]

2) String operations take into account the dynamic encoding of their operands, with lossless conversions inserted as required.

3) When a string is assigned to a variable, it is eventually converted as required by the static encoding of the target, with possible data loss.
[about "required" see above.
Special case: when the source is a variable, no conversion occurs when the *static* source and target types are "compatible".
What exactly is compatible with CP_ACP?
]

DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to