I have some observations on the discussion so far. The biggest question is what the intended behavior is.

Martin wrote:
Well, I have pointed out myself,in my mail, that it probably needs
more documentation. I do not know if it is documented or not.

But it is the answer, I have gotten several times from developers in
the FPC team. So for all I know it is the intended behaviour. At
least intended in FPC (and apparently either intended or at least
implemented in Delphi). So if there is no documentation for it, then
it would appear a problem of documentation, rather than a bug in the
compiler (Again all based on the statements I was given)

Thaddy wrote:
It is a contract between the compiler and the programmer in which it
is expected that the string will not be modified inside a procedure,
function or method.

This is the crux of the controversy. I realized this when I was writing the original post, but did not mention it explicitly because I thought it would come up anyway.

The difference between a feature and a bug is the specifications. Here the specifications are the documentation. I have not found any documentation in either FPC or Delphi that there is some implicit contract whereby the programmer promises not to modify other variables which happen to refer to the same instance as a const parameter. Many people have repeatedly stated that this is the programmer's fault. If there is an implicit agreement with the programmer, then yes I agree with these statements and I believe it is not a compiler bug (although certainly not good language design).

However I am looking for documentation. Has anyone found anything yet? If anyone can find anything I would be pleased as it would settle the question. But lacking any documentation, I don't see how anyone should know there is an implicit contract. To me, a const parameter means that you cannot modify that parameter by pointing it to something else, nor (in the case of strings and dynamic arrays) alter the contents by means of said parameter. (Although, you can't really "alter" the contents of the instance, as copy-on-write simply creates a whole new instance.) That's what it means in other languages I've used, and nothing more. I don't see it as implying anything else. Furthermore, as many examples have shown, the programmer often /cannot/ know whether several variables refer to the same instance, since the handling of creating and destroying instances, copy-on-write, etc., is handled by the compiler and is considered an implementation detail that should be opaque to the programmer.

I don't know how many of you have actually looked at the demo I posted, but here is the crucial part. The demo program contains this line:

FCurrentDriverName := Edit1.Text;

In this state, the program works perfectly. However, if this line is changed as follows:

FCurrentDriverName := Edit1.Text + 'abc';

the program then crashes. IMHO, this is very scary. All you have to do is make a tiny, harmless change and suddenly a working program crashes. Also in the demo you will notice that the programmer doesn't even call a function that takes a const parameter; the problem is caused by setting a parameter, and it just so happens that behind the scenes the parameter's setter takes a const parameter.

Unless there is some documentation I am unaware of, I don't agree with the implicit contract theory. Instances and variables are not the same. People confuse them because that is actually exactly the point of the Pascal construct: the compiler creates the illusion that each variable is an instance. This is why there is copy on write; so you can modify a variable and it doesn't modify other variables that (prior to modification) referred to the same instance. But again it is only an illusion. In the implementation, a variable (or parameter) and an instance of an automatic type are not the same, and that is where the problem is rooted. The management of these is an opaque implementation detail. The programmer cannot be expected to know whether or not the compiler has chosen to use the same instance for two variables/parameters, and yet that is what the implicit contract theory states.

As in C, Java, etc., if I have a const variable, that means it's a const variable/parameter; i.e., I can't change it to point to something else. It doesn't carry any implications about other variables that may be pointing to the same instance. If the "implicit const contract" is indeed true, then I agree there is not a compiler bug, just a poorly conceived language feature (please note I most certainly am not trying to blame anyone for it though).

The best I have found so far, which is still somewhat ambiguous, is

http://docwiki.embarcadero.com/RADStudio/en/Parameters_%28Delphi%29#Constant_Parameters

which says, "A constant (const) parameter is like a local constant or read-only variable. Constant parameters are similar to value parameters, except that you can't assign a value to a constant parameter within the body of a procedure or function, nor can you pass one as a var parameter to another routine."

Although I acknowledge it is a bit vague, this seems to me to be leaning heavily in the direction that there is no implicit contract. As it says, just think about how a local const variable would behave. Additionally, note that it says you can't assign a value /within the body/ of the routine. There is no constraint mentioned about modifying other variables in other routines. However I acknowledge that the absence of a statement is not proof.

So, the best evidence I have against the implicit contract theory is:
1. By analogy with all the other languages I'm familiar with (C, C++, Java)
2. The fact that if the theory were true it would be a terrible language idea 3. The fact that the Embarcadero documentation likens a const parameter to a local const or const variable. 4. Since using a single instance for multiple variables, copy-on-write, and reference counting are supposed to be opaque implementation details that provide the illusion that every variable is its own unique instance, it does not make any sense to believe the programmer can know when the compiler has chosen to share an instance with other variables, such that the programmer can determine whether or not const is safe in a particular situation.

Again, if there is prior documentation to the contrary, I will gladly be proven wrong, and we can move on to worry about practical steps to mitigate the possibility of problems rather than whether it is a bug. However if there is no such specification, then really what we have here is an undefined aspect of the language. In that case, seeing that it needs clarification, I would propose that a sane language design decision be made by rejecting the implicit contract concept.

Michael wrote:
You can always fool the compiler. The compiler trusts you and assumes
that what you tell her is true...

Yes, of course you can always fool the compiler, it just shouldn't be
the other way around. The example you gave is very different for one
very important reason: you show using explicit allocating and freeing of
an object. With strings, the programmer does not, and cannot, explicitly
allocate or deallocate the resources, and the problem lies in the
behavior of the automatic allocation and deallocation. Thus there is
nothing in common between this example and the problem at hand.

Alexander wrote:
So far, we have found two instances of procedures with non-const
string parameters with the comment to the effect of "do not change to
const, or the code will break". One of these instances is fairly
recent, and the developer who made the change did remember that it
cost him more than a work-day to fix -- and that was after the crash
 report from the client. The company is now considering outright ban
 on all const string parameters,

Wimpie wrote:
If it helps, I can remember fixing this bug. It took two days...
...
So yes, it is very scary to me and would like it to be fixed.

Thank you for providing some corroborating evidence that there is in
fact real-world danger here and that it is worth the time to consider.

Florian wrote:
It affects more types, even shortstring suffers from it

I must respectfully disagree. In the case of shortstring, the value of the const parameter does get modified, but that is to be expected. If my understanding is correct (and I'm open to be corrected), the semantics of ShortString are different. With AnsiString, assigning one string variable to another is supposed to create the illusion that they are unique instances. Hence there is copy-on-write. With short strings, assigning one to another literally means they are the same instance. Again this comes back to the difference between instance and variable, and the illusion implicit in AnsiString and dynamic arrays, which I think is not the case with ShortString (but again I could be wrong).

The problem at issue here is the fact that the compiler can actually free memory prematurely. In the case of shortstring, it won't crash. The programmer may have tricked himself, but the compiler isn't doing anything unexpected. In the case of reference counted types (strings, dynamic arrays, and interfaces), the compiler is doing something quite unpredictable. Again, however, this depends on the specification/documentation. If there is in fact a constraint the programmer must follow of not modifying other variables that by chance point to the same instance, then I would agree there is no problem with ref counted types either and that the programmer has made an error. I just question that assumption.

Alexander wrote:
The documentation should recommend users to never use "const
string" parameters
except cases where they can prove it to be safe.
Moreover, since the safety of such code may be compromised by
completely unrelated changes
(e.g. by adding {$INLINE OFF} for debugging purposes),
"const" modifier should not be used at all except the most
performance-critical code.

If the compiler is actually working correctly and there is an unstated (perhaps even undocumented) constraint on the programmer to not modify other variables that use the same instance as a const parameter (even though the programmer cannot know that), then I fully agree with this advice.

In fact, with the implicit contract theory, I propose that it would *never* safe to use const, because the programmer cannot control the implementation of the ref counted types. It is an explicit part of the design that the programmer should not, and indeed cannot, be aware of when/where the compiler allocates, deallocates, or copies instances of any particular ref counted type. Therefore, semantically there can never be a guarantee that using const is safe, if the implicit contract is accepted. I stress, instances and variables are not the same. The relationship between them is supposed to be an opaque implementation detail. That is part of the reason I do not believe in this implicit contract about handling consts.

Now I will acknowledge that const can be used in certain limited situations without harm. I still stand by the claim that it is semantically incorrect to ever use const, and I say this on the basis that if const safety depends on the automatic type implementation and the automatic type implementation is opaque, it is theoretically wrong to believe const can ever be used. However in practical terms we do know that passing a string which is *created* as a local variable in a routine cannot do harm because it will always have at least 1 refcount.

To summarize:

1. To the programmer, each AnsiString and dynamic array variable is supposed to be unique, i.e., after doing A := B, modifying A does not affect B and vice versa.

2. The fact that multiple variables (or parameters) actually can refer to a single instance is an implementation optimization. That is of no concern to the programmer. The optimization is implemented by the combination of reference counting and copy-on-write.

3. The programmer cannot be aware of what the compiler decides to do regarding how it implements reference counting and copy-on-write. The programmer should simply know that unique variables are unique instances for all practical purposes (except var parameters obviously, since that is the whole point of having to declare it var).

4. If the programmer cannot be aware of when an instance is shared with multiple variables, an implicit contract that the programmer cannot modify other variables which by chance are using the same instance is impossible to obey, and therefore const would be useless.

As further support for #1, consider: With a class, you do not have to pass it to a procedure as var, but when you modify the instance, you modify the same instance as was passed to the procedure. That is how classes work. Having a variable point to the same object as another variable does in fact mean it's the same instance, the same object. That is Object Pascal's object model. But with strings, if you want to modify the string in a procedure and have that affect the argument initially passed to the procedure you *have* to use var. That alone should be convincing evidence that the programmer is always supposed to be able to assume that unique variables are unique instances for automatically managed strings and arrays.

If I may offer one other comment, I think it would be best to reserve comments on TStrings/TStringList and other libraries to a separate conversation and focus on the basic behavior of the compiler or else the conversation will get too confusing.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to