Re: [fpc-devel] Const optimization is a serious bug

Chad Berchek Wed, 06 Jul 2011 22:44:59 -0700

I have some observations on the discussion so far. The biggest questionis what the intended behavior is.


Martin wrote:

Well, I have pointed out myself,in my mail, that it probably needs
more documentation. I do not know if it is documented or not.


But it is the answer, I have gotten several times from developers in
the FPC team. So for all I know it is the intended behaviour. At
least intended in FPC (and apparently either intended or at least
implemented in Delphi). So if there is no documentation for it, then
it would appear a problem of documentation, rather than a bug in the
compiler (Again all based on the statements I was given)


Thaddy wrote:

It is a contract between the compiler and the programmer in which it
is expected that the string will not be modified inside a procedure,
function or method.

This is the crux of the controversy. I realized this when I was writingthe original post, but did not mention it explicitly because I thoughtit would come up anyway.

The difference between a feature and a bug is the specifications. Herethe specifications are the documentation. I have not found anydocumentation in either FPC or Delphi that there is some implicitcontract whereby the programmer promises not to modify other variableswhich happen to refer to the same instance as a const parameter. Manypeople have repeatedly stated that this is the programmer's fault. Ifthere is an implicit agreement with the programmer, then yes I agreewith these statements and I believe it is not a compiler bug (althoughcertainly not good language design).

However I am looking for documentation. Has anyone found anything yet?If anyone can find anything I would be pleased as it would settle thequestion. But lacking any documentation, I don't see how anyone shouldknow there is an implicit contract. To me, a const parameter means thatyou cannot modify that parameter by pointing it to something else, nor(in the case of strings and dynamic arrays) alter the contents by meansof said parameter. (Although, you can't really "alter" the contents ofthe instance, as copy-on-write simply creates a whole new instance.)That's what it means in other languages I've used, and nothing more. Idon't see it as implying anything else. Furthermore, as many exampleshave shown, the programmer often /cannot/ know whether several variablesrefer to the same instance, since the handling of creating anddestroying instances, copy-on-write, etc., is handled by the compilerand is considered an implementation detail that should be opaque to theprogrammer.

I don't know how many of you have actually looked at the demo I posted,but here is the crucial part. The demo program contains this line:


FCurrentDriverName := Edit1.Text;

In this state, the program works perfectly. However, if this line ischanged as follows:


FCurrentDriverName := Edit1.Text + 'abc';

the program then crashes. IMHO, this is very scary. All you have to dois make a tiny, harmless change and suddenly a working program crashes.Also in the demo you will notice that the programmer doesn't even call afunction that takes a const parameter; the problem is caused by settinga parameter, and it just so happens that behind the scenes theparameter's setter takes a const parameter.

Unless there is some documentation I am unaware of, I don't agree withthe implicit contract theory. Instances and variables are not the same.People confuse them because that is actually exactly the point of thePascal construct: the compiler creates the illusion that each variableis an instance. This is why there is copy on write; so you can modify avariable and it doesn't modify other variables that (prior tomodification) referred to the same instance. But again it is only anillusion. In the implementation, a variable (or parameter) and aninstance of an automatic type are not the same, and that is where theproblem is rooted. The management of these is an opaque implementationdetail. The programmer cannot be expected to know whether or not thecompiler has chosen to use the same instance for twovariables/parameters, and yet that is what the implicit contract theorystates.

As in C, Java, etc., if I have a const variable, that means it's a constvariable/parameter; i.e., I can't change it to point to something else.It doesn't carry any implications about other variables that may bepointing to the same instance. If the "implicit const contract" isindeed true, then I agree there is not a compiler bug, just a poorlyconceived language feature (please note I most certainly am not tryingto blame anyone for it though).


The best I have found so far, which is still somewhat ambiguous, is

http://docwiki.embarcadero.com/RADStudio/en/Parameters_%28Delphi%29#Constant_Parameters

which says, "A constant (const) parameter is like a local constant orread-only variable. Constant parameters are similar to value parameters,except that you can't assign a value to a constant parameter within thebody of a procedure or function, nor can you pass one as a var parameterto another routine."

Although I acknowledge it is a bit vague, this seems to me to be leaningheavily in the direction that there is no implicit contract. As it says,just think about how a local const variable would behave. Additionally,note that it says you can't assign a value /within the body/ of theroutine. There is no constraint mentioned about modifying othervariables in other routines. However I acknowledge that the absence of astatement is not proof.


So, the best evidence I have against the implicit contract theory is:
1. By analogy with all the other languages I'm familiar with (C, C++, Java)

2. The fact that if the theory were true it would be a terrible languageidea3. The fact that the Embarcadero documentation likens a const parameterto a local const or const variable.4. Since using a single instance for multiple variables, copy-on-write,and reference counting are supposed to be opaque implementation detailsthat provide the illusion that every variable is its own uniqueinstance, it does not make any sense to believe the programmer can knowwhen the compiler has chosen to share an instance with other variables,such that the programmer can determine whether or not const is safe in aparticular situation.

Again, if there is prior documentation to the contrary, I will gladly beproven wrong, and we can move on to worry about practical steps tomitigate the possibility of problems rather than whether it is a bug.However if there is no such specification, then really what we have hereis an undefined aspect of the language. In that case, seeing that itneeds clarification, I would propose that a sane language designdecision be made by rejecting the implicit contract concept.


Michael wrote:

You can always fool the compiler. The compiler trusts you and assumes
that what you tell her is true...


Yes, of course you can always fool the compiler, it just shouldn't be
the other way around. The example you gave is very different for one
very important reason: you show using explicit allocating and freeing of
an object. With strings, the programmer does not, and cannot, explicitly
allocate or deallocate the resources, and the problem lies in the
behavior of the automatic allocation and deallocation. Thus there is
nothing in common between this example and the problem at hand.

Alexander wrote:

So far, we have found two instances of procedures with non-const
string parameters with the comment to the effect of "do not change to
const, or the code will break". One of these instances is fairly
recent, and the developer who made the change did remember that it
cost him more than a work-day to fix -- and that was after the crash
 report from the client. The company is now considering outright ban
 on all const string parameters,


Wimpie wrote:

If it helps, I can remember fixing this bug. It took two days...

...

So yes, it is very scary to me and would like it to be fixed.


Thank you for providing some corroborating evidence that there is in
fact real-world danger here and that it is worth the time to consider.

Florian wrote:

It affects more types, even shortstring suffers from it

I must respectfully disagree. In the case of shortstring, the value ofthe const parameter does get modified, but that is to be expected. If myunderstanding is correct (and I'm open to be corrected), the semanticsof ShortString are different. With AnsiString, assigning one stringvariable to another is supposed to create the illusion that they areunique instances. Hence there is copy-on-write. With short strings,assigning one to another literally means they are the same instance.Again this comes back to the difference between instance and variable,and the illusion implicit in AnsiString and dynamic arrays, which Ithink is not the case with ShortString (but again I could be wrong).

The problem at issue here is the fact that the compiler can actuallyfree memory prematurely. In the case of shortstring, it won't crash. Theprogrammer may have tricked himself, but the compiler isn't doinganything unexpected. In the case of reference counted types (strings,dynamic arrays, and interfaces), the compiler is doing something quiteunpredictable. Again, however, this depends on thespecification/documentation. If there is in fact a constraint theprogrammer must follow of not modifying other variables that by chancepoint to the same instance, then I would agree there is no problem withref counted types either and that the programmer has made an error. Ijust question that assumption.


Alexander wrote:

The documentation should recommend users to never use "const
string" parameters
except cases where they can prove it to be safe.
Moreover, since the safety of such code may be compromised by
completely unrelated changes
(e.g. by adding {$INLINE OFF} for debugging purposes),
"const" modifier should not be used at all except the most
performance-critical code.

If the compiler is actually working correctly and there is an unstated(perhaps even undocumented) constraint on the programmer to not modifyother variables that use the same instance as a const parameter (eventhough the programmer cannot know that), then I fully agree with thisadvice.

In fact, with the implicit contract theory, I propose that it would*never* safe to use const, because the programmer cannot control theimplementation of the ref counted types. It is an explicit part of thedesign that the programmer should not, and indeed cannot, be aware ofwhen/where the compiler allocates, deallocates, or copies instances ofany particular ref counted type. Therefore, semantically there can neverbe a guarantee that using const is safe, if the implicit contract isaccepted. I stress, instances and variables are not the same. Therelationship between them is supposed to be an opaque implementationdetail. That is part of the reason I do not believe in this implicitcontract about handling consts.

Now I will acknowledge that const can be used in certain limitedsituations without harm. I still stand by the claim that it issemantically incorrect to ever use const, and I say this on the basisthat if const safety depends on the automatic type implementation andthe automatic type implementation is opaque, it is theoretically wrongto believe const can ever be used. However in practical terms we do knowthat passing a string which is *created* as a local variable in aroutine cannot do harm because it will always have at least 1 refcount.


To summarize:

1. To the programmer, each AnsiString and dynamic array variable issupposed to be unique, i.e., after doing A := B, modifying A does notaffect B and vice versa.

2. The fact that multiple variables (or parameters) actually can referto a single instance is an implementation optimization. That is of noconcern to the programmer. The optimization is implemented by thecombination of reference counting and copy-on-write.

3. The programmer cannot be aware of what the compiler decides to doregarding how it implements reference counting and copy-on-write. Theprogrammer should simply know that unique variables are unique instancesfor all practical purposes (except var parameters obviously, since thatis the whole point of having to declare it var).

4. If the programmer cannot be aware of when an instance is shared withmultiple variables, an implicit contract that the programmer cannotmodify other variables which by chance are using the same instance isimpossible to obey, and therefore const would be useless.

As further support for #1, consider: With a class, you do not have topass it to a procedure as var, but when you modify the instance, youmodify the same instance as was passed to the procedure. That is howclasses work. Having a variable point to the same object as anothervariable does in fact mean it's the same instance, the same object. Thatis Object Pascal's object model. But with strings, if you want to modifythe string in a procedure and have that affect the argument initiallypassed to the procedure you *have* to use var. That alone should beconvincing evidence that the programmer is always supposed to be able toassume that unique variables are unique instances for automaticallymanaged strings and arrays.

If I may offer one other comment, I think it would be best to reservecomments on TStrings/TStringList and other libraries to a separateconversation and focus on the basic behavior of the compiler or else theconversation will get too confusing.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Const optimization is a serious bug

Reply via email to