I've been reading over some of the recent discussion about reference counting problems with const string parameters. I've done some experiments and I believe that the so-called const optimization is a serious flaw, not just a corner case of questionable legitimacy. I have some sample code I will show which should be quite scary. Additionally, this is a security vulnerability. It is also a quiet bug, because it may go undetected for a long time but randomly result in unreproducible crashes.
Consider: * It does not affect just global variables but also object fields. While global variables are used rarely, object fields are used constantly in OOP. * It does not affect just strings but also interfaces and dynamic arrays. Of course, any compiler optimization is based on certain logic. The compiler can determine certain truths about the code at compile time and use those truths to avoid unnecessary operations. The basic logic behind const optimization is incorrect I believe, so I consider this a bug. Forgive me if I am mistaken, and please offer corrections, but I believe the logic to be as follows, and that it is incorrect, as I will explain. I think the logic for const optimization would go something like this: 1. Definition: A reference count is the number of variables that point to the same instance of something (strings, dynamic arrays, and interfaces). 2. By definition: The reference count will decrease whenever a variable stops pointing to that instance. This can occur if: (a) the variable goes out of scope, (b) the variable is reassigned to a different instance, or (c) the content of the variable is modified, thus triggering copy-on-write and creating a new instance, effectively triggering (b). 3. Definition (of const): Within the procedure a const parameter cannot be reassigned, it cannot be modified, and (of course) it does not go out of scope while the procedure is running. 4. Deduction from 2 and 3: During the time the procedure is running, the reference count will always be greater than or equal to the reference count when the procedure first began running. 5. Assertion (background knowledge): The reference count of the instance was greater than zero before it was passed to the procedure. 6. Deduction from 4 and 5: The reference count cannot become less than or equal to zero while the procedure is running. 7. Assertion (background knowledge): An instance's memory is deallocated only when its reference count is decremented to zero. 8. Deduction from 6 and 7: The instance cannot be freed while the procedure which received it as a const parameter is running. 9. Implication from 8: There is no need to increment and decrement the reference count upon entering and leaving the procedure. I may have glossed over a couple details but I think this is sufficiently clear. The logic is incorrect. The problem appears to be in step 4. The logic sort of conflates the notions of "instance" and "variable". While it is true that the variable, which is a const parameter, cannot be reassigned, it is NOT true that the instance itself cannot be modified. In fact other variables that refer to the instance can be changed, and the refcount can decrease. We already know this; the reason I went through all the logic is because I want to emphasize that the "optimization" of not updating the refcount for const parameters is fundamentally wrong. It is not just a corner case or a slight oversight in the language design; it is a full-fledged bug. Going back to what I said earlier, we all recognize a compiler optimization is based on determining certain truths about the code at compile time. Well, the truth behind const optimization isn't true. The question, then, is: how serious is it? A few things to consider: * It affects all OOP, even if you're not using any global variables. * It affects interfaces and dynamic arrays as well. * It is a SECURITY vulnerability. This bug makes it possible to write to memory that has been deallocated and possibly reallocated to a different use. This is a serious security flaw. * It is difficult, if not impossible, to detect. There are often enough other references to an instance that the refcount will not actually get down to zero. So the problem does not show up most of the time. But whether or not there are other references can depend on the logic of the program, user input, etc., and is essentially unpredictable. * When it happens, it will be almost impossible to diagnose. Careful examination of the code will reveal nothing. Since the bug can be triggered rarely and essentially at random, careful testing and step-by-step debug tracing may still reveal nothing. * Even when the memory is deallocated prematurely, it won't necessarily crash. Actually, the program will often continue running with the deallocated memory and appear to be working fine until that memory is allocated for something else. Thus when the program finally does crash, the original cause of the crash may have happened long ago and have no direct connection to the place where the fault finally occurred. * While examples demonstrating the behavior tend to be simple, and so it may seem that the problem should have been obvious to the programmer, in real use the problem can be nested deep in the interactions of many layers of code. It is impossible for a programmer to confidently write a correct program in Free Pascal and believe it will work. I have prepared some sample code to demonstrate how easily this can happen without anyone noticing and how subtle it can be. Please give it a try: http://vobarian.com/downloads/RefCountCrashDemo.zip Be sure to read the _README.txt file for instructions on how to do the demo. I also have sample code of crashing with global variables, interfaces, and dynamic arrays. If anyone would like to see it just let me know and I'll upload it. The next question is: what can be done? Several schemes have been proposed to test for the error. However, it seems likely that some of these may impose more overhead than simply eliminating the original "optimization". The safest and cleanest thing to do is to remove the incorrect const optimization. However then we have to worry about performance. The performance impact depends entirely on the application. I have done several string manipulation performance benchmarks, which I can also share if anyone is interested. The brief results are (in arbitrary time units; lower is better): Palindrome Testing: 166 no const 170 with const const is 2.3% slower Word Wrapping: 1318 no const 1297 with const const is 1.6% faster XML Processing (counting nodes & loading into TMemDataset, using customized XML libraries): 668 no const 575 with const const is 13.9% faster All three benchmark programs are almost entirely string manipulation. The results are somewhat inconclusive. These were brief tests, so more rigorous and well-thought-out testing would be needed to give an accurate impression. The palindrome test was consistently faster (I ran each test many times) without const, but only by a tiny bit. I can't figure out why. I probably overlooked something in the code. The XML processing takes a serious hit, but as with the palindrome testing, it was a quick test so I may have overlooked some cause of inefficiency. Of course XML is disgustingly inefficient anyway, but that's beside the point :) One thing that a performance test must also take into account is whether there are other variables that cause a try-finally to be set up. Since the overhead appears to mostly be in the try-finally, for procedures where there would have to be one anyway, the use of const probably doesn't make much difference. My knowledge of FPC internals is almost none, so I can't really offer a good suggestion. My best proposal would be to eliminate the "optimization" and come up with a much lighter form of exception handling to make up for the speed. I am skeptical of any attempt to keep the optimization but check for errors because I think there is a good chance that there will be some oversights, some cases nobody thought of, not to mention the fact that it increases complexity. - Chad _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel