On 11 Sep 2010, at 00:21, Chris Ball wrote: > Whilst I understand what you are saying, my practical experience with our code > suggests there are subtleties perhaps within GNUstep that make this assumption > unsafe.
If there are, then the bugs are in other code. Running the clang static analyser found a few potential user-after-release bugs in GNUstep, so it's possible. > Thread 1: > result = GSAtomicDecrement((gsatomic_t)&(((obj)anObject)[-1].retained)); > where result = -1 The object's last reference has just gone away. There should now be no pointers to the object. If there are, then it's an application bug. > now switch to Thread 2: > result = > GSAtomicDecrement((gsatomic_t)&(((obj)anObject)[-1].retained)); > if (result < 0) > { > if (result != -1) > { > [NSException raise: NSInternalInconsistencyException > format: @"NSDecrementExtraRefCount() decremented too far"]; > } > > Death due to exception. Correct. You have done something like this: thread 1: a = [objc new]; ... [a release]; thread 2: [a release]; If the retain count of an object ever drops below 0, this means that it has been sent one too many release messages. As I said, the purpose of the exception is to try to make debugging this easier. In a single-threaded application, or the same application used a single CPU or where the threads didn't overlap in this way, then the sequence would be: thread 1 sends -release. thread 1 sends handles -dealloc thread 1 object is destroyed. thread 2 -release message is sent to invalid address. Program crashes. This kind of bug is masked by using the GNUstep DESTROY() macro on a global (or an ivar on an object that is shared between two threads). In this case, you have something roughly equivalent to this: [a release]; a = nil; If two threads do this with a very small gap between them, nothing will go wrong. The second thread will send -release to nil, where it will be ignored. If, however, the preemption happens between these two lines, the second thread will send -release to an object with a retain count of 0. It is possible to write a lockless, thread-safe, version of DESTROY(), but you can not use the GNUstep one (or something equivalent) in this way. > Other variations of course exist, ie thread one can get through the first two > if's and switch to the other thread just after resetting the count to zero, > now > thread one will return YES as will thread 2. Again, only possible if there is a bug outside this code. This code is not intended to catch all possible bugs in user code, just the small subset that it is possible to catch. > Further exercises are left to the reader. > > If your contention is that only one thread is managing the retain count and > releasing an object, then you don't need an atomic decrement at all but that > clearly isn't true given that any thread can do retain/release an otherwise > autoreleased object. That is not my contention at all. > I should point out that this isn't academic, we have a home rolled database > that has used GNUstep since around 2002 and processes between 500 million and > a > billion messages per day and before I disabled this GSATOMIC stuff we would > process tops around 30 million messages before the system fell over in one > weird way or another (this on a 32 core machine). In our application the > place > I most consistently found the code to fail is where we are transferring > messages > via a queue from one thread to another so we have an autorelease message > created in thread 1, which is then retained for transfer to the queue (it is > retained again inside the queue because the queue itself is threadsafe) then > the other thread pulls the message from the queue (queue releases the message) > we do stuff with it and then call release (at which point we don't care about > the message anymore) and presumably thread one reaps the autoreleased memory. Sounds like you have an issue where the object is already over-autoreleased. Try running the clang static analyser on your code. > Now I'm not proud, this system all told, is around 300k lines of code, of > course > it has bugs, having said this the program doesn't leak the queued messages (we > don't have terrabytes of ram so we would notice) and I would expect wild > double > releases in our code to have at least caused a few otherwise unexplained > application crashes over the years given that the place that fails is called > once per message, actually I wouldn't expect the system to last more than a > minute or two tops given the memory churn that the system goes through. > > We played with a couple simple gnustep programs using the Intel thread checker > tool (http://software.intel.com/en-us/intel-thread-checker/) and disabling > GSATOMIC stopped quite a few strange errors in the GNUStep internals from > occurring for what it's worth. Disabling GSATOMIC will put you on a very slow path, where every retain or release involves acquiring and releasing a lock. It will make a lot of bugs disappear simply because it will enforce a lot of synchronisation on your code. Every retain / release message will become a sync point. >>> Thank goodness for the NSInteralInconsistencyException because this would >>> have >>> been a bear to find otherwise. Although very random/crashy behaviour seemed >>> about as likely as getting the exception. >> >> The exception is there for debugging. We don't guarantee catching >> double-free bugs using this mechanism, because if it happened in a >> non-concurrent system we'd already have free'd the memory that the retain >> count is stored in, giving a segfault. > > There is no double free bugs that I'm aware of and since the actual memory > release happens outside of the locks in the non-GSATOMIC version of the code > double frees could certainly still cause problems if they are present. There are no double-free bugs in any code that people are aware of. They are trivial to fix, once you know where they are... David -- Sent from my brain _______________________________________________ Gnustep-dev mailing list Gnustep-dev@gnu.org http://lists.gnu.org/mailman/listinfo/gnustep-dev