Re: Garbage collector vs variable lifetime

John Engelhart Sun, 08 Jun 2008 22:03:35 -0700


On Jun 7, 2008, at 10:10 PM, Michael Ash wrote:

On Sat, Jun 7, 2008 at 6:37 PM, Hamish Allan <[EMAIL PROTECTED]> wrote:

On Sat, Jun 7, 2008 at 7:35 PM, Michael Ash <[EMAIL PROTECTED]>wrote:
This is pretty nitpicky. If it's in scope but you don't use it, then
it doesn't matter. Kind of like a Zen koan: if an object iscollected
in the forest but nobody is pointing to it anymore, does it make a
sound?
:)

I'm just arguing that it makes it more straightforward if the GC
behaves deterministically according to the code you wrote, ratherthan
what the optimising compiler re-wrote for you.


If you don't like undefined behavior, then C-based languages are a
poor choice. If you don't like nondeterministic object lifetimes, then
garbage collection is a poor choice.

If your statement regarding nondeterministic object lifetimes is true,and (as I think it has been shown) deterministic object lifetimes issometimes required for deterministic program behavior, does this notimply that the current GC system is fundamentally flawed?

I think Hamish is right. It is reasonable to expect that code, asentered, results in deterministic behavior and that the 'principle ofleast surprise' holds. When programming for multithreading, oneimplicitly accepts that common programming techniques and their cause -> effect relationship may no longer be valid. It requires a completechange in discipline to account for these effects if one hopes toproduce code that executes deterministically. Otherwise, one isexposed to 'race conditions' in which things work correctly 99% of thetime, but occasionally they fail.

It's difficult not to see similarities in multithreading programmingand programming with Leopards GC system. If one uses Leopards GCsystem without compensating for these 'non-deterministic object lifetimes', one essentially creates a race condition. 99% of the time,these race conditions wont result in abnormal program behavior, butevery once in a while the conditions will be such that the collectorwill run and reclaim an allocation that is still in use.

In the case of NSData/NSMutableData, the relationship between theparent object and the pointer returned by bytes/mutableBytes isobvious. The parent object has an ivar pointer to the bytesallocation, so keeping the parent object alive keeps the bytesallocation alive.

In the case of NSString and UTF8String, there is no suchrelationship. Since there is no pointer from the parent NSString tothe created UTF8String allocation, keeping the parent NSString 'live'does not keep the pointer to the UTF8String live.

Under GC, something as simple as [[NSMutableData dataWithLength:4096]mutableBytes] can't be used. It must be restructured such that theobject instantiation statement be assigned to a variable and not be'anonymous'.

Even this has pitfalls. Leopards GC system considers all pointers onthe stack to be live, and any pointers in the heap must be updated viaa write barrier. In order to assist the compiler in identifying whichpointers require write barriers, __strong is introduced. It'simportant to note, though, that the specification and definition ofthe C language is no way requires a stack, and the statement'{ NSMutableData *data = [NSMutableData dataWithLength:4096]; }' doesnot imply in any way that the variable 'data' will exist on the stackby the C language definition.

It's clear that the C language definition of a pointer and thedefinition of a GC pointer under leopard are close, but notnecessarily the same. A very small addendum to the rules along thelines of 'A __strong pointer will remain visible to the GC system fromthe point in which it is defined until the end of its enclosingblock.' would neatly solve an awful lot of issues. This one changewould result in generated code that matched expected behavior vs. thecurrent C pointer rules which allow the optimizer to consider it'dead' at the point of last use.

It does not fix the case of UTF8String, though, as the variablecontaining the pointer is 'char *'. In fact, the only way I can thinkof using the pointer returned by UTF8String that is deterministic andapproximates the old autorelease rules is something like (uses C99):


{
  [[NSGarbageCollector defaultCollector] disable];
  char *utf8String = [theString UTF8String];
  size_t utf8StringLength = strlen(utf8String);
  char utf8StringCopy[utf8StringLength + 1];
  memcpy(&utf8StringCopy[0], utf8String, utf8StringLength);
  utf8StringCopy[utf8StringLength] = 0;
  [[NSGarbageCollector defaultCollecor] enable];

  // utf8StringCopy is valid until this block ends..
}

As it stands, this is really the only bullet proof way of using thepointer returned by UTF8String. One can say with certainty thatutf8StringCopy is valid under all uses, to any function, method, or C-only library function until the end of the enclosing block, regardlessof the behavior of the GC collector.

Using the raw UTF8String pointer is far, far more convenient, and verylikely to work 99.99% of the time without problems.


Convenience, however, rarely trumps correctness.


The compiler and garbage collector both do their jobs. They keep the
object around as long as it is referenced, after which it becomes a
candidate for collection. The trouble is just that where it stops
being referenced and where you think it should stop being referenced
are not the same place.

I think this argues for the case that __strong pointers should notnecessarily be treated as equivalents to regular pointers. As Ipreviously suggested, appending the pointer rules such that 'A__strong pointer will remain visible to the GC system from the pointin which it is defined until the end of its enclosing block.' wouldseem to better reflect peoples expected behavior, rather than actualbehavior.

This really highlights the danger of bolting such features on to anexisting compiler and language. There are decades of extremely subtleimplied invariants built in to the assumptions used to code the GCCcompiler, especially when it comes to optimization transformations.Some of these are no longer true when it comes to __strong pointers,or they create undesirable, subtle side effects.

Now it's your turn -- where is the problem with the compiler marking
references you have *semantically* placed on the stack as strong,
whether or not they really end up on the stack after optimisation?


The problem is that this proposal doesn't make any sense given the
architecture of the Cocoa GC. The GC considers the stack plus
registers of each thread to be one big blob which it then scans for
anything that looks like a pointer. There's no way to mark anything as
being "strong", because the collector considers everything on the
stack to be strong. Even if this were to be resolved it still wouldn't
help because the problem isn't that the data pointer isn't strong, the
problem is that the data pointer *goes away*. No amount of annotation
will fix that; you have to change the compiler to keep object pointers
on the stack through the end of the current scope, and if you make
that change then the annotation is unnecessary anyway.

Actually, one has to tackle the issue of "places it on the stack" aswell, since the language itself does not specify nor require the useof a stack. One would really need a new storage class specifier, muchlike 'auto' and 'register', with the obvious candidate being 'stack'.As it stands, block local auto __strong pointers are implied to resideon the stack, not explicitly required to reside on the stack.Otherwise, one is stuck with a very subtle coupling of languagespecification to implementation requirement which I think shouldremain distinct. In fact, it's a subtle coupling of languagespecification to the specific implementation details of Leopards GCsystem.

Granted, this is a pedantic point, but I think it's important to beexplicit in such matters. This whole thread has really shown thatthere is an uneven application of the 'rules' when it comes to GC. Italso highlights the fact that the current documentation regarding theGC system is woefully inadequate when it comes to resolving some ofthese finer, nuanced points.

I disagree. Since the programmer cannot possibly know the state ofthe
call stack in any other way than by knowing that the compiler must
keep references to objects right up to the point where those objects
are no longer used, he must not make any assumption as to thelifetime
of those objects beyond that point.


But who is to say the compiler won't make optimisations which alter
when the object is last referenced? (See my reply to Peter for a code
example.)


As Chris Hanson pointed out, the compiler cannot move function or
method calls without changing the underlying semantics of the code, so
you're guaranteed to be safe by doing a [data self] or equivalent at
the end of the loop. You can also, of course, use CFRetain/CFRelease
to more explicitly manage its lifetime.

As I pointed out, even this is not necessarily true. The GCC__attributes(()) of 'const' and 'pure' can render this assumptioninvalid if they were to be extended to objc methods (and since Irealized that I haven't actually tried it, just might be).

Also, a strict reading of the documentation for CFRetain and CFReleasegive no indication of their behavior under GC. In the absence ofanything explicit, I would think that the standard 'toll-freebridging' rules apply, and therefore CFRetain and CFRelease actuallybecome essentially empty function calls. I would tend to think thatit is much more appropriate to use the methods provided via theNSGarbageCollector class: enableCollectorForPointer: anddisableCollectorForPointer:.

If you're doubtful about this, consider what would happen in a boring
non-collected environment if the compiler were allowed to move this
stuff around. A [obj release] or free(ptr) could be moved to before
code which accesses the object or pointer, which would of course
result in disaster.

Ah, but the devil is in the details. The result from [data self] isinvariant, it does not matter if it is executed immediately after theobject is instantiated or just before [data release], the result isalways the same. If the compiler can glean this fact through eitherexplicit means such as -(id)self __attribute((const)); or viaoptimization introspection, then it can be subject to commonsubexpression elimination or loop invariant code motion, even deadcode elimination. Since -(id)self causes no side effects, and theresult of which is presumably unused in our hypothetical example, thestatement accomplishes no real work and can be eliminated.

Consider the fact that '[data self]' essentially becomes 'id self(idself, SEL _cmd) { return(self); }' If one were to take the LLVM ideasto their hypothetical logical conclusion, or in other words a 'A runtime continuous inter procedure optimizing compiler', the compilerwould have everything at its disposal to make this induction all byitself.

So, in the specific case of '[data self]', a sufficiently informedoptimizing compiler can cheerfully mark this statement as dead code,negating its 'object life time extending' effects.

[data self] just so happens to be trivial enough that your statementsmight not hold with changes / advances in the compiler. Yourstatements stand for anything more complicated.

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Re: Garbage collector vs variable lifetime

Reply via email to