On Jun 7, 2008, at 10:10 PM, Michael Ash wrote:

On Sat, Jun 7, 2008 at 6:37 PM, Hamish Allan <[EMAIL PROTECTED]> wrote:
On Sat, Jun 7, 2008 at 7:35 PM, Michael Ash <[EMAIL PROTECTED]> wrote:

This is pretty nitpicky. If it's in scope but you don't use it, then
it doesn't matter. Kind of like a Zen koan: if an object is collected
in the forest but nobody is pointing to it anymore, does it make a
sound?

:)

I'm just arguing that it makes it more straightforward if the GC
behaves deterministically according to the code you wrote, rather than
what the optimising compiler re-wrote for you.

If you don't like undefined behavior, then C-based languages are a
poor choice. If you don't like nondeterministic object lifetimes, then
garbage collection is a poor choice.

If your statement regarding nondeterministic object lifetimes is true, and (as I think it has been shown) deterministic object lifetimes is sometimes required for deterministic program behavior, does this not imply that the current GC system is fundamentally flawed?

I think Hamish is right. It is reasonable to expect that code, as entered, results in deterministic behavior and that the 'principle of least surprise' holds. When programming for multithreading, one implicitly accepts that common programming techniques and their cause - > effect relationship may no longer be valid. It requires a complete change in discipline to account for these effects if one hopes to produce code that executes deterministically. Otherwise, one is exposed to 'race conditions' in which things work correctly 99% of the time, but occasionally they fail.

It's difficult not to see similarities in multithreading programming and programming with Leopards GC system. If one uses Leopards GC system without compensating for these 'non-deterministic object life times', one essentially creates a race condition. 99% of the time, these race conditions wont result in abnormal program behavior, but every once in a while the conditions will be such that the collector will run and reclaim an allocation that is still in use.

In the case of NSData/NSMutableData, the relationship between the parent object and the pointer returned by bytes/mutableBytes is obvious. The parent object has an ivar pointer to the bytes allocation, so keeping the parent object alive keeps the bytes allocation alive.

In the case of NSString and UTF8String, there is no such relationship. Since there is no pointer from the parent NSString to the created UTF8String allocation, keeping the parent NSString 'live' does not keep the pointer to the UTF8String live.

Under GC, something as simple as [[NSMutableData dataWithLength:4096] mutableBytes] can't be used. It must be restructured such that the object instantiation statement be assigned to a variable and not be 'anonymous'.

Even this has pitfalls. Leopards GC system considers all pointers on the stack to be live, and any pointers in the heap must be updated via a write barrier. In order to assist the compiler in identifying which pointers require write barriers, __strong is introduced. It's important to note, though, that the specification and definition of the C language is no way requires a stack, and the statement '{ NSMutableData *data = [NSMutableData dataWithLength:4096]; }' does not imply in any way that the variable 'data' will exist on the stack by the C language definition.

It's clear that the C language definition of a pointer and the definition of a GC pointer under leopard are close, but not necessarily the same. A very small addendum to the rules along the lines of 'A __strong pointer will remain visible to the GC system from the point in which it is defined until the end of its enclosing block.' would neatly solve an awful lot of issues. This one change would result in generated code that matched expected behavior vs. the current C pointer rules which allow the optimizer to consider it 'dead' at the point of last use.

It does not fix the case of UTF8String, though, as the variable containing the pointer is 'char *'. In fact, the only way I can think of using the pointer returned by UTF8String that is deterministic and approximates the old autorelease rules is something like (uses C99):

{
  [[NSGarbageCollector defaultCollector] disable];
  char *utf8String = [theString UTF8String];
  size_t utf8StringLength = strlen(utf8String);
  char utf8StringCopy[utf8StringLength + 1];
  memcpy(&utf8StringCopy[0], utf8String, utf8StringLength);
  utf8StringCopy[utf8StringLength] = 0;
  [[NSGarbageCollector defaultCollecor] enable];

  // utf8StringCopy is valid until this block ends..
}

As it stands, this is really the only bullet proof way of using the pointer returned by UTF8String. One can say with certainty that utf8StringCopy is valid under all uses, to any function, method, or C- only library function until the end of the enclosing block, regardless of the behavior of the GC collector.

Using the raw UTF8String pointer is far, far more convenient, and very likely to work 99.99% of the time without problems.

Convenience, however, rarely trumps correctness.


The compiler and garbage collector both do their jobs. They keep the
object around as long as it is referenced, after which it becomes a
candidate for collection. The trouble is just that where it stops
being referenced and where you think it should stop being referenced
are not the same place.

I think this argues for the case that __strong pointers should not necessarily be treated as equivalents to regular pointers. As I previously suggested, appending the pointer rules such that 'A __strong pointer will remain visible to the GC system from the point in which it is defined until the end of its enclosing block.' would seem to better reflect peoples expected behavior, rather than actual behavior.

This really highlights the danger of bolting such features on to an existing compiler and language. There are decades of extremely subtle implied invariants built in to the assumptions used to code the GCC compiler, especially when it comes to optimization transformations. Some of these are no longer true when it comes to __strong pointers, or they create undesirable, subtle side effects.

Now it's your turn -- where is the problem with the compiler marking
references you have *semantically* placed on the stack as strong,
whether or not they really end up on the stack after optimisation?

The problem is that this proposal doesn't make any sense given the
architecture of the Cocoa GC. The GC considers the stack plus
registers of each thread to be one big blob which it then scans for
anything that looks like a pointer. There's no way to mark anything as
being "strong", because the collector considers everything on the
stack to be strong. Even if this were to be resolved it still wouldn't
help because the problem isn't that the data pointer isn't strong, the
problem is that the data pointer *goes away*. No amount of annotation
will fix that; you have to change the compiler to keep object pointers
on the stack through the end of the current scope, and if you make
that change then the annotation is unnecessary anyway.

Actually, one has to tackle the issue of "places it on the stack" as well, since the language itself does not specify nor require the use of a stack. One would really need a new storage class specifier, much like 'auto' and 'register', with the obvious candidate being 'stack'. As it stands, block local auto __strong pointers are implied to reside on the stack, not explicitly required to reside on the stack. Otherwise, one is stuck with a very subtle coupling of language specification to implementation requirement which I think should remain distinct. In fact, it's a subtle coupling of language specification to the specific implementation details of Leopards GC system.

Granted, this is a pedantic point, but I think it's important to be explicit in such matters. This whole thread has really shown that there is an uneven application of the 'rules' when it comes to GC. It also highlights the fact that the current documentation regarding the GC system is woefully inadequate when it comes to resolving some of these finer, nuanced points.


I disagree. Since the programmer cannot possibly know the state of the
call stack in any other way than by knowing that the compiler must
keep references to objects right up to the point where those objects
are no longer used, he must not make any assumption as to the lifetime
of those objects beyond that point.

But who is to say the compiler won't make optimisations which alter
when the object is last referenced? (See my reply to Peter for a code
example.)

As Chris Hanson pointed out, the compiler cannot move function or
method calls without changing the underlying semantics of the code, so
you're guaranteed to be safe by doing a [data self] or equivalent at
the end of the loop. You can also, of course, use CFRetain/CFRelease
to more explicitly manage its lifetime.

As I pointed out, even this is not necessarily true. The GCC __attributes(()) of 'const' and 'pure' can render this assumption invalid if they were to be extended to objc methods (and since I realized that I haven't actually tried it, just might be).

Also, a strict reading of the documentation for CFRetain and CFRelease give no indication of their behavior under GC. In the absence of anything explicit, I would think that the standard 'toll-free bridging' rules apply, and therefore CFRetain and CFRelease actually become essentially empty function calls. I would tend to think that it is much more appropriate to use the methods provided via the NSGarbageCollector class: enableCollectorForPointer: and disableCollectorForPointer:.

If you're doubtful about this, consider what would happen in a boring
non-collected environment if the compiler were allowed to move this
stuff around. A [obj release] or free(ptr) could be moved to before
code which accesses the object or pointer, which would of course
result in disaster.

Ah, but the devil is in the details. The result from [data self] is invariant, it does not matter if it is executed immediately after the object is instantiated or just before [data release], the result is always the same. If the compiler can glean this fact through either explicit means such as -(id)self __attribute((const)); or via optimization introspection, then it can be subject to common subexpression elimination or loop invariant code motion, even dead code elimination. Since -(id)self causes no side effects, and the result of which is presumably unused in our hypothetical example, the statement accomplishes no real work and can be eliminated.

Consider the fact that '[data self]' essentially becomes 'id self(id self, SEL _cmd) { return(self); }' If one were to take the LLVM ideas to their hypothetical logical conclusion, or in other words a 'A run time continuous inter procedure optimizing compiler', the compiler would have everything at its disposal to make this induction all by itself.

So, in the specific case of '[data self]', a sufficiently informed optimizing compiler can cheerfully mark this statement as dead code, negating its 'object life time extending' effects.

[data self] just so happens to be trivial enough that your statements might not hold with changes / advances in the compiler. Your statements stand for anything more complicated.
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to