Re: Garbage collector vs variable lifetime

Peter Duniho Sat, 07 Jun 2008 17:18:16 -0700

Date: Sat, 7 Jun 2008 23:24:22 +0100
From: "Hamish Allan" <[EMAIL PROTECTED]>


Whenever you write documentation in a natural language, there is scope
for ambiguity. This particular technical specification is only
mentioned in a single sentence:

"The root set is comprised of all objects reachable from root objects
and all possible references found by examining the call stacks of
every Cocoa thread."

Is that the actual specification for the garbage collector? Or justsome documentation provided by Apple?


In any case...

Are you seriously telling me that "all *possible* references found by
examining the calls stacks" unambiguously means "all *actual*
references found by examining the rest of the code in the block"?

I'm not telling you that at all. I don't actually know how the Obj-CGC works, but from what's been written here, it appears to me thatit's not looking at code, but rather it's looking at data.

It doesn't need to examine the code to know whether there's a valuein the stack that refers to the object. If it walks all the stackframes and doesn't find a reference to the object, then no code canpossibly be referencing the object via a local variable.

It is available for my use until the end of that scope. The factthat
I don't actually make use of it, and the optimising compiler thinks
I'm done with it: *that*'s an implementation detail.
No, it's not. It's a _semantic_ detail. If you do in fact notuse thereference, then it is in fact _unused_. Garbage collection is allabout
what's being used or not.
You'll notice that I was replying to a specific point Michael made:
"the promise that is being made is that objects which you're still
pointing to won't go away".

He can speak for himself, but I believe that statement was made incontext of the run-time status. And if the compiler optimizes out avariable, indeed it may be that you're not still "pointing to" thatparticular object.

I don't disagree that "have a pointer"
means something different to "use a pointer".

I don't think it's necessary to consider those to have differentmeanings for the behavior of the compiler to make sense, even in thecontext of garbage collection. If you have no code that "uses thepointer", then the compiler is free to use that memory location forsomething else, at which point you no longer "have a pointer". Thetwo are very much related.

Which part of the
documentation unambiguously specifies that "Garbage collection is all
about what's being used or not"?

Ahhh...please, I'm not trying to defend the documentation. If youfind something in the documentation that is misleading or incomplete,that's no surprise to me. That happens in documentation (and notjust Apple's) all the time. Especially for relatively new features.

As far as I'm concerned, the only question is whether the behavior ofthe compiler and its effect on the GC makes sense. And it's myopinion that it does.

Even if the compiler did not optimize away the
use of the variable past that point in the code, if the GC systemcouldotherwise determine you weren't ever going to use it again, itwould _still_
be valid for the GC system to collect the object.


If you could write a garbage collector that could reliably detect what
was being used, you'd have no need to specify __strong or __weak any
more, and you'd have solved the halting problem to boot. We're not
quite at that stage yet :)

So? My point is that whatever the theoretical capabilities of theGC, as long as you haven't written code that will actually use areference to an object, the GC should be allowed to collect theobject. The fact that writing such an advanced GC is theoreticallyimpossible is immaterial.


That's why we call these "hypothetical" examples.  :)

I agree with you that if it returned collected memory, it would be
more powerful than under retain/release/autorelease. But power and
straightforwardness do not always go hand in hand ;)
Well, in this case I believe they do. The real issue here is thata garbagecollection system is trying to co-exist peacefully with a non-GCsystem.
I disagree. The real issue here is that code optimisations cause
different behaviour in the GC,

Define "different". In every GC system I've ever used, there arenever any guarantees about when an object may be collected, exceptthat it won't be collected until it's no longer in use. The factthat in the debug build, an object is collected _later_ than wouldtheoretically be possible in an optimized build doesn't mean that theGC system is broken. It just means that it's non-deterministic,which is always true anyway.

whereas if the GC behaved according to
rules based on the *semantics* of what the programmer writes (i.e.
what would happen if that variable were really on the stack, rather
than optimised away into a register), there wouldn't be a problem (the
optimisation could still happen, of course, but the compiler would
flag that reference as strong).

We will simply have to agree to disagree. I am not of the opinionthat the scoping of a variable is a declaration of semantics. That'ssyntax.

Maybe it's just because I've been using GC systems more than youhave, and so I've gotten used to thinking about memory allocations inthe paradigm that GC systems impose. Maybe this is a subjectivequestion that cannot be answered. But you aren't going to get me toagree that scoping of a variable is a semantic quality of the code.My opinion is that it's how the variable is actually used, not howit's declared, that defines the semantics of its use.

[...]
That said, even in .NET (for example), the GC system has a"GC.KeepAlive()"method for this very purpose. It doesn't actually do anything,but itinterrupts optimizations the compiler might make that wouldotherwise allowan object to be collected (it would be used exactly as theproposed "[dataself]" call would be). This is to allow for situations where theonlyreference to the object is in "unmanaged" code -- that is, it'sbeen passed
across the interface from .NET to the older non-GC'ed Windows API.
Again, I think that signalling to the GC that you don't want the
object collected by creating a stack variable reference to it --
whether or not that variable ever actually ends up on the stack due to
optimisations -- is quite enough.

If that's what you were signaling to the GC system, then yes...thatwould be enough. But that's not a signal to the GC system at all.Hence the "problem".

No need for GC.KeepAlive(), [data
self], CFRetain(), disableCollectionForPointer, or any other hack.

Nope. Even if you "fixed" the "optimized-away" problem, you'd stillhave other problems. That's because the root of this problem isn'tthe optimizing compiler. It's the fact that garbage collection isbeing used alongside other forms of memory management. It's onlybecause there are things that are allocated outside of the garbagecollection system that this even comes up.

[snip]
Not only is it not a bug for the compiler to not concern itselfwith theissue, the fact is that the extant example here is just the tip oftheiceberg. While you might argue that the compiler _could_ preventthisparticular problem, a) it requires the compiler to get into thebusiness of
memory management
The compiler is already in that business -- hence the modifiers
__strong and __weak.

Not really. In fact, if anything those are proof that the compileris _not_ in the business of memory management. Instead, it providesa way for YOU to give it information that is used to control memorymanagement. If the compiler were in the business of memorymanagement, it would infer those properties itself.

and b) there will still be lots of other scenarios that
are similar, but not solvable by the compiler.


Could you please elaborate on this?


I'm surprised I need to.

But, consider the NSData example we've been using. Suppose that thebyte* in question weren't just used locally, but rather passed tosome other data structure. Even if we "fixed" the compiler so thatit didn't optimize away the variable referencing the "data" object,when that variable goes out of scope as the method returns, the byte*still would potentially be released, as the "data" object is.

As I said before, the original example is simply a subset of a moregeneral class of problems. These problems come up any time you mixGC and non-GC, because GC-able objects may reference non-GC-ableobjects or vice a versa. The former is a problem because a GC-ableobject could wind up releasing a resource that someone else was usingwhen itself gets released, and the latter is a problem because the GC-able object could be referenced in a location that the GC systemdoesn't know to look.

Either way, you get some data getting released while it's stilltheoretically in use.

A pure GC system would never have this problem, because it has acomplete view of the memory referencing situation.

Sure, you could design NSData differently to mask a design problem in
GC. But GC won't be easier to use than retain/release/autorelease
without simple rules like "if you declare it on the stack, it's in the
root set, regardless of whether the compiler optimises it into a
register".

Well, I disagree. It's already easier to use, without a contrivedrule like that.

The design change required for NSData isn't for the purpose ofmasking a "problem in GC". There's not a problem in GC, there's aproblem with NSData. It should not be returning a reference thatcould go away at some time in the future.

Solving this problem is non-trivial, at least if I understand theother comments about what kinds of references NSData could return.If it's just returning a random block of allocated memory, then thefix would be as simple as making that block also allocated by the GCsystem. But if the reference can be, for example, a pointer to amemory-mapped file, then some additional housekeeping needs to happenin order to make sure that a) the pointer isn't invalidated when theNSData object is released, and b) the pointer _is_ invalidated whenit's really and truly no longer used.

Consider the following:

if ([data length] > 2)
{
  char *cs = (char *)[data bytes];
  char c = *cs; // statement 1
  void *p = (void *)data; // statement 2
  NSLog("First character of data object at %p is %c\n", p, c);
}

Now, who is to say that it wouldn't suit the optimising compiler to
switch round statements 1 and 2, which it can plainly see have no
dependency on one another? So referencing "data" to ensure it remains
alive after I reference its inner pointers doesn't necessarily work.

No one said anything about _referencing_ "data". You have to _use_it in a way that the compiler _can't_ rearrange. That's whysomething like "[data self]" works, but just assigning the value tosome other variable doesn't.

I agree with you that there are a variety of solutions. I'm just
proposing one that I think makes memory management more
straightforward for the programmer than any others I've heard so far.
If you have any specific objections to it, I'd like to hear them.

By "proposing one", you mean changing the compiler so that it doesn'toptimize away the variable?

My specific objection to that is that it's a valuable optimization,and there's no good reason for the mere fact that GC is in use tomean that we do without the optimization.

And it seems to me that with GC being a relatively new addition toObj-C,that the likelihood of running into such situations is going to begreaterthan in a more-evolved environment. It's just something thatneeds to bekept in mind, just as in the older retain/release paradigm therewere anumber of rules that needed to be kept in mind. IMHO, inasmuch asthe needto keep this particular issue in mind comes up less frequentlythan theretain/release rules needed to, the GC system is moreaccommodating (thoughI suppose it also means it's harder to get used to keeping therule in mind
:) ).
I think it also means it'll be harder to track certain bugs down. And
unnecessarily so!

Having garbage collection introduces a whole new class of bugs, yes.But it also removes a whole other class of bugs. Frankly, I findthat the class of bugs that it removes are WAY more common than thoseit introduces, and as the rest of the framework catches up to the GCparadigm, this will only be more and more true.


YMMV.

Pete
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Re: Garbage collector vs variable lifetime

Reply via email to