On Wed, Oct 30, 2013 at 6:26 PM, Ben Kloosterman <[email protected]> wrote:

> On Thu, Oct 31, 2013 at 2:05 AM, Jonathan S. Shapiro <[email protected]>wrote:
>
>> On Tue, Oct 29, 2013 at 10:24 PM, Ben Kloosterman <[email protected]>wrote:
>>
>>> I Think by the time you add a header you have added a lot of complexity
>>> for little gain. eg C Interop or mapping directly to structures such as IP
>>> headers become Impossible.  You can add the header to the top but then you
>>> need to find the base object all the time and access it..certainly not
>>> elegant.
>>>
>>
>> In the usual implementation, the object reference points to the first
>> content word of the object, not to the header. I'm also not sure what the
>> interop problem really is here, since C doesn't have (and cannot express)
>> objects to begin with. Finally, when you are dealing with C you are dealing
>> with pointers, not object references. That's a whole different kettle of
>> fish, because pinning and liveness-dominance requirements apply in that
>> case.
>>
>>
>> But there is no usual implimentation for an internal object header..
>

Yes. I suppose I meant "in the usual implementation of exterior objects".



> ...where do you put them ?
>

You enumerate a bunch of choices, but there is really only one that works
in practice, because write barriers need to be *super* cheap, and there is
no way to identify interior pointers easily enough to do it in the inline
portion of the write barrier.

Once interior references are running around loose, they will show up in all
sorts of places where they aren't expected. So if we admit them in escaping
form, we need to have a write barrier that works without regard to whether
the object reference is interior or exterior.

So either *all* objects (interior and exterior) have a GC header at
obref[-n], or the write barrier can't examine the object header. And that's
mostly fine. Except that deferred reference counting needs the 'M' bit, and
handling forwarding needs the 'F' bit, so I'm not convinced that getting
rid of the header is an option. If we declare that *all* objects have a
vTable slot then I guess we could use the low-order bits, but that entails
layout issues too, because not all objects have vTable pointers. Notably:
CLR value types do not carry a /this/ pointer.

I wrote yesterday that by-ref is an interior object reference in disguise
(at the implementation level). This is incorrect.  I was wondering how C#
got away with this, and the answer is sneaky:

   1. All unboxed types are value (struct) types
   2. Within a method of a value type 'vt, /this/ is not an object
   reference. It has type REF 'vt. Which means that an inner reference cannot
   leak.
   3. Structs cannot derive from each other in either direction. The
*only* derivation
   is from System.ValueType. This means that there can only be one vTable for
   a given struct type, so the vTable pointer isn't needed, even if the struct
   overrides virtual methods of System.Object

It's a clever set of rules, but even with these rules they ended up needing
to box and unbox the structs a lot.



>
>>
>> The C# method of storing no interior references has some merit . Value
>>> types are copied by value  , they can be passed by reference ( register)
>>>  but are never stored as a reference.
>>>
>>
>> It helps to use "object reference" and "by reference" when you are
>> discussing the two together. They mean completely different things, and
>> it's really easy to get the discussion confused.
>>
>
> yep ,, we also  need a better term for non boxed array ..as these need not
> be in an object  maybe embedded array. And value types embedded objects ?
>

We *have* a better term: array. The boxed version of an array should be
called a vector. Unfortunately the good people at Microsoft have a limited
appreciation of lexicography.

Actually, it's not whether the array is boxed or unboxed that's mainly at
issue. The main issue is whether the array is dynamically or statically
sized. Where we need to disambiguate, perhaps we could use "fixed array"
and "dynamic array".



> Unfortunately, this is *exactly* the case we need to be concerned about,
>> and the C# solution doesn't help. From the implementation perspective,
>> passing by reference is just syntactic sugar for an [potentially interior]
>> object reference. You know that there is a dominating exterior object
>> reference that is live, so you don't need to *mark* the by-ref
>> references, but in a relocating collector you still need to relocate them.
>> That can be done with fat pointers, I suppose. As you note, one advantage
>> of by-reference is that it is a non-leaking reference, so you don't have to
>> worry about them propagating into the heap.
>>
>
> C# is relocating and  I dont think  uses fat  pointers. Maybe because gen0
> and gen1 collections ( which are fast ) still stop the world so they can
> just scan the stack and find these reference ( we have the object reference
>  and object size ) so you can either go the fat pointer / slice route or
> maybe search the stack for these  ( a bit like your find the block and work
> back but in this case you search  the stackmap)
>

Yeah. So I looked, and it turns out that they got this right, and I'm
completely wrong about by-reference being an object reference - the /this/
pointer of a value type 'vt is typed as REF 'vt rather than OBREF 'vt. So
it can't escape. As long as it can't escape, it's not that big a problem.

If we're trying to avoid the header on interior unboxed objects, we don't
need it for finding the base object. Note that objects are a minimum of 4
words (when alignment requirements are considered). On a 32-bit system
that's 16 bytes, in which case a 128byte immix line holds 8 objects. How
convenient that there are 8 bits in the per-line metadata byte. Instead of
using an object count, we can use a bit to mark which 4-word boundaries
correspond to the start of an object. Given such bits, we can walk
backwards fairly cheaply to find containing objects.

So now we're considering a model in which maybe only REF 'vt can be an
inner reference. Except we still have forwarding to deal with on the
exterior object, so we still need a write barrier, and the write barrier in
this case needs to be able to find the containing object.


>  I still think at least for v1 embedded unboxed arrays should be ref
> types or value types restricted to having no references it makes things a
> lot more simpler and can be extended later...
>

>> Except it doesn't meet the requirements for one of the early things I
>> need to build.
>>
>
> Java has no value types at all and is pretty fast ,  its not needed just
> an optomization
>

It is NOT just an optimization. The absence of general unboxed types is a
fundamental expressiveness limitation. You simply cannot write systems code
in any sensible way without it. I* *challenge *anyone* to re-code the
Coyotos kernel with comparable efficiency in Java. You get to tell the
compiler that no collector is required, so that's a free 25%-30% advantage
over conventional Java right there.

It. Can't. Be. Done.

Java lacks the expressiveness to describe the data structures.

BitC's first objective is to be a safe language for systems programming.
Being able to express this is necessary.


> Like the read barrier removal i take it the  read locks are only  needed
>>> while the collector is running...
>>>
>>
>> Read locks are required whenever two *mutators* may run concurrently and
>> the object is mutable. If no read lock is present, you can't detect
>> contention when one thread or the other upgrades to a write lock.
>>  Obviously you don't need read locks on immutable objects. You're not
>> locking out the collector here.
>>
>
> Which means in a single threaded context ( with the collector on another
> thread) if it can be identified you dont need the lock.
>

Yes. Read and write locks aren't necessarily GC primitives, but if we end
up needing dynamic translation, they should definitely be inlined
intrinsics.

Though it should be said that compare-and-swap on a single thread actually
isn't that expensive.


>  The missing part of this story is that it doesn't address forwarded
>> pointers that get loaded out of objects residing in the heap. If you have
>> to do read locking, you can deal with it then, but otherwise those remain
>> an issue. Note that taking a read lock amounts to a read barrier.
>>
>>
>>
> So how does it  remove the read barrier ?  Isnt checking the lock ( not
> just taking it ) a read barrier ?
>

Previously, you needed the read barrier *in addition to* the lock. The
point here is that if you are locking anyway you can fuse them.

The problem with this approach is that it assumes a per-object lock, which
is not how I want things to work.


> Its very complex and im struggling to follow it ( as i dont have time to
> read up on all this at the moment)  but it seems strange to me that such an
> expensive mechanism (read barrier /locks) is needed for something that runs
> so rare and mainly to assist in defragmentation of the heap...
>

Yes. It's a very high cost. The feature that causes it is object relocation
- you need a way to make sure that you detect the fact that an object is
forwarded.

But now that a lot of the cases are handled in the store barrier, I think
that the *rest* can be handled with page faults if we really need to.



> Is the treatment worse the disease (fragmentation causing extra memory )?
>

Unfortunately, the data says it isn't. The problem is that 10% of objects
do not die young. That's true in the second generation too. So in a given
32KB block you might have as many as 1K objects, 100 of which don't die
very fast. That block can be recycled, but it won't become free any time
soon. So what happens is that you end up constantly growing the heap to
generate free blocks.

Or you can size-partition the heap to avoid the fragmentation. But that
costs you a 28% increase in L1 cache misses and likely a bunch more in TLB
misses.


> Isnt a simple but short stop the world almost as effective , and for the
> Nursery sweep in parralel ? Then you only need a write barrier for heap
> objects references to Nursery which is pretty cheap.  Possibly you can
> remove  most of the Nursery pauses by using a write barrier and relying on
> Thread local nurseries ( we discussed some mechanisms ) .
>

It isn't, because the nursery isn't the source of the problem here.
Forwarding within the nursery can all be done during nursery sweep (as you
suggest). The problem is that for a large heap you need to do concurrent
collection in the background, which includes forwarding objects to render
low-occupancy blocks free. Thus the read barrier, because the mutator is
running concurrently.

A compromise would be for the GC to identify blocks that will be evacuated
at the start of collection, accumulate a points-to map during tracing,
pause all the mutators, forward the objects, and explicitly back-patch the
points-to locations. Most of the objects will forward, and the reference
counts will usually confirm that we got all of the in-bound pointers. When
we didn't, we institute a heavy (concurrent) read barrier with the page
fault system. This approach works for a small number of active threads. The
problem with it is that it really doesn't scale.


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to