On Wed, Oct 30, 2013 at 6:26 PM, Ben Kloosterman <[email protected]> wrote:
> On Thu, Oct 31, 2013 at 2:05 AM, Jonathan S. Shapiro <[email protected]>wrote: > >> On Tue, Oct 29, 2013 at 10:24 PM, Ben Kloosterman <[email protected]>wrote: >> >>> I Think by the time you add a header you have added a lot of complexity >>> for little gain. eg C Interop or mapping directly to structures such as IP >>> headers become Impossible. You can add the header to the top but then you >>> need to find the base object all the time and access it..certainly not >>> elegant. >>> >> >> In the usual implementation, the object reference points to the first >> content word of the object, not to the header. I'm also not sure what the >> interop problem really is here, since C doesn't have (and cannot express) >> objects to begin with. Finally, when you are dealing with C you are dealing >> with pointers, not object references. That's a whole different kettle of >> fish, because pinning and liveness-dominance requirements apply in that >> case. >> >> >> But there is no usual implimentation for an internal object header.. > Yes. I suppose I meant "in the usual implementation of exterior objects". > ...where do you put them ? > You enumerate a bunch of choices, but there is really only one that works in practice, because write barriers need to be *super* cheap, and there is no way to identify interior pointers easily enough to do it in the inline portion of the write barrier. Once interior references are running around loose, they will show up in all sorts of places where they aren't expected. So if we admit them in escaping form, we need to have a write barrier that works without regard to whether the object reference is interior or exterior. So either *all* objects (interior and exterior) have a GC header at obref[-n], or the write barrier can't examine the object header. And that's mostly fine. Except that deferred reference counting needs the 'M' bit, and handling forwarding needs the 'F' bit, so I'm not convinced that getting rid of the header is an option. If we declare that *all* objects have a vTable slot then I guess we could use the low-order bits, but that entails layout issues too, because not all objects have vTable pointers. Notably: CLR value types do not carry a /this/ pointer. I wrote yesterday that by-ref is an interior object reference in disguise (at the implementation level). This is incorrect. I was wondering how C# got away with this, and the answer is sneaky: 1. All unboxed types are value (struct) types 2. Within a method of a value type 'vt, /this/ is not an object reference. It has type REF 'vt. Which means that an inner reference cannot leak. 3. Structs cannot derive from each other in either direction. The *only* derivation is from System.ValueType. This means that there can only be one vTable for a given struct type, so the vTable pointer isn't needed, even if the struct overrides virtual methods of System.Object It's a clever set of rules, but even with these rules they ended up needing to box and unbox the structs a lot. > >> >> The C# method of storing no interior references has some merit . Value >>> types are copied by value , they can be passed by reference ( register) >>> but are never stored as a reference. >>> >> >> It helps to use "object reference" and "by reference" when you are >> discussing the two together. They mean completely different things, and >> it's really easy to get the discussion confused. >> > > yep ,, we also need a better term for non boxed array ..as these need not > be in an object maybe embedded array. And value types embedded objects ? > We *have* a better term: array. The boxed version of an array should be called a vector. Unfortunately the good people at Microsoft have a limited appreciation of lexicography. Actually, it's not whether the array is boxed or unboxed that's mainly at issue. The main issue is whether the array is dynamically or statically sized. Where we need to disambiguate, perhaps we could use "fixed array" and "dynamic array". > Unfortunately, this is *exactly* the case we need to be concerned about, >> and the C# solution doesn't help. From the implementation perspective, >> passing by reference is just syntactic sugar for an [potentially interior] >> object reference. You know that there is a dominating exterior object >> reference that is live, so you don't need to *mark* the by-ref >> references, but in a relocating collector you still need to relocate them. >> That can be done with fat pointers, I suppose. As you note, one advantage >> of by-reference is that it is a non-leaking reference, so you don't have to >> worry about them propagating into the heap. >> > > C# is relocating and I dont think uses fat pointers. Maybe because gen0 > and gen1 collections ( which are fast ) still stop the world so they can > just scan the stack and find these reference ( we have the object reference > and object size ) so you can either go the fat pointer / slice route or > maybe search the stack for these ( a bit like your find the block and work > back but in this case you search the stackmap) > Yeah. So I looked, and it turns out that they got this right, and I'm completely wrong about by-reference being an object reference - the /this/ pointer of a value type 'vt is typed as REF 'vt rather than OBREF 'vt. So it can't escape. As long as it can't escape, it's not that big a problem. If we're trying to avoid the header on interior unboxed objects, we don't need it for finding the base object. Note that objects are a minimum of 4 words (when alignment requirements are considered). On a 32-bit system that's 16 bytes, in which case a 128byte immix line holds 8 objects. How convenient that there are 8 bits in the per-line metadata byte. Instead of using an object count, we can use a bit to mark which 4-word boundaries correspond to the start of an object. Given such bits, we can walk backwards fairly cheaply to find containing objects. So now we're considering a model in which maybe only REF 'vt can be an inner reference. Except we still have forwarding to deal with on the exterior object, so we still need a write barrier, and the write barrier in this case needs to be able to find the containing object. > I still think at least for v1 embedded unboxed arrays should be ref > types or value types restricted to having no references it makes things a > lot more simpler and can be extended later... > >> Except it doesn't meet the requirements for one of the early things I >> need to build. >> > > Java has no value types at all and is pretty fast , its not needed just > an optomization > It is NOT just an optimization. The absence of general unboxed types is a fundamental expressiveness limitation. You simply cannot write systems code in any sensible way without it. I* *challenge *anyone* to re-code the Coyotos kernel with comparable efficiency in Java. You get to tell the compiler that no collector is required, so that's a free 25%-30% advantage over conventional Java right there. It. Can't. Be. Done. Java lacks the expressiveness to describe the data structures. BitC's first objective is to be a safe language for systems programming. Being able to express this is necessary. > Like the read barrier removal i take it the read locks are only needed >>> while the collector is running... >>> >> >> Read locks are required whenever two *mutators* may run concurrently and >> the object is mutable. If no read lock is present, you can't detect >> contention when one thread or the other upgrades to a write lock. >> Obviously you don't need read locks on immutable objects. You're not >> locking out the collector here. >> > > Which means in a single threaded context ( with the collector on another > thread) if it can be identified you dont need the lock. > Yes. Read and write locks aren't necessarily GC primitives, but if we end up needing dynamic translation, they should definitely be inlined intrinsics. Though it should be said that compare-and-swap on a single thread actually isn't that expensive. > The missing part of this story is that it doesn't address forwarded >> pointers that get loaded out of objects residing in the heap. If you have >> to do read locking, you can deal with it then, but otherwise those remain >> an issue. Note that taking a read lock amounts to a read barrier. >> >> >> > So how does it remove the read barrier ? Isnt checking the lock ( not > just taking it ) a read barrier ? > Previously, you needed the read barrier *in addition to* the lock. The point here is that if you are locking anyway you can fuse them. The problem with this approach is that it assumes a per-object lock, which is not how I want things to work. > Its very complex and im struggling to follow it ( as i dont have time to > read up on all this at the moment) but it seems strange to me that such an > expensive mechanism (read barrier /locks) is needed for something that runs > so rare and mainly to assist in defragmentation of the heap... > Yes. It's a very high cost. The feature that causes it is object relocation - you need a way to make sure that you detect the fact that an object is forwarded. But now that a lot of the cases are handled in the store barrier, I think that the *rest* can be handled with page faults if we really need to. > Is the treatment worse the disease (fragmentation causing extra memory )? > Unfortunately, the data says it isn't. The problem is that 10% of objects do not die young. That's true in the second generation too. So in a given 32KB block you might have as many as 1K objects, 100 of which don't die very fast. That block can be recycled, but it won't become free any time soon. So what happens is that you end up constantly growing the heap to generate free blocks. Or you can size-partition the heap to avoid the fragmentation. But that costs you a 28% increase in L1 cache misses and likely a bunch more in TLB misses. > Isnt a simple but short stop the world almost as effective , and for the > Nursery sweep in parralel ? Then you only need a write barrier for heap > objects references to Nursery which is pretty cheap. Possibly you can > remove most of the Nursery pauses by using a write barrier and relying on > Thread local nurseries ( we discussed some mechanisms ) . > It isn't, because the nursery isn't the source of the problem here. Forwarding within the nursery can all be done during nursery sweep (as you suggest). The problem is that for a large heap you need to do concurrent collection in the background, which includes forwarding objects to render low-occupancy blocks free. Thus the read barrier, because the mutator is running concurrently. A compromise would be for the GC to identify blocks that will be evacuated at the start of collection, accumulate a points-to map during tracing, pause all the mutators, forward the objects, and explicitly back-patch the points-to locations. Most of the objects will forward, and the reference counts will usually confirm that we got all of the in-bound pointers. When we didn't, we institute a heavy (concurrent) read barrier with the page fault system. This approach works for a small number of active threads. The problem with it is that it really doesn't scale. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
