Way out of my depth and dont have time to learn how it works and read the background papers past a scan of the paper and your excellent summary but i dont think ill let good questions go unanswered.
On Wed, Oct 16, 2013 at 7:18 AM, Jonathan S. Shapiro <[email protected]>wrote: > > > - *Question*: it is not clear to me whether logging applies only to > young objects or possibly also to old objects. > - *Option*: It would be simple to arrange that the S bit is not > necessary for a forwarded object. > - *Option*: I am not convinced that the N bit is relevant for a > forwarded object. > - *Option*: Note that new objects never need to be logged. The N bit > can be seen as an optimization of the logging scheme in which all "old" > reference fields are known to be null. In consequence, the only objects > that end up in the log should be old objects. > - Opinion: two F bits are not needed. A single F bit can do the job. > - Opinion: If the S bit is moved to the per-line metadata, then only > the M and one F bit are actually needed for a forwarded object. If these > are moved to the least-significant position, a forwarding pointer can be > written in the header word without otherwise disturbing the object. Given 8 > byte alignment rules, this seems feasible even if the S bit cannot be > moved. > > Lots of savings but if you have exra fields in the header it doesnt gain much.. re S bits looks solid only issues ( and they both should not be) are we sure only the last block in a line straddles ? ( seems likely but you never know). There may also be a higher cost for getting the line data, > Note that an object with an N bit set resides in thread-local storage; no > cross-CPU contention exists for such objects. > ???? So what happens when a reference to that object is passed to another thread. > *New Object Allocation* > * > * > New objects are allocated from a nursery with an initially zero reference > count. They are allocated with the "N" (new) bit set in their header, and > are not initially reference counted. During nursery collection, the N bit > for live objects will be turned off and the reference count will be updated > based on the traced in-bound references. > > The paper is explicit that it is *not* performing a compacting copy of > the nursery, but it also mentions coalescing objects in new space. This > leaves me with some questions: > > - *Question:* Does nursery allocation always proceed from an initially > free block? > > It doesnt say so .. so it would follow normal allocation a free block and a recycled block .. > > - *Question*: Why isn't partial compaction being used here? I.e. a > scheme in which up to *N* lines will be moved to the front of the > nursery block, with the effect that the oldest lines trend toward the > bottom and can be generationally eliminated? Or is this really what is > meant by "compaction"? > > Sounds good and there may be an attractive scheme without any stackmap / relocation though obviously this will wasted more memory but they do recycle to lines in partial blocks . It may not be generationally eliminated if they choose that block as a recycle block for allocation and use the lines , old blocks will have a high % of empty to i guess you would have to choose between old mixed with new or choosing middle aged blocks which will be less empty wasting more space in the older blocks. I think there compaction = copying . > > - *Question*: Doesn't the capacity for coalescing imply the need for a > complete stack map? > > How could you do it without it ? And note they state copying is not needed its an optomization so you could start without a stackmap. > > - *Question*: After nursery GC, the surviving block is clearly a > recycled block. At what point is the residual free space in this block > considered low enough that it can no longer serve effectively as a nursery > block? What happens then? > > They dont say when a new block is selected .. presumably when the last line has been filled. ( and the free block handles large blocks) . Nothing happens then unless they process N marked objects then but it doesnt need to .. if it doesnt than i dont think its really a nursery. Just 1 big heap. > > > > > *Note*: Object allocation sizes are statically known. This prompts me to > wonder why the TLA does not use a "small recycled block", "medium recycled > block", "overflow block" strategy. Does carrying around more bump allocator > base pointers introduce more register pressure than we want to spend on > allocation? > > Be nice to know if they tired it but if they did im sure they would have mentioned it. > > *Concurrency* > * > * > RC-immix allows mutators to run concurrently or collectors to run > concurrently, but mutators may not run concurrently with collectors. I am > unclear why this is so. Mutation in the long-term heap is rare, and it > would not be difficult to ensure that updates to the appropriate header > bits were atomic. The N and M bits could readily be reframed for tri-color > marking, with the unused (fourth) representable value meaning "new". There > are issues with forwarding barriers that I have raised in a separate email; > these do not seem insurmountable. If I were attempting to render this > scheme fully concurrent, I believe I would attempt the following changes to > the object header bits: > > 1. Coalesce the M and N bits into a two-bit field whose values are > white, grey, black, and new. > 2. Shift the S bit to the per-line metadata, allowing expansion of the > reference count field. > 3. Reduce the forwarding bit to a single bit, using the pointer > overwrite method proposed by Nettles and O'Toole, but using the least three > (known zero) bits of the forwarding pointer to record the forwarded state > (F) and the mark color (if that is needed on a forwarded object - not > sure). Though if object sizes permit, it would sure be nice to keep the > reference count information on the forwarding stub object, so that we know > when all in-bound references have been discovered. > 4. Use a single bit to record object logging. > 5. Use atomic updates on the object header when necessary. > > > *Other Comments* > * > * > I'm not entirely convinced about the particular combination of mechanisms > that is being selected here, but there are a bunch of good ideas that are > being cleverly exploited together. > I suspect using rc-Immix as the main heap for URC will deliver better results than lists of similarly sized objects , and better on small heaps but more development cost ( lists have been shown to be worse than bump allocators for Nurseries and may do for the main body though it is far less of an issue except maybe nursery pause times) .. Im not sure why they didnt check a more standard Nursery as discussed previously , they do note the better performance of the generational collector on smaller heaps (Stephen Blackburn was on both so he should know) . Probably added implimentation cost and they were already good enough. Ben
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
