As we discussed before, the VTable marks approach [1] has a "false sharing" problem on a multiprocessor:
when one thread is writing to vtable mark, it is invalidating respective cache line in other processor caches. Meanwhile, since gcmaps, located near vtable marks, are loaded frequently during heap tracing, the same cache line will be loaded and invalidated repeatedly, leading to huge load to memory bus and harming performance. *Illustration*: original "VTable marks" suggestion applied to current DRLVM object layout. object VTable gcmap +--------+ +-----------+ +------------------+ | VT ptr |------->| gcmap ptr |----------->| offset of ref #1 | | ... | | mark | | offset of ref #2 | +--------+ + ... | | ... | +-----------+ | 0 | +------------------+ I would like suggest solution to false sharing problem using additional level of indirection, that is, to store the _pointer to mark word_ in VTable rather than mark word itself. *Illustration*: "indirect VTable marks" suggestion object VTable gcmap +--------+ +-----------+ +------------------+ | VT ptr |------->| gcmap ptr |----------->| offset of ref #1 | | ... | | mark ptr |---, | offset of ref #2 | +--------+ + ... | | | ... | +-----------+ | | 0 | | +------------------+ v [mark word] I do not think this will hurt performance significantly in comparison with original "vtable marks" approach, because, additional load of mark_ptr is very likely to be served from the first-level cache, because it happens at the same time as gcmap_ptr load. (If the mark_ptr is loaded first, then subsequent load of gcmap_ptr will be served from cache, so no additional memory load overhead anyway). In current DRLVM design [2], each VTable already have pointers to native Class structure: Class* clss; It looks like the same pointer can be reused for VTable mark word, if we allocate VTable mark word as the first word of struct Class. In this way, even the size of VTable structure will not be changed comparing to current size. The resulting object layout diagram would be *Illustration*: "indirect VTable marks stored in struct Class" object VTable gcmap +--------+ +-----------+ +------------------+ | VT ptr |------->| gcmap ptr |----------->| offset of ref #1 | | ... | | clss ptr |---, | offset of ref #2 | +--------+ + ... | | | ... | +-----------+ | | 0 | | +------------------+ v +-----------+ | mark word | | ... | +-----------+ struct Class Robin suggested "side byte-map" as another solution to the same false sharing problem. As I do not completely understand how this side byte-map would be implemented, I do not know if it is similar to this suggestion. Robin, could you comment on it? [1] http://wiki.apache.org/harmony/ClassUnloading [2] http://svn.apache.org/viewvc/incubator/harmony/enhanced/drlvm/trunk/vm/vmcore/include/vtable.h?view=co (* This is a follow-up to design proposals at http://wiki.apache.org/harmony/ClassUnloading I am starting new discussion because mailing list is a better means for discussion than Wiki. After we come to conclusion, I will log it to the wiki page. *)