Specifically, the C4 paper explains wanting to perform map/protect
operations without an implicit TLB-invalidate for all cores. The three
bullet points they list about the stock-linux VM interface are (a) each
remap operation forces an expensive TLB invalidate; (b) each remap can only
map a small 4k page; and (c) remap operations have single threaded lock
contention.

I don't understand (b), since mremap() <http://linux.die.net/man/2/mremap>,
mprotect() <http://linux.die.net/man/2/mprotect>,
remap_file_pages()<http://linux.die.net/man/2/remap_file_pages> all
accept a size argument.

My hypothesis is that they overcame these limitations by increasing the VM
mapping granularity, thus reducing their effective kernel remapping rates.
-- This would increase the granularity of some of their operations (see
below), but if I'm correct this is *mostly* trading memory-overhead for
mapping efficiency -- something which seems entirely reasonable for
desktop/server apps. This tradeoff is not so easy to make on mobile, but
they don't target mobile, so...

As for what they do with the VM system...

1) They use page-protection to enable fully concurrent evacuation. The C4
paper explains that a page/segment is memory protected before evacuation.
On page fault, if the object has been moved they heal the source-reference,
if the object has not been moved, they move it, and then heal the source
reference.

2) They use unmapping and page-protection to support "hand-over-hand"
compaction, where they only need one free page to compact any size heap. As
soon as they are done evacuating a page, they do what they call
"Quick-Release" -- they unmap the physical page and set it up virtual page
protection to trap and heal any stale references to the old object
locations.

As far as I can see, as long as the mapping calls support arbitrary sizes
in map/protect/remap, the above operations can work on any granularity. The
larger the granularity, the larger the memory overhead and lower the
effective kernel remapping rates.

If I'm understanding this all correctly, there would also be a second order
performance degradation from working with larger granularities. This occurs
because the bigger the size of the protected page in (1), the more likely a
concurrent mutator will run into that page and cause a healing fault.

It's possible I'm missing something though, because I don't see why
batching many remaps per TLB invalidate was such a critical operation for
them.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to