On Sat, 14 Apr 2012 05:21:08 -0500, Manu <turkey...@gmail.com> wrote:

On 13 April 2012 17:25, Kagamin <s...@here.lot> wrote:

once you prefetched the function, it will remain in the icache and be
reused from there the next time.


All depends how much you love object orientation. If you follow the C++
book and make loads of classes for everything, you'll thrash the hell out
of it. If you only have a couple of different object, maybe they'll coexist
in the icache.
The GC is a bit of a special case though because it runs in a tight loop.
That said, the pipeline hazards still exist regardless of the state of
icache.
Conventional virtuals are worse, since during the process of executing
regular code, there's not usually such a tight loop pattern.
(note: I was answering the prior question about virtual functions in
general, not as applied to the specific use case of a GC scan)

The latest 'turbo' ARM chips (Cortex, etc) and such may store a branch
target table, they are alleged to have improved prediction, but I haven't
checked.
Prior chips (standard Arm9 and down, note: most non-top-end androids fall
in this category, and all games consoles with arms in them) don't
technically have branch prediction at all. ARM has conditional execution
bits on instructions, so it can filter opcodes based on the state of the
flags register. This is a cool design for binary branching, or performing
'select' operations, but it can't do anything to help an indirect jump.

Point is, the GC is the most fundamental performance hazard to D, and I
just think it's important to make sure the design is such that it is
possible to write a GC loop which can do its job with generated data tables
if possible, instead of requiring generated marker functions.
It would seem to me that carefully crafted tables of data could technically
perform the same function as marker functions, but without any function
calls... and the performance characteristics may be better/worse for
different architectures.


Is there any reason to assume that the GC is actually using the mark function for 
marking? The mark function is only there to provide the GC with the information it 
needs in order to mark. What it actually is is defined by the runtime. So it may 
just return a bitmask. Or it might _be_ a bitmask (for objects <= 512 bytes) Or 
an 64-bit index into some internal table. Etc. If fact, real functions only start 
making sense with very large arrays and compound structs/objects.

My actual concern with this approach is that a DLL compiled with GC A will have 
problems interacting with a program with GC B.

Reply via email to