Summary: its slower :-( :(
Calculating the flags position in the pool in pobject_lives() and free_unused_pobjects() takes more time then the smaller cache foot_print does gain. Two reasons: positions have to be calced twice and cache is more stressed with other things, IMHO. Hmm... the first reason, a second bit of pointer arithmetic, seems surprising, cycles being sooo much cheaper than cache misses. So I modified the tpmc test with a second calc. Plus two extra function calls to make sure it wasn't optimized away (to a separately compiled file and back). The two real test cases (linear flag-only walk, and random PMC->flag) were fine (unchanged and perhaps 1/3 slower), though the fast toy case of linear PMC->flag was 5x slower (still faster than the equivalents). So it's not the first reason. That leaves the cache being stressed by other things. Do we have any candidates? I'd expect some interference effects between flag arrays, given _lots_ of arrays and random access. I'm not sure the stressX benchmarks are "lots" enough. But while this interference might be worse in reality than in the test program, it should still be much less than for touching PMCs (say by 10x). So that doesn't seem a likely candidate. Is the gc run doing anything memory intensive aside from the flag fiddling? I don't suppose it is still touching the PMC bodies for any reason? Puzzled, Mitchell ("[..] in realiter"?) Message-ID: <[EMAIL PROTECTED]> Date: Wed, 08 Jan 2003 15:00:38 +0100 From: Leopold Toetsch <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: P6I <[EMAIL PROTECTED]>, Dan Sugalski <[EMAIL PROTECTED]> Subject: Re: More thougths on DOD References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Mitchell N Charity wrote: > The attached patch adds a scheme where: > - gc flags are in the pool, and > - pmc->pool mapping is done with aligned pools and pmc pointer masking. > > Observations: > - It's fast. (The _test_ is anyway.) I did try it and some more in realiter. Summary: its slower :-( Calculating the flags position in the pool in pobject_lives() and free_unused_pobjects() takes more time then the smaller cache foot_print does gain. Two reasons: positions have to be calced twice and cache is more stressed with other things, IMHO. There seems to be remaining only: smaller PMCs for scalars. leo