On Thu, 2007-05-31 at 11:42 +1000, skaller wrote:
> Ok, I just changed Felix gc to replace a doubly linked list
> with a JudyLArray and an auxiliary Judy1 array. The GC now
> scans for garbage using JudyLFirst/Next functions instead
> of tracing the doubly linked list. Test program timings:
>
> Without Judy: 1 minute 8 seconds
> With Judy: 1 minute 28 seconds
Now, I have implemented the interior pointer code;
it looks like this:
/////////////////////////////////////////////////
for(unsigned int i=0; i<n_offsets; ++i)
{
void **q = (void**)(void*)(p + offsets[i]);
#ifndef JUDYGC
if(*q)
scan_object(CLIENT_TO_FRAME(*q));
#else
Word_t cand = (Word_t)*q;
if(cand)
{
Word_t fp=cand;
Word_t *plen = (Word_t*)JudyLLast(ja,&fp,&je);
// this is only true for object pointers..
// not interior pointers -- FOR TESTING ONLY!
assert(fp == (Word_t)CLIENT_TO_FRAME((void*)cand));
if(plen && fp<=cand && cand < fp+*plen)
scan_object((frame_t*)(void*)fp);
}
#endif
}
//////////////////////////////////////
Here, the non-Judy code uses a macro CLIENT_TO_FRAME
to convert a pointer to the client part of a store to
the address of the header which prefixes it. The header
contains various information including a pointer to a
shape object which includes the array of offsets of all
pointers in the object, which this code is scanning.
The Judy code does the same job, by taking a pointer
and looking BACKWARDS for the previous registered
memory block address, then checking the candidate
pointer is between that address and the end of the block,
which is obtained by adding the block length on.
The judy array is a map: pointer -> length.
The assertion holds in CURRENT Felix code, since
we don't yet allow interior pointers.
And I just noticed that the test fp <= cand is redundant
too .. since the JudyLLast postcondition is
if plen then fp <= cand
i.e. if we get a non-null value back, the index returned is
less than or equal to the input index.
The timing on this code is:
Judy with interior pointer check: 1 minute 31 seconds
which is only marginally slower than without the check.
The extra Judy function call is more or less free. I have no
idea why but one can't complain!
NOTE: at least part of the reason is almost certainly that
the earliest nodes in the Judy tree are invariant and singleton,
because all the malloc()d addresses are close to each other
(in the 64 bit scheme of things .. )
So although in theory each lookup requires 8 memory accesses,
the first 5 are probably permanently wired into the cache,
and cost almost no machine cycles.
One HUGE advantage of the Judy structure is that this performance
optimisation is delivered WITHOUT compromising the ability of
the routine to work on a machine with a fully populated
zigabyte store (all 64 bits of address space mapped to memory .. :)
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel