Re: AW: AW: New Judy version

john skaller Sun, 01 Feb 2009 04:33:47 -0800

On 01/02/2009, at 6:06 PM, Doug Baskins wrote:

> Aleksey:
>
>> I personally don't see why it cannot be done or why it is  
>> conseptually
>  wrong.  The fact that Judy is cache-efficient doesn't make an idea
>  wrong.
>
> The idea (smarter/faster JudyNext()) has been pondered way back in
> 2001-2002.  I did some breadboards then with not very acceptable
> results.



I think this is to be expected. Roughly speaking: lets say we have a
"cursor" 10 levels deep, implying 10 cache "blocks".

With the existing Judy, searching again from scratch will be
extremely fast *provided* these 10 blocks are still in the cache.
Buffering information in "user memory" will actually slow Judy down
because of the extra code and memory (and hence cache) required
to do so.

IMHO this is the design principle of Judy: let the *machine cache*
remember useful state transparently to the user software.

>
> I would enjoy any feedback you could give on the Judy API.  My C++
> knowledge could fit in one hand.  An example: a JudyNext could be
> written to return a small array (<100) of Indexes/Values?  How to
> invalidate it when someone does an Insert/Delete/Free?

IN C++ "Invalidate" is a concept, NOT a physical thing, this is not
different from C. In C if you free the memory a pointer points at
the pointer is invalidated by the operation: this does not change
the pointer, it makes using it unsafe.

Judy is superior, because you CANNOT invalidate a key:
it is always valid to try to fetch the associated (or next) item,
an error is returned if there is none. Using an *invalid* key
such as a pointer to a free'd block never returns an error
(it usually either segfaults of returns wrong data).

Because of "invalidation" thing, cursors DO NOT work
with multi-threaded access (one thread can invalidate
another thread's cursor and there is no way to tell).

Whereas the current API simply requires serialisation
of operations: the hardware automatically synchronises
caches, even across multiple CPUs.

Let me put this another way: the current API does
not provide the user with any *state* other than the
Judy array itself. The operations are stateless
(eg "find next thing after a given one -- the given one
doesn't have to exist"). Cursors are state, which is
very hard to synchronise between threads.

This means it is enough to mutex the store containing
the Judy array pointer between threads to ensure
any change to the "top pointer" is shared.


>  What is
> a good API for C++?

Much the same as the current C one.  It correctly hides
most of the private state already, using functional abstraction.

The biggest confusion in the current interface IMHO is that
"void*" and "void**" are variously used, which is confusing.
In theory the base data types are

        K -- the key
        D -- the data
        A -- the array

and some operations, such as "insert" take a K and return
a D* (the place to put the data) whilst fetching returns a
plain D.

A template would fix this. However in practice, Judy keys
are always machine words and data is either a single bit
or a machine word, so for example D and K would usually
either be a long (assuming that is the length of a void*),
or a pointer to something, and these constraints can't be
expressed for templates easily (at best, specialisations,
but it isn't clear it is worthwhile).


--
john skaller
[email protected]





------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: AW: AW: New Judy version

Reply via email to