Re: Faster JudyNext

Doug Baskins Sun, 01 Feb 2009 09:16:36 -0800

John & Aleksey:

Ok, I looked at the code and came up with this idea.  I need some
suggestions on what the API will look like.  I have an initial suggestion:

Word_t Index[20];      // Array to hold 1 passed and 19 returned Indexes

Index[0] = 19;            // Max number of returned Indexes - any numb 1..inf,
                                // but Judy will limit it to about 0..64

PValue = JudyLNextA(PJLArray, Index, &JError);

A maximum of 19 returned Indexes.  Index[0] is
initially 19 and must be restored on every call
since it will be changed by JudyLNextA to the 
actual number returned.  Index[1] is the requested
calling parameter and must be updated with:

Index[1] = Index[Index[0]];
Index[0] = 19;

With every call.  PValue will simply be a pointer
to the array of Values associated with Index[1]..Index[n].

I suspect this would only take about a day or two to implement
and would be a lot faster than the current Judy*Next().
It would also be an extremely fast way to empty (iterate?) a Judy
Array.  The number 19 could be any number, but Judy will
decide what the actual number returned -- usually dictated
by the number of Values stored consecutively.  I think that
it is usually a maximum of 64.  The additional time of the 
routine will be slightly greater than the original JudyLNext()
basically the time it takes to fill "Index[]".

JudyLPrev(), if done will be a little weirder, because the Values
are stored in ascending order, the Indexes must be returned the
same way.  I guess the next previous has to be returned in the:

Index[Index[0]]; and only Index[0] needs updating to the "19" in
every call.  Not so weird after all.

As far as being valid data, the same rules apply as before --
the next modification of the array (Insert, Delete, Free) makes
the data very suspect.  This method also be used to retrieve/store
a number of Values without multiple calls to JudyLGet.  I do not
see any reason why all 4 api (first/last/prev/next) should not be done.

As always it seems, I don't know why I did not think of this before.

Doug Baskins

Doug Baskins <[email protected]>

----- Original Message ----
From: john skaller <[email protected]>
To: Doug Baskins <[email protected]>
Cc: [email protected]; Aleksey Cheusov <[email protected]>
Sent: Sunday, February 1, 2009 7:33:19 PM
Subject: Re: AW: AW: New Judy version

On 01/02/2009, at 6:06 PM, Doug Baskins wrote:

> Aleksey:
>
>> I personally don't see why it cannot be done or why it is  
>> conseptually
>  wrong.  The fact that Judy is cache-efficient doesn't make an idea
>  wrong.
>
> The idea (smarter/faster JudyNext()) has been pondered way back in
> 2001-2002.  I did some breadboards then with not very acceptable
> results.

I think this is to be expected. Roughly speaking: lets say we have a
"cursor" 10 levels deep, implying 10 cache "blocks".

With the existing Judy, searching again from scratch will be
extremely fast *provided* these 10 blocks are still in the cache.
Buffering information in "user memory" will actually slow Judy down
because of the extra code and memory (and hence cache) required
to do so.

IMHO this is the design principle of Judy: let the *machine cache*
remember useful state transparently to the user software.

>
> I would enjoy any feedback you could give on the Judy API.  My C++
> knowledge could fit in one hand.  An example: a JudyNext could be
> written to return a small array (<100) of Indexes/Values?  How to
> invalidate it when someone does an Insert/Delete/Free?

IN C++ "Invalidate" is a concept, NOT a physical thing, this is not
different from C. In C if you free the memory a pointer points at
the pointer is invalidated by the operation: this does not change
the pointer, it makes using it unsafe.

Judy is superior, because you CANNOT invalidate a key:
it is always valid to try to fetch the associated (or next) item,
an error is returned if there is none. Using an *invalid* key
such as a pointer to a free'd block never returns an error
(it usually either segfaults of returns wrong data).

Because of "invalidation" thing, cursors DO NOT work
with multi-threaded access (one thread can invalidate
another thread's cursor and there is no way to tell).

Whereas the current API simply requires serialisation
of operations: the hardware automatically synchronises
caches, even across multiple CPUs.

Let me put this another way: the current API does
not provide the user with any *state* other than the
Judy array itself. The operations are stateless
(eg "find next thing after a given one -- the given one
doesn't have to exist"). Cursors are state, which is
very hard to synchronise between threads.

This means it is enough to mutex the store containing
the Judy array pointer between threads to ensure
any change to the "top pointer" is shared.

>  What is
> a good API for C++?

Much the same as the current C one.  It correctly hides
most of the private state already, using functional abstraction.

The biggest confusion in the current interface IMHO is that
"void*" and "void**" are variously used, which is confusing.
In theory the base data types are

    K -- the key
    D -- the data
    A -- the array

and some operations, such as "insert" take a K and return
a D* (the place to put the data) whilst fetching returns a
plain D.

A template would fix this. However in practice, Judy keys
are always machine words and data is either a single bit
or a machine word, so for example D and K would usually
either be a long (assuming that is the length of a void*),
or a pointer to something, and these constraints can't be
expressed for templates easily (at best, specialisations,
but it isn't clear it is worthwhile).

--
john skaller
[email protected]

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: Faster JudyNext

Reply via email to