Re: The Judy algorithm

Tim Bunce Tue, 11 Mar 2003 01:44:01 -0800

On Mon, Mar 10, 2003 at 03:33:38PM +0100, Elizabeth Mattijsen wrote:
> At 10:37 +0000 3/10/03, Tim Bunce wrote:
> >I think this might be interesting to some of you...
> >  "Judy is a general purpose dynamic array implemented as a C callable
> >  library. Judy's speed and memory usage are typically better than
> >  other data storage models and improves with very large data sets."
> >
> >http://judy.sourceforge.net/application/10minutes.htm
> >http://judy.sourceforge.net/application/
> >http://sourceforge.net/projects/judy
> >I've appended a few extracts from the "10minutes.htm" url given above.
> 
> This looks very interesting (particularly for a project I'm working 
> on now, which was the reason I looked into this right now), but the 
> project really seems quite silent if not dead.

I emailed Doug [CC'd] with your concerns. Here's his reply.

Tim.

From: [EMAIL PROTECTED]
Subject: Re: (Fwd) Re: The Judy algorithm
Date: Mon, 10 Mar 2003 22:40:25 

Tim:

I will try to reply to your concerns below.

> Doug,
> 
> I've just flagged Judy to the perl6-internals list (who are developing
> the 'parrot' virtual machine) and they've raised some concerns (below).
> 
> Could you tell me about the status of the Judy code.
> Is it being maintained?

Yes, but I am not much of a process person.  The versions that install and
compile other platforms are prelimary and need more testing.  They are 
available in <http://judy.sourceforge.net/downloads> (see README).  

> 
> Tim.
> 
> ----- Forwarded message from Elizabeth Mattijsen &lt;[EMAIL PROTECTED]&gt; -----
> 
> Delivered-To: [EMAIL PROTECTED]
> In-Reply-To: &lt;[EMAIL PROTECTED]&gt;
> Date: Mon, 10 Mar 2003 15:33:38 +0100
> To: Tim Bunce &lt;[EMAIL PROTECTED]&gt;, [EMAIL PROTECTED]
> From: Elizabeth Mattijsen &lt;[EMAIL PROTECTED]&gt;
> Subject: Re: The Judy algorithm
> 
> At 10:37 +0000 3/10/03, Tim Bunce wrote:
> &gt;I think this might be interesting to some of you...
> &gt;  "Judy is a general purpose dynamic array implemented as a C callable
> &gt;  library. Judy's speed and memory usage are typically better than
> &gt;  other data storage models and improves with very large data sets."
> &gt;
> &gt;http://judy.sourceforge.net/application/10minutes.htm
> &gt;http://judy.sourceforge.net/application/
> &gt;http://sourceforge.net/projects/judy
> &gt;I've appended a few extracts from the "10minutes.htm" url given above.
> 
> This looks very interesting (particularly for a project I'm working 
> on now, which was the reason I looked into this right now), but the 
> project really seems quite silent if not dead.
> 
> 
> Some more info:
> Only HP-UX and Linux seem to be supported out of the box (only tried 
> Linux and Mac OS X).
> 
> I adapted the indexSL program to just be a filter and piped 
> /usr/share/dict/words through it.  Then let it run with Valgrind. 
> That reports:
> 
> ==11948== LEAK SUMMARY:
> ==11948==    definitely lost: 11 bytes in 1 blocks.
> ==11948==    possibly lost:   26 bytes in 2 blocks.
> ==11948==    still reachable: 0 bytes in 0 blocks.
> 
> Not a whole lot of leakage, but still.

I agree any leakage is unacceptable.  However, Judy is tested carefully
to not have leakage.  Memory usage (from malloc()) is kept internally to the 
structure and must subtract (free()) to exactly zero when the last element is 
deleted from the array. Perhaps there is a problem in the measurement.
I would like to know more about the measurement to be certain that the 
problem is not in Judy.  JudyL and Judy1 only allocate blocks in multiples
of 4 bytes.

> 
> 
> I got the configure script into believing that MacOS X is really 
> Linux.  Compilation then halts on
> byteswap.h being missing.  I didn't look any further then.

The versions in the download directory (mentioned above) should solve 
this problem.  However, I think it requires gmake.

> 
> 
> The forum seems to be missing answers from the primary (only) 
> developer.  Bug reports with patches have not been applied (such as 
> trivial bashisms in the configure script).
> 
> 
> The application directory contains some nice examples that might be 
> applicable to Parrot: especially the "best of both worlds" approach 
> in which Judy arrays are used to handle hash value collisions on a 
> rather small (256 or 64K keys) hash.

If a hashing scheme (of strings) is able to solve your problem (just store and 
retrieves) then I suggest using JudyL inplace of your normal hash table.  This 
makes a very scalable hashing method.  The performance is better than any known
tree method (including JudySL).

> 
> 
> 
> Just my 2 eurocents worth (which appear to be worth more than 2.1 US$ 
> cents nowadays ;-)
> 
> 
> Liz
> 
> ----- End forwarded message -----
>

I will be available for your questions and comments.

Doug Baskins <[EMAIL PROTECTED]>

Re: The Judy algorithm

Reply via email to