Re: bug in interface

skaller Sat, 26 May 2007 17:35:49 -0700

On Sat, 2007-05-26 at 09:25 -0700, Doug Baskins wrote:
> John:

hi Doug .. glad to hear you're still around :)


I have just wrapped Judy up inside Felix.. cool!

> Future versions of Judy will not have error structures as
> part of the calling parameters in many of the routines that do
> not modify the array (tree).  Errors handling is a pain to figure
> out while your thought is in program flow and often overlooked.

That's true in C I suppose. However I have a wrapper for a higher
level language (Felix), and that can be constructed so ignoring
errors is impossible.

Similarly, in C++, wrappers can check errors systematically
and throw exceptions (I hate exceptions .. but that's the 
way it is usually done in C++ ;(

And I think John Meacham has a wrapper for Haskell.

> That is one of the reason I came up with the Macro interface
> (JL*() and J1*() etc).  The only reason they were originally
> put in was to detect corruptions in the array(tree).  The Macro
> interface allows you to change the way errors are handled
> without making large changes to the code.

Yes, but for a wrapper macros suck. I think the macros should
have been put in a separate header file which #includes the 
function interfaces, so that one can avoid namespace pollution
macros cause -- especially ones with 3 letter names :)

> On a different subject, you are not the only one to critique the
> API to Judy.  The API was a result of many discussions and
> nobody was ever happy with it.  

Judy1 and JudyL seem to be just fine to me .. though I haven't
used them extensively yet. My complaint regards JudySL and JudyHS.
As mentioned .. no one uses null terminated strings any more,
not for anything serious. The JudySL semantics *with a count*
instead of null termination would be great.

JudyHS has a subset of those semantics, but it has two issues:
it doesn't provide the iterators, which are essential, and it
uses a different design which may not always be suitable.

The hashing thing may be good for random text strings,
but someone might be sticking in two words of binary data
every time, and then the hashing is pointless, and we really
just want a Judy Trie with depth 16 (on 64 bit machine).
Here the strings are not in fact variable length, but the
variable length interface is 'good enough' .. but we can't
have null termination for binary data!

Strings are distinct from word keys, because strings sort
in byte order, whereas words require byte-swapping depending
on endian-ness.

That said: Judy1 and JudyL are two primary data strucures,
one can probably implement everything else using them,
with only a small little loss of efficiency.

> On still another subject,  I have wrestled many hours on the
> PPvoid_t vs Pvoid_t  semantics to specify the "name" of the
> Judy array.  The PPvoid_t is primarily necessary to allow the
> "Array of Array" (I.E. the concept that a NULL pointer allocates
> another Judy array -- such a used in JudySL, JudyHS etc).

The use of 'void*' instead of 'void**' for functions that don't
modify the Judy Array is probably correct for a low level interface,
even though it is dangerous (it saves one dereference in a few cases).

However void* isn't necessary. You can use a cast. 
You can also provide a distinct interface internally to the 
one provided to the public.

The point is that you put a Word_t in and get a void* out?
That doesn't make sense :)

> It is absolutely necessary that the delete code be able to NULL
> that pointer when the last element is removed or chaos would result. 

Yes, of course: a Judy array is abstracted to a pointer to
a pointer. Such a pointer can be freely and safely passed
around. The pointer it points at IS the Judy array, it only
happens to be a pointer, it could be any data structure.

In the abstract, the user doesn't need to know the type
of a Judy Array, only that Judy1, JudyL etc are distinct,
and distinct from any other data structure.

> P.S. make sure you are using 1.0.4 or greater because there
> is a bug in JLN().

yes, I have 1.0.4.


> It has been on my wish-list for years to do a "thread-safe" 
> version of Judy.  That would require different semantics to the 
> Judy API (you can't have pointers to the "Value" area outside
> a potential lock or modification of the array by some other 
> thread.)  Your help on this would be appreciated, especially
> for C++.  None of the original Judy team used C++.

What do you mean by thread safe? The code is re-entrant isn't it?
As long as you don't have any global variables (and I mean variable
variables not constants like tables) then the code is thread safe.

Despite comments I saw -- malloc() is thread safe: it HAS to be
or no threaded programs would work :)

Setting vectors like a pointer to allocator isn't thread
safe, but that's only a minor issue since it is usually
done only once, initially, before any threads are started.

Now, if you mean *concurrent access to the SAME Judy arrays*
that's another story.

The answer to that is probably *dont bother*. The reason is
that an array is a database, and protecting individual
accesses is worthless. What needs to be protected is 
transactions, which is a sequence of operations, possibly
on several data structures.

Unfortunately, only the application programmers knows
what a 'transaction' is, so only the application programmer
is in a position to serialise operations.

That said, if you have a specific use for a single array
with a single thread safe operation, that is worth providing
a wrapper for. However such wrappers are usually just
locking a mutex, doing the operation, and releasing it.
That's what my thread safe garbage collector does -- 
the gc itself isn't thread safe, but there is a wrapper
which makes sure the public methods are serialised with
a mutex.

This is actually a pain, because gc operations include
malloc(), and mallic() is ALREADY thread safe .. what I 
actually want inside my already serialised operations
is to call an *unsafe* variant of malloc() that doesn't
serialise allocations: by calling the safe version internally
there is an extra mutex lock/unlock done by malloc that is
entirely unnecessary.



-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: bug in interface

Reply via email to