On Mon, 2007-09-17 at 09:30 -0700, Doug Baskins wrote:
> John:

Sorry for the length of the replies, and also the fact they're
not deterministic (you're making me think, so this is a thought
in progress response .. :)

> Please comment on what the next prototype of Judy has in Judy.h:
> 
> 
> #define JLN(PValue, PArray, Index)
> \
> {
> \
>     (Index)++;
> \
>     if ((PArray) && (Index))
> \
>         (PValue) = (Pvoid_t) JudyLFirst(PArray, &(Index), PJE0);
> \
>     else
> \
>         (PValue) = NULL;
> \
> }

ouch .. I guess that will work, but it's a hack.. get_next_or_equal
is changed to get_next by incrementing the argument pointer? :))

The problem with that is it modifies the argument, even in the
case there is no 'next'. You can fix that by copying the value,
and only setting the original Index on success.

BTW: C++ and most C compilers have 'inline' functions now.
'inline' is ISO C99 too. These would be better than macros.

However in C, you cannot write a function taking a reference
argument (lvalue), so you have to use macros if you want to
do that, instead of the "C way" which is to use a pointer.

Personally, having done much functional programming, I prefer
the pointer anyhow.

In the example macro above, 'Index' is an lvalue = reference.

> Perhaps JudyLGet, JudyLNext and JudyLFirst should return a NULL with
> a corruption error.  I am leaning that way too.

But NULL isn't an error! It just means 'not found' which is a 
normal result.

I would use the hook function idea. By default, corruption
should core dump (call abort()) with a message to stderr.

Basically, if the function fails it shouldn't return.

However, malloc failure isn't a hard error. The problem is
if you return, the user has to decide what to do.. if you have
no memory left, your screwed anyhow :)

If Judy is in the middle of doing some modifications to the
Judy array and malloc fails, then the array may be in an 
inconsistent and unrecoverable state. Unless you can guarantee
to ALWAYS leave the Judy array in a consistent state on malloc
failure .. there's no point returning an error code: the user
program MUST abort -- it can't even delete the Judy array.

OTOH the hook function CAN allow for a retry on malloc failure.

It isn't clear this is a good idea though, since you already
provide a malloc hook? In that case, malloc_hook() returning
0 should call the error hook, since if the user wanted to try
to recover, they could set the hook function, call malloc
inside it, and free up memory and retry the malloc,
all inside the hook.

So there's a good argument to never return ANY error code,
not even on malloc failure.

The main problem is that in a multi-threaded C environment,
the error hook is global, so it has to do Posix per-thread
memory operations to get, for example, a jumpbuf to jump
to an per thread error handler.

C++ has no such problem, the hook can just throw an exception.

Note an interesting problem for ME: Felix is using Judy to
IMPLEMENT a garbage collector .. so if you're calling the
garbage collector when you run out of memory .. well,
the collector still has to work! So technically I actually
*have* to use an allocation hook and some reserved memory
for Judy, to actually free up memory for the client
application.

The problem here is that Judy isn't re-entrant with respect
to the allocator.

in other words, there is no where to PUT the allocation object,
you can only store the hook into global memory.

The only solution to that is to pass an extra argument to
every Judy function, which is a pointer to an object containing
the allocator hook and data.

Since Felix could run multiple collectors in multiple threads,
this is technically necessary. The problem is, the extra argument
will cost on every call, especially on 32 bit machines.

[The alternative is very ugly -- a mutex and/or Posix
per thread memory: passing a pointer is better]

Felix itself always passes such context around with a pointer.
We do not use global variables at all (because they destroy
thread safety and re-entrancy). But that's a cost I'm prepared
to pay and you may not be.

> > [BAD .. never call the public interface of a routine inside
> > any routine at the same level! Doing so makes it impossible
> > to wrap the public interface without interfering with the
> > implementation]
> 
> Please give me an example how JudyLNext() should be written
> given the above api for JLN() and assume that JudyLFirst()
> never returns a PPJERR.

You would just define JudyLFirst_private, which can be called
by other routines then wrap it in JudyLFirst.

If you make it an inline function it should be optimised away.

> > BTW: using ~0UL is a bad idea. This is NOT -1 on XP64, because
> > long is only 4 bytes, and when cast to 8 bytes it may lead to
> 
> How about a ~0ULL ?  will C++ complain?

g++ wont. However, MSVC++ might, I'm not sure. Neither C89 nor
C++ require long long.

But the point is, you shouldn't need that if you don't ever return
an error code.

> What your asking for is to combine:

I make multiple suggestions, which doesn't help .. :)

My design paradigm could be different from yours. I try to write
functions which can't fail. If a failure is detected internally,
they just abort the program.

This doesn't mean you abort on 'end of file' when reading,
because end of file isn't an error, its an *expected* condition.

Now the problem is, in some application a Judy Array corruption
isn't a hard error either. For example a web server loading
a plugin which fails, would just kill use of that facility ..
you don't want the whole webserver to be taken down by a bug
in a plugin.

Thus .. I can't say my approach is necessarily the only correct one.

Just to say again another way -- my basic philosophy is that if an 
error is detected, the program should just abort (or call a user
hook which defaults to doing that).

Anything less assumes the code being executed isn't an essential
part of the program.

The thing is, this can be the case.. real code DOES contain
non-essential enhancements.

> Given the apparent lack of use of these error returns, I tend to agree
> with you.  Please suggest the semantics for 32 bit Judy1Count() return
> when the Array is full (2**32 and 2**32 - 1 entries).  Note: 64 bit 
> Judy1Count() does not have this problem -- yet, perhaps in the 23rd
> century.

You could return a 64 bit integer. Either long or long long, or,
easier:

        struct Judy64 { uint32 hi; uint32 lo; };

It's a bit messy, but most users will just go:

        int n = Judy1Count (..).lo;

and ignore the high word, because they know they didn't put 2^32
entries in the array :) It does cost a bit more to return the
value though.

> Also, should an error return due to malloc(2) fail allow the Array to
> be
> corrupt.  It is very difficult to keep the array in-tact with a malloc
> fail.

Yes, I expected that.

> > Corrupted Judy structure should instantly abort().
> > If you have corruption, you need the earliest possible core dump.
> 
> I have already been thru that.  People think it is a bug in Judy if it
> core dumps inside of the Judy code.  Please comment.

But as you see from my own 'bugs in Judy' which were actually bugs
in MY code .. the converse can happen too.

It's worse, because I can fail to check the error code. The thing
with aborting is you can fprint(stderr a message which makes it
impossible for the user to miss it.

The problem with that isn't "users complaining Judy aborted" --
well that may be a 'human' problem for you, but it isn't technically
the issue: the technical issue is more like what I described above:
if this part of the code is a 'non-essential' feature which fails
in a contest where you don't want the whole program to fail.

In C++ the hook function is the right solution, and indeed that
is precisely what the C++ standard library actually uses in
the same context. The hook can throw an exception, which will
terminate the program, or if we have the 'non-essential feature'
the user can catch it.

In C, the only way to emulate that is with setjmp/longjmp,
otherwise you do need to return an error code so the application
can take appropriate action.

The 'right' solution is two level interface: the fully nasty
one returning the error codes, and a more abstract wrapper that
is easier to use and safer. Of course that's what you're trying
to provide with the macros.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Reply via email to