Re: Simplified C99 judy wrappers

John Meacham Thu, 20 Feb 2014 02:54:12 -0800

On Tue, Feb 18, 2014 at 8:51 PM, Alan Silverstein <[email protected]> wrote:
> I suspect this was based more on reading my "Shop Manual" (wish I'd
> found a better name than that, and written it in HTML not Framemaker),
> rather than the source code? Â :-)


Yeah for the most part. But I do dig into the source code some too.

>> ...I could sweep just by discarding the judy tree and not having to
>> traverse the whole heap looking for free bits helped allocation time a
>> lot.
>
> Yeah, that kind of usage, what I call "synergistic serendipity" or if
> you prefer, "serendipitous synergy," is what Doug, and later I, got
> excited about with libJudy. Â We used to imagine, "what could you do with
> a dynamic sparse array if it was really cheap and fast?" Â We found all
> kinds of interesting applications, but in most cases using libJudy
> required some unusual thinking at best, warped at worst.
>
> We spent some time during the project looking for "killer apps" to
> demonstrate the value of the library... Â But it was surprisingly hard.
> Basically you had to have libJudy in your toolbox during project design
> and implementation. Â It was very difficult to retrofit afterwards to
> significant tools. Â Quite often 80% of the same benefit could be had
> just by tweaking some hashing constants that had gotten dusty.

I would have no idea how to go about marketing a library. :)

When I was at google I got people to use the generic stuff I made interally,
but that was by word of mouth and no money was changing hands. (Just some
20% time sharing, I'll fix your bug if you fix mine.). A couple ended up
officially supported and staffed. But the endgame for a very successful
interal library is being open sourced. like protobufs. So your judy
development path makes a lot of sense to me.

> Anyway as De Marco observed, "people optimize whatever is measured, and
> hide their garbage elsewhere." Â Despite knowing this, and trying to
> cover ALL the bases, it's obvious in hindsight that what we created had
> various shortcomings, garbage where we didn't "measure" closely enough.
> Most of our project's effort went into version 4 (Judy IV), the final
> version, which took a lot longer than expected; core coding and
> documentation efforts. Â We were focused on HPUX (after all HP was paying
> the bills), which didn't even have C explicit inline functions at that
> time. Â So open source portability suffered, and in hindsight,
> documentation and code readability too.

I was wondering how you went about benchmarking and testing judy with what
simulates real data. I wrote a small radix tree library that was a subset of
judy in functionality that I needed to embed in something, It just had fixed
sized linear nodes, leaf bitsets, and uncompressed nodes and no root
compression. I was testing it against judy with random data and found it to
have surprisingly good, comparable performance... of course. the key thing
there is the word 'random'. as soon as any structure or clustering or
anything other than uniform randomness was added then judy did wildly
better. Which makes sense to me, but how did you come up with 'non-random'
datasets to test with since clearly uniform random data is not very useful
in general for testing beyond a point.

>> I always assumed the odd API was to conform to some internal HP coding
>> standard created by someone who has a history with algol (hence the
>> in/out arguments) I've seen odder ones forced on programmers out
>> there.
>
> No, actually it's kind of funny in a sad way. Â I was always a very
> persnickety UI and manual entry designer. Â I wrote a lot of Unix
> commands and documentation over the years, both for the HPUX product and
> for "toys," so I was very good at this. Â For libJudy I came up with some
> very solid (I thought) manual entries. Â But Doug really wanted to
> simplify the API; hence the macro API. Â Later he had a manual writer
> essentially throw away half of my documentation work (without telling me
> first) to make the macro API the main entry point, and the formal
> signatures and related description (whatever remained) an afterthought.
>
> In the long light of time, I think Doug was wrong, and I deserve an "I
> told you so" about that change. Â The macro API has confused people more
> than Doug ever intended, and made it even harder to code properly too.
> It seems to me that only "real experts" (who can handle the raw truth,
> even prefer it) end up seriously using libJudy.

Yeah, it actually took some digging to figure out I could call the JudyXFoo
calls directly and that I could just pass PJE0 as the last parameter and
everything will work just fine.

I see some references in the documentation to the macros being "faster"
because they can be inlined, but as far as I can tell I don't see that every
happening. Is this something that used to happen?

> But conversely, another common gripe about libJudy is how the source
> code is painfully abstract to read. Â Yeah, a dynamic sparse array is
> already a difficult (abstract) notion with poor real-world metaphors.
> Being aware of various UI principles like, "one object for each name,
> one name for each object," I drove Doug nuts insisting that we think
> very carefully about what to call everything, aiming for clarity,
> precision, "psychological distinction", etc. Â (Sorry Doug,
> resistor-colored "flowers" as the leaves in the tree were cute but
> confusing!)

A whole lot of CS knowledge is terminology. It is very important to get
something consistent to work with. I think object oriented programming
wouldn't have been nearly as popular had someone not come up with a common
terminology for 'design patterns'. Sure everyone was programming with them,
but once the terminology happened they became much easier, even though you
were writing the same code. I definitely reused the judy terminology.

Interesting anecdote about bad terminology, back in the day I was given the
task of integrating our software with some offshore developed code. It
didn't sound difficult but once I opened their code I found out that for
some reason, they used the word 'bucket' for every data structure. (language
barrier?). pointers were buckets. arrays were buckets. linked lists were
buckets. a linked list holding pointers would be described in the comments
as a 'bucket of buckets'. Kind of comical in retrospect, not so much at the
time. In any case, terminology is key :)

Strangely enough, the code wasn't that bad if you looked at the structure
and ignored the names of everything, I have no idea how the programmer kept
everything straight in their head.

> Anyway since we needed very tight control over the C code, but didn't
> have inline functions, and being comfortable myself with C macros, even
> fairly hairy ones where you pass as a parameter the name of another
> macro, I tried to build minimal-redundancy code that would be as clear
> and fast as possible. Â In hindsight though, the excessively abstract
> macros themselves threw most people, no matter how much I tried to find
> good object names and document everything carefully. Â Maybe Doug was
> right, and it was better to just repeat nearly identical code sections
> 7-8 times rather than "compressing" the source with levels of macros.

I was thinking about trying to do something with generated code by something
more powerful than macros. Like a program you feed in the targets
cache size, key
size you want and payload if anything and it will calculate the optimal
state machine and spit it out....

   John

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: Simplified C99 judy wrappers

Reply via email to