re: (tangents on libJudy development)

Alan Silverstein Thu, 20 Feb 2014 15:21:27 -0800

> A couple ended up officially supported and staffed.

Many of my old "G-jobs" over the years at HP, likewise.


> But the endgame for a very successful interal library is being open
> sourced.  like protobufs.  So your judy development path makes a lot
> of sense to me.

Well perhaps in hindsight, but HP invested about $1M in the project
hoping to at least recover that, and the only reason they (we, before
leaving) open-sourced it was they canceled the project and laid off the
team.

> I was wondering how you went about benchmarking and testing judy with
> what simulates real data.

Don't recall for sure, but I know we pulled a variety of "big" datasets
off the web to use.  Also we nailed source code coverage so thoroughly
(using a commercial product called C-Cover that was really well
engineered) that EVERY branch was either hit by the regression test or
explicitly waivered (we knew why not), just a few untriggerable error
paths.

> I wrote a small radix tree library that was a subset of judy in
> functionality that I needed to embed in something, It just had fixed
> sized linear nodes, leaf bitsets, and uncompressed nodes and no root
> compression.  I was testing it against judy with random data and found
> it to have surprisingly good, comparable performance...  of course.
> the key thing there is the word 'random'.  as soon as any structure or
> clustering or anything other than uniform randomness was added then
> judy did wildly better.

Whew.  :-)

> ...it actually took some digging to figure out I could call the
> JudyXFoo calls directly and that I could just pass PJE0 as the last
> parameter and everything will work just fine.

If I'd known, I would have saved my latest and best manual entries, and
"contributed them to the open source project" as an alternative.  But
for whatever reason, I didn't.  They were buried in the (RCS?) history.

This was before the *_funcs* files I see now when I look at the sources
I took away with me.  I think back then the functions were the main
pages and they referenced *_macros* files instead.

> I see some references in the documentation to the macros being
> "faster" because they can be inlined, but as far as I can tell I don't
> see that every happening.  Is this something that used to happen?

Apparently so, back then, on HPUX C at least.

> A whole lot of CS knowledge is terminology.

A whole lot of any human intellectual endeavor is the "mere semantics."
(Which if I understand right, I think is paradoxical because semantics
is supposed to be the meaning, not just the symbology?)  Anyway as you
know, our ability to think is limited by our metaphors and choices of
words.  Finding the right "framing" for any problem is more than half
the battle to finding a solution.

Or as I noted long ago:  "Tell me your data structures and I will infer
your code."  Now who said that, and how exactly did they say it?  Dunno,
can't Google up the answer...

> Interesting anecdote about bad terminology, back in the day I was
> given the task of integrating our software with some offshore
> developed code.  It didn't sound difficult but once I opened their
> code I found out that for some reason, they used the word 'bucket' for
> every data structure.  (language barrier?).  pointers were buckets.
> arrays were buckets.  linked lists were buckets.  a linked list
> holding pointers would be described in the comments as a 'bucket of
> buckets'.  Kind of comical in retrospect, not so much at the time.  In
> any case, terminology is key :)

SIGH!  Sounds like "bucket" = "thing" and they were more sloppy than
unfamiliar with English.  I've worked with (and had to rein in my
disdain for) so many people who were nominally professionals but who did
really sloppy work without even caring or being aware of it.  Pet peeve
of mine, never mind.

> Strangely enough, the code wasn't that bad if you looked at the
> structure and ignored the names of everything, I have no idea how the
> programmer kept everything straight in their head.

Are you sure they didn't run it through an obfuscator?  :-)

> I was thinking about trying to do something with generated code by
> something more powerful than macros.  Like a program you feed in the
> targets cache size, key size you want and payload if anything and it
> will calculate the optimal state machine and spit it out....

Right.  Actually, perhaps ironically, if I recall right libJudy ended up
with a small code generator in it anyway, but I forget what it was used
for.  Yeah, if highly repetitive code is needed for performance but is
mind-numbing to maintain, and if existing template creators don't work,
you end up having to roll your own.

Cheers,
Alan

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

re: (tangents on libJudy development)

Reply via email to