Nice topic

john skaller Mon, 02 Apr 2012 14:04:36 -0700

[Changed topic, old one is SPAMing the conversation:
the 'gratulations is suggests i 'on a 'illion 'ollars .. :]

On 03/04/2012, at 4:35 AM, Alan Silverstein wrote:

> Gang,
> 
>> Gcc at least has a way to "hint" to the compiler whether a branch is
>> likely to be taken.  Felix actually emits these hints.  However I have
>> not noticed gcc doing anything with them.
> 
> Amusing, it reminds me that in the last year I carefully marked some new
> source functions "inline", but later discovered that gcc was basically
> ignoring my advice and doing whatever the heck it wanted.  Turns out
> "inline" is merely a suggestion, and I don't know of a way to force it
> ("I know what I'm doing and I want to waste space for performance")
> other than reverting to macros.  (Maybe there's some gcc option or
> #pragma I didn't look for...)

Well .. you can try clang 3.0 :)

Or if you want to use a "real" language, use Felix, where
"inline" isn't a hint.

> 
>> A compromise API would be like:  "here is an array size 100, fill it
>> with 100 entries starting at key such-and-such".  You could then
>> optimally process the 100 entries, then get the next 100.
> 
> I vaguely recall (this was > 10 years ago) discussing multi-index
> operations like this.  I know I actually WROTE code that batch-created
> (inserted) a Judy1/L array rather faster than by simple iteration, and
> it worked fine, but I don't know where it is today.  (Given a simple
> array, or was it parallel arrays, of sparse indexes and values, libJudy
> internally-smart code can create a valid Judy tree pretty quickly, I
> think maybe 3X the one-at-a-time mode?)

Actually that would be quite useful.
I actually use Judy as a temporary data structure.
The GC algorithm is basically: find all reachable objects,
then delete everything not reachable.

All-reachable is a set of pointers (Judy) constructed by scanning.
Everything-not-reachable is a set of pointers. Also Judy.
This is constructed because the practice is to run finalisers
on all these objects *first* before deallocation. 

>> Generally, a map of word to "anything" would be good.  This can be
>> done by map to a word, which is a pointer.  At the cost of an
>> indirection and a memory management problem.
> 
> Agreed.  The solution, which we also toyed with but did not implement,
> is to have libJudy offer APIs like JudyL but with arbitrary-sized value
> areas whose memory is managed by libJudy.  In which case don't just
> return a pointer to the caller, have them give you a pointer to the
> data, and its size, and the library does the mem-create and copy-in.

Just an option for two words would be useful.

Why?

Well, there is a serious bug/misconception in the Judy build.
Actually, the Felix clone makes the same mistake.

I'm running on a 32 bit machine. I build only 32 bit Judy.
Stupid. I still need 64 bit integers.

I'm running on a 64 bit machine. I build 64 bit Judy.
Stupid. I still need "int". That's only 32 bits.
Heck, "long" is only 32 bits on Win64.

What should be built is: 32 and 64 bit arrays. On BOTH
32 and 64 bit machines. I think this can be done now.
In addition we should have 32->64 and 64->32
maps (JudyL). And 64->128 (64 bit machine only).

>> A trivial example:  an IP address is 4 bytes, some of which can be 0.
>> Of course this can be done now, with a JudyL instead of S.
> 
> Right.  C strings are what I call "length-terminated" by the NUL bytes.

In the C++ Standard they're called NTBS: Null terminated byte strings.
So that's a technical, formal, and standardised name for "C strings" :-)

> Perl strings for example are "length-associated," and any bit pattern
> can appear in the string.  

Ditto C++ strings, and strings in almost every sane language other
than C. Actually .. *including* C, since C also has length controlled
buffers, just that you have to manage the length yourself.

> You can easily store them in a JudyL
> meta-trie where the top level is the length, but you end up with them
> sorted by length first, then lexicographically, not just
> lexicographically.  You might not care.

This is similar to the property naive Hash-table has.
You lose ordering in return for O(1) operations.

--
john skaller
[email protected]
http://felix-lang.org

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Nice topic

Reply via email to