Doug, > Go ahead and try to define some semantics for your idea. > Maybe it would be clear or not, I haven't thought about it.
First, let me be clear that we are using Judy in a variety of ways now, and overall I think you did an excellent job of covering the most common use cases with the existing Judy arrays and sets. I expect our needs are fairly specific so I'm unsure whether others will get much value out of the 32->32 array. (Since 32->32 is a bit cumbersome, let me instead propose that we call these Judy4 arrays, where the keys and values are 4 byte unsigned integers). Note that we currently use JudyL arrays as Judy4 arrays, but on 64-bit machines, JudyL uses 8-byte keys and values. In our case, the 32-bit keys are tokens that we generate to represent strings in large symbol tables. We create 10s of millions of these tokens, and we likewise create 10s of millions of these Judy4 arrays. Most of these Judy4 arrays contain just a few entries, say 10 or less keys. However, some will contain hundreds or possibly thousands of keys, and it is not possible to predict in advance how many keys will be inserted into a give Judy4 array. In our usage, values are always counts of occurrences of the key (in a given context). So, there is a phase in which we are aggregating data, and the fundamental operation is "increment the count for key K in context C." If the key K did not previously exist in context C, we insert (K,1) into the array. If the key K did exist, then we replace (K,n) with (K,n+1). After the aggregation phase, there is an analysis phase. In the analysis phase we primarily do iterations over the Judy4 arrays, but there are also some random lookups. We currently have no real need to be able to delete a key from a Judy4 array. In our usage, it would be acceptable to simply reset the value to 0 and leave the key. Finally, in our usage, 0 is never a valid key, though every other 32-bit value is theoretically a valid key. The above defines the most important aspects of our semantics. Again, these may be too specific, and to make the Judy4 arrays be generally useful to others, you should perhaps generalize the above, by making no assumptions about values other than the fact that they are opaque 4-byte values, and also make no assumptions about the keys other than that they are unique unsigned 4-byte values. Finally, the API would be essentially identical to the JudyL API, with the exception that both the key and value would be defined as uint32_t (using the <stdint.h> type). Inside your implementation, I imagine that you would allocate chunks of memory that are each 1 cache line. In these cache lines you will sometimes have 8-byte pointers to other cache lines. You will also sometimes have 8-bytes representing a specific (key,value) pair. But of course, there will probably be other encodings that use some kind of compression. This implementation must be different than the implementation of JudyL on 32-bit architectures, due to the fact that pointers are 8 bytes instead of 4. Otherwise, the implementations might be similar. I am doing some research to see what other possibilities exist that we might use, but I would be very pleased if you decided to add Judy4 to Judy. Thanks, Jim ----- Original Message ----- From: "Doug Baskins" <[email protected]> To: "Jim Lloyd" <[email protected]>, [email protected] Sent: Saturday, December 5, 2009 10:33:39 PM Subject: Re: 32->32 and 64->64 on 64-bit machines Jim: > but I suspect that internally the code relies on the assumption that a Word_t > must be the same size as a pointer. That is correct. Fortunately 4Gb of RAM is a lot cheaper now. Of course anything is possible, but memory savings would only be in very dense (non-sparse) data sets. One of the serious problems with Judy is people understanding the calling semantics. I think that has been one of my biggest headaches. Go ahead and try to define some semantics for your idea. Maybe it would be clear or not, I haven't thought about it. I am currently in the process of trying to come up with semantics for JudySLNext*() -- the new version of JudySL that includes a string length parameter (and allows imbedded nul chars in the string). I suspect I will take a lot of heat on whatever I come out with. Thank you for your interest, Doug Doug Baskins <[email protected]> ----- Original Message ---- From: Jim Lloyd <[email protected]> To: [email protected] Sent: Sun, December 6, 2009 5:15:03 AM Subject: 32->32 and 64->64 on 64-bit machines Hi, I've used Judy Arrays in various projects for about 8 years now and have been very pleased with the performance for both speed and memory usage. In my current project I'm beginning to wish for a feature that is not currently available, and I'm wondering whether it is feasible. Our applications run on 64-bit machines, and we rely heavily on both JudyL and JudySL. For some of our JudyL use, the keys are always 32-bits, and the values are counts. These counts are always well less than 2^32, so ideally we'd have a version of JudyL that mapped 32-bit words to 32-bit words. Its not unusual for one of our applications to consume well over 4Gb with these 32to32 JudyL arrays. I'd love to be able to cut the storage for these arrays in half. How much effort would it be to create this feature? A JudyL array compiled for 32-bit machine would produce the correct interface, but I suspect that internally the code relies on the assumption that a Word_t must be the same size as a pointer. Thanks, Jim Lloyd Silver Tail Systems ------------------------------------------------------------------------------ Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev _______________________________________________ Judy-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/judy-devel ------------------------------------------------------------------------------ Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev _______________________________________________ Judy-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/judy-devel
