Jimmy,Jing Lv wrote: > I've done some further study on hashcode and HashMap. I find it > very interesting that RI maybe has done something on IdentityHashCode > just as Sergey said. According to Tim's micro-benchmark on > JIRA-4046[1], Harmony classlib+IBMVME shows that only 4 bucket are > used, each contain 250 elements, while RI used up all 16 buckets and > each of them contains 62-64 elements. What's more, if we print RI's > objects's hashcode, it was ended with all kind of numbers (e.g. > 5b8827,147c1db ...) while Harmony+IBMVME return the exact > address(not sure) which was ended with even number(e.g. > 77387738,77787778 ...). I haven't try DRLVM at the time (just like > IBMVME?)
I'd say that the IBM VME should do a better job! > So if we do a little modify on the bucket index calculation with > Tim's test, e.g, > int bucket = hashCode >> 1 & (numBuckets - 1); > It shows all bucket are used though some of them get much more > elements than others. And if we modify as: > int bucket = hashCode >> 2 & (numBuckets - 1); > It just appears as RI do, every bucket got nearly the same number > of elements > If we apply this (hashCode >> 2 & (numBuckets - 1)) to the HashMap > bucket index algorism, and do some micro benckmark like put/get/remove > for 10000 times[2], it will improve 7% (~28s to ~26s on my desktop). That's cool (and of course you could rescale the divisor instead of shifting the hashCode each time and get the same effect). > But it is properly not a good idea for HashMap, for some > user-defined hashcode will work very bad in this algorism. Some very > famous benchmark, like Specjbb, will at once reduce 10 ~ 15% in score. > It may not be a proper algorism for classlib buckets index > calculation. Agreed, and with apologies for sounding like a broken record, IMHO the classlib code should not optimize for an inadequate distribution of an undefined hashcode algorithm (unless it is so prevalent that it is a de facto standard). > So it still requires algorism like we discussed before, though it > was not easy at all. If we agree on use &(bucketsize -1) (because it > is much faster than % prime), I think we have agreed on that... > we should find out a way to gather all > 32-bit information into log(2,bucketsize) bits. Operator '+', '|', > '^', '>>' are used for this target (they are fast themselves). However > bucketsize are too small usually (at the beginning it was 16, 4-bit) > to store all 32-bit differences, we have to lose some information > after all. Exactly, no amount of bit-twiddling will help you contain 32-bits of information in 4-bits! The knack is knowing which four bits to select from the hashCode. Unless you know the distribution of the hashCode values is in fact a subset of all the possible 32-bit values allowed by the spec, you can only assume that they are random -- at which point you can select any four bits. > Let's first take two hypothesis: 1) hashcode (and user defined > hashcode) are distributed averagely themselves 2) however the lower > bits are more important than higher bits(it is imposible for them to > be equal, especially in small buckets) You mean more important to the hashmap implementation because they are used by &(numBuckets-1) to compute the bucket index, right? > Under these two hypothesis, we can focus on lower bits. It solved > Yuri's problem that upper bits changes may not effects on index of the > buckets, it does not matter(as maybe happen in most cases). And > however the index-searching algorism can not be too complex, 4 or 5 > operators may be enough to make hashcode more distributed averagely in > the buckets, and avoid the cost on the searching. > After reading some papers and hash algorism (however most of them > are too complex and time-consuming for us), currently I have an > expression here: > index = hashcode ? hashcode >> a ? hashcode >> b ( ? hashcode >> c) > where "?" stands for one of operator of '^,+,|,&' and a,b,c stand > for a number less then 32 , a < b < c (I don't know if last operator > and c can be omitted). > Our task is to find best a,b,c and operators between them to > improve hashcode performance. Currently the best expression I've find > is : > (hash ^ hash >> 9) + hash >> 23 > or > (hash ^ hash >> 9) +( hash >>15 | hash >> 23) > both of them work quite well on my micro benchmark and improve a > little in Specjbb(at least not worse than using "hash" it alone :) ) > But I still believe it can improve a lot. I think you are optimizing for a particular implementation of identityHashCode, and the class library code should be agnostic. > To sum up, 1) I believe Sergey's idea do helps a lot in > hash-related classes, which requires VM guys to do some modification > in IdentityHashCode (a simple guess, object address >> 2 ?) I agree that the VM should do it's best job at producing a good distribution of values for identityHashCode. > 2) for user-define hashcode and HashMap it self, based on > &(bucketsize -1), we can find some parameter for the expression I list > above, or find some similar expression to use, so that we can make it > distributed more averagely in the buckets. Sorry for being so dumb, but how can you devise an expression that will produce a better bucket index from a hashcode if you don't know anything about the characteristics of the hashcode? Regards, Tim
