Thanks so much Yuri ! But I'm a little concern before I click on these pages, I'm not sure that we can touch that code/algorithm for Harmony development, is that copyright-ed or patent-ed?
2007/6/27, Yuri Dolgov <[EMAIL PROTECTED]>:
Hi Jimmy, Here are couple of interesting links on hash functions implementation: http://burtleburtle.net/bob/hash/doobs.html http://www.azillionmonkeys.com/qed/hash.html In my point of view usage of shifts should give some benefit as in performance so in a distribution quality. Thanks, Yuri On 6/27/07, Jimmy,Jing Lv <[EMAIL PROTECTED]> wrote: > > Hi, > > As currently focus on performance, I find this JIRA(4064) and do > some investigation on it. > > The Performance improvement of java.util.HashMap is about two > aspects: one is hash algorithm that make hashcode randomly distributed > enough, just as Tim's fix/comments in Harmony-4064, the other is about > to make hash algorithm itself perform fast enough. HashMap is so > widely used so that both two aspects may be very important. > > I strongly agree with Tim that hash algorithm that make hashcode > randomly distributed is very important, for HashMap.put(), get(), etc, > depends on it and it should perform better if the hash algorithm can > make it randomly distributed. On the other hand, the hash algorithm > should perform very fast as every HashMap.put and get method call this > hash algorithm, if it is slow itself, then its benfits on randomly > distribution became meaningless. > > I do a lot of benchmark and micro-benchmarks, it shows that (a & > b) is much faster than (a % b). And if b is Math.pow(2,n)-1, the two > operations just get the same result. That's why if we choose > "&(Math.pow(2,n)-1)" (here Math.pow(2,n) is the size of bucket of a > hashmap)to replace "%", by this way we improve performance greatly at > least 2% ~ 3% in some benchmarks . This shows if we avoid slow > operation like mod(%), multiply(*), division(/) and so forth, we can > get performance improvement. > > Yes, I do think to randomly distrubute the hashcode is very > important and I suggest we can use other machnisms other than mod > prime to make it happen. One thought is right-shift the hashcode and > combine bits together (by AND or OR or Exclusive Or operation) into > low order(e.g, (a & (a >> 8) ^ (a >> 16) | (a >> 24)) & > (bucketSize-1), a small test on it shows that it does distributed more > randomly, but it may be better algorithms. In this way the hashcode > may be more random destributed and performs fast. And in this way we > may keep a balance of a both fast and good hash algorithm. > > Any comments/suggestions? Thanks! > > > 2007/6/6, Tim Ellison (JIRA) <[EMAIL PROTECTED]>: > > > > [ > https://issues.apache.org/jira/browse/HARMONY-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501905] > > > > Tim Ellison commented on HARMONY-4064: > > -------------------------------------- > > > > Robert. You patch undoes the change I made to make the length of the > elements array a prime number, so that it is now always a power of two. The > problem is that identity hash codes are not randomly distributed, they > typically reflect object address alignments which will be factors of two, so > we don't get a good distribution across all the possible buckets. > > > > You can see the effect in this simple 'simulation': > > > > public class HashingTest { > > > > public static void main(String[] args) { > > int numBuckets = 16; > > > > int[] distrib = new int[numBuckets]; > > for (int i = 0; i < distrib.length; i++) { > > distrib[i] = 0; > > } > > > > for (int i = 0; i < 1000; i++) { > > int hashCode = new Object().hashCode(); > > int bucket = hashCode & (numBuckets - 1); > > distrib[bucket] = distrib[bucket] + 1; > > } > > > > System.out.println("Bucket\tList_length"); > > for (int i = 0; i < distrib.length; i++) { > > System.out.println(i + "\t" + distrib[i]); > > } > > } > > } > > > > I suggest that we have to ensure the elements array length is chosen > more carefully, or that we modify the hashCodes to avoid these collisions. > > > > > [classlib][luni] Performance improvement of java.util.HashMap > > > ------------------------------------------------------------- > > > > > > Key: HARMONY-4064 > > > URL: > https://issues.apache.org/jira/browse/HARMONY-4064 > > > Project: Harmony > > > Issue Type: Improvement > > > Components: Classlib > > > Reporter: Robert Hu > > > Assignee: Tim Ellison > > > Attachments: HARMONY-4064.diff > > > > > > > > > The performance of HashMap can be improved, the hash method is > improved. > > > By running the following test code, our HashMap can be improved from > 110% time cost (compared by RI) to 90% time cost (compared by RI). > > > public class TestHashMap { > > > public static void main(String[] args) { > > > Random ran = new Random(); > > > HashMap map = new HashMap(); > > > int elementNum = 500; > > > int times = 10000; > > > int[] rans = new int[elementNum]; > > > for (int i = 0; i < elementNum; i++) > > > rans[i] = ran.nextInt(Integer.MAX_VALUE); > > > long start = System.currentTimeMillis(); > > > for (int i = 0; i < elementNum; i++) > > > map.put(rans[i], "b"); > > > System.out.println(System.currentTimeMillis() - start); > > > start = System.currentTimeMillis(); > > > for (int i = 0; i < elementNum; i++) > > > for(int j = 0; j< times; j++){ > > > map.get(rans[i]); > > > } > > > System.out.println(System.currentTimeMillis() - start); > > > } > > > } > > > > -- > > This message is automatically generated by JIRA. > > - > > You can reply to this email to add a comment to the issue online. > > > > > > > -- > > Best Regards! > > Jimmy, Jing Lv > China Software Development Lab, IBM >
-- Best Regards! Jimmy, Jing Lv China Software Development Lab, IBM
