Re: Performance improvement of java.util.HashMap (Was Re: [jira] Commented: (HARMONY-4064) [classlib][luni] Performance improvement of java.util.HashMap)

Jimmy,Jing Lv Wed, 27 Jun 2007 20:08:07 -0700

Thanks so much Yuri !
But I'm a little concern before I click on these pages, I'm not sure
that we can touch that code/algorithm for Harmony development, is that
copyright-ed or patent-ed?


2007/6/27, Yuri Dolgov <[EMAIL PROTECTED]>:

Hi Jimmy,
Here are couple of interesting links on hash functions implementation:
http://burtleburtle.net/bob/hash/doobs.html
http://www.azillionmonkeys.com/qed/hash.html

In my point of view usage of shifts should give some benefit as in
performance so in a distribution quality.

Thanks,
Yuri



On 6/27/07, Jimmy,Jing Lv <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
>     As currently focus on performance, I find this JIRA(4064) and do
> some investigation on it.
>
>     The Performance improvement of java.util.HashMap is about two
> aspects: one is hash algorithm that make hashcode randomly distributed
> enough, just as Tim's fix/comments in Harmony-4064, the other is about
> to make hash algorithm itself perform fast enough. HashMap is so
> widely used so that both two aspects may be very important.
>
>     I strongly agree with Tim that hash algorithm that make hashcode
> randomly distributed is very important, for HashMap.put(), get(), etc,
> depends on it and it should perform better if the hash algorithm can
> make it randomly distributed. On the other hand, the hash algorithm
> should perform very fast as every HashMap.put and get method call this
> hash algorithm, if it is slow itself, then its benfits on randomly
> distribution became meaningless.
>
>     I do a lot of benchmark and micro-benchmarks,  it shows that (a &
> b) is much faster than (a % b). And if b is Math.pow(2,n)-1, the two
> operations just get the same result. That's why if we choose
> "&(Math.pow(2,n)-1)" (here Math.pow(2,n) is the size of bucket of a
> hashmap)to replace "%", by this way we improve performance greatly at
> least 2% ~ 3% in some benchmarks . This shows if we avoid slow
> operation like mod(%), multiply(*), division(/) and so forth, we can
> get performance improvement.
>
>     Yes, I do think to randomly distrubute the hashcode is very
> important and I suggest we can use other machnisms other than mod
> prime to make it happen. One thought is right-shift the hashcode and
> combine bits together (by AND or OR or Exclusive Or operation) into
> low order(e.g, (a & (a >> 8) ^ (a >> 16) | (a >> 24)) &
> (bucketSize-1), a small test on it shows that it does distributed more
> randomly, but it may be better algorithms. In this way the hashcode
> may be more random destributed and performs fast. And in this way we
> may keep a balance of a both fast and good hash algorithm.
>
> Any comments/suggestions? Thanks!
>
>
> 2007/6/6, Tim Ellison (JIRA) <[EMAIL PROTECTED]>:
> >
> >     [
> 
https://issues.apache.org/jira/browse/HARMONY-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501905]
> >
> > Tim Ellison commented on HARMONY-4064:
> > --------------------------------------
> >
> > Robert.  You patch undoes the change I made to make the length of the
> elements array a prime number, so that it is now always a power of two.  The
> problem is that identity hash codes are not randomly distributed, they
> typically reflect object address alignments which will be factors of two, so
> we don't get a good distribution across all the possible buckets.
> >
> > You can see the effect in this simple 'simulation':
> >
> > public class HashingTest {
> >
> >     public static void main(String[] args) {
> >         int numBuckets = 16;
> >
> >         int[] distrib = new int[numBuckets];
> >         for (int i = 0; i < distrib.length; i++) {
> >             distrib[i] = 0;
> >         }
> >
> >         for (int i = 0; i < 1000; i++) {
> >             int hashCode = new Object().hashCode();
> >             int bucket = hashCode & (numBuckets - 1);
> >             distrib[bucket] = distrib[bucket] + 1;
> >         }
> >
> >         System.out.println("Bucket\tList_length");
> >         for (int i = 0; i < distrib.length; i++) {
> >             System.out.println(i + "\t" + distrib[i]);
> >         }
> >     }
> > }
> >
> > I suggest that we have to ensure the elements array length is chosen
> more carefully, or that we modify the hashCodes to avoid these collisions.
> >
> > > [classlib][luni] Performance improvement of java.util.HashMap
> > > -------------------------------------------------------------
> > >
> > >                 Key: HARMONY-4064
> > >                 URL:
> https://issues.apache.org/jira/browse/HARMONY-4064
> > >             Project: Harmony
> > >          Issue Type: Improvement
> > >          Components: Classlib
> > >            Reporter: Robert Hu
> > >            Assignee: Tim Ellison
> > >         Attachments: HARMONY-4064.diff
> > >
> > >
> > > The performance of HashMap can be improved, the hash method is
> improved.
> > > By running the following test code, our HashMap can be improved from
> 110% time cost (compared by RI) to 90% time cost (compared by RI).
> > > public class TestHashMap {
> > >     public static void main(String[] args) {
> > >         Random ran = new Random();
> > >         HashMap map = new HashMap();
> > >         int elementNum = 500;
> > >         int times = 10000;
> > >         int[] rans = new int[elementNum];
> > >         for (int i = 0; i < elementNum; i++)
> > >             rans[i] = ran.nextInt(Integer.MAX_VALUE);
> > >         long start = System.currentTimeMillis();
> > >         for (int i = 0; i < elementNum; i++)
> > >             map.put(rans[i], "b");
> > >         System.out.println(System.currentTimeMillis() - start);
> > >         start = System.currentTimeMillis();
> > >         for (int i = 0; i < elementNum; i++)
> > >             for(int j = 0; j< times; j++){
> > >                 map.get(rans[i]);
> > >             }
> > >         System.out.println(System.currentTimeMillis() - start);
> > >     }
> > > }
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>
>
> --
>
> Best Regards!
>
> Jimmy, Jing Lv
> China Software Development Lab, IBM
>



--

Best Regards!

Jimmy, Jing Lv
China Software Development Lab, IBM

Re: Performance improvement of java.util.HashMap (Was Re: [jira] Commented: (HARMONY-4064) [classlib][luni] Performance improvement of java.util.HashMap)

Reply via email to