Given the table size if a power of two, another possible optimization would be
     private static int nextKeyIndex(int i, int len) {
-        return (i + 2 < len ? i + 2 : 0);
+        return (i + 2) & (len - 1);
     }

Or even
+        int m = len - 1;
         while ( (item = tab[i]) != null) {
...
-            i = nextKeyIndex(i, len);
+            i = (i + 2) & m;
         }
where ever applicable.

On 08.07.2014 19:14, Martin Buchholz wrote:
So ... using 3*x is never wrong, and optimizing to x + (x << 1) is at best only going to save a couple of cycles, so 3*x is the right choice except for the most performance sensitive code.

In java.util, we tend to go further and write optimal code even when hotspot is likely to make the same optimizations, partly under Doug Lea's performance-obsessed influence.

Also, microbenchmarking is notoriously difficult.

If you've written a microbenchmark, it would be good to check it in somewhere. I don't know what current openjdk practice is for that...


On Tue, Jul 8, 2014 at 7:01 AM, Ivan Gerasimov <ivan.gerasi...@oracle.com <mailto:ivan.gerasi...@oracle.com>> wrote:


    On 08.07.2014 4:44, Martin Buchholz wrote:



    On Mon, Jul 7, 2014 at 9:41 AM, Martin Buchholz
    <marti...@google.com <mailto:marti...@google.com>> wrote:

        I'd like to offer an alternative version of this change.
         webrev coming soon.


    Here's the promised webrev.
    
http://cr.openjdk.java.net/~martin/webrevs/openjdk9/IdentityHashMap-capacity/
    
<http://cr.openjdk.java.net/%7Emartin/webrevs/openjdk9/IdentityHashMap-capacity/>

    - Fixes a typo in the javadoc.
    - removes the "threshold" field (WAT, a cache to avoid
    multiplying by 3?)
    - optimizes 3*x into x + x << 1

    My personal preference would be x + x + x, but I thought JIT can
    do this kind of optimizations itself.
    Out of curiosity I've created a microbenchmark:

Benchmark Mode Samples Score Score error Units o.s.MyBenchmark.testMethod_01_X3 avgt 200 5.900 0.041 ns/op o.s.MyBenchmark.testMethod_02_PPP avgt 200 6.029 0.035 ns/op o.s.MyBenchmark.testMethod_03_PSH avgt 200 5.907 0.071 ns/op

    On my machine 3*x and x + (x<<1) appear to be of the same speed
    (#1 and #3 above).
    x + x + x (case #2) happens to be ~2% slower.

    Given the optimization doesn't introduce any speedup, wouldn't it
    be better to use 3*x for its clarity?

    Sincerely yours,
    Ivan



    - adds more test assertions
    - removes the unusual 4/3 slack space for expansion on
    deserialization.

    TODO: the test should be testng-ified, I think.





Reply via email to