On 06/08/2013 10:25 PM, Peter Levart wrote:

On 06/08/2013 10:17 PM, Peter Levart wrote:

On 06/08/2013 12:21 PM, Remi Forax wrote:
On 06/07/2013 12:01 PM, Aleksey Shipilev wrote:
(posting this to hotspot-dev@ and cc-ing core-libs-dev@, as Christian T.
suggested offline)

Hi guys,

Hi Aleksey,


The fix for scalability problem is here:
http://cr.openjdk.java.net/~shade/7177472/webrev.00/

in add, the do/while does something a little weird, if putIfAbsent returns null, interned is equals to elem, there is no need to do a e.get() in that case,
I would prefer a code like this:

            T interned;
            do {
                expungeStaleElements();
                WeakEntry<T> e = new WeakEntry<>(elem, stale);
                WeakEntry<T> exist = map.putIfAbsent(e, e);
                interned = (exist == null)? elem: exist.get();
            } while (interned == null);
            return interned;

Hi Remi, Aleksey,

In case the loop retries, there's no need to construct another WeakEntry...

            T interned;
             WeakEntry<T> e = new WeakEntry<>(elem, stale);
            do {
                expungeStaleElements();
                WeakEntry<T> exist = map.putIfAbsent(e, e);
                interned = (exist == null)? elem: exist.get();
            } while (interned == null);
            return interned;

Regards, Peter

Aleksey,

Also, in a retry-loop you might be spin-waiting for the ReferenceHandler thread to transfer the cleared WeakEntry to the ReferenceQueue. You might want to do a map.replace(e, exist, e) in case exist != null and exist.get() == null to make it independent of ReferenceHandler thread's processing. In this case only a single out-of-loop call to expungeStaleElements() would be enough.

Scrap that. when exist.get() == null, exist.equals(e) == false, so putIfAbsent should succeed.


Peter



The cast to WeakEntry in expungeStaleElements is not needed.

In WeakEntry.equals, the null check is not needed because null instanceof WeakEntry returns false, and you don't need to cast obj.get() to a WeakEntry<T> given you only to call equals on entry.get().
So the code should be:


           public boolean equals(Object obj) {
             if (!(obj instanceof WeakEntry)) {
               return false;
             }
               Object that = ((WeakEntry<?>)obj).get();
             Object mine = get();
return (that == null || mine == null)? this == obj: mine.equals(that);
         }


Otherwise, it looks good.

RĂ©mi


Testing:
   - Linux x86_64 builds OK
   - Linux x86_64 java/lang/invoke/ jtreg passes OK

Since this is a proof-of-concept patch at this point, I kept the testing
minimal. Let me know what other testing you think is needed before the
integration.

Thanks,
Aleksey.

*** Rationale and details for the issue follow: ***

This issue was in limbo for an year. The performance issue with
MethodType.methodType factory leads to abysmal scalability on the
targeted tests. While this can arguably demonstrated on larger
workloads, the targeted microbenchmarks are showcasing it nicely. This
issue was blocking the JSR292 API performance testing, because on large
enough test server, you will inevitably be bitten by this.

If you run the JMH-enabled [1] microbenchmark:
http://openjdk.java.net/projects/code-tools/jmh/
http://cr.openjdk.java.net/~shade/7177472/jsr292bench.tar.gz

...on the current jdk8/jdk8, 2x2 i5 2.0 Ghz, Linux x86_64, you will see the scalability is going down. ("testCached" makes sure the reference to
MethodType is always strongly reachable, "testNew" makes sure the
reference is mostly non-reachable, prompting cache re-fill, "testOOB"
does not make any precautions about that, and also does not suffer from
the static overheads entailed by the reference management in first two
tests).

1 thread:
MethodTypeBench.testCached     78 +-  1 nsec/op
MethodTypeBench.testNew       120 +-  1 nsec/op
MethodTypeBench.testOOB        41 +-  1 nsec/op

4 threads:
MethodTypeBench.testCached    495 +- 40 nsec/op
MethodTypeBench.testNew       611 +- 37 nsec/op
MethodTypeBench.testOOB       377 +- 10 nsec/op

In fact, on larger machines it may be as bad as tens/hundreds of
microseconds to look up the MethodType. With the patch applied, on the
same 2x2 machine:

1 thread:
MethodTypeBench.testCached     92 +-  1 nsec/op
MethodTypeBench.testNew       101 +-  1 nsec/op
MethodTypeBench.testOOB        49 +-  1 nsec/op

4 threads:
MethodTypeBench.testCached    129 +-  1 nsec/op
MethodTypeBench.testNew       142 +- 10 nsec/op
MethodTypeBench.testOOB        89 +-  3 nsec/op


...which means the scalability is back: the average time to look up the
MethodType is not growing spectacularly when the thread count go up.
Again, the effect is astonishing on large machines.

Notice that we took some hit in single-threaded performance, and that is
due to two reasons: a) WeakEntry allocation in get(); b) CHM. With the
upcoming CHMv8, this starts to look better:

1 thread:
MethodTypeBench.testCached     87 +-  1 nsec/op
MethodTypeBench.testNew        95 +-  1 nsec/op
MethodTypeBench.testOOB        52 +-  1 nsec/op

4 threads:
MethodTypeBench.testCached    121 +-  2 nsec/op
MethodTypeBench.testNew       141 +-  1 nsec/op
MethodTypeBench.testOOB        93 +-  5 nsec/op

...which seems to indicate this change is future-proof.




Reply via email to