Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

2015-05-15 Thread Alan Bateman

On 13/05/2015 12:51, Claes Redestad wrote:

Hi,

9-b33 introduced a sustained regression in SPECjvm2008
xml.transform on a number of our test setups.

JDK-8058643 removed the check on value.length  0, which
means repeated calls to .hashCode() now do a store of the
calculated value (0) to the hash field. This has potential to
cause excessive cache coherence traffic in xml.transform[1]
I remember the original change but of course didn't spot that empty 
String would result in a write of 0. I like Shipilev's First Law.


The patch looks good to me.

-Alan.


Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

2015-05-15 Thread Claes Redestad


On 2015-05-15 15:23, Alan Bateman wrote:

On 13/05/2015 12:51, Claes Redestad wrote:

Hi,

9-b33 introduced a sustained regression in SPECjvm2008
xml.transform on a number of our test setups.

JDK-8058643 removed the check on value.length  0, which
means repeated calls to .hashCode() now do a store of the
calculated value (0) to the hash field. This has potential to
cause excessive cache coherence traffic in xml.transform[1]
I remember the original change but of course didn't spot that empty 
String would result in a write of 0. I like Shipilev's First Law.


Yes, this is a fine example of that law. :-)



The patch looks good to me.


Thanks!

/Claes


Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

2015-05-13 Thread Aleksey Shipilev
On 13.05.2015 14:51, Claes Redestad wrote:
 9-b33 introduced a sustained regression in SPECjvm2008
 xml.transform on a number of our test setups.
 
 JDK-8058643 removed the check on value.length  0, which
 means repeated calls to .hashCode() now do a store of the
 calculated value (0) to the hash field. This has potential to
 cause excessive cache coherence traffic in xml.transform[1]

More details: it seems like xml.transform ends up using JDK APIs
(com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable.getExpandedTypeID)
that compute the hashcode for null argument by calling
.hashCode(), which is silly in itself.

Anyhow, dropping value.length  0 check was my bad, and it is a yet
another manifestation of the empirical law in performance engineering:
 https://twitter.com/shipilev/status/539119320081907713

 Adding back the check that value.length  0 before entering the
 calculation loop recuperates the loss in both micro and
 xml.transform, but we're seeing as good or better number by
 simply guarding the store:
 
 Webrev: http://cr.openjdk.java.net/~redestad/8061254/webrev.00/

Good.

Thanks,
-Aleksey



Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

2015-05-13 Thread Sergey Bylokhov
Just curious, what is the difference between this fix and an old version 
of the code:

http://cr.openjdk.java.net/~shade/8058643/webrev.01/src/java.base/share/classes/java/lang/String.java.sdiff.html

I meant:
if (h == 0  value.length  0) {}
vs
if (h != 0) {hash = h;}

On 13.05.15 14:51, Claes Redestad wrote:

Hi,

9-b33 introduced a sustained regression in SPECjvm2008
xml.transform on a number of our test setups.

JDK-8058643 removed the check on value.length  0, which
means repeated calls to .hashCode() now do a store of the
calculated value (0) to the hash field. This has potential to
cause excessive cache coherence traffic in xml.transform[1]

Extracting the code that showed up in profiles to a micro[2] and
running this in multiple threads is up to 100x slower in 9-b33 on
a dual-socket machine.

Adding back the check that value.length  0 before entering the
calculation loop recuperates the loss in both micro and
xml.transform, but we're seeing as good or better number by
simply guarding the store:

Webrev: http://cr.openjdk.java.net/~redestad/8061254/webrev.00/
Bug: https://bugs.openjdk.java.net/browse/JDK-8061254 (confidential
due to links to internal systems, sorry!)

This additionally avoids the field store (and potential for pointless
cache traffic) also on non-empty strings whose hash value happen
to equals 0.

Thanks!

/Claes

[1] See 
com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable#getExpandedTypeID.

[2] http://cr.openjdk.java.net/~redestad/8061254/expandedtypeid-micro.zip



--
Best regards, Sergey.



Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

2015-05-13 Thread Vitaly Davidovich
It's interesting that this version performs as good or *better* than
value.length  0 check.  Intuitively, value.length is going to be used
anyway (if h == 0) in the loop and so there should be no penalty of loading
and branching on it (given the change here has a branch anyway) and
compiler can keep that value in a register for the actual loop.

Looking at the generated asm for current String.hashCode(), it doesn't look
like compiler is actually using the user-written check to skip its own
check (see the two tests at the bottom of this snippet):

0x7f7e312a0d8c: mov%rsi,%rbx
  0x7f7e312a0d8f: mov0x10(%rsi),%eax;*getfield hash
; -
java.lang.String::hashCode@1 (line 1454)

  0x7f7e312a0d92: test   %eax,%eax
  0x7f7e312a0d94: jne0x7f7e312a0e85  ;*ifne
; -
java.lang.String::hashCode@6 (line 1455)

  0x7f7e312a0d9a: mov0xc(%rsi),%ebp ;*getfield value
; -
java.lang.String::hashCode@10 (line 1455)

  0x7f7e312a0d9d: mov0xc(%r12,%rbp,8),%r10d  ;*arraylength
; -
java.lang.String::hashCode@13 (line 1455)
; implicit exception:
dispatches to 0x7f7e312a0ea9
  0x7f7e312a0da2: test   %r10d,%r10d
  0x7f7e312a0da5: jle0x7f7e312a0e91  ;*ifle
; -
java.lang.String::hashCode@14 (line 1455)

  0x7f7e312a0dab: test   %r10d,%r10d
  0x7f7e312a0dae: jbe0x7f7e312a0e95


It does however continue using r10.

On Wed, May 13, 2015 at 8:36 AM, Sergey Bylokhov sergey.bylok...@oracle.com
 wrote:

 Just curious, what is the difference between this fix and an old version
 of the code:

 http://cr.openjdk.java.net/~shade/8058643/webrev.01/src/java.base/share/classes/java/lang/String.java.sdiff.html

 I meant:
 if (h == 0  value.length  0) {}
 vs
 if (h != 0) {hash = h;}


 On 13.05.15 14:51, Claes Redestad wrote:

 Hi,

 9-b33 introduced a sustained regression in SPECjvm2008
 xml.transform on a number of our test setups.

 JDK-8058643 removed the check on value.length  0, which
 means repeated calls to .hashCode() now do a store of the
 calculated value (0) to the hash field. This has potential to
 cause excessive cache coherence traffic in xml.transform[1]

 Extracting the code that showed up in profiles to a micro[2] and
 running this in multiple threads is up to 100x slower in 9-b33 on
 a dual-socket machine.

 Adding back the check that value.length  0 before entering the
 calculation loop recuperates the loss in both micro and
 xml.transform, but we're seeing as good or better number by
 simply guarding the store:

 Webrev: http://cr.openjdk.java.net/~redestad/8061254/webrev.00/
 Bug: https://bugs.openjdk.java.net/browse/JDK-8061254 (confidential
 due to links to internal systems, sorry!)

 This additionally avoids the field store (and potential for pointless
 cache traffic) also on non-empty strings whose hash value happen
 to equals 0.

 Thanks!

 /Claes

 [1] See
 com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable#getExpandedTypeID.
 [2] http://cr.openjdk.java.net/~redestad/8061254/expandedtypeid-micro.zip



 --
 Best regards, Sergey.




Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

2015-05-13 Thread Vitaly Davidovich
Thanks Claes.  I think there's some room for improvement in the JIT to
piggyback on the user check, but that's a side topic.

Out of curiosity, have you tried moving the slow path (computation of the
hash) into an out-of-line method? What's the bytecode size if you do that?
This shouldn't matter for C2, but I wonder if it may somehow tickle the
JIT/register allocator into better codegen.

On Wed, May 13, 2015 at 9:06 AM, Claes Redestad claes.redes...@oracle.com
wrote:

  I shouldn't have said better. :-)

 Running against versions of hashCode using value.length  0 and h != 0
 respectively show results that are within the error range of each other.
 It's
 hard to quantify the cost of one extra branch in the slow path, I guess.

 Checking value.length will only avoid the store for the empty string,
 but not for other strings with hash==0. For the empty string, we might
 do one extra compare compare/branch every time we call hashCode,

 The value.length  0 check does issue an additional getfield op code for
 the slow path, avoiding this can be slightly beneficial in some scenarios
 (even though it shouldn't matter much in the final asm).

 For reference, this patch (doing if (h != 0) { hash = h; }) nets about a
 60x speedup on the micro with -t 32 -p valueString=pollinating sandboxes
 on our multi-socket machines over both the 8058643 implementation
 and the one before it.

 /Claes


 On 2015-05-13 14:53, Vitaly Davidovich wrote:

 It's interesting that this version performs as good or *better* than
 value.length  0 check.  Intuitively, value.length is going to be used
 anyway (if h == 0) in the loop and so there should be no penalty of loading
 and branching on it (given the change here has a branch anyway) and
 compiler can keep that value in a register for the actual loop.

  Looking at the generated asm for current String.hashCode(), it doesn't
 look like compiler is actually using the user-written check to skip its own
 check (see the two tests at the bottom of this snippet):

  0x7f7e312a0d8c: mov%rsi,%rbx
   0x7f7e312a0d8f: mov0x10(%rsi),%eax;*getfield hash
 ; -
 java.lang.String::hashCode@1 (line 1454)

0x7f7e312a0d92: test   %eax,%eax
   0x7f7e312a0d94: jne0x7f7e312a0e85  ;*ifne
 ; -
 java.lang.String::hashCode@6 (line 1455)

0x7f7e312a0d9a: mov0xc(%rsi),%ebp ;*getfield value
 ; -
 java.lang.String::hashCode@10 (line 1455)

0x7f7e312a0d9d: mov0xc(%r12,%rbp,8),%r10d  ;*arraylength
 ; -
 java.lang.String::hashCode@13 (line 1455)
 ; implicit exception:
 dispatches to 0x7f7e312a0ea9
   0x7f7e312a0da2: test   %r10d,%r10d
   0x7f7e312a0da5: jle0x7f7e312a0e91  ;*ifle
 ; -
 java.lang.String::hashCode@14 (line 1455)

0x7f7e312a0dab: test   %r10d,%r10d
   0x7f7e312a0dae: jbe0x7f7e312a0e95


  It does however continue using r10.

 On Wed, May 13, 2015 at 8:36 AM, Sergey Bylokhov 
 sergey.bylok...@oracle.com wrote:

 Just curious, what is the difference between this fix and an old version
 of the code:

 http://cr.openjdk.java.net/~shade/8058643/webrev.01/src/java.base/share/classes/java/lang/String.java.sdiff.html

 I meant:
 if (h == 0  value.length  0) {}
 vs
 if (h != 0) {hash = h;}


 On 13.05.15 14:51, Claes Redestad wrote:

 Hi,

 9-b33 introduced a sustained regression in SPECjvm2008
 xml.transform on a number of our test setups.

 JDK-8058643 removed the check on value.length  0, which
 means repeated calls to .hashCode() now do a store of the
 calculated value (0) to the hash field. This has potential to
 cause excessive cache coherence traffic in xml.transform[1]

 Extracting the code that showed up in profiles to a micro[2] and
 running this in multiple threads is up to 100x slower in 9-b33 on
 a dual-socket machine.

 Adding back the check that value.length  0 before entering the
 calculation loop recuperates the loss in both micro and
 xml.transform, but we're seeing as good or better number by
 simply guarding the store:

 Webrev: http://cr.openjdk.java.net/~redestad/8061254/webrev.00/
 Bug: https://bugs.openjdk.java.net/browse/JDK-8061254 (confidential
 due to links to internal systems, sorry!)

 This additionally avoids the field store (and potential for pointless
 cache traffic) also on non-empty strings whose hash value happen
 to equals 0.

 Thanks!

 /Claes

 [1] See
 com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable#getExpandedTypeID.
 [2]
 http://cr.openjdk.java.net/~redestad/8061254/expandedtypeid-micro.zip



   --
 Best regards, Sergey.






Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

2015-05-13 Thread Claes Redestad

I shouldn't have said better. :-)

Running against versions of hashCode using value.length  0 and h != 0
respectively show results that are within the error range of each other. 
It's

hard to quantify the cost of one extra branch in the slow path, I guess.

Checking value.length will only avoid the store for the empty string,
but not for other strings with hash==0. For the empty string, we might
do one extra compare compare/branch every time we call hashCode,

The value.length  0 check does issue an additional getfield op code for
the slow path, avoiding this can be slightly beneficial in some scenarios
(even though it shouldn't matter much in the final asm).

For reference, this patch (doing if (h != 0) { hash = h; }) nets about a
60x speedup on the micro with -t 32 -p valueString=pollinating sandboxes
on our multi-socket machines over both the 8058643 implementation
and the one before it.

/Claes

On 2015-05-13 14:53, Vitaly Davidovich wrote:
It's interesting that this version performs as good or *better* than 
value.length  0 check.  Intuitively, value.length is going to be used 
anyway (if h == 0) in the loop and so there should be no penalty of 
loading and branching on it (given the change here has a branch 
anyway) and compiler can keep that value in a register for the actual 
loop.


Looking at the generated asm for current String.hashCode(), it doesn't 
look like compiler is actually using the user-written check to skip 
its own check (see the two tests at the bottom of this snippet):


0x7f7e312a0d8c: mov%rsi,%rbx
  0x7f7e312a0d8f: mov0x10(%rsi),%eax  ;*getfield hash
; - 
java.lang.String::hashCode@1 (line 1454)


  0x7f7e312a0d92: test   %eax,%eax
  0x7f7e312a0d94: jne0x7f7e312a0e85  ;*ifne
; - 
java.lang.String::hashCode@6 (line 1455)


  0x7f7e312a0d9a: mov0xc(%rsi),%ebp ;*getfield value
; - 
java.lang.String::hashCode@10 (line 1455)


  0x7f7e312a0d9d: mov0xc(%r12,%rbp,8),%r10d  ;*arraylength
; - 
java.lang.String::hashCode@13 (line 1455)
; implicit exception: 
dispatches to 0x7f7e312a0ea9

  0x7f7e312a0da2: test   %r10d,%r10d
  0x7f7e312a0da5: jle0x7f7e312a0e91  ;*ifle
; - 
java.lang.String::hashCode@14 (line 1455)


  0x7f7e312a0dab: test   %r10d,%r10d
  0x7f7e312a0dae: jbe0x7f7e312a0e95


It does however continue using r10.

On Wed, May 13, 2015 at 8:36 AM, Sergey Bylokhov 
sergey.bylok...@oracle.com mailto:sergey.bylok...@oracle.com wrote:


Just curious, what is the difference between this fix and an old
version of the code:

http://cr.openjdk.java.net/~shade/8058643/webrev.01/src/java.base/share/classes/java/lang/String.java.sdiff.html

http://cr.openjdk.java.net/%7Eshade/8058643/webrev.01/src/java.base/share/classes/java/lang/String.java.sdiff.html

I meant:
if (h == 0  value.length  0) {}
vs
if (h != 0) {hash = h;}


On 13.05.15 14:51, Claes Redestad wrote:

Hi,

9-b33 introduced a sustained regression in SPECjvm2008
xml.transform on a number of our test setups.

JDK-8058643 removed the check on value.length  0, which
means repeated calls to .hashCode() now do a store of the
calculated value (0) to the hash field. This has potential to
cause excessive cache coherence traffic in xml.transform[1]

Extracting the code that showed up in profiles to a micro[2] and
running this in multiple threads is up to 100x slower in 9-b33 on
a dual-socket machine.

Adding back the check that value.length  0 before entering the
calculation loop recuperates the loss in both micro and
xml.transform, but we're seeing as good or better number by
simply guarding the store:

Webrev:
http://cr.openjdk.java.net/~redestad/8061254/webrev.00/
http://cr.openjdk.java.net/%7Eredestad/8061254/webrev.00/
Bug: https://bugs.openjdk.java.net/browse/JDK-8061254
(confidential
due to links to internal systems, sorry!)

This additionally avoids the field store (and potential for
pointless
cache traffic) also on non-empty strings whose hash value happen
to equals 0.

Thanks!

/Claes

[1] See

com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable#getExpandedTypeID.
[2]
http://cr.openjdk.java.net/~redestad/8061254/expandedtypeid-micro.zip

http://cr.openjdk.java.net/%7Eredestad/8061254/expandedtypeid-micro.zip



-- 
Best regards, Sergey.