Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

Vitaly Davidovich Wed, 13 May 2015 06:14:13 -0700

Thanks Claes.  I think there's some room for improvement in the JIT to
piggyback on the user check, but that's a side topic.


Out of curiosity, have you tried moving the slow path (computation of the
hash) into an out-of-line method? What's the bytecode size if you do that?
This shouldn't matter for C2, but I wonder if it may somehow tickle the
JIT/register allocator into better codegen.

On Wed, May 13, 2015 at 9:06 AM, Claes Redestad <[email protected]>
wrote:

>  I shouldn't have said better. :-)
>
> Running against versions of hashCode using value.length > 0 and h != 0
> respectively show results that are within the error range of each other.
> It's
> hard to quantify the cost of one extra branch in the slow path, I guess.
>
> Checking value.length will only avoid the store for the empty string,
> but not for other strings with hash==0. For the empty string, we might
> do one extra compare compare/branch every time we call hashCode,
>
> The value.length > 0 check does issue an additional getfield op code for
> the slow path, avoiding this can be slightly beneficial in some scenarios
> (even though it shouldn't matter much in the final asm).
>
> For reference, this patch (doing if (h != 0) { hash = h; }) nets about a
> 60x speedup on the micro with -t 32 -p valueString="pollinating sandboxes"
> on our multi-socket machines over both the 8058643 implementation
> and the one before it.
>
> /Claes
>
>
> On 2015-05-13 14:53, Vitaly Davidovich wrote:
>
> It's interesting that this version performs as good or *better* than
> value.length > 0 check.  Intuitively, value.length is going to be used
> anyway (if h == 0) in the loop and so there should be no penalty of loading
> and branching on it (given the change here has a branch anyway) and
> compiler can keep that value in a register for the actual loop.
>
>  Looking at the generated asm for current String.hashCode(), it doesn't
> look like compiler is actually using the user-written check to skip its own
> check (see the two tests at the bottom of this snippet):
>
>  0x00007f7e312a0d8c: mov    %rsi,%rbx
>   0x00007f7e312a0d8f: mov    0x10(%rsi),%eax    ;*getfield hash
>                                                 ; -
> java.lang.String::hashCode@1 (line 1454)
>
>    0x00007f7e312a0d92: test   %eax,%eax
>   0x00007f7e312a0d94: jne    0x00007f7e312a0e85  ;*ifne
>                                                 ; -
> java.lang.String::hashCode@6 (line 1455)
>
>    0x00007f7e312a0d9a: mov    0xc(%rsi),%ebp     ;*getfield value
>                                                 ; -
> java.lang.String::hashCode@10 (line 1455)
>
>    0x00007f7e312a0d9d: mov    0xc(%r12,%rbp,8),%r10d  ;*arraylength
>                                                 ; -
> java.lang.String::hashCode@13 (line 1455)
>                                                 ; implicit exception:
> dispatches to 0x00007f7e312a0ea9
>   0x00007f7e312a0da2: test   %r10d,%r10d
>   0x00007f7e312a0da5: jle    0x00007f7e312a0e91  ;*ifle
>                                                 ; -
> java.lang.String::hashCode@14 (line 1455)
>
>    0x00007f7e312a0dab: test   %r10d,%r10d
>   0x00007f7e312a0dae: jbe    0x00007f7e312a0e95
>
>
>  It does however continue using r10.
>
> On Wed, May 13, 2015 at 8:36 AM, Sergey Bylokhov <
> [email protected]> wrote:
>
>> Just curious, what is the difference between this fix and an old version
>> of the code:
>>
>> http://cr.openjdk.java.net/~shade/8058643/webrev.01/src/java.base/share/classes/java/lang/String.java.sdiff.html
>>
>> I meant:
>> "if (h == 0 && value.length > 0) {}"
>> vs
>> "if (h != 0) {hash = h;}"
>>
>>
>> On 13.05.15 14:51, Claes Redestad wrote:
>>
>>> Hi,
>>>
>>> 9-b33 introduced a sustained regression in SPECjvm2008
>>> xml.transform on a number of our test setups.
>>>
>>> JDK-8058643 removed the check on value.length > 0, which
>>> means repeated calls to "".hashCode() now do a store of the
>>> calculated value (0) to the hash field. This has potential to
>>> cause excessive cache coherence traffic in xml.transform[1]
>>>
>>> Extracting the code that showed up in profiles to a micro[2] and
>>> running this in multiple threads is up to 100x slower in 9-b33 on
>>> a dual-socket machine.
>>>
>>> Adding back the check that value.length > 0 before entering the
>>> calculation loop recuperates the loss in both micro and
>>> xml.transform, but we're seeing as good or better number by
>>> simply guarding the store:
>>>
>>> Webrev: http://cr.openjdk.java.net/~redestad/8061254/webrev.00/
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8061254 (confidential
>>> due to links to internal systems, sorry!)
>>>
>>> This additionally avoids the field store (and potential for pointless
>>> cache traffic) also on non-empty strings whose hash value happen
>>> to equals 0.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> [1] See
>>> com.sun.org.apache.xml.internal.dtm.ref.ExpandedNameTable#getExpandedTypeID.
>>> [2]
>>> http://cr.openjdk.java.net/~redestad/8061254/expandedtypeid-micro.zip
>>>
>>
>>
>>   --
>> Best regards, Sergey.
>>
>>
>
>

Re: RFR: 8061254: SPECjvm2008-XML performance regressions in 9-b33

Reply via email to