On Wed, 14 Oct 2020 02:04:42 GMT, CoreyAshford 
<github.com+51754783+coreyashf...@openjdk.org> wrote:

>>> Hi Corey, thanks for taking some stuff out of the “too short” path. There 
>>> may be a performance regression when decoding
>>> many short arrays because of the stub call overhead and the usage of the 
>>> slower part of the Java implementation. We
>>> could do it a little better in many cases to compute the maximum possible 
>>> iteration count i: i = (sl - sp) / block_size
>>> if (i * block_size > sl - 12) i-- if (i <= 0) return 0 What do you think?
>> 
>> Are you thinking of a case where that produces a higher iteration count?  It 
>> looks effectively the same to me.
>> 
>>> I don’t think branch prediction hints are helpful for the “too short” check.
>> 
>> My thinking is that most of the time when the intrinsic is called, it will 
>> not take the early exit, but I suppose when
>> it is processing a sub-block_size buffer, it will return early every time.  
>> I will remove the hints.
>>> And we should better use CCR1 instead of CCR2 which is specified as 
>>> non-volatile.
>> 
>> Ah, I should have checked the calling conventions.  I thought all of the CR* 
>> regs are volatile.  I will fix that.
>> 
>>>  Did you already find a 2nd reviewer for the PPC64 part?
>> 
>> Your original comment said "2nd review", so I thought you meant you need to 
>> review it again after the changes.  So, no,
>> I haven't looked for or found a second reviewer.  Any suggestions?  The 
>> folks on the team here have been busy with
>> other work.  Btw, I'm off today, so I will push commits to the 
>> above-mentioned issues tomorrow.
>
>> 
>> > Did you already find a 2nd reviewer for the PPC64 part?
>> 
>> Your original comment said "2nd review", so I thought you meant you need to 
>> review it again after the changes. So, no,
>> I haven't looked for or found a second reviewer. Any suggestions? The folks 
>> on the team here have been busy with other
>> work.
> 
> I am actively asking for some help here, so maybe within a few days I can get 
> a 2nd reviewer.

Hi Corey,

> Are you thinking of a case where that produces a higher iteration count?
Sorry for the confusion. This is also fine:
length = sl - sp - 12
i = length / block_size
if (i <= 0) return 0

But I still wonder why we should use 2 branches. Why not
srawi_
ble(CCR0, return_zero)
?

> Ah, I should have checked the calling conventions. I thought all of the CR* 
> regs are volatile. I will fix that.
Actually, we do save and restore all CRs, so it’s not a real problem with the 
current implementation. But I prefer
staying closer to the elf ABI as long as there’s no good reason to do it 
differently.

> Your original comment said "2nd review", so I thought you meant you need to 
> review it again after the changes.
We usually require at least 2 reviews by different people for all non-trivial 
changes. And I don’t consider the PPC64
part as trivial. In addition to that, I’m not familiar with Power 10.

Best regards,
Martin


From: CoreyAshford <notificati...@github.com>
Sent: Dienstag, 13. Oktober 2020 22:59
To: openjdk/jdk <j...@noreply.github.com>
Cc: Doerr, Martin <martin.do...@sap.com>; Mention <ment...@noreply.github.com>
Subject: Re: [openjdk/jdk] 8248188: Add IntrinsicCandidate and API for Base64 
decoding (#293)


Hi Corey, thanks for taking some stuff out of the “too short” path. There may 
be a performance regression when decoding
many short arrays because of the stub call overhead and the usage of the slower 
part of the Java implementation. We
could do it a little better in many cases to compute the maximum possible 
iteration count i: i = (sl - sp) / block_size
if (i * block_size > sl - 12) i-- if (i <= 0) return 0 What do you think?

Are you thinking of a case where that produces a higher iteration count? It 
looks effectively the same to me.

I don’t think branch prediction hints are helpful for the “too short” check.

My thinking is that most of the time when the intrinsic is called, it will not 
take the early exit, but I suppose when
it is processing a sub-block_size buffer, it will return early every time. I 
will remove the hints.

And we should better use CCR1 instead of CCR2 which is specified as 
non-volatile.

Ah, I should have checked the calling conventions. I thought all of the CR* 
regs are volatile. I will fix that.

Did you already find a 2nd reviewer for the PPC64 part?

Your original comment said "2nd review", so I thought you meant you need to 
review it again after the changes. So, no,
I haven't looked for or found a second reviewer. Any suggestions? The folks on 
the team here have been busy with other
work.

Btw, I'm off today, so I will push commits to the above-mentioned issues 
tomorrow.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on 
GitHub<https://github.com/openjdk/jdk/pull/293#issuecomment-708005545>, or
unsubscribe<https://github.com/notifications/unsubscribe-auth/AKR64KC2K7IZFAPXVQIVYZDSKS5SLANCNFSM4RVHNW5Q>.

-------------

PR: https://git.openjdk.java.net/jdk/pull/293

Reply via email to