On Wed, 16 Feb 2022 12:26:45 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>>> > Hi, IIRC for evex encoding you can embed the RC control bit directly in 
>>> > the evex prefix, removing the need to rely on global MXCSR register. 
>>> > Thanks.
>>> 
>>> Hi @merykitty , You are correct, we can embed RC mode in instruction 
>>> encoding of round instruction (towards -inf,+inf, zero). But to match the 
>>> semantics of Math.round API one needs to add 0.5[f] to input value and then 
>>> perform rounding over resultant value, which is why @sviswa7 suggested to 
>>> use a global rounding mode driven by MXCSR.RC so that intermediate floating 
>>> inexact values are resolved as desired, but OOO execution may misplace 
>>> LDMXCSR and hence may have undesired side effects.
>> 
>> **Just want to correct above statement, LDMXCSR will not be 
>> re-ordered/re-scheduled early OOO backend.**
>
>> That pseudocode would make a very useful comment too. This whole patch is 
>> very thinly commented.
> 
> I have replaced earlier bulky sequence, new sequence is having similar 
> performance but reduction in code may improve inlining behavior.  Added 
> descriptive comments around the special cases.

> There are already `RoundFloat`, `RoundDouble`, and `RoundDoubleMode` nodes 
> defined.
> 
> Though `RoundFloat` and `RoundDouble` are legacy nodes used only on x86-32, 
> `RoundDoubleMode` supports multiple rounding modes and is amenable to 
> auto-vectorization.
> 
> What do you think about the following alternative?
> 
> Reuse `RoundDoubleMode` (with a new rounding mode) and introduce 
> `RoundFloatMode`.
> 
> Special rounding rules is not the only peculiarity of `Math.round()`. It also 
> converts the result to an integral type. It can be represented as `ConvF2I 
> (RoundFloatMode f #rmode)` / `ConvD2L (RoundDoubleMode d #rmode)`. In scalar 
> case, it can be matched as a single AD instruction.
> 
> Auto-vectorizer can then convert it to `VectorCastF2X (RoundFloatModeV vf 
> #rmode)` / `VectorCastD2X (RoundDoubleModeV vd #rmode)` and match it in a 
> similar manner.

Adding new rounding mode to RoundDoubleMode may disturb other targets. 
match_rule_supported routine operates over Opcodes and currently any target 
supporting RoundDoubleMode generates code for all the rounding modes. Your 
solution is anyways based on creating new scalar and vector IR node for 
floating point rounding operation, which is what patch is doing currently.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7094

Reply via email to