Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-28 Thread Sandhya Viswanathan
On Wed, 28 Jul 2021 04:48:35 GMT, Vladimir Kozlov  wrote:

>> Looks good to me.
>
> @sviswa7 and @jatin-bhateja jatin-bhateja
> The push caused https://bugs.openjdk.java.net/browse/JDK-8271366
> I am strongly suggest in a future to ask an Oracle's engineer to test Intel's 
> changes before pushing.

@vnkozlov  @PaulSandoz Sorry for the inconvenience. @jatin-bhateja Please don't 
be in a hurry to push and reach out to Oracle engineers for testing before 
pushing.

-

PR: https://git.openjdk.java.net/jdk/pull/3720


Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-28 Thread Paul Sandoz
On Tue, 27 Jul 2021 18:31:20 GMT, Sandhya Viswanathan 
 wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 19 commits:
>> 
>>  - 8266054: Re-designing benchmark to remove noise.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>>  - 8266054: Formal argument name change to be more appropriate.
>>  - 8266054: Review comments resolution.
>>  - 8266054: Incorporating styling changes based on reviews.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into 
>> JDK-8266054
>>  - ... and 9 more: 
>> https://git.openjdk.java.net/jdk/compare/a8f15427...b20404e2
>
> Looks good to me.

> @sviswa7 and @jatin-bhateja jatin-bhateja
> The push caused https://bugs.openjdk.java.net/browse/JDK-8271366
> I am strongly suggest in a future to ask an Oracle's engineer to test Intel's 
> changes before pushing.

Yes, as discussed before please request that we perform internal tests before 
integrating e.g. CC me. Unfortunately the pre-commit PR tests don't cover all 
the tests cases and we don't yet have a way to expand that set.

-

PR: https://git.openjdk.java.net/jdk/pull/3720


Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Vladimir Kozlov
On Tue, 27 Jul 2021 18:31:20 GMT, Sandhya Viswanathan 
 wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 19 commits:
>> 
>>  - 8266054: Re-designing benchmark to remove noise.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>>  - 8266054: Formal argument name change to be more appropriate.
>>  - 8266054: Review comments resolution.
>>  - 8266054: Incorporating styling changes based on reviews.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into 
>> JDK-8266054
>>  - ... and 9 more: 
>> https://git.openjdk.java.net/jdk/compare/a8f15427...b20404e2
>
> Looks good to me.

@sviswa7 and @jatin-bhateja jatin-bhateja
The push caused https://bugs.openjdk.java.net/browse/JDK-8271366
I am strongly suggest in a future to ask an Oracle's engineer to test Intel's 
changes before pushing.

-

PR: https://git.openjdk.java.net/jdk/pull/3720


Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Weijun Wang
On Tue, 20 Jul 2021 09:57:07 GMT, Jatin Bhateja  wrote:

>> Current VectorAPI Java side implementation expresses rotateLeft and 
>> rotateRight operation using following operations:-
>> 
>> vec1 = lanewise(VectorOperators.LSHL, n)
>> vec2 = lanewise(VectorOperators.LSHR, n)
>> res = lanewise(VectorOperations.OR, vec1 , vec2)
>> 
>> This patch moves above handling from Java side to C2 compiler which 
>> facilitates dismantling the rotate operation if target ISA does not support 
>> a direct rotate instruction.
>> 
>> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over 
>> long and integer type vectors. For other cases (i.e. sub-word type vectors 
>> or for targets which do not support direct rotate operations )   instruction 
>> sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
>> 
>> Please find below the performance data for included JMH benchmark.
>> Machine:  Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
>> 
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts 
>> (ops/ms) | Gain
>> -- | -- | -- | -- | -- | -- | --
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | 
>> 0.973851372
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | 
>> 0.966757399
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | 
>> 1.030234907
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | 
>> 0.986665464
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | 
>> 0.967630525
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | 
>> 0.984453766
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | 
>> 0.978908964
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | 
>> 0.988670669
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | 
>> 0.986649996
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | 
>> 0.992415694
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | 
>> 0.975610495
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | 
>> 0.986262605
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | .105 | 
>> 1.003719792
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | 
>> 1.001646863
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | 
>> 0.999849188
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | 
>> 1.000843194
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | 
>> 0.998776954
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | 
>> 0.999041739
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | 
>> 3.329338501
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | 
>> 3.233829288
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | 
>> 3.246098286
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | 
>> 3.230315997
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | 
>> 3.171940969
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | 
>> 3.22095324
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | 
>> 3.34665393
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | 
>> 3.18429981
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | 
>> 3.210244272
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | 
>> 3.259019764
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | 
>> 3.352069988
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | 
>> 3.181869353
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | 
>> 2.625252123
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | 
>> 2.697217983
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | 
>> 2.72596813
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | 
>> 2.871118372
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | 
>> 2.624018047
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | 
>> 2.691895339
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | 
>> 3.364761291
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931 

Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Sandhya Viswanathan
On Tue, 27 Jul 2021 18:05:49 GMT, Sandhya Viswanathan 
 wrote:

>> Correcting this,  I2L may be needed in auto-vectorization flow since 
>> Integer/Long.rotate[Right/Left] APIs accept only integral shift, so for 
>> Long.rotate* operations integral shift value must be converted to long using 
>> I2L before broadcasting it. VectorAPI lanewise operations between 
>> vector-scalar, scalar type already matches with vector basic type.  Since 
>> degeneration routine is common b/w both the flows so maintaining IR 
>> consistency here.
>
> For Vector API the shift is always coming in as int type for rotate by scalar 
> (lanewiseShiftTemplate). The down conversion to byte or short needs to be 
> done before scalar2vector.

I see that similar thing is done before for shift, so down conversion to sub 
type is not required.

-

PR: https://git.openjdk.java.net/jdk/pull/3720


Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Sandhya Viswanathan
On Tue, 20 Jul 2021 09:57:07 GMT, Jatin Bhateja  wrote:

>> Current VectorAPI Java side implementation expresses rotateLeft and 
>> rotateRight operation using following operations:-
>> 
>> vec1 = lanewise(VectorOperators.LSHL, n)
>> vec2 = lanewise(VectorOperators.LSHR, n)
>> res = lanewise(VectorOperations.OR, vec1 , vec2)
>> 
>> This patch moves above handling from Java side to C2 compiler which 
>> facilitates dismantling the rotate operation if target ISA does not support 
>> a direct rotate instruction.
>> 
>> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over 
>> long and integer type vectors. For other cases (i.e. sub-word type vectors 
>> or for targets which do not support direct rotate operations )   instruction 
>> sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
>> 
>> Please find below the performance data for included JMH benchmark.
>> Machine:  Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
>> 
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts 
>> (ops/ms) | Gain
>> -- | -- | -- | -- | -- | -- | --
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | 
>> 0.973851372
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | 
>> 0.966757399
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | 
>> 1.030234907
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | 
>> 0.986665464
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | 
>> 0.967630525
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | 
>> 0.984453766
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | 
>> 0.978908964
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | 
>> 0.988670669
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | 
>> 0.986649996
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | 
>> 0.992415694
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | 
>> 0.975610495
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | 
>> 0.986262605
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | .105 | 
>> 1.003719792
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | 
>> 1.001646863
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | 
>> 0.999849188
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | 
>> 1.000843194
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | 
>> 0.998776954
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | 
>> 0.999041739
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | 
>> 3.329338501
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | 
>> 3.233829288
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | 
>> 3.246098286
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | 
>> 3.230315997
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | 
>> 3.171940969
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | 
>> 3.22095324
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | 
>> 3.34665393
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | 
>> 3.18429981
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | 
>> 3.210244272
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | 
>> 3.259019764
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | 
>> 3.352069988
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | 
>> 3.181869353
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | 
>> 2.625252123
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | 
>> 2.697217983
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | 
>> 2.72596813
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | 
>> 2.871118372
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | 
>> 2.624018047
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | 
>> 2.691895339
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | 
>> 3.364761291
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931 

Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Sandhya Viswanathan
On Tue, 27 Jul 2021 08:17:55 GMT, Jatin Bhateja  wrote:

>> src/hotspot/share/opto/vectorIntrinsics.cpp line 1598:
>> 
>>> 1596:   cnt = elem_bt == T_LONG ? gvn().transform(new ConvI2LNode(cnt)) 
>>> : cnt;
>>> 1597:   opd2 = gvn().transform(VectorNode::scalar2vector(cnt, num_elem, 
>>> type_bt));
>>> 1598: } else {
>> 
>> Why conversion for only T_LONG and not for T_BYTE and T_SHORT? Is there an 
>> assumption here that only T_INT and T_LONG elem_bt are supported?
>
> Correcting this,  I2L may be needed in auto-vectorization flow since 
> Integer/Long.rotate[Right/Left] APIs accept only integral shift, so for 
> Long.rotate* operations integral shift value must be converted to long using 
> I2L before broadcasting it. VectorAPI lanewise operations between 
> vector-scalar, scalar type already matches with vector basic type.  Since 
> degeneration routine is common b/w both the flows so maintaining IR 
> consistency here.

For Vector API the shift is always coming in as int type for rotate by scalar 
(lanewiseShiftTemplate). The down conversion to byte or short needs to be done 
before scalar2vector.

-

PR: https://git.openjdk.java.net/jdk/pull/3720


Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Jatin Bhateja
On Tue, 27 Jul 2021 00:24:52 GMT, Sandhya Viswanathan 
 wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 19 commits:
>> 
>>  - 8266054: Re-designing benchmark to remove noise.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>>  - 8266054: Formal argument name change to be more appropriate.
>>  - 8266054: Review comments resolution.
>>  - 8266054: Incorporating styling changes based on reviews.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into 
>> JDK-8266054
>>  - ... and 9 more: 
>> https://git.openjdk.java.net/jdk/compare/a8f15427...b20404e2
>
> src/hotspot/share/opto/vectorIntrinsics.cpp line 1598:
> 
>> 1596:   cnt = elem_bt == T_LONG ? gvn().transform(new ConvI2LNode(cnt)) 
>> : cnt;
>> 1597:   opd2 = gvn().transform(VectorNode::scalar2vector(cnt, num_elem, 
>> type_bt));
>> 1598: } else {
> 
> Why conversion for only T_LONG and not for T_BYTE and T_SHORT? Is there an 
> assumption here that only T_INT and T_LONG elem_bt are supported?

Correcting this,  I2L may be needed in auto-vectorization flow since 
Integer/Long.rotate[Right/Left] APIs accept only integral shift, so for 
Long.rotate* operations integral shift value must be converted to long using 
I2L before broadcasting it. VectorAPI lanewise operations between 
vector-scalar, scalar type already matches with vector type.  Since 
degeneration routine is common b/w both the flows so maintaining IR consistency 
here.

-

PR: https://git.openjdk.java.net/jdk/pull/3720


Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-27 Thread Jatin Bhateja
On Tue, 27 Jul 2021 01:54:01 GMT, Sandhya Viswanathan 
 wrote:

>> Jatin Bhateja has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 19 commits:
>> 
>>  - 8266054: Re-designing benchmark to remove noise.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>>  - 8266054: Formal argument name change to be more appropriate.
>>  - 8266054: Review comments resolution.
>>  - 8266054: Incorporating styling changes based on reviews.
>>  - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge http://github.com/openjdk/jdk into JDK-8266054
>>  - Merge branch 'JDK-8266054' of http://github.com/jatin-bhateja/jdk into 
>> JDK-8266054
>>  - ... and 9 more: 
>> https://git.openjdk.java.net/jdk/compare/a8f15427...b20404e2
>
> src/hotspot/share/opto/vectornode.cpp line 1199:
> 
>> 1197:  
>> (Node*)(phase->intcon(shift_mask + 1));
>> 1198: Node* vector_mask = 
>> phase->transform(VectorNode::scalar2vector(shift_mask_node,vlen, elem_ty));
>> 1199: int subVopc = VectorNode::opcode((bt == T_LONG) ? Op_SubL : 
>> Op_SubI, bt);
> 
> There seems to be an assumption here that the vector type is INT or LONG only 
> and not subword type. From Vector API you can get the sub word types as well.
> Also if this path is coming from auto-vectorizer, don't we need masking here?

Subtype is being passed to VectorNode::opcode for correct opcode selection. 
Also shift_mask_node is a constant value node, so there is no assumption on 
vector type. Wrap around (masking) for shift value may not be needed here since 
we are degenerating rotate into shifts (logical left and rights).

-

PR: https://git.openjdk.java.net/jdk/pull/3720


Re: RFR: 8266054: VectorAPI rotate operation optimization [v13]

2021-07-26 Thread Sandhya Viswanathan
On Tue, 20 Jul 2021 09:57:07 GMT, Jatin Bhateja  wrote:

>> Current VectorAPI Java side implementation expresses rotateLeft and 
>> rotateRight operation using following operations:-
>> 
>> vec1 = lanewise(VectorOperators.LSHL, n)
>> vec2 = lanewise(VectorOperators.LSHR, n)
>> res = lanewise(VectorOperations.OR, vec1 , vec2)
>> 
>> This patch moves above handling from Java side to C2 compiler which 
>> facilitates dismantling the rotate operation if target ISA does not support 
>> a direct rotate instruction.
>> 
>> AVX512 added vector rotate instructions vpro[rl][v][dq] which operate over 
>> long and integer type vectors. For other cases (i.e. sub-word type vectors 
>> or for targets which do not support direct rotate operations )   instruction 
>> sequence comprising of vector SHIFT (LEFT/RIGHT) and vector OR is emitted.
>> 
>> Please find below the performance data for included JMH benchmark.
>> Machine:  Cascade Lake Server (Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz)
>> 
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/jatinbha/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark | (bits) | (shift) | (size) | Baseline Score (ops/ms) | With Opts 
>> (ops/ms) | Gain
>> -- | -- | -- | -- | -- | -- | --
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 256 | 3939.136 | 3836.133 | 
>> 0.973851372
>> RotateBenchmark.testRotateLeftB | 128 | 7 | 512 | 1984.231 | 1918.27 | 
>> 0.966757399
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 256 | 3925.165 | 4043.842 | 
>> 1.030234907
>> RotateBenchmark.testRotateLeftB | 128 | 15 | 512 | 1962.723 | 1936.551 | 
>> 0.986665464
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 256 | 3945.6 | 3817.883 | 
>> 0.967630525
>> RotateBenchmark.testRotateLeftB | 128 | 31 | 512 | 1944.458 | 1914.229 | 
>> 0.984453766
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 256 | 4612.149 | 4514.874 | 
>> 0.978908964
>> RotateBenchmark.testRotateLeftB | 256 | 7 | 512 | 2296.252 | 2270.237 | 
>> 0.988670669
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 256 | 4576.628 | 4515.53 | 
>> 0.986649996
>> RotateBenchmark.testRotateLeftB | 256 | 15 | 512 | 2288.278 | 2270.923 | 
>> 0.992415694
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 256 | 4624.243 | 4511.46 | 
>> 0.975610495
>> RotateBenchmark.testRotateLeftB | 256 | 31 | 512 | 2305.459 | 2273.788 | 
>> 0.986262605
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 256 | 7748.283 | .105 | 
>> 1.003719792
>> RotateBenchmark.testRotateLeftB | 512 | 7 | 512 | 3906.214 | 3912.647 | 
>> 1.001646863
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 256 | 7764.653 | 7763.482 | 
>> 0.999849188
>> RotateBenchmark.testRotateLeftB | 512 | 15 | 512 | 3916.061 | 3919.363 | 
>> 1.000843194
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 256 | 7779.754 | 7770.239 | 
>> 0.998776954
>> RotateBenchmark.testRotateLeftB | 512 | 31 | 512 | 3916.471 | 3912.718 | 
>> 0.999041739
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 256 | 4043.39 | 13461.814 | 
>> 3.329338501
>> RotateBenchmark.testRotateLeftI | 128 | 7 | 512 | 1996.217 | 6455.425 | 
>> 3.233829288
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 256 | 4028.614 | 13077.277 | 
>> 3.246098286
>> RotateBenchmark.testRotateLeftI | 128 | 15 | 512 | 1997.612 | 6452.918 | 
>> 3.230315997
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 256 | 4123.357 | 13079.045 | 
>> 3.171940969
>> RotateBenchmark.testRotateLeftI | 128 | 31 | 512 | 2003.356 | 6452.716 | 
>> 3.22095324
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 256 | 7666.949 | 25658.625 | 
>> 3.34665393
>> RotateBenchmark.testRotateLeftI | 256 | 7 | 512 | 3855.826 | 12278.106 | 
>> 3.18429981
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 256 | 7670.901 | 24625.466 | 
>> 3.210244272
>> RotateBenchmark.testRotateLeftI | 256 | 15 | 512 | 3765.786 | 12272.771 | 
>> 3.259019764
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 256 | 7660.599 | 25678.864 | 
>> 3.352069988
>> RotateBenchmark.testRotateLeftI | 256 | 31 | 512 | 3773.401 | 12006.469 | 
>> 3.181869353
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 256 | 11900.948 | 31242.989 | 
>> 2.625252123
>> RotateBenchmark.testRotateLeftI | 512 | 7 | 512 | 5830.878 | 15727.149 | 
>> 2.697217983
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 256 | 12171.847 | 33180.067 | 
>> 2.72596813
>> RotateBenchmark.testRotateLeftI | 512 | 15 | 512 | 5830.544 | 16740.182 | 
>> 2.871118372
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 256 | 11909.553 | 31250.882 | 
>> 2.624018047
>> RotateBenchmark.testRotateLeftI | 512 | 31 | 512 | 5846.747 | 15738.831 | 
>> 2.691895339
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 256 | 2047.243 | 6888.484 | 
>> 3.364761291
>> RotateBenchmark.testRotateLeftL | 128 | 7 | 512 | 1005.029 | 3245.931