On Thu, 4 Jan 2024 05:28:59 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

> Hi,
> 
> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only 
> targets.
> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 
> instruction set.
> These are very frequently used APIs in columnar database filter operation.
> 
> Implementation uses a lookup table to record permute indices. Table index is 
> computed using
> mask argument of compress/expand operation.
> 
> Following are the performance number of JMH micro included with the patch.
> 
> 
> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids)
> 
> Baseline:
> Benchmark                                 (size)   Mode  Cnt    Score   Error 
>   Units
> ColumnFilterBenchmark.filterDoubleColumn    1024  thrpt    2  142.767         
>  ops/ms
> ColumnFilterBenchmark.filterDoubleColumn    2047  thrpt    2   71.436         
>  ops/ms
> ColumnFilterBenchmark.filterDoubleColumn    4096  thrpt    2   35.992         
>  ops/ms
> ColumnFilterBenchmark.filterFloatColumn     1024  thrpt    2  182.151         
>  ops/ms
> ColumnFilterBenchmark.filterFloatColumn     2047  thrpt    2   91.096         
>  ops/ms
> ColumnFilterBenchmark.filterFloatColumn     4096  thrpt    2   44.757         
>  ops/ms
> ColumnFilterBenchmark.filterIntColumn       1024  thrpt    2  184.099         
>  ops/ms
> ColumnFilterBenchmark.filterIntColumn       2047  thrpt    2   91.981         
>  ops/ms
> ColumnFilterBenchmark.filterIntColumn       4096  thrpt    2   45.170         
>  ops/ms
> ColumnFilterBenchmark.filterLongColumn      1024  thrpt    2  148.017         
>  ops/ms
> ColumnFilterBenchmark.filterLongColumn      2047  thrpt    2   73.516         
>  ops/ms
> ColumnFilterBenchmark.filterLongColumn      4096  thrpt    2   36.844         
>  ops/ms
> 
> Withopt:
> Benchmark                                 (size)   Mode  Cnt     Score   
> Error   Units
> ColumnFilterBenchmark.filterDoubleColumn    1024  thrpt    2  2051.707        
>   ops/ms
> ColumnFilterBenchmark.filterDoubleColumn    2047  thrpt    2   914.072        
>   ops/ms
> ColumnFilterBenchmark.filterDoubleColumn    4096  thrpt    2   489.898        
>   ops/ms
> ColumnFilterBenchmark.filterFloatColumn     1024  thrpt    2  5324.195        
>   ops/ms
> ColumnFilterBenchmark.filterFloatColumn     2047  thrpt    2  2587.229        
>   ops/ms
> ColumnFilterBenchmark.filterFloatColumn     4096  thrpt    2  1278.665        
>   ops/ms
> ColumnFilterBenchmark.filterIntColumn       1024  thrpt    2  4149.384        
>   ops/ms
> ColumnFilterBenchmark.filterIntColumn       2047  thrpt    2  1791.170        
>   ops/ms
> ColumnFilterBenchmark.filterIntColumn       4096...

This pull request has now been integrated.

Changeset: 6d36eb78
Author:    Jatin Bhateja <jbhat...@openjdk.org>
URL:       
https://git.openjdk.org/jdk/commit/6d36eb78ad781ecd80d66d1319921a8746820394
Stats:     372 lines in 10 files changed: 354 ins; 8 del; 10 mod

8322768: Optimize non-subword vector compress and expand APIs for AVX2 target.

Reviewed-by: epeter, sviswanathan

-------------

PR: https://git.openjdk.org/jdk/pull/17261

Reply via email to