pitrou commented on PR #37536:
URL: https://github.com/apache/arrow/pull/37536#issuecomment-1715867137

   By the way, do we expect AVX2 or AVX512 to generate more optimized code on 
overflow-checking operations?
   
   I tested locally (gcc 12.3.0, AMD Ryzen 9 3900X) and AVX2 makes things 
_worse_ here:
   ```console
   $ ARROW_USER_SIMD_LEVEL=none python -m timeit -s "import pyarrow as pa, 
pyarrow.compute as pc; a = pa.array([42]*1_000_000, type='int32')" 
"pc.sum_checked(a)"
   500 loops, best of 5: 462 usec per loop
   $ ARROW_USER_SIMD_LEVEL=avx2 python -m timeit -s "import pyarrow as pa, 
pyarrow.compute as pc; a = pa.array([42]*1_000_000, type='int32')" 
"pc.sum_checked(a)"
   100 loops, best of 5: 2.46 msec per loop
   ```
   
   Note that AVX2 does improve performance on non-checked sum operation:
   ```console
   $ ARROW_USER_SIMD_LEVEL=none python -m timeit -s "import pyarrow as pa, 
pyarrow.compute as pc; a = pa.array([42]*1_000_000, type='int32')" "pc.sum(a)"
   2000 loops, best of 5: 120 usec per loop
   $ ARROW_USER_SIMD_LEVEL=avx2 python -m timeit -s "import pyarrow as pa, 
pyarrow.compute as pc; a = pa.array([42]*1_000_000, type='int32')" "pc.sum(a)"
   5000 loops, best of 5: 82.6 usec per loop
   ```
   
   (this CPU can't run AVX512, sorry)
   
   If we can't exhibit any performance gain, then we should remove the AVX* 
specializations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to