cyb70289 commented on PR #49756:
URL: https://github.com/apache/arrow/pull/49756#issuecomment-4326326915

   **Just for reference**
   
   Did a quick poke with **AI coding agent**. It analyzed the reason why Neon 
code is not inlined and proposed a fix to xsimd: 
[neon-bitcast-inline.patch](https://github.com/user-attachments/files/27121829/neon-bitcast-inline.patch)
   
   Unit test passed. Neon code is slightly faster than SVE128, matches 
expectation. I only tested one case.
   
   ```
   # neon
   BM_UnpackBool/NeonUnaligned/1/32       6.56 ns         6.56 ns    107048724 
items_per_second=4.87937G/s
   
   # sve128
   BM_UnpackBool/Sve128Unaligned/1/32       7.06 ns         7.06 ns     
99251620 items_per_second=4.53545G/s
   ```
   
   I suspected xsimd bitcast Neon code may be too complicated for compiler to 
inline (maybe related to my old PR to fix an 
[issue](https://github.com/xtensor-stack/xsimd/issues/573), but I forgot 
details).
   Debug report from coding agent (I haven't read it carefully): 
[findings.md](https://github.com/user-attachments/files/27122154/findings.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to