Thanks Wes, I'm glad to see this feature coming. From history talks, the main concern is runtime dispatcher may cause performance issue. Personally, I don't think it's a big problem. If we're using SIMD, it must be targeting some time consuming code.
But we do need to take care some issues. E.g, I see code like this: for (int i = 0; i < n; ++i) { simd_code(); } With runtime dispatcher, it becomes an indirect function call in each iteration. We should change the code to move the loop inside simd_code(). It would be better if you can consider architectures other than x86(at framework level). Ignore it if it costs much effort. We can always improve later. Yibo On 5/13/20 9:46 AM, Wes McKinney wrote:
hi, We've started to receive a number of patches providing SIMD operations for both x86 and ARM architectures. Most of these patches make use of compiler definitions to toggle between code paths at compile time. This is problematic for a few reasons: * Binaries that are shipped (e.g. in Python) must generally be compiled for a broad set of supported compilers. That means that AVX2 / AVX512 optimizations won't be available in these builds for processors that have them * Poses a maintainability and testing problem (hard to test every combination, and it is not practical for local development to compile every combination, which may cause drawn out test/CI/fix cycles) Other projects (e.g. NumPy) have taken the approach of building binaries that contain multiple variants of a function with different levels of SIMD, and then choosing at runtime which one to execute based on what features the CPU supports. This seems like what we ultimately need to do in Apache Arrow, and if we continue to accept patches that do not do this, it will be much more work later when we have to refactor things to runtime dispatching. We have some PRs in the queue related to SIMD. Without taking a heavy handed approach like starting to veto PRs, how would everyone like to begin to address the runtime dispatching problem? Note that the Kernels revamp project I am working on right now will also facilitate runtime SIMD kernel dispatching for array expression evaluation. Thanks, Wes