Thanks Wes, I'm glad to see this feature coming.

From history talks, the main concern is runtime dispatcher may cause 
performance issue.
Personally, I don't think it's a big problem. If we're using SIMD, it must be 
targeting some time consuming code.

But we do need to take care some issues. E.g, I see code like this:
for (int i = 0; i < n; ++i) {
  simd_code();
}
With runtime dispatcher, it becomes an indirect function call in each iteration.
We should change the code to move the loop inside simd_code().

It would be better if you can consider architectures other than x86(at 
framework level).
Ignore it if it costs much effort. We can always improve later.

Yibo

On 5/13/20 9:46 AM, Wes McKinney wrote:
hi,

We've started to receive a number of patches providing SIMD operations
for both x86 and ARM architectures. Most of these patches make use of
compiler definitions to toggle between code paths at compile time.

This is problematic for a few reasons:

* Binaries that are shipped (e.g. in Python) must generally be
compiled for a broad set of supported compilers. That means that AVX2
/ AVX512 optimizations won't be available in these builds for
processors that have them
* Poses a maintainability and testing problem (hard to test every
combination, and it is not practical for local development to compile
every combination, which may cause drawn out test/CI/fix cycles)

Other projects (e.g. NumPy) have taken the approach of building
binaries that contain multiple variants of a function with different
levels of SIMD, and then choosing at runtime which one to execute
based on what features the CPU supports. This seems like what we
ultimately need to do in Apache Arrow, and if we continue to accept
patches that do not do this, it will be much more work later when we
have to refactor things to runtime dispatching.

We have some PRs in the queue related to SIMD. Without taking a heavy
handed approach like starting to veto PRs, how would everyone like to
begin to address the runtime dispatching problem?

Note that the Kernels revamp project I am working on right now will
also facilitate runtime SIMD kernel dispatching for array expression
evaluation.

Thanks,
Wes

Reply via email to