Since there hasn't been too much discussion, and we aren't ready to port our existing use of xsimd to use the new API, I suggest we return to this topic when there is a push to develop more SIMD-enabled variants of functions in the C++ library, but I wanted to raise it while it was on mind to get people thinking about it. It seems that a lot of the recent compute work has been about enabling essential feature coverage.
On Sun, Jul 18, 2021 at 11:41 PM Yuqi Gu <guy...@apache.org> wrote: > > *> So rather than using xsimd::batch<uint32_t, 16> for an AVX512 batch,>you > would do xsimd::batch<uint32_t, xsimd::arch::avx512> (or e.g.>neon/neon64 > for ARM ISAs) and then access the batch size through the>batch::size static > property.* > > Glad to see xsimd use 'Arch' as the parameter of a 'batch'. > For the ARROW-11502 <https://github.com/apache/arrow/pull/9424>, I've > submitted several PRs to xsimd to hide arch dependent code in Arrow for > avoiding a large maintenance burden. > But it was found that it's hard to design an Arch-independent API of a > specific feature to cover all different ISAs. > Some specific features exist in x86, but do not exist in Arm64 and vice > versa. It would take more code maintenance burden to unify these > differences. > > Agree with Yibo to use the new xsimd approach as the > dynamic runtime dispatch for each different CPUs support. > support level. > > BRs, > Yuqi > > > > > > Yibo Cai <yibo....@arm.com> 于2021年7月19日周一 上午10:55写道: > > > > > > > On 7/17/21 12:08 AM, Wes McKinney wrote: > > > hi folks, > > > > > > I had a conversation with the developers of xsimd last week in Paris > > > and was made aware that they are working on a substantial refactor of > > > xsimd to improve its usability for cross-compilation and > > > dynamic-dispatch based on runtime processor capabilities. The branch > > > with the refactor is located here: > > > > > > https://github.com/xtensor-stack/xsimd/tree/feature/xsimd-refactoring > > > > > > In particular, the simd batch API is changing from > > > > > > template <class T, size_t N> > > > class batch; > > > > > > to > > > > > > template <class T, class arch> > > > class batch; > > > > > > So rather than using xsimd::batch<uint32_t, 16> for an AVX512 batch, > > > you would do xsimd::batch<uint32_t, xsimd::arch::avx512> (or e.g. > > > neon/neon64 for ARM ISAs) and then access the batch size through the > > > batch::size static property. > > > > Adding this 'arch' parameter is a bit strange at first glance, given the > > purpose of an simd wrapper is to hide arch dependent code. > > But as latest simd isa (sve, avx512) has much richer features than > > simply widening the data width, looks arch code is a must. > > I think this change won't cause trouble to existing xsimd client code. > > > > > > > > A few comments for discussion / investigation: > > > > > > * Firstly, we will have to prepare ourselves to migrate to this new > > > API in the future > > > > > > * At some point, we will likely want to generate SIMD-variants of our > > > C++ math kernels usable via dynamic dispatch for each different CPU > > > support level. It would be beneficial to author as much code in an > > > ISA-independent fashion that can be cross-compiled to generate binary > > > code for each ISA. We should investigate whether the new approach in > > > xsimd will provide what we need or if we need to take a different > > > approach. > > > > > > * We have some of our own dynamic dispatch code to enable runtime > > > function pointer selection based on available SIMD levels. Can we > > > benefit from any of the work that is happening in this xsimd refactor? > > > > I think they have some overlaps. Runtime dispatch at xsimd level(simd > > code block) looks better than at kernel dispatch level, IIUC. > > > > > > > > * We have some compute code (e.g. hash tables for aggregation / joins) > > > that uses explicit AVX2 intrinsics — can some of this code be ported > > > to use generic xsimd APIs or will we need to use a different > > > fundamental algorithm design to yield maximum efficiency for each SIMD > > > ISA? > > > > > > Thanks, > > > Wes > > > > >