Re: [DISCUSS][C++] Strategies for SIMD cross-compilation?

Wes McKinney Mon, 26 Jul 2021 06:35:47 -0700

Since there hasn't been too much discussion, and we aren't ready to
port our existing use of xsimd to use the new API, I suggest we return
to this topic when there is a push to develop more SIMD-enabled
variants of functions in the C++ library, but I wanted to raise it
while it was on mind to get people thinking about it. It seems that a
lot of the recent compute work has been about enabling essential
feature coverage.


On Sun, Jul 18, 2021 at 11:41 PM Yuqi Gu <guy...@apache.org> wrote:
>
> *> So rather than using xsimd::batch<uint32_t, 16> for an AVX512 batch,>you
> would do xsimd::batch<uint32_t, xsimd::arch::avx512> (or e.g.>neon/neon64
> for ARM ISAs) and then access the batch size through the>batch::size static
> property.*
>
> Glad to see xsimd use 'Arch' as the parameter of a 'batch'.
> For the ARROW-11502 <https://github.com/apache/arrow/pull/9424>, I've
> submitted several PRs to xsimd to hide arch dependent code in Arrow  for
> avoiding a large maintenance burden.
> But it was found that it's hard to design an Arch-independent API of a
> specific feature to cover all different ISAs.
> Some specific features exist in x86, but do not exist in Arm64 and vice
> versa. It would take more code maintenance burden to unify these
> differences.
>
> Agree with Yibo to use the new xsimd approach as the
> dynamic runtime dispatch for each different CPUs support.
> support level.
>
> BRs,
> Yuqi
>
>
>
>
>
> Yibo Cai <yibo....@arm.com> 于2021年7月19日周一 上午10:55写道：
>
> >
> >
> > On 7/17/21 12:08 AM, Wes McKinney wrote:
> > > hi folks,
> > >
> > > I had a conversation with the developers of xsimd last week in Paris
> > > and was made aware that they are working on a substantial refactor of
> > > xsimd to improve its usability for cross-compilation and
> > > dynamic-dispatch based on runtime processor capabilities. The branch
> > > with the refactor is located here:
> > >
> > > https://github.com/xtensor-stack/xsimd/tree/feature/xsimd-refactoring
> > >
> > > In particular, the simd batch API is changing from
> > >
> > > template <class T, size_t N>
> > > class batch;
> > >
> > > to
> > >
> > > template <class T, class arch>
> > > class batch;
> > >
> > > So rather than using xsimd::batch<uint32_t, 16> for an AVX512 batch,
> > > you would do xsimd::batch<uint32_t, xsimd::arch::avx512> (or e.g.
> > > neon/neon64 for ARM ISAs) and then access the batch size through the
> > > batch::size static property.
> >
> > Adding this 'arch' parameter is a bit strange at first glance, given the
> > purpose of an simd wrapper is to hide arch dependent code.
> > But as latest simd isa (sve, avx512) has much richer features than
> > simply widening the data width, looks arch code is a must.
> > I think this change won't cause trouble to existing xsimd client code.
> >
> > >
> > > A few comments for discussion / investigation:
> > >
> > > * Firstly, we will have to prepare ourselves to migrate to this new
> > > API in the future
> > >
> > > * At some point, we will likely want to generate SIMD-variants of our
> > > C++ math kernels usable via dynamic dispatch for each different CPU
> > > support level. It would be beneficial to author as much code in an
> > > ISA-independent fashion that can be cross-compiled to generate binary
> > > code for each ISA. We should investigate whether the new approach in
> > > xsimd will provide what we need or if we need to take a different
> > > approach.
> > >
> > > * We have some of our own dynamic dispatch code to enable runtime
> > > function pointer selection based on available SIMD levels. Can we
> > > benefit from any of the work that is happening in this xsimd refactor?
> >
> > I think they have some overlaps. Runtime dispatch at xsimd level(simd
> > code block) looks better than at kernel dispatch level, IIUC.
> >
> > >
> > > * We have some compute code (e.g. hash tables for aggregation / joins)
> > > that uses explicit AVX2 intrinsics — can some of this code be ported
> > > to use generic xsimd APIs or will we need to use a different
> > > fundamental algorithm design to yield maximum efficiency for each SIMD
> > > ISA?
> > >
> > > Thanks,
> > > Wes
> > >
> >

Re: [DISCUSS][C++] Strategies for SIMD cross-compilation?

Reply via email to