>
> Since I develop on an AVX512-capable machine, if we have runtime
> dispatching then it should be able to test all variants of a function
> from a single executable / test run rather than having to produce
> multiple builds and test them separately, right?

Yes, but I think the same of true without runtime dispatching.  We might
have different mental models for runtime dispatching so I'll put up a
concrete example.  If we want optimized code for "some_function" it would
like like:

#ifdef HAVE_AVX512
void some_function_512() {
...
}
#endif

void some_function_base() {
...
}

// static dispatching
void some_function() {
#ifdef HAVE_AVX512
some_function_512();
#else
some_function_base();
#endif
}

// dynamic dispatch
void some_function() {
   static void()* chosen_function = Choose(cpu_info, &some_function_512,
&some_function_base);
   *chosen_function();
}

In both cases, we  need to have a tests which call into some_function_512()
and some_function_base().  It is possible with runtime dispatching we can
write code in tests as something like:

for (CpuInfo info : all_supported_architectures) {
    TEST(Choose(info, &some_function_512, &some_function_base));
}

But I think there is likely something equivalent that we could to do with
macro magic.

Did you have something different in mind?

Micah





On Tue, May 12, 2020 at 8:31 PM Wes McKinney <wesmck...@gmail.com> wrote:

> On Tue, May 12, 2020 at 9:47 PM Yibo Cai <yibo....@arm.com> wrote:
> >
> > Thanks Wes, I'm glad to see this feature coming.
> >
> >  From history talks, the main concern is runtime dispatcher may cause
> performance issue.
> > Personally, I don't think it's a big problem. If we're using SIMD, it
> must be targeting some time consuming code.
> >
> > But we do need to take care some issues. E.g, I see code like this:
> > for (int i = 0; i < n; ++i) {
> >    simd_code();
> > }
> > With runtime dispatcher, it becomes an indirect function call in each
> iteration.
> > We should change the code to move the loop inside simd_code().
>
> To be clear, I'm referring to SIMD-optimized code that operates on
> batches of data. The overhead of choosing an implementation based on a
> global settings object should not be meaningful. If there is
> performance-sensitive code at inline call sites then I agree that it
> is an issue. I don't think that characterizes most of the anticipated
> work in Arrow, though, since functions generally will process a
> chunk/array of data at time (see, e.g. Parquet encoding/decoding work
> recently).
>
> > It would be better if you can consider architectures other than x86(at
> framework level).
> > Ignore it if it costs much effort. We can always improve later.
> >
> > Yibo
> >
> > On 5/13/20 9:46 AM, Wes McKinney wrote:
> > > hi,
> > >
> > > We've started to receive a number of patches providing SIMD operations
> > > for both x86 and ARM architectures. Most of these patches make use of
> > > compiler definitions to toggle between code paths at compile time.
> > >
> > > This is problematic for a few reasons:
> > >
> > > * Binaries that are shipped (e.g. in Python) must generally be
> > > compiled for a broad set of supported compilers. That means that AVX2
> > > / AVX512 optimizations won't be available in these builds for
> > > processors that have them
> > > * Poses a maintainability and testing problem (hard to test every
> > > combination, and it is not practical for local development to compile
> > > every combination, which may cause drawn out test/CI/fix cycles)
> > >
> > > Other projects (e.g. NumPy) have taken the approach of building
> > > binaries that contain multiple variants of a function with different
> > > levels of SIMD, and then choosing at runtime which one to execute
> > > based on what features the CPU supports. This seems like what we
> > > ultimately need to do in Apache Arrow, and if we continue to accept
> > > patches that do not do this, it will be much more work later when we
> > > have to refactor things to runtime dispatching.
> > >
> > > We have some PRs in the queue related to SIMD. Without taking a heavy
> > > handed approach like starting to veto PRs, how would everyone like to
> > > begin to address the runtime dispatching problem?
> > >
> > > Note that the Kernels revamp project I am working on right now will
> > > also facilitate runtime SIMD kernel dispatching for array expression
> > > evaluation.
> > >
> > > Thanks,
> > > Wes
> > >
>

Reply via email to