In a discussion on https://issues.apache.org/jira/browse/IMPALA-6128,
we are talking about which instruction sets (available on newer x86-64
processors) we want to require.

At this point, I'm not sure how strong the motivation is for requiring
certain instruction sets, but it may be worth some effort to talk
about guidelines. As of now, we can decide at run time which methods
to use based on CPU info gathered at daemon start time. See
cpu-info.cc.

The instruction in this case is the CLMUL instruction, which we
believe was available on all new server-class x86-64 chips by Intel
and AMD as of Q2, 2011. It has good performance benefits for
spill-to-disk encryption.

We currently use the following, but only dispatching at run time:

SSSE3(*), SSE4.1, SSE4.2 (Available since late 2011 on both AMD and Intel)
POPCNT (Available since late 2008 on both AMD and Intel)
AVX (late 2011)
AVX2 (late 2015)

One argument for continuing with our current requirements is that
dispatching still gets us good speedup in some cases, and the branch
predictor should take care of some of the latency of dispatching.

One argument for adding more requirements is that not only can
dispatching go away, but we can add flags to the compilers to use
later instructions, which can speed up auto-vectorized operations or
standard library operations. For instance, AVX has 256-bit registers
that can speed up bulk memory operations.

A concern I have with setting a time-based rule is that it doesn't
seem easy to me to figure out when, say, AMD *stopped* selling
server-class chips without AVX. So, if we started requiring AVX, we
could have some Impala user with recent AMD chips become unable to run
the latest Impala, which would be a shame.

Thoughts about what we should require?

(*) We spit out an error if the machine does not have SSSE3

Reply via email to