In a discussion on https://issues.apache.org/jira/browse/IMPALA-6128, we are talking about which instruction sets (available on newer x86-64 processors) we want to require.
At this point, I'm not sure how strong the motivation is for requiring certain instruction sets, but it may be worth some effort to talk about guidelines. As of now, we can decide at run time which methods to use based on CPU info gathered at daemon start time. See cpu-info.cc. The instruction in this case is the CLMUL instruction, which we believe was available on all new server-class x86-64 chips by Intel and AMD as of Q2, 2011. It has good performance benefits for spill-to-disk encryption. We currently use the following, but only dispatching at run time: SSSE3(*), SSE4.1, SSE4.2 (Available since late 2011 on both AMD and Intel) POPCNT (Available since late 2008 on both AMD and Intel) AVX (late 2011) AVX2 (late 2015) One argument for continuing with our current requirements is that dispatching still gets us good speedup in some cases, and the branch predictor should take care of some of the latency of dispatching. One argument for adding more requirements is that not only can dispatching go away, but we can add flags to the compilers to use later instructions, which can speed up auto-vectorized operations or standard library operations. For instance, AVX has 256-bit registers that can speed up bulk memory operations. A concern I have with setting a time-based rule is that it doesn't seem easy to me to figure out when, say, AMD *stopped* selling server-class chips without AVX. So, if we started requiring AVX, we could have some Impala user with recent AMD chips become unable to run the latest Impala, which would be a shame. Thoughts about what we should require? (*) We spit out an error if the machine does not have SSSE3