https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67553
--- Comment #2 from tmb99 at gmx dot net --- seems to be the same for most saturating instructions: __m128i v0 = _mm_setzero_si128(); __m128i v2 = _mm_setzero_si128(); __m128i sum = _mm_adds_epi16(v0,v2); __m128i dif = _mm_subs_epi8(v0,v2); __m128i hsum = _mm_hadds_epi16(v0,v2); __m128i hdif = _mm_hsubs_epi16(v0,v2); __m128i pacu = _mm_packus_epi16(v0,v2); __m128i pacs = _mm_packs_epi32(v0,v2); compiles to: vpxor %xmm0, %xmm0, %xmm0 vpxor %xmm2, %xmm2, %xmm2 vphsubsw %xmm0, %xmm0, %xmm4 vpackuswb %xmm0, %xmm0, %xmm3 vphaddsw %xmm0, %xmm0, %xmm5 vpsubsb %xmm2, %xmm2, %xmm2 vpxor %xmm1, %xmm1, %xmm1 vpaddsw %xmm0, %xmm0, %xmm0 vpackssdw %xmm1, %xmm1, %xmm1 also: 3 setzero/vpxor instructions instead of just one.