https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125189
--- Comment #2 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
godbold emits
vcvttps2dq zmm0, zmm2
vpsrad zmm1, zmm0, 31
vpsrld zmm1, zmm1, 28
vpaddd zmm0, zmm0, zmm1
vpandd zmm0, zmm0, zmm3
vpsubd zmm0, zmm0, zmm1
vmovups zmm1, ZMMWORD PTR "y"[rip]
for
"for (int k=0; k<16; ++k) j[k] = int(a[i+k])%16;"
so I did not bother to write it myself...
(and it is not a realistic use case: usually one builds the index with the most
significand bits)