https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> --- And perhaps for other (but constant and not power of two) modulos use that unsigned long long a; a = (n >> 96) * (unsigned long long) (((__uint128_t 1) << 96) % c); a += ((n >> 64) & 0xffffffffULL) * (unsigned long long) (((__uint128_t 1) << 64) % c); a += ((n >> 32) & 0xffffffffULL) * (unsigned long long) (((__uint128_t 1) << 32) % c); a += (n & 0xffffffffULL); return a % c; form? Though, guess it might get quite large because the multiplications by constants can expand again into multiple instructions. And it needs to be careful that the 4 addends don't overflow the 32 bits.