https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #16 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #15) > I plan to work on this early in stage3. > And we really shouldn't use any tables, GCC should figure it all out. > So, for double-word modulo by constant that would be expanded using a > libcall, go for x from the word bitsize to double-word bitsize and check if > (1max << x) % cst > is 1 It's probably better to search from high to low, to reduce the number of necessary shifts for division by constants like 9 or 13. > (and prefer what we've agreed on for 3), and fall back to > multiplications (see #c8) if there aren't any other options and the costs > don't say it is too costly. I think for variants where the constants aren't power of two, #define ONE ((__uint128_t) 1) #define TWO_64 (ONE << 64) #define MASK60 ((1ul << 60) - 1) void div_rem_13 (mytype n, mytype *div, unsigned int *rem) { const mytype magic = TWO_64 * 14189803133622732012u + 5675921253449092805u * ONE; /* 0xC4EC4EC4EC4EC4EC4EC4EC4EC4EC4EC5 */ __uint64_t a, b, c; unsigned int r; a = n & MASK60; b = (n >> 60); b = b & MASK60; c = (n >> 120); r = (a+b+c) % 13; n = n - r; *div = n * magic; *rem = r; } should be pretty efficient; there is only one shift which spans two words. (The assembly generated from the function looks weird because of quite a few move instructions, but that should not be an issue for code generated inline). Regarding the approach in comment #8, I think I'll run some benchmarks to see how well that works for other constants which don't fit the pattern of being divisors for 2^n-1.