[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

tkoenig at gcc dot gnu.org via Gcc-bugs Sun, 08 Nov 2020 11:28:19 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459


--- Comment #16 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #15)
> I plan to work on this early in stage3.
> And we really shouldn't use any tables, GCC should figure it all out.
> So, for double-word modulo by constant that would be expanded using a
> libcall, go for x from the word bitsize to double-word bitsize and check if
> (1max << x) % cst
> is 1

It's probably better to search from high to low, to reduce the number
of necessary shifts for division by constants like 9 or 13.

> (and prefer what we've agreed on for 3), and fall back to
> multiplications (see #c8) if there aren't any other options and the costs
> don't say it is too costly.

I think for variants where the constants aren't power of two,

#define ONE ((__uint128_t) 1)
#define TWO_64 (ONE << 64)
#define MASK60 ((1ul << 60) - 1)

void
div_rem_13 (mytype n, mytype *div, unsigned int *rem)
{
  const mytype magic = TWO_64 * 14189803133622732012u + 5675921253449092805u *
ONE; /* 0xC4EC4EC4EC4EC4EC4EC4EC4EC4EC4EC5 */
  __uint64_t a, b, c;
  unsigned int r;

  a = n & MASK60;
  b = (n >> 60);
  b = b & MASK60;
  c = (n >> 120);
  r = (a+b+c) % 13;
  n = n - r;
  *div = n * magic;
  *rem = r;
}

should be pretty efficient; there is only one shift which spans two
words.  (The assembly generated from the function looks weird
because of quite a few move instructions, but that should not be
an issue for code generated inline).

Regarding the approach in comment #8, I think I'll run some benchmarks
to see how well that works for other constants which don't fit
the pattern of being divisors for 2^n-1.

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

Reply via email to