Ping.

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02036.html
Thanks,
Kyrill

On 24/07/15 11:55, Kyrill Tkachov wrote:
Hi all,

This patch implements an aarch64-specific expansion of the signed modulo by a 
power of 2.
The proposed sequence makes use of the conditional negate instruction CSNEG.
For a power of N, x % N can be calculated with:
negs   x1, x0
and    x0, x0, #(N - 1)
and    x1, x1, #(N - 1)
csneg  x0, x0, x1, mi

So, for N == 256 this would be:
negs   x1, x0
and    x0, x0, #255
and    x1, x1, #255
csneg  x0, x0, x1, mi

For comparison, the existing sequence emitted by expand_smod_pow2 in expmed.c 
is:
asr     x1, x0, 63
lsr     x1, x1, 56
add     x0, x0, x1
and     x0, x0, 255
sub     x0, x0, x1

Note that the CSNEG sequence is one instruction shorter and that the two and 
operations
are independent, compared to the existing sequence where all instructions are 
dependent
on the preceeding instructions.

For the special case of N == 2 we can do even better:
cmp     x0, xzr
and     x0, x0, 1
csneg   x0, x0, x0, ge

I first tried implementing this in the generic code in expmed.c but that didn't 
work
out for a few reasons:

* This relies on having a conditional-negate instruction. We could gate it on
HAVE_conditional_move and the combiner is capable of merging the final negate 
into
the conditional move if a conditional negate is available (like on aarch64) but 
on
targets without a conditional negate this would end up emitting a separate 
negate.

* The first negs has to be a negs for the sequence to be a win i.e. having a 
separate
negate and compare makes the sequence slower than the existing one (at least in 
my
microbenchmarking) and I couldn't get subsequent passes to combine the negate 
and combine
into the negs (presumably due to the use of the negated result in one of the 
ands).
Doing it in the aarch64 backend where I could just call the exact gen_* 
functions that
I need worked much more cleanly.

The costing logic is updated to reflect this sequence during the intialisation 
of
expmed.c where it calculates the smod_pow2_cheap metric.

The tests will come in patch 3 of the series which are partly shared with the 
equivalent
arm implementation.

Bootstrapped and tested on aarch64.
Ok for trunk?

Thanks,
Kyrill

2015-07-24  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

      * config/aarch64/aarch64.md (mod<mode>3): New define_expand.
      (*neg<mode>2_compare0): Rename to...
      (neg<mode>2_compare0): ... This.
      * config/aarch64/aarch64.c (aarch64_rtx_costs, MOD case): Reflect
      CSNEG sequence in MOD by power of 2 case.

Reply via email to