Re: Truncate optimisation question

2013-12-09 Thread Eric Botcazou
> Therefore, I think that the best thing to do is to go back to Uros' original
> idea of distributing the SUBREG only within a PLUS or a MINUS:
>   http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01682.html
> The outer operation sort of guarantees that operating in this mode is OK for
> the target, so the transformation can be done unconditionally.  It could be
> also done in simplify-rtx.c instead of combine.c.

It turns out that it's too late, Uros immediately after installed a couple of 
patches in the x86 back-end:
  http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg01032.html
  http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg01074.html
that (very likely) rely on simplify_gen_subreg distributing the SUBREG.

OK, at this point I'm going to propose the minimal kludge...

-- 
Eric Botcazou


Re: Truncate optimisation question

2013-12-08 Thread Richard Sandiford
Eric Botcazou  writes:
> Therefore, I think that the best thing to do is to go back to Uros' original 
> idea of distributing the SUBREG only within a PLUS or a MINUS:
>   http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01682.html
> The outer operation sort of guarantees that operating in this mode is OK for 
> the target, so the transformation can be done unconditionally.  It could be 
> also done in simplify-rtx.c instead of combine.c.

Please do you whatever you think is best.  Clearly I screwed this up, sorry.

Richard


Re: Truncate optimisation question

2013-12-08 Thread Eric Botcazou
> No, the second follow-up was about moving the simplification to combine.c
> and guarding it with !WORD_REGISTER_OPERATIONS, or whatever other check
> you prefer, so that no undoing is necessary.

In fact the mere distribution of the SUBREG seems to be questionable, whatever 
the target.  For the testcase of PR58295, the redundant masking is eliminated 
because:

(insn 9 7 10 2 (set (reg:SI 120)
(plus:SI (reg:SI 119)
(const_int -48 [0xffd0]))) pr58295.c:8 5 {*arm_addsi3}
 (expr_list:REG_DEAD (reg:SI 119)
(nil)))
(insn 10 9 11 2 (set (reg:SI 121)
(and:SI (reg:SI 120)
(const_int 255 [0xff]))) pr58295.c:8 80 {*arm_andsi3_insn}
 (expr_list:REG_DEAD (reg:SI 120)
(nil)))
(insn 11 10 12 2 (set (reg:CC 100 cc)
(compare:CC (reg:SI 121)
(const_int 9 [0x9]))) pr58295.c:8 227 {*arm_cmpsi_insn}
 (expr_list:REG_DEAD (reg:SI 121)
(nil)))

is combined into:

(set (reg:CC 100 cc)
(compare:CC (plus:SI (reg:SI 119)
(const_int -48 [0xffd0]))
(const_int 9 [0x9])))

and then split, thanks to:

case SUBREG:
  /* Check for the case where we are comparing A - C1 with C2, that is

   (subreg:MODE (plus (A) (-C1))) op (C2)

 with C1 a constant, and try to lift the SUBREG, i.e. to do the
 comparison in the wider mode.  One of the following two conditions
 must be true in order for this to be valid:

   1. The mode extension results in the same bit pattern being added
  on both sides and the comparison is equality or unsigned.  As
  C2 has been truncated to fit in MODE, the pattern can only be
  all 0s or all 1s.

   2. The mode extension results in the sign bit being copied on
  each side.

 The difficulty here is that we have predicates for A but not for
 (A - C1) so we need to check that C1 is within proper bounds so
 as to perturbate A as little as possible.  */

in simplify_comparison (and the SUBREG is generated from the AND by another 
case of simplify_comparison).  Distributing the SUBREG disables the code.

Therefore, I think that the best thing to do is to go back to Uros' original 
idea of distributing the SUBREG only within a PLUS or a MINUS:
  http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01682.html
The outer operation sort of guarantees that operating in this mode is OK for 
the target, so the transformation can be done unconditionally.  It could be 
also done in simplify-rtx.c instead of combine.c.

-- 
Eric Botcazou


Re: Truncate optimisation question

2013-12-07 Thread Richard Sandiford
Eric Botcazou  writes:
>> But that's the problem with trying to do the optimisation in this way.
>> We first simplify a truncation of an SImode addition X.  Then we simplify
>> a zero extension of that truncation.  Then we have the opportunity to
>> realise that the zero extension wasn't necessary after all, so we actually
>> want to undo both simplifications and go back to the original addition
>> pattern. So undoing the simplifications is precisely what we're aiming for
>> here, again regardless of RISCness.
>
> In other words, you want to add the reverse transformation in combine.c to 
> undo the effects of the simplication in simplify-rtx.c?  And move the latter 
> simplification back to combine.c in the process?

No, the second follow-up was about moving the simplification to combine.c
and guarding it with !WORD_REGISTER_OPERATIONS, or whatever other check
you prefer, so that no undoing is necessary.

To be clear, that isn't what I want. :-)  It just seems less bad than
guarding a simplify-rtx.c operation based on the target.

What I really want is to do redundancy elimination as a separate pass,
without tying it combine and without doing it the way that combine does.
But it's not like I've got time to do that.

The quote above was just me griping about the way that, although combine
is asking simplify-rtx.c to simplify i1-combined-with-i2, it's actually
relying on the simplifications being no-ops, so that the original i1 is
still available when i1-with-i2 is further combined into i3 and the
redundancy of i2 becomes known, giving i1-with-i3.  It just seems like
a fragile way to do redundancy elimination.

Thanks,
Richard


Re: Truncate optimisation question

2013-12-07 Thread Eric Botcazou
> But that's the problem with trying to do the optimisation in this way.
> We first simplify a truncation of an SImode addition X.  Then we simplify
> a zero extension of that truncation.  Then we have the opportunity to
> realise that the zero extension wasn't necessary after all, so we actually
> want to undo both simplifications and go back to the original addition
> pattern. So undoing the simplifications is precisely what we're aiming for
> here, again regardless of RISCness.

In other words, you want to add the reverse transformation in combine.c to 
undo the effects of the simplication in simplify-rtx.c?  And move the latter 
simplification back to combine.c in the process?

-- 
Eric Botcazou


Re: Truncate optimisation question

2013-12-06 Thread Richard Biener
On Fri, Dec 6, 2013 at 9:42 AM, Richard Sandiford
 wrote:
> Eric Botcazou  writes:
>>> Well, I think making the simplify-rtx code conditional on the target
>>> would be the wrong way to go.  If we really can't live with it being
>>> unconditional then I think we should revert it.  But like I say I think
>>> it would be better to make combine recognise the redundancy even with
>>> the new form.  (Or as I say, longer term, not to rely on combine to
>>> eliminate redundant extensions.)  But I don't have time to do that myself...
>>
>> It helps x86 so we won't revert it.  My fear is that we'll need to add code 
>> in
>> other places to RISCify back the result of this "simplification".
>
> Sorry, realised I didn't respond to this yesterday.  I wasn't suggesting
> we just revert and walk away.  ISTR the original suggestion was to patch
> combine instead of simplify-rtx.c, so we could back to that.

I think that looks most sensible.

Richard.

> Thanks,
> Richard


Re: Truncate optimisation question

2013-12-06 Thread Richard Sandiford
Eric Botcazou  writes:
>> Well, I think making the simplify-rtx code conditional on the target
>> would be the wrong way to go.  If we really can't live with it being
>> unconditional then I think we should revert it.  But like I say I think
>> it would be better to make combine recognise the redundancy even with
>> the new form.  (Or as I say, longer term, not to rely on combine to
>> eliminate redundant extensions.)  But I don't have time to do that myself...
>
> It helps x86 so we won't revert it.  My fear is that we'll need to add code 
> in 
> other places to RISCify back the result of this "simplification".

Sorry, realised I didn't respond to this yesterday.  I wasn't suggesting
we just revert and walk away.  ISTR the original suggestion was to patch
combine instead of simplify-rtx.c, so we could back to that.

Thanks,
Richard


Re: Truncate optimisation question

2013-12-05 Thread Richard Sandiford
Eric Botcazou  writes:
>> The comment says that we're trying to match:
>> 
>> 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int
>> 2. (set (reg:QI) (plus:QI (mem:QI) (const_int)))
>> 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int)))
>> 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int)))
>> 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int
>> 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
>> 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
>> 8. (set (reg:SI) (leu:SI (plus:QI ...)))
>> 
>> And I think that's what we should be matching in cases where the
>> extension isn't redundant, even on RISC targets.
>
> Which one(s) exactly?  Most of the RISC targets we have are parameterized 
> (WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes 
> smaller than the word mode.

The first one, sorry.

>> The problem here isn't really about which mode is on the plus,
>> but whether we recognise that the extension instruction is redundant.
>> I.e. we start with:
>> 
>> (insn 9 8 10 2 (set (reg:SI 120)
>> (plus:SI (subreg:SI (reg:QI 118) 0)
>> (const_int -48 [0xffd0]))) test.c:6 -1
>>  (nil))
>> (insn 10 9 11 2 (set (reg:SI 121)
>> (and:SI (reg:SI 120)
>> (const_int 255 [0xff]))) test.c:6 -1
>>  (nil))
>> (insn 11 10 12 2 (set (reg:CC 100 cc)
>> (compare:CC (reg:SI 121)
>> (const_int 9 [0x9]))) test.c:6 -1
>>  (nil))
>> 
>> and what we want combine to do is to recognise that insn 10 is redundant
>> and reduce the sequence to:
>> 
>> (insn 9 8 10 2 (set (reg:SI 120)
>> (plus:SI (subreg:SI (reg:QI 118) 0)
>> (const_int -48 [0xffd0]))) test.c:6 -1
>>  (nil))
>> (insn 11 10 12 2 (set (reg:CC 100 cc)
>> (compare:CC (reg:SI 120)
>> (const_int 9 [0x9]))) test.c:6 -1
>>  (nil))
>> 
>> But insn 11 is redundant on all targets, not just RISC ones.
>> It isn't about whether the target has a QImode addition or not.
>
> That's theoritical though since, on x86 for example, the redundant 
> instruction 
> isn't even generated because of the QImode addition...

Not for this testcase, sure, but we use an SImode addition and keep the
equivalent redundant extension until combine for:

int foo (unsigned char *x)
{
  return (((unsigned int) *x - 48) & 0xff) < 10;
}

immediately before combine:
(insn 7 6 8 2 (parallel [
(set (reg:SI 93 [ D.1753 ])
(plus:SI (reg:SI 92 [ D.1753 ])
(const_int -48 [0xffd0])))
(clobber (reg:CC 17 flags))
]) /tmp/foo.c:3 261 {*addsi_1}
 (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 8 7 9 2 (set (reg:SI 94 [ D.1753 ])
(zero_extend:SI (subreg:QI (reg:SI 93 [ D.1753 ]) 0))) /tmp/foo.c:3 133 
{*zero_extendqisi2}
 (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ])
(nil)))
(insn 9 8 10 2 (set (reg:CC 17 flags)
(compare:CC (reg:SI 94 [ D.1753 ])
(const_int 9 [0x9]))) /tmp/foo.c:3 7 {*cmpsi_1}
 (expr_list:REG_DEAD (reg:SI 94 [ D.1753 ])
(nil)))

What saves us isn't QImode addition but QImode comparison:

combine:
(insn 7 6 8 2 (parallel [
(set (reg:SI 93 [ D.1753 ])
(plus:SI (reg:SI 92 [ D.1753 ])
(const_int -48 [0xffd0])))
(clobber (reg:CC 17 flags))
]) /tmp/foo.c:3 261 {*addsi_1}
 (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(note 8 7 9 2 NOTE_INSN_DELETED)
(insn 9 8 10 2 (set (reg:CC 17 flags)
(compare:CC (subreg:QI (reg:SI 93 [ D.1753 ]) 0)
(const_int 9 [0x9]))) /tmp/foo.c:3 5 {*cmpqi_1}
 (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ])
(nil)))

movzbl  (%rdi), %eax
subl$48, %eax
cmpb$9, %al
setbe   %al
movzbl  %al, %eax
ret

(The patch didn't affect things here.)

FWIW, change the testcase to:

int foo (unsigned char *x)
{
  return (((unsigned int) *x - 48) & 0x1ff) < 10;
}

and we keep the redundant AND, again regardless of whether the patch is
applied.

>> Well, I think making the simplify-rtx code conditional on the target
>> would be the wrong way to go.  If we really can't live with it being
>> unconditional then I think we should revert it.  But like I say I think
>> it would be better to make combine recognise the redundancy even with
>> the new form.  (Or as I say, longer term, not to rely on combine to
>> eliminate redundant extensions.)  But I don't have time to do that myself...
>
> It helps x86 so we won't revert it.  My fear is that we'll need to add
> code in other places to RISCify back the result of this
> "simplification".

But that's the problem with trying to do the optimisation in this way.
We first simplify a truncation of an SImode addition 

Re: Truncate optimisation question

2013-12-05 Thread Eric Botcazou
> The comment says that we're trying to match:
> 
> 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int
> 2. (set (reg:QI) (plus:QI (mem:QI) (const_int)))
> 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int)))
> 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int)))
> 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int
> 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
> 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
> 8. (set (reg:SI) (leu:SI (plus:QI ...)))
> 
> And I think that's what we should be matching in cases where the
> extension isn't redundant, even on RISC targets.

Which one(s) exactly?  Most of the RISC targets we have are parameterized 
(WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes 
smaller than the word mode.

> The problem here isn't really about which mode is on the plus,
> but whether we recognise that the extension instruction is redundant.
> I.e. we start with:
> 
> (insn 9 8 10 2 (set (reg:SI 120)
> (plus:SI (subreg:SI (reg:QI 118) 0)
> (const_int -48 [0xffd0]))) test.c:6 -1
>  (nil))
> (insn 10 9 11 2 (set (reg:SI 121)
> (and:SI (reg:SI 120)
> (const_int 255 [0xff]))) test.c:6 -1
>  (nil))
> (insn 11 10 12 2 (set (reg:CC 100 cc)
> (compare:CC (reg:SI 121)
> (const_int 9 [0x9]))) test.c:6 -1
>  (nil))
> 
> and what we want combine to do is to recognise that insn 10 is redundant
> and reduce the sequence to:
> 
> (insn 9 8 10 2 (set (reg:SI 120)
> (plus:SI (subreg:SI (reg:QI 118) 0)
> (const_int -48 [0xffd0]))) test.c:6 -1
>  (nil))
> (insn 11 10 12 2 (set (reg:CC 100 cc)
> (compare:CC (reg:SI 120)
> (const_int 9 [0x9]))) test.c:6 -1
>  (nil))
> 
> But insn 11 is redundant on all targets, not just RISC ones.
> It isn't about whether the target has a QImode addition or not.

That's theoritical though since, on x86 for example, the redundant instruction 
isn't even generated because of the QImode addition...

> Well, I think making the simplify-rtx code conditional on the target
> would be the wrong way to go.  If we really can't live with it being
> unconditional then I think we should revert it.  But like I say I think
> it would be better to make combine recognise the redundancy even with
> the new form.  (Or as I say, longer term, not to rely on combine to
> eliminate redundant extensions.)  But I don't have time to do that myself...

It helps x86 so we won't revert it.  My fear is that we'll need to add code in 
other places to RISCify back the result of this "simplification".

-- 
Eric Botcazou


Re: Truncate optimisation question

2013-12-04 Thread Richard Sandiford
Eric Botcazou  writes:
>> Combine is asking simplify-rtx.c to truncate an addition to QImode
>> and simplify-rtx.c is providing a reasonable representation of that.
>> It's the representation we should use when matching against .md patterns,
>> for example.  The problem is that combine doesn't want to keep the
>> truncation in this case, but doesn't know that yet.
>
> I disagree, I don't find it reasonable to turn an addition in SImode into an 
> addition in QImode when the machine supports the former but not the latter.
> I agree that it may help in some contexts, but then the transformation should 
> be restricted to these contexts.
>
>> Right, but the only complaint I know of is about its effect on combine.
>> And my point is that that complaint isn't about combine failing to combine
>> instructions per se.  It's that combine is failing to remove a redundant
>> operation.  With the right input, the same rtl sequence could conceivably
>> be generated on a CISC target like x86_64, since it defines all the required
>> patterns (SImode addition, QI->SI zero extension, SImode comparison). It
>> could also occur with a sequence that starts out as a QImode addition. So
>> trying to make the simplification depend on CISCness seems like papering
>> over the problem.
>
> The problem is that this changes the combinations tried by the combiner in a 
> way that is adverse to most RISC targets.  Sure, we could change all the 
> affected back-ends but what's the point exactly?  What do we gain here?
>
> Look for example at comment #4 in PR rtl-optimization/58295.

The comment says that we're trying to match:

1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int
2. (set (reg:QI) (plus:QI (mem:QI) (const_int)))
3. (set (reg:QI) (plus:QI (subreg:QI) (const_int)))
4. (set (reg:CC) (compare:CC (subreg:QI) (const_int)))
5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int
6. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
7. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
8. (set (reg:SI) (leu:SI (plus:QI ...)))

And I think that's what we should be matching in cases where the
extension isn't redundant, even on RISC targets.

The problem here isn't really about which mode is on the plus,
but whether we recognise that the extension instruction is redundant.
I.e. we start with:

(insn 9 8 10 2 (set (reg:SI 120)
(plus:SI (subreg:SI (reg:QI 118) 0)
(const_int -48 [0xffd0]))) test.c:6 -1
 (nil))
(insn 10 9 11 2 (set (reg:SI 121)
(and:SI (reg:SI 120)
(const_int 255 [0xff]))) test.c:6 -1
 (nil))
(insn 11 10 12 2 (set (reg:CC 100 cc)
(compare:CC (reg:SI 121)
(const_int 9 [0x9]))) test.c:6 -1
 (nil))

and what we want combine to do is to recognise that insn 10 is redundant
and reduce the sequence to:

(insn 9 8 10 2 (set (reg:SI 120)
(plus:SI (subreg:SI (reg:QI 118) 0)
(const_int -48 [0xffd0]))) test.c:6 -1
 (nil))
(insn 11 10 12 2 (set (reg:CC 100 cc)
(compare:CC (reg:SI 120)
(const_int 9 [0x9]))) test.c:6 -1
 (nil))

But insn 11 is redundant on all targets, not just RISC ones.
It isn't about whether the target has a QImode addition or not.

>> If you think the patch was wrong or if you feel the fallout is too great
>> then please feel free to revert it.
>
> I think that the fallout is too great for RISC targets, yes, so I'm trying to 
> find a reasonable compromise.

Well, I think making the simplify-rtx code conditional on the target
would be the wrong way to go.  If we really can't live with it being
unconditional then I think we should revert it.  But like I say I think
it would be better to make combine recognise the redundancy even with
the new form.  (Or as I say, longer term, not to rely on combine to
eliminate redundant extensions.)  But I don't have time to do that myself...

Thanks,
Richard


Re: Truncate optimisation question

2013-12-04 Thread Eric Botcazou
> Combine is asking simplify-rtx.c to truncate an addition to QImode
> and simplify-rtx.c is providing a reasonable representation of that.
> It's the representation we should use when matching against .md patterns,
> for example.  The problem is that combine doesn't want to keep the
> truncation in this case, but doesn't know that yet.

I disagree, I don't find it reasonable to turn an addition in SImode into an 
addition in QImode when the machine supports the former but not the latter.
I agree that it may help in some contexts, but then the transformation should 
be restricted to these contexts.

> Right, but the only complaint I know of is about its effect on combine.
> And my point is that that complaint isn't about combine failing to combine
> instructions per se.  It's that combine is failing to remove a redundant
> operation.  With the right input, the same rtl sequence could conceivably
> be generated on a CISC target like x86_64, since it defines all the required
> patterns (SImode addition, QI->SI zero extension, SImode comparison). It
> could also occur with a sequence that starts out as a QImode addition. So
> trying to make the simplification depend on CISCness seems like papering
> over the problem.

The problem is that this changes the combinations tried by the combiner in a 
way that is adverse to most RISC targets.  Sure, we could change all the 
affected back-ends but what's the point exactly?  What do we gain here?

Look for example at comment #4 in PR rtl-optimization/58295.

> If you think the patch was wrong or if you feel the fallout is too great
> then please feel free to revert it.

I think that the fallout is too great for RISC targets, yes, so I'm trying to 
find a reasonable compromise.

-- 
Eric Botcazou


Re: Truncate optimisation question

2013-12-03 Thread Richard Sandiford
Eric Botcazou  writes:
>> I don't think this is the way to go.  AIUI the problem here isn't that
>> RISC architectures don't have QImode adds as such.  If we were going
>> to combine insn 6 and insn 7 _in isolation_ then we would have either:
>> 
>>(zero_extend:SI (subreg:QI (plus:SI (subreg:QI (reg:SI R))
>>(const_int X
>> 
>> before the patch or:
>> 
>>(zero_extend:SI (plus:QI (reg:QI R)
>> (const_int X)))
>> 
>> after the patch.  And IMO the second form is a simplification over the
>> first. (Despite being a RISC port, MIPS Octeon does have a pattern for
>> zero-extending byte addition, so this isn't entirely academic.)
>
> Well, if you look at the transformation in isolation, you cannot
> reasonably say that it's a simplification for most RISC architectures
> either.  That it happens to help zero-extensions on MIPS is fine, but
> it pessimizes other cases on other architectures (and maybe on MIPS as
> well).

I think we're talking about different meanings of "simplification".
simplify-rtx.c can't and IMO shouldn't be making decisions about
simplicity in the sense of "X is cheaper on the target than Y"
or even "the target has an optab for X but not Y".  But it should
try wherever possible to (a) allow recursive simplification and
(b) reduce the number of ways that the same thing can be represented.
And the patch did that.

E.g. the old representation quoted above allowed the constant to have
redundant upper bits.  The new representation causes them to be removed,
so we have a predictable constant.

Combine is asking simplify-rtx.c to truncate an addition to QImode
and simplify-rtx.c is providing a reasonable representation of that.
It's the representation we should use when matching against .md patterns,
for example.  The problem is that combine doesn't want to keep the
truncation in this case, but doesn't know that yet.

>> It sounds like the problem is instead that we're relying on combine
>> to remove redundant zero extensions and that combine's no longer able
>> to do that when folding insns 6 and 7 into the comparison.  Is that right?
>> I think it would better to use a dedicated global pass to remove redundant
>> extensions instead.  IIRC there were various attempts to do that.
>
> The regressions affect 4.8 and mainline and it's probably too late to add new 
> passes in order to mitigate them.
>
>> IMO combine should just be about instruction selection.  The patch
>> still seems good to me from that POV.
>
> The patch is in simplify-rtx.c though, not in combine.c, so it's more general.

Right, but the only complaint I know of is about its effect on combine.
And my point is that that complaint isn't about combine failing to combine
instructions per se.  It's that combine is failing to remove a redundant
operation.  With the right input, the same rtl sequence could conceivably
be generated on a CISC target like x86_64, since it defines all the required
patterns (SImode addition, QI->SI zero extension, SImode comparison).
It could also occur with a sequence that starts out as a QImode addition.
So trying to make the simplification depend on CISCness seems like papering
over the problem.

If we want to keep this as a combine optimisation for 4.9 then I think
we should teach it to handle zero-extended arithmetic too.

But there again, I stood down as an rtl maintainer for a reason. :-)
If you think the patch was wrong or if you feel the fallout is too great
then please feel free to revert it.

Thanks,
Richard


Re: Truncate optimisation question

2013-12-03 Thread Eric Botcazou
> To me promote_mode sounds like the best fit.  But doesn't combine do
> instruction validation? So in this case the target claims to support the
> narrow operation?

Part of the problem is that it's not in the combiner, it's in simplify-rtx.c, 
so it's applied liberally when you're manipulating the RTL.

-- 
Eric Botcazou


Re: Truncate optimisation question

2013-12-03 Thread Eric Botcazou
> I don't think this is the way to go.  AIUI the problem here isn't that
> RISC architectures don't have QImode adds as such.  If we were going
> to combine insn 6 and insn 7 _in isolation_ then we would have either:
> 
>(zero_extend:SI (subreg:QI (plus:SI (subreg:QI (reg:SI R))
>(const_int X
> 
> before the patch or:
> 
>(zero_extend:SI (plus:QI (reg:QI R)
> (const_int X)))
> 
> after the patch.  And IMO the second form is a simplification over the
> first. (Despite being a RISC port, MIPS Octeon does have a pattern for
> zero-extending byte addition, so this isn't entirely academic.)

Well, if you look at the transformation in isolation, you cannot reasonably 
say that it's a simplification for most RISC architectures either.  That it 
happens to help zero-extensions on MIPS is fine, but it pessimizes other cases 
on other architectures (and maybe on MIPS as well).

> It sounds like the problem is instead that we're relying on combine
> to remove redundant zero extensions and that combine's no longer able
> to do that when folding insns 6 and 7 into the comparison.  Is that right?
> I think it would better to use a dedicated global pass to remove redundant
> extensions instead.  IIRC there were various attempts to do that.

The regressions affect 4.8 and mainline and it's probably too late to add new 
passes in order to mitigate them.

> IMO combine should just be about instruction selection.  The patch
> still seems good to me from that POV.

The patch is in simplify-rtx.c though, not in combine.c, so it's more general.

-- 
Eric Botcazou


Re: Truncate optimisation question

2013-12-03 Thread Richard Biener
Eric Botcazou  wrote:
>> For code:
>> 
>> unsigned char foo (unsigned char c)
>> {
>>return (c >= '0') && (c <= '9');
>> }
>> 
>> we end up generating:
>> 
>>  sub r0, r0, #48
>>  uxtbr0, r0
>>  cmp r0, #9
>>  movhi   r0, #0
>>  movls   r0, #1
>>  bx  lr
>> 
>> The extra uxtb (extend) is causing the test to fail. We started
>generating
>> the extra extend when a particular optimisation went in with
>(revision
>> r191928).
>
>That's PR rtl-optimization/58295.  We have pessimizations on the SPARC
>too.
>
>> The comment in simplify-rtx.c says it transforms
>> (truncate:SI (op:DI (x:DI) (y:DI)))
>> 
>> into
>> 
>> (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI)))
>> 
>> but from what I can see it also transforms
>> 
>> (truncate:QI (op:SI (x:SI) (y:SI)))
>> 
>> into
>> 
>> (op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI)))
>> 
>>  From the combine dump I see that the sub and extend operations come
>from
>> the RTL:
>> 
>> (insn 6 3 7 2 (set (reg:SI 116)
>>  (plus:SI (reg:SI 0 r0 [ c ])
>>  (const_int -48 [0xffd0])))
>> 
>> (insn 7 6 8 2 (set (reg:SI 117)
>>  (zero_extend:SI (subreg:QI (reg:SI 116) 0)))
>> 
>> 
>> If I add a QImode compare pattern to the arm backend it gets matched
>and the
>> extra extend goes away, but it seems to me that that's not the
>correct
>> solution. Ideally, if a QImode operation is performed as an SImode
>> operation on a target (like the sub and compare operations on arm)
>then we
>> should not be doing this optimisation?
>
>Yes, that's my opinion as well, but Richard (CCed) seemed to disagree. 
>In any 
>case, that's certainly not a simplification.
>
>> My question is, how does one express that information in the
>simplify-rtx.c
>> code? It seems that the PR that optimisation fixed (54457) only cared
>about
>> DI -> SI truncations, so perhaps we should disable it for conversions
>> between other modes where it's not beneficial altogether?
>
>I can think of 3 possible solutions:
> - WORD_REGISTER_OPERATIONS,
> - promote_mode,
> - optabs.
>
>The 3rd solution seems to be the most straightforward, but this would
>be the 
>first time that we test optabs from simplify-rtx.c.

To me promote_mode sounds like the best fit.  But doesn't combine do 
instruction validation? So in this case the target claims to support the narrow 
operation?

Richard.




Re: Truncate optimisation question

2013-12-03 Thread Richard Sandiford
Eric Botcazou  writes:
>> For code:
>> 
>> unsigned char foo (unsigned char c)
>> {
>>return (c >= '0') && (c <= '9');
>> }
>> 
>> we end up generating:
>> 
>>  sub r0, r0, #48
>>  uxtbr0, r0
>>  cmp r0, #9
>>  movhi   r0, #0
>>  movls   r0, #1
>>  bx  lr
>> 
>> The extra uxtb (extend) is causing the test to fail. We started generating
>> the extra extend when a particular optimisation went in with (revision
>> r191928).
>
> That's PR rtl-optimization/58295.  We have pessimizations on the SPARC too.
>
>> The comment in simplify-rtx.c says it transforms
>> (truncate:SI (op:DI (x:DI) (y:DI)))
>> 
>> into
>> 
>> (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI)))
>> 
>> but from what I can see it also transforms
>> 
>> (truncate:QI (op:SI (x:SI) (y:SI)))
>> 
>> into
>> 
>> (op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI)))
>> 
>>  From the combine dump I see that the sub and extend operations come from
>> the RTL:
>> 
>> (insn 6 3 7 2 (set (reg:SI 116)
>>  (plus:SI (reg:SI 0 r0 [ c ])
>>  (const_int -48 [0xffd0])))
>> 
>> (insn 7 6 8 2 (set (reg:SI 117)
>>  (zero_extend:SI (subreg:QI (reg:SI 116) 0)))
>> 
>> 
>> If I add a QImode compare pattern to the arm backend it gets matched and the
>> extra extend goes away, but it seems to me that that's not the correct
>> solution. Ideally, if a QImode operation is performed as an SImode
>> operation on a target (like the sub and compare operations on arm) then we
>> should not be doing this optimisation?
>
> Yes, that's my opinion as well, but Richard (CCed) seemed to disagree.  In 
> any 
> case, that's certainly not a simplification.
>
>> My question is, how does one express that information in the simplify-rtx.c
>> code? It seems that the PR that optimisation fixed (54457) only cared about
>> DI -> SI truncations, so perhaps we should disable it for conversions
>> between other modes where it's not beneficial altogether?
>
> I can think of 3 possible solutions:
>  - WORD_REGISTER_OPERATIONS,
>  - promote_mode,
>  - optabs.
>
> The 3rd solution seems to be the most straightforward, but this would be the 
> first time that we test optabs from simplify-rtx.c.

I don't think this is the way to go.  AIUI the problem here isn't that
RISC architectures don't have QImode adds as such.  If we were going
to combine insn 6 and insn 7 _in isolation_ then we would have either:

   (zero_extend:SI (subreg:QI (plus:SI (subreg:QI (reg:SI R))
   (const_int X

before the patch or:

   (zero_extend:SI (plus:QI (reg:QI R)
(const_int X)))

after the patch.  And IMO the second form is a simplification over the first.
(Despite being a RISC port, MIPS Octeon does have a pattern for
zero-extending byte addition, so this isn't entirely academic.)

It sounds like the problem is instead that we're relying on combine
to remove redundant zero extensions and that combine's no longer able
to do that when folding insns 6 and 7 into the comparison.  Is that right?
I think it would better to use a dedicated global pass to remove redundant
extensions instead.  IIRC there were various attempts to do that.

IMO combine should just be about instruction selection.  The patch
still seems good to me from that POV.

Thanks,
Richard


Re: Truncate optimisation question

2013-12-03 Thread Eric Botcazou
> For code:
> 
> unsigned char foo (unsigned char c)
> {
>return (c >= '0') && (c <= '9');
> }
> 
> we end up generating:
> 
>  sub r0, r0, #48
>  uxtbr0, r0
>  cmp r0, #9
>  movhi   r0, #0
>  movls   r0, #1
>  bx  lr
> 
> The extra uxtb (extend) is causing the test to fail. We started generating
> the extra extend when a particular optimisation went in with (revision
> r191928).

That's PR rtl-optimization/58295.  We have pessimizations on the SPARC too.

> The comment in simplify-rtx.c says it transforms
> (truncate:SI (op:DI (x:DI) (y:DI)))
> 
> into
> 
> (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI)))
> 
> but from what I can see it also transforms
> 
> (truncate:QI (op:SI (x:SI) (y:SI)))
> 
> into
> 
> (op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI)))
> 
>  From the combine dump I see that the sub and extend operations come from
> the RTL:
> 
> (insn 6 3 7 2 (set (reg:SI 116)
>  (plus:SI (reg:SI 0 r0 [ c ])
>  (const_int -48 [0xffd0])))
> 
> (insn 7 6 8 2 (set (reg:SI 117)
>  (zero_extend:SI (subreg:QI (reg:SI 116) 0)))
> 
> 
> If I add a QImode compare pattern to the arm backend it gets matched and the
> extra extend goes away, but it seems to me that that's not the correct
> solution. Ideally, if a QImode operation is performed as an SImode
> operation on a target (like the sub and compare operations on arm) then we
> should not be doing this optimisation?

Yes, that's my opinion as well, but Richard (CCed) seemed to disagree.  In any 
case, that's certainly not a simplification.

> My question is, how does one express that information in the simplify-rtx.c
> code? It seems that the PR that optimisation fixed (54457) only cared about
> DI -> SI truncations, so perhaps we should disable it for conversions
> between other modes where it's not beneficial altogether?

I can think of 3 possible solutions:
 - WORD_REGISTER_OPERATIONS,
 - promote_mode,
 - optabs.

The 3rd solution seems to be the most straightforward, but this would be the 
first time that we test optabs from simplify-rtx.c.

-- 
Eric Botcazou


Truncate optimisation question

2013-12-03 Thread Kyrill Tkachov

Hi all,

I'm investigating a testsuite failure on arm: gcc.target/arm/unsigned-extend-1.c

For code:

unsigned char foo (unsigned char c)
{
  return (c >= '0') && (c <= '9');
}

we end up generating:

sub r0, r0, #48
uxtbr0, r0
cmp r0, #9
movhi   r0, #0
movls   r0, #1
bx  lr

The extra uxtb (extend) is causing the test to fail. We started generating the 
extra extend when a particular optimisation went in with (revision r191928).


The comment in simplify-rtx.c says it transforms
(truncate:SI (op:DI (x:DI) (y:DI)))

into

(op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI)))

but from what I can see it also transforms

(truncate:QI (op:SI (x:SI) (y:SI)))

into

(op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI)))

From the combine dump I see that the sub and extend operations come from the 
RTL:

(insn 6 3 7 2 (set (reg:SI 116)
(plus:SI (reg:SI 0 r0 [ c ])
(const_int -48 [0xffd0])))

(insn 7 6 8 2 (set (reg:SI 117)
(zero_extend:SI (subreg:QI (reg:SI 116) 0)))


If I add a QImode compare pattern to the arm backend it gets matched and the 
extra extend goes away, but it seems to me that that's not the correct solution. 
Ideally, if a QImode operation is performed as an SImode operation on a target 
(like the sub and compare operations on arm) then we should not be doing this 
optimisation?


My question is, how does one express that information in the simplify-rtx.c 
code?
It seems that the PR that optimisation fixed (54457) only cared about DI -> SI 
truncations, so perhaps we should disable it for conversions between other modes 
where it's not beneficial altogether?


Thanks,
Kyrill