Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Bernd Schmidt

On 11/26/2015 05:22 PM, Richard Sandiford wrote:

Bernd Schmidt  writes:


I wish we'd taken some more time to think through the consequences of
the original internal_fn patchset.


I don't think this PR shows that the approach was wrong.


I think it does. Internal functions make a new assumptions, that 
expanders don't FAIL - but as we've now seen, they do. The optimize_size 
thing is reasonably easy to grep for and it looks like only i386 is 
affected, but have you looked at every expander in every port that could 
be used by an internal function to ensure it does not FAIL for a 
different reason?


Is there a simple way to disable the entire internal_fn machinery and 
get us back to where we were in gcc-5, without taking out all the code 
immediately? That would give us time until next stage 1 to think through 
the issues.



Bernd


Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Bernd Schmidt

On 12/01/2015 01:16 PM, Richard Biener wrote:

On Tue, Dec 1, 2015 at 12:54 PM, Bernd Schmidt  wrote:

On 11/26/2015 05:22 PM, Richard Sandiford wrote:


Bernd Schmidt  writes:


I wish we'd taken some more time to think through the consequences of
the original internal_fn patchset.



I don't think this PR shows that the approach was wrong.



I think it does. Internal functions make a new assumptions, that expanders
don't FAIL - but as we've now seen, they do. The optimize_size thing is
reasonably easy to grep for and it looks like only i386 is affected, but
have you looked at every expander in every port that could be used by an
internal function to ensure it does not FAIL for a different reason?


Of course we are not sure.


Hence, the correct approach for gcc-6 is to recognize that these patches 
were not ready, and disable the new functionality IMO. I'm not 
suggesting reverting the patches just yet, maybe we can solve these 
issues for gcc-7.



Do you have even a guess as to how to approach the issue differently?


Not off-hand. That's not a question we're facing for stage3 though, and 
I'd rather take a cautious approach than go deeper into the hole.



Bernd


Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Richard Biener
On Tue, Dec 1, 2015 at 12:54 PM, Bernd Schmidt  wrote:
> On 11/26/2015 05:22 PM, Richard Sandiford wrote:
>>
>> Bernd Schmidt  writes:
>>
>>> I wish we'd taken some more time to think through the consequences of
>>> the original internal_fn patchset.
>>
>>
>> I don't think this PR shows that the approach was wrong.
>
>
> I think it does. Internal functions make a new assumptions, that expanders
> don't FAIL - but as we've now seen, they do. The optimize_size thing is
> reasonably easy to grep for and it looks like only i386 is affected, but
> have you looked at every expander in every port that could be used by an
> internal function to ensure it does not FAIL for a different reason?

Of course we are not sure.  But I think the approach in the series is the only
reasonable one.  I view the internal_fn support for optabs as a great way to
provide sth like instruction selection to GIMPLE with the goal to simplify
RTL expansion itself (which since quite some time cannot rely on seeing
"large" expressions anymore and with TER has the limitation of seeing
only some under the constraint TER and out-of-SSA operate).

> Is there a simple way to disable the entire internal_fn machinery and get us
> back to where we were in gcc-5, without taking out all the code immediately?
> That would give us time until next stage 1 to think through the issues.

Do you have even a guess as to how to approach the issue differently?

Yes, I think we can rip out uses of the new machinery quite easily but I don't
think we're at the point declaring failure yet.

Richard.

>
> Bernd


Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Bernd Schmidt

On 12/01/2015 02:43 PM, Richard Sandiford wrote:

I don't think what you say is an argument that the approach is wrong.
The C conditions for optabs have always been more restricted than
other define_expands and define_insns, since they cannot refer
to operands.  When caching of optabs was added, they also lost
the ability to test for size/speed choices.  There have also
always been optabs that are not allowed to FAIL (such as moves,
get_thread_pointer, widening multiplication, vec_cond, etc.).
This series is extending that list, but it's in the spirit
of restrictions that have always existed.  I don't see that
that's an argument that the approach is wrong.


Ok, you can of course change the rules, but that means the following 
needs to be done as a minimum (and it should have been done initially):

 * the new rules must be documented
 * all existing expanders need to be examined to see whether they
   comply.

At the moment we don't know how widespread the problem is. If you're 
willing to do the audit of all ports then I'd be more willing to 
consider this suitable for gcc-6.



Bernd


Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 11/26/2015 05:22 PM, Richard Sandiford wrote:
>> Bernd Schmidt  writes:
>>
>>> I wish we'd taken some more time to think through the consequences of
>>> the original internal_fn patchset.
>>
>> I don't think this PR shows that the approach was wrong.
>
> I think it does. Internal functions make a new assumptions, that 
> expanders don't FAIL - but as we've now seen, they do. The optimize_size 
> thing is reasonably easy to grep for and it looks like only i386 is 
> affected, but have you looked at every expander in every port that could 
> be used by an internal function to ensure it does not FAIL for a 
> different reason?

I've tried and I couldn't see any other problems.

I don't think what you say is an argument that the approach is wrong.
The C conditions for optabs have always been more restricted than
other define_expands and define_insns, since they cannot refer
to operands.  When caching of optabs was added, they also lost
the ability to test for size/speed choices.  There have also
always been optabs that are not allowed to FAIL (such as moves,
get_thread_pointer, widening multiplication, vec_cond, etc.).
This series is extending that list, but it's in the spirit
of restrictions that have always existed.  I don't see that
that's an argument that the approach is wrong.

The current approach to FAILs dated from a time when expand was
the first code-generation pass.  The FAILs aren't a good fit for
gimple optimisers that are trying to find out what the target
can do (and how cheaply it can do it).

> Is there a simple way to disable the entire internal_fn machinery and 
> get us back to where we were in gcc-5, without taking out all the code 
> immediately? That would give us time until next stage 1 to think through 
> the issues.

That seems like an overreaction.

I went for the 22-patch series because I think it's the best fix for
this problem.  It also makes the existing enabled, preferred_for_size
and preferred_for_speed handling more robust (as shown by the ARM bug
that the structural changes exposed at compile time).  But there are
other less-invasive ways of fixing it too, as described in the thread
about rsqrt.  I'm going to work on that today.

Thanks,
Richard



Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Bernd Schmidt

On 11/25/2015 01:20 PM, Richard Sandiford wrote:

This series fixes PR 68432, a regression caused by my internal-functions-
for-optabs series.  Some of the libm optabs in i386.md have a true HAVE_*
condition but conditionally FAIL if we're optimising for size:

   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
   && !flag_trapping_math)
 {
   if (TARGET_ROUND)
emit_insn (gen_sse4_1_round2
   (operands[0], operands[1], GEN_INT (ROUND_MXCSR)));
   else if (optimize_insn_for_size_p ())
 FAIL;
   else
ix86_expand_rint (operands[0], operands[1]);
 }


How many such cases are there? Is it just the ix86 patterns? And, could 
the same effect be achieved by just moving the optimize_insn_for_size_p 
test into the predicate (as some existing patterns already do), and then 
testing the predicate while ensuring that optimize_insn_for_x returns 
the right value? That seems like a minimal fix, and I think one that 
would be vastly more appropriate for stage 3. The alternative splitting 
looks error-prone and may not be optimal, and I still have misgivings 
about the new attribute syntax and its application to define_expands.



Bernd


Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 11/25/2015 01:20 PM, Richard Sandiford wrote:
>> This series fixes PR 68432, a regression caused by my internal-functions-
>> for-optabs series.  Some of the libm optabs in i386.md have a true HAVE_*
>> condition but conditionally FAIL if we're optimising for size:
>>
>>if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
>>&& !flag_trapping_math)
>>  {
>>if (TARGET_ROUND)
>>  emit_insn (gen_sse4_1_round2
>> (operands[0], operands[1], GEN_INT (ROUND_MXCSR)));
>>else if (optimize_insn_for_size_p ())
>>  FAIL;
>>else
>>  ix86_expand_rint (operands[0], operands[1]);
>>  }
>
> How many such cases are there? Is it just the ix86 patterns?

Yeah, just x86 AFAICT.

> And, could the same effect be achieved by just moving the
> optimize_insn_for_size_p test into the predicate (as some existing
> patterns already do), and then testing the predicate while ensuring
> that optimize_insn_for_x returns the right value?

That would mean that the validity of a gimple call would depend on both
the target predicates and whether the block containing the statement
is optimised for size or speed.  So whenever we want to test whether
a gimple call is valid, we'd need to generate rtl for its arguments
and pass them to the target predicates.  We'd also need to be aware
that moving a call between blocks could make it invalid (because
we might be moving a call from a block optimised for speed to a block
optimised for size).  I don't think those are the kinds of thing that
gimple passes would normally expect.

It seems better to use FAILs and predicates for correctness only
and use other ways of representing size/speed decisions.  And since we
already have another way for rtl, it seems like a good idea to use it
for gimple too.

Thanks,
Richard



Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Bernd Schmidt

On 11/26/2015 04:13 PM, Richard Sandiford wrote:


That would mean that the validity of a gimple call would depend on both
the target predicates and whether the block containing the statement
is optimised for size or speed.  So whenever we want to test whether
a gimple call is valid, we'd need to generate rtl for its arguments
and pass them to the target predicates.  We'd also need to be aware
that moving a call between blocks could make it invalid (because
we might be moving a call from a block optimised for speed to a block
optimised for size).  I don't think those are the kinds of thing that
gimple passes would normally expect.


In your world, would we move such calls into places where we currently 
would reject expanding them? I.e., would the expanders no longer fail, 
even if the target does not want to expand something when optimizing for 
size?


The other question is, you mention the need to generate rtl. I don't 
quite see why - the predicates (or insn conditions, to follow the 
terminology in the manual) for a named pattern aren't allowed to look at 
operands. Surely these conditions are already taken into account in your 
internal_fn work?



It seems better to use FAILs and predicates for correctness only
and use other ways of representing size/speed decisions.  And since we
already have another way for rtl, it seems like a good idea to use it
for gimple too.


I'm looking for a minimal fix for gcc-6, and a 22-patch series that 
rewrites lots of target patterns isn't that. If someone else wants to 
approve it then fine, but I think this is not a good approach for now.


To avoid having to retest validity when moving an internal function, 
could you just make the availability test run the predicate with both 
for_speed and for_size options, and require that the pattern is valid 
for both? That should give you a definitive answer as to whether you can 
later expand the insn, and I'd call that good enough for now.


I wish we'd taken some more time to think through the consequences of 
the original internal_fn patchset.



Bernd


Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 11/26/2015 04:13 PM, Richard Sandiford wrote:
>> That would mean that the validity of a gimple call would depend on both
>> the target predicates and whether the block containing the statement
>> is optimised for size or speed.  So whenever we want to test whether
>> a gimple call is valid, we'd need to generate rtl for its arguments
>> and pass them to the target predicates.  We'd also need to be aware
>> that moving a call between blocks could make it invalid (because
>> we might be moving a call from a block optimised for speed to a block
>> optimised for size).  I don't think those are the kinds of thing that
>> gimple passes would normally expect.
>
> In your world, would we move such calls into places where we currently 
> would reject expanding them? I.e., would the expanders no longer fail, 
> even if the target does not want to expand something when optimizing for 
> size?

Yeah, that could happen in theory, though should be rare in practice.
We already had this problem in rtl when patterns tried to test
optimize_insn_for_size/speed_p in their insn conditions.  See e.g.:

  https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00697.html

> The other question is, you mention the need to generate rtl. I don't 
> quite see why - the predicates (or insn conditions, to follow the 
> terminology in the manual) for a named pattern aren't allowed to look at 
> operands. Surely these conditions are already taken into account in your 
> internal_fn work?

Ah, when you said "predicate" I assumed you meant the match_operand
predicates.  We'd need to generate rtl to check those.  But checking
size/speed in the insn condition doesn't work for the reason above: we
could move an instruction between blocks after deciding to generate it.

It also isn't suitable for optabs because the conditions are cached
by init_optabs.  I suppose we could have a separate cache for size
and speed though.

>> It seems better to use FAILs and predicates for correctness only
>> and use other ways of representing size/speed decisions.  And since we
>> already have another way for rtl, it seems like a good idea to use it
>> for gimple too.
>
> I'm looking for a minimal fix for gcc-6, and a 22-patch series that 
> rewrites lots of target patterns isn't that. If someone else wants to 
> approve it then fine, but I think this is not a good approach for now.

OK.  But I'd also like to avoid a hacky solution just because of the
stage.  Although we're in stage 3, we're still early stage 3,
so I'm hoping there's a bit of leeway.

> To avoid having to retest validity when moving an internal function, 
> could you just make the availability test run the predicate with both 
> for_speed and for_size options, and require that the pattern is valid 
> for both? That should give you a definitive answer as to whether you can 
> later expand the insn, and I'd call that good enough for now.

That would mean we'd never use rint for x86 before expand.

> I wish we'd taken some more time to think through the consequences of 
> the original internal_fn patchset.

I don't think this PR shows that the approach was wrong.

Thanks,
Richard



Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Bernd Schmidt

On 11/26/2015 05:22 PM, Richard Sandiford wrote:

It also isn't suitable for optabs because the conditions are cached
by init_optabs.  I suppose we could have a separate cache for size
and speed though.


That sounds necessary given the existence of such insn conditions, 
unless we want to disallow this practice.



To avoid having to retest validity when moving an internal function,
could you just make the availability test run the predicate with both
for_speed and for_size options, and require that the pattern is valid
for both? That should give you a definitive answer as to whether you can
later expand the insn, and I'd call that good enough for now.


That would mean we'd never use rint for x86 before expand.


How does this compare to the situation before your internal_fn patches? 
What are cases where behaviour differs and how?



Bernd


Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-26 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 11/26/2015 05:22 PM, Richard Sandiford wrote:
>> It also isn't suitable for optabs because the conditions are cached
>> by init_optabs.  I suppose we could have a separate cache for size
>> and speed though.
>
> That sounds necessary given the existence of such insn conditions, 
> unless we want to disallow this practice.

Does that work though?  (Before my internal_fn patches I mean.)

>>> To avoid having to retest validity when moving an internal function,
>>> could you just make the availability test run the predicate with both
>>> for_speed and for_size options, and require that the pattern is valid
>>> for both? That should give you a definitive answer as to whether you can
>>> later expand the insn, and I'd call that good enough for now.
>>
>> That would mean we'd never use rint for x86 before expand.
>
> How does this compare to the situation before your internal_fn patches? 
> What are cases where behaviour differs and how?

Ok, for rint I guess it doesn't matter.  It's the TARGET_USE_FANCY_MATH_387
that would change, since tree-call-cdce.c would no longer convert (say):

  y = acos(x);

into

  y = ifn_acos(x);
  if (...)
acos (x);

Maybe noone cares about TARGET_USE_FANCY_MATH_387 these days though.

I still think insn conditions are the wrong place to check this.
We shouldn't have a different rule for rtl insn conditions and gimple
insn conditions.

Thanks,
Richard



[PR68432 00/26] Handle size/speed choices for internal functions

2015-11-25 Thread Richard Sandiford
This series fixes PR 68432, a regression caused by my internal-functions-
for-optabs series.  Some of the libm optabs in i386.md have a true HAVE_*
condition but conditionally FAIL if we're optimising for size:

  if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
  && !flag_trapping_math)
{
  if (TARGET_ROUND)
emit_insn (gen_sse4_1_round2
   (operands[0], operands[1], GEN_INT (ROUND_MXCSR)));
  else if (optimize_insn_for_size_p ())
FAIL;
  else
ix86_expand_rint (operands[0], operands[1]);
}

This is going to cause problems if we want to make more decisions
related to optabs at the gimple level.

We've already had the same problem in rtl, where some patterns used
to check optimize_insn_for_size_p in their C condition, and would then
fail to match if an instruction were moved from a hot block to a cold
block.  Now rtl has two attributes, preferred_for_size and
preferred_for_speed, that say whether a particular alternative of a
particular instruction should be used when optimising for size or speed
respectively.  We try to honour a "false" value as much as possible,
but it's not an absolute guarantee.

The point of this series is to extend preferred_for_size and
preferred_for_speed to define_expands, so that the attributes
can be tested for optabs too.

enabled, preferred_for_size and preferred_for_speed are supposed
to be an inavariant property of a given (code, alternative) pair.
They're not supposed to depend on a particular insn or its operands.
However, the attribute functions still take an rtx_insn * as argument,
so mistakes are only caught at runtime if we see a specific instruction
for which the attributes conflict with the cached ones
(recog.c:check_bool_attrs).

Extending the attributes to define_expands means that we finally
need to "fix" that and make the attributes take a (code, alternative)
pair instead.  Most ports were already structured to allow that.
The two exceptions are ARM and NDS32.

The problem for NDS32 is that "enabled" depends on "length", which
needs access to an instruction in order to calculate branch lengths.
This isn't a problem in practice because all instructions with
operand-dependent lengths force "enabled" to 1.  However,
it's easier to enforce the restriction at compile time if we
have an "is_16bit" attribute, to go along with the TARGET_16_BIT
condition that controls whether 16-bit instructions can be used.

The problem for ARM is that "enabled" depends on "type" and
"use_literal_pool", both of which use match_operand tests in some cases.
I think the "type" match_operands were actually OK for "enabled" in
practice, but I think the "use_literal_pool" ones are a genuine bug.
They are used when we have one alternative that accepts both memory and
immediate operands.  The alternative is supposed to be disabled for
immediate operands when arm_disable_literal_pool is true, but not
disabled for memory operands.  However, the "enabled" cache only cares
about the alternative number, so we can end up disabling memory operands
if we cached based on an immediate operand, or allow immediate operands
if we cached based on a memory operand.  The series fixes that by
splitting alternatives where necessary.

Due to the define_subst patches, it's already syntactically possible to
attach attributes to define_expands, but genattrtab.c currently ignores
them.  The series restricts these attributes to the new "code,alternative"
style and then handles them in the same way as define_insn attributes.

I realise this is rather invasive for stage 3, but I think it's
worth fixing the bug "properly" rather than papering over it.
The ARM "use_literal_pool" bug described above shows how easy
it is for the enabled attribute to go wrong in subtle ways.

The series is split into five parts:

  (1) Make the ARM and NDS32 changes described above
  (2) Add support for "code,alternative" attributes
  (3) Make all ports use them for enabled, preferred_for_size and
  preferred_for_speed
  (4) Use preferred_for_size and preferred_for_speed to decide whether
  to use internal functions
  (5) Convert the internal-function-related i386 patterns to use
  preferred_for_size instead of FAILing.

(3) is a purely mechanical change.  I think it counts as obvious if the
other parts are OK.

Tested by building GCC before and after the series on:

aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi
arm-linux-gnueabihf avr-rtems bfin-elf c6x-elf cr16-elf cris-elf
epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf
ia64-linux-gnu iq2000-elf lm32-elf m32c-elf m32r-elf
m68k-linux-gnu mcore-elf mep-elf microblaze-elf mips-linux-gnu
mmix mn10300-elf moxie-elf msp430-elf nds32le-elf
hppa64-hp-hpux11.23 nios2-linux-gnu nvptx-none pdp11
powerpc-linux-gnu powerpc-eabispe rl78-elf rx-elf s390-linux-gnu
sh-linux-gnu sparc-linux-gnu spu-elf tilegx-elf tilepro-elf
xstormy16-elf v850-elf vax-netbsdelf 

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-25 Thread Richard Biener
On Wed, Nov 25, 2015 at 1:20 PM, Richard Sandiford
 wrote:
> This series fixes PR 68432, a regression caused by my internal-functions-
> for-optabs series.  Some of the libm optabs in i386.md have a true HAVE_*
> condition but conditionally FAIL if we're optimising for size:
>
>   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
>   && !flag_trapping_math)
> {
>   if (TARGET_ROUND)
> emit_insn (gen_sse4_1_round2
>(operands[0], operands[1], GEN_INT (ROUND_MXCSR)));
>   else if (optimize_insn_for_size_p ())
> FAIL;
>   else
> ix86_expand_rint (operands[0], operands[1]);
> }
>
> This is going to cause problems if we want to make more decisions
> related to optabs at the gimple level.
>
> We've already had the same problem in rtl, where some patterns used
> to check optimize_insn_for_size_p in their C condition, and would then
> fail to match if an instruction were moved from a hot block to a cold
> block.  Now rtl has two attributes, preferred_for_size and
> preferred_for_speed, that say whether a particular alternative of a
> particular instruction should be used when optimising for size or speed
> respectively.  We try to honour a "false" value as much as possible,
> but it's not an absolute guarantee.
>
> The point of this series is to extend preferred_for_size and
> preferred_for_speed to define_expands, so that the attributes
> can be tested for optabs too.

So to not re-introduce the same position issue at GIMPLE (passes
querying optimize_bb_for_speed/size and with that querying
the optab for IFN support) when expanding an internal function
you ignore whether it actually "exists"?  That is, IFN expansion
will skip the define_expand (which is maybe disabled)?

Otherwise cleaning this things up is nice.  The question is whether
it really solves all the issues ;)

Richard.

> enabled, preferred_for_size and preferred_for_speed are supposed
> to be an inavariant property of a given (code, alternative) pair.
> They're not supposed to depend on a particular insn or its operands.
> However, the attribute functions still take an rtx_insn * as argument,
> so mistakes are only caught at runtime if we see a specific instruction
> for which the attributes conflict with the cached ones
> (recog.c:check_bool_attrs).
>
> Extending the attributes to define_expands means that we finally
> need to "fix" that and make the attributes take a (code, alternative)
> pair instead.  Most ports were already structured to allow that.
> The two exceptions are ARM and NDS32.
>
> The problem for NDS32 is that "enabled" depends on "length", which
> needs access to an instruction in order to calculate branch lengths.
> This isn't a problem in practice because all instructions with
> operand-dependent lengths force "enabled" to 1.  However,
> it's easier to enforce the restriction at compile time if we
> have an "is_16bit" attribute, to go along with the TARGET_16_BIT
> condition that controls whether 16-bit instructions can be used.
>
> The problem for ARM is that "enabled" depends on "type" and
> "use_literal_pool", both of which use match_operand tests in some cases.
> I think the "type" match_operands were actually OK for "enabled" in
> practice, but I think the "use_literal_pool" ones are a genuine bug.
> They are used when we have one alternative that accepts both memory and
> immediate operands.  The alternative is supposed to be disabled for
> immediate operands when arm_disable_literal_pool is true, but not
> disabled for memory operands.  However, the "enabled" cache only cares
> about the alternative number, so we can end up disabling memory operands
> if we cached based on an immediate operand, or allow immediate operands
> if we cached based on a memory operand.  The series fixes that by
> splitting alternatives where necessary.
>
> Due to the define_subst patches, it's already syntactically possible to
> attach attributes to define_expands, but genattrtab.c currently ignores
> them.  The series restricts these attributes to the new "code,alternative"
> style and then handles them in the same way as define_insn attributes.
>
> I realise this is rather invasive for stage 3, but I think it's
> worth fixing the bug "properly" rather than papering over it.
> The ARM "use_literal_pool" bug described above shows how easy
> it is for the enabled attribute to go wrong in subtle ways.
>
> The series is split into five parts:
>
>   (1) Make the ARM and NDS32 changes described above
>   (2) Add support for "code,alternative" attributes
>   (3) Make all ports use them for enabled, preferred_for_size and
>   preferred_for_speed
>   (4) Use preferred_for_size and preferred_for_speed to decide whether
>   to use internal functions
>   (5) Convert the internal-function-related i386 patterns to use
>   preferred_for_size instead of FAILing.
>
> (3) is a purely mechanical change.  I think it counts as obvious if the
> other parts are OK.

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-11-25 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Nov 25, 2015 at 1:20 PM, Richard Sandiford
>  wrote:
>> This series fixes PR 68432, a regression caused by my internal-functions-
>> for-optabs series.  Some of the libm optabs in i386.md have a true HAVE_*
>> condition but conditionally FAIL if we're optimising for size:
>>
>>   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH
>>   && !flag_trapping_math)
>> {
>>   if (TARGET_ROUND)
>> emit_insn (gen_sse4_1_round2
>>(operands[0], operands[1], GEN_INT (ROUND_MXCSR)));
>>   else if (optimize_insn_for_size_p ())
>> FAIL;
>>   else
>> ix86_expand_rint (operands[0], operands[1]);
>> }
>>
>> This is going to cause problems if we want to make more decisions
>> related to optabs at the gimple level.
>>
>> We've already had the same problem in rtl, where some patterns used
>> to check optimize_insn_for_size_p in their C condition, and would then
>> fail to match if an instruction were moved from a hot block to a cold
>> block.  Now rtl has two attributes, preferred_for_size and
>> preferred_for_speed, that say whether a particular alternative of a
>> particular instruction should be used when optimising for size or speed
>> respectively.  We try to honour a "false" value as much as possible,
>> but it's not an absolute guarantee.
>>
>> The point of this series is to extend preferred_for_size and
>> preferred_for_speed to define_expands, so that the attributes
>> can be tested for optabs too.
>
> So to not re-introduce the same position issue at GIMPLE (passes
> querying optimize_bb_for_speed/size and with that querying
> the optab for IFN support) when expanding an internal function
> you ignore whether it actually "exists"?  That is, IFN expansion
> will skip the define_expand (which is maybe disabled)?

We ignore the size and speed attributes when expanding an existing
function call, yeah.  But to me "exists" means the C condition in the
define_expand or define_insn, which we check even when expanding.
(We effectively check it whenever direct_optab or convert_optab
is called, thanks to caching by init_optabs.)  We should never generate
an expand or insn if the C condition is false.

In other words, the size and speed attributes are additional information
on top of the "exists" condition that we check when deciding whether to
create a call, but not when expanding an existing call.

I forgot to say that the patch only handles optabs that are mapped
to internal functions.  I think in next stage 1 it would make sense to
do the same for all optabs, but that would be quite invasive and I don't
think it would fix a bug as such.

Thanks,
Richard