Re: RFC: patch to build GCC for arm with LRA

2013-08-30 Thread Yvan Roux
Hi,

here is a request for comments on the 2 attached patches which enable
the build of GCC on ARM with LRA.  The patches introduce a new
undocumented option -mlra to use LRA instead of reload, as what was
done on previous LRA support, which is here to ease the test and
comparison with reload and not supposed to be kept in 4.9.

On AArch32:

* The patch includes the previous changes of this thread, add the
handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
and let LRA handle the constraint in Thumb.
* The status is that the build complete in ARM mode with a couple of
regressions in the testsuite, that I'll investigate:
  - gcc.c-torture/execute/20060102-1.c
  - gcc.c-torture/execute/961026-1.c
  - gcc.target/arm/atomic-comp-swap-release-acquire.c
and some build failures in libstdc++ at the pass manager level (there
is an invalid read in gt_ggc_mx)
* The build of libraries in Thumb mode still doesn't complete, as
Vladimir said the secondary_reload  modification solves LRA cycling
but we still have some issues.

On AArch64 :

* I modified must_be_index_p to avoid the call of set_address_base
with patterns which contains a MULT.
* The build complete, but there is a couple of regression in the
testsuite I'm looking at on
 - gcc.c-torture/execute/ieee/fp-cmp-4l.c
 - c-c++-common/torture/complex-sign-mul-minus-one.c
for instance.

Any comments ?

Thanks
Yvan




On 6 July 2013 01:12, Vladimir Makarov  wrote:
> On 13-07-05 8:43 AM, Yvan Roux wrote:
>>
>> Hi,
>>
>> for AArch64 it is also needed to take into account SIGN_EXTRACT in the
>> set_address_base and set_address_index routines, as we acan encounter
>> that kind of insn for instance :
>>
>> (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
>> (subreg:DI (reg/v:SI 76 [ elt ]) 0)
>> ...
>
> OK.
>
>> with the attached patch and the LRA enabled, compiler now bootstrap
>> but I've few regressions in the testsuite,
>
> It seems ok to me but I am confused of the following change:
>
>  set_address_base (struct address_info *info, rtx *loc, rtx *inner)
>  {
> -  if (GET_CODE (*inner) == LO_SUM)
> +  if (GET_CODE (*inner) == SIGN_EXTRACT)
> +inner = strip_address_mutations (&XEXP (*inner, 0));
> +
> +  if (GET_CODE (*inner) == LO_SUM || GET_CODE (*inner) == MULT)
>
>  inner = strip_address_mutations (&XEXP (*inner, 0));
>gcc_checking_assert (REG_P (*inner)
> || MEM_P (*inner)
>
> base address should not contain MULT (which you added).  It is controlled by
> the result of must_be_index_p.  So set_address_base should not have code for
> MULT and you need to change must_be_index_p in a way that set_address_base
> is not called for MULT.
>
>
>> gcc.c/torture/execute/fp-cmp-4l.c for instance. I was looking at these
>> issues before submitting  a complete AArch64 LRA enabling patch, but
>> as you are speaking about that...
>>
>> Valdimir, for the ARM target I already had the ASHIFTRT and LSHIFTRT
>> addition on my side, but still had an ICE during bootstrap with LRA
>> when compiling fixed-bit.c (the Max number of generated reload insns
>> we talk about already) is it working for you ?
>>
>>
> I did not check thumb I guess.  If what you are asking about the problem you
> sent me about 2 weeks ago, I guess one solution would be putting
>
>   if (lra_in_progress)
> return NO_REGS;
>
> at the beginning of arm/targhooks.c::default_secondary_reload.  LRA is smart
> enough to figure out what to do from constraints in most cases of when
> reload needs secondary_reload.  In arm case, default_secondary_reload only
> confuses LRA.
>
> I had no time to test this change, but it solves LRA cycling for the test
> case you sent me.
>


aarch32-lra.patch
Description: Binary data


aarch64-lra.patch
Description: Binary data


Re: RFC: patch to build GCC for arm with LRA

2013-08-30 Thread Yvan Roux
Sorry for the previous off-the-list-html-format answer :(

On 30 August 2013 15:18, Richard Earnshaw  wrote:
> On 30/08/13 14:09, Yvan Roux wrote:
>> Hi,
>>
>> here is a request for comments on the 2 attached patches which enable
>> the build of GCC on ARM with LRA.  The patches introduce a new
>> undocumented option -mlra to use LRA instead of reload, as what was
>> done on previous LRA support, which is here to ease the test and
>> comparison with reload and not supposed to be kept in 4.9.
>>
>> On AArch32:
>>
>> * The patch includes the previous changes of this thread, add the
>> handling of ASHIFTRT, LSHIFTRT, ROTATE and ROTATERT in get_index_scale
>> and let LRA handle the constraint in Thumb.
>> * The status is that the build complete in ARM mode with a couple of
>> regressions in the testsuite, that I'll investigate:
>>   - gcc.c-torture/execute/20060102-1.c
>>   - gcc.c-torture/execute/961026-1.c
>>   - gcc.target/arm/atomic-comp-swap-release-acquire.c
>> and some build failures in libstdc++ at the pass manager level (there
>> is an invalid read in gt_ggc_mx)
>> * The build of libraries in Thumb mode still doesn't complete, as
>> Vladimir said the secondary_reload  modification solves LRA cycling
>> but we still have some issues.
>>
>> On AArch64 :
>>
>> * I modified must_be_index_p to avoid the call of set_address_base
>> with patterns which contains a MULT.
>> * The build complete, but there is a couple of regression in the
>> testsuite I'm looking at on
>>  - gcc.c-torture/execute/ieee/fp-cmp-4l.c
>>  - c-c++-common/torture/complex-sign-mul-minus-one.c
>> for instance.
>>
>> Any comments ?
>>
>
> Why are you posting to conflicting sets of patches for rtlanal.c?
> Changes to that file will have to work for all architectures; so while
> it helps a little bit to see which changes are needed for which target,
> ultimately you need one patch for that file that works everywhere.

Yes sorry for that, I've 2 dev branches on my side for AArch32 and
AArch64, and I thought that posting one patch for each will help
people who wants to test one target in particular, but I planned to
make either one patch for switching ARM on LRA at the end or 2 patches
without conflict.

> Also, as a style nit, use
>
>   return (test
>   || test
>   || test);
>
> so that the parenthesis will force correct indentation of continuation
> lines.

Ok, will fix this.

Thanks Richard

>
> R.
>
>> Thanks
>> Yvan
>>
>>
>>
>>
>> On 6 July 2013 01:12, Vladimir Makarov  wrote:
>>> On 13-07-05 8:43 AM, Yvan Roux wrote:
>>>>
>>>> Hi,
>>>>
>>>> for AArch64 it is also needed to take into account SIGN_EXTRACT in the
>>>> set_address_base and set_address_index routines, as we acan encounter
>>>> that kind of insn for instance :
>>>>
>>>> (insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
>>>> (subreg:DI (reg/v:SI 76 [ elt ]) 0)
>>>> ...
>>>
>>> OK.
>>>
>>>> with the attached patch and the LRA enabled, compiler now bootstrap
>>>> but I've few regressions in the testsuite,
>>>
>>> It seems ok to me but I am confused of the following change:
>>>
>>>  set_address_base (struct address_info *info, rtx *loc, rtx *inner)
>>>  {
>>> -  if (GET_CODE (*inner) == LO_SUM)
>>> +  if (GET_CODE (*inner) == SIGN_EXTRACT)
>>> +inner = strip_address_mutations (&XEXP (*inner, 0));
>>> +
>>> +  if (GET_CODE (*inner) == LO_SUM || GET_CODE (*inner) == MULT)
>>>
>>>  inner = strip_address_mutations (&XEXP (*inner, 0));
>>>gcc_checking_assert (REG_P (*inner)
>>> || MEM_P (*inner)
>>>
>>> base address should not contain MULT (which you added).  It is controlled by
>>> the result of must_be_index_p.  So set_address_base should not have code for
>>> MULT and you need to change must_be_index_p in a way that set_address_base
>>> is not called for MULT.
>>>
>>>
>>>> gcc.c/torture/execute/fp-cmp-4l.c for instance. I was looking at these
>>>> issues before submitting  a complete AArch64 LRA enabling patch, but
>>>> as you are speaking about that...
>>>>
>>>> Valdimir, for the ARM target I already had the ASHIFTRT and LSHIFTRT
>>>> addition on my side, but still had an ICE during bootstrap with LRA
>>>> when compiling fixed-bit.c (the Max

Re: RFC: patch to build GCC for arm with LRA

2013-09-09 Thread Yvan Roux
Hi,

Thanks for the review.  Here is the fixed self-contained patch to
enable LRA on AAarch32 and AArch64 (for those who want to give a try).
 I'm still working on the issues described previously and will send
rtlanal.c patch separately to prepare the move.

Thanks,
Yvan


On 9 September 2013 01:23, Vladimir Makarov  wrote:
> On 13-09-08 2:04 PM, Richard Sandiford wrote:
>>
>> Yvan Roux  writes:
>>>
>>> @@ -5786,7 +5796,11 @@ get_index_scale (const struct address_info *info)
>>> && info->index_term == &XEXP (index, 0))
>>>   return INTVAL (XEXP (index, 1));
>>>   -  if (GET_CODE (index) == ASHIFT
>>> +  if ((GET_CODE (index) == ASHIFT
>>> +   || GET_CODE (index) == ASHIFTRT
>>> +   || GET_CODE (index) == LSHIFTRT
>>> +   || GET_CODE (index) == ROTATE
>>> +   || GET_CODE (index) == ROTATERT)
>>> && CONST_INT_P (XEXP (index, 1))
>>> && info->index_term == &XEXP (index, 0))
>>>   return (HOST_WIDE_INT) 1 << INTVAL (XEXP (index, 1));
>>
>> This bit doesn't look right.  The scale is only 1 << N for (ashift ... N).
>> I think we have to stick to returning zero for the other codes.
>
> Thanks, Richard.  I missed this.
>
> I am agree it might be a problem although the probability of it is
> negligible as index reg hardly can be equivalent to something + constant
> displacement (except may be case reg+reg when each reg can be base or index
> but in this case we don't use shifts).  Still returning zero is a safe
> solution as it prevents such equivalence substitution completely.
>
> Yvan, could modify this part of code and resend it to Richard Henderson for
> approval.
>
>> The other two (snipped) rtlanal.c hunks like fine to me FWIW.  Maybe now
>> would be a good time to add an "is this a shift code?" predicate though,
>> if we don't have one already.
>>
> Yes, I think we could somehow to promote this info to LRA but I don't think
> it is necessary or urgent.  There is already an alternative solution -- just
> doing nothing as prohibiting equivalence substitution for complicated index
> part hardly worsens the generated code.
>


arm-lra.patch
Description: Binary data


[PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-09 Thread Yvan Roux
Hi,

here are the modifications, discussed in another thread, needed in
rtlanal.c by ARM targets (AArch32 and AArch64) to build GCC with LRA.

Is it ok for trunk ?

Thanks,
Yvan

2013-09-09  Yvan Roux  
Vladimir Makarov  

* rtlanal.c (must_be_index_p, set_address_index): Add ASHIFTRT,
LSHIFTRT, ROTATE, ROTATERT and SIGN_EXTRACT handling.
(set_address_base): Add SIGN_EXTRACT handling.


arm-lra-rtl.patch
Description: Binary data


Re: RFC: patch to build GCC for arm with LRA

2013-09-09 Thread Yvan Roux
Thanks for noticing it Richard, I made a refactoring mistake and addr
was supposed to be used instead of x.  In fact on AArch64 it occurs
that we don't have stripped rtxes at this step and we have some of the
form below, this if why I added the strip.

(insn 29 27 5 7 (set (mem:SI (plus:DI (sign_extract:DI (mult:DI
(subreg:DI (reg/v:SI 76 [ elt ]) 0)

I'll redo the patch...

Thanks
Yvan

On 9 September 2013 10:58, Richard Sandiford  wrote:
> Richard Sandiford  writes:
>> I think SIGN_EXTRACT from the lsb (i.e. when the third operand is 0 for
>> !BITS_BIG_ENDIAN or GET_MODE_PRECISION (mode) - 1 for BITS_BIG_ENDIAN)
>
> Gah, "GET_MODE_PRECISION (mode) - size" for BITS_BIG_ENDIAN.
>
> Richard


Re: RFC: patch to build GCC for arm with LRA

2013-09-10 Thread Yvan Roux
> Yeah, but that's because strip_address_mutations doesn't consider
> SIGN_EXTRACT to be a "mutation" as things stand.  My point was that
> I think it should, at least for the special extract-from-lsb case.
> It then shouldn't be necessary to handle SIGN_EXTRACT in the other
> address-analysis routines.
>
> (That might be what you meant, sorry, just thought I'd say in case.)

You did well.  I wanted to handle it in strip_address_mutation, but
misread the code and thought that it wasn't called all the time, but
in any case I didn't thought to the endianness issue. I've added
ZERO_EXTRACT too in this treatment, but wonder if for the big endian
case the third operand has to be taken into account, like this:

GET_MODE_PRECISION(mode) - size - pos

Thanks,
Yvan


Re: RFC: patch to build GCC for arm with LRA

2013-09-10 Thread Yvan Roux
> Endianness in the BYTES_BIG_ENDIAN sense shouldn't be a problem AFAIK.
> We just need to worry about BITS_BIG_ENDIAN.  For:
>
>   ({sign,zero}_extract:m X len pos)
>
> "pos" counts from the lsb if !BITS_BIG_ENDIAN and from the msb if
> BITS_BIG_ENDIAN.  So I think the condition should be something like:
>
>   pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (m) - len : 0)
>
> Agreed that it makes sense to handle ZERO_EXTRACT in the same way.

Agreed, here is the new patch which includes the new shifting code
predicate.  I'll post the rtl analysis part in another request.

Thanks,
Yvan


arm-lra.patch
Description: Binary data


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-11 Thread Yvan Roux
New attempt, with fixes from Richard's comments (discussed in the other thread).

Thanks,
Yvan

2013-09-09  Yvan Roux  
Vladimir Makarov  

* rtlanal.c (strip_address_mutations): Add bitfield operations
handling.
(shift_code_p): New predicate for shifting operations.
(must_be_index_p): Add shifting operations handling.
(set_address_index): Likewise.


On 9 September 2013 10:01, Yvan Roux  wrote:
> Hi,
>
> here are the modifications, discussed in another thread, needed in
> rtlanal.c by ARM targets (AArch32 and AArch64) to build GCC with LRA.
>
> Is it ok for trunk ?
>
> Thanks,
> Yvan
>
> 2013-09-09  Yvan Roux  
> Vladimir Makarov  
>
> * rtlanal.c (must_be_index_p, set_address_index): Add ASHIFTRT,
> LSHIFTRT, ROTATE, ROTATERT and SIGN_EXTRACT handling.
> (set_address_base): Add SIGN_EXTRACT handling.


arm-lra-rtl.patch
Description: Binary data


Re: RFC: patch to build GCC for arm with LRA

2013-09-11 Thread Yvan Roux
Yes indeed ! here is a fixed patch.

In strip_address_mutations we now have 3 if/else if statements with
the same body which could be factorized in:

  if ((GET_RTX_CLASS (code) == RTX_UNARY)
  /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
 used to convert between pointer sizes.  */
  || (lsb_bitfield_op_p (*loc))
  /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
 bit can be used too.  */
  || (code == AND && CONST_INT_P (XEXP (*loc, 1
  /* (and ... (const_int -X)) is used to align to X bytes.  */
loc = &XEXP (*loc, 0);

if you think that it doesn't affect too much the readability.

Many Thanks,
Yvan

On 11 September 2013 09:32, Richard Sandiford
 wrote:
> Yvan Roux  writes:
>> @@ -5454,6 +5454,16 @@ strip_address_mutations (rtx *loc, enum rtx_code 
>> *outer_code)
>>   /* Things like SIGN_EXTEND, ZERO_EXTEND and TRUNCATE can be
>>  used to convert between pointer sizes.  */
>>   loc = &XEXP (*loc, 0);
>> +  else if (GET_RTX_CLASS (code) == RTX_BITFIELD_OPS)
>> + {
>> +   /* Bitfield operations [SIGN|ZERO]_EXTRACT can be used too.  */
>> +   enum machine_mode mode = GET_MODE(*loc);
>> +   unsigned HOST_WIDE_INT len = INTVAL (XEXP (*loc, 1));
>> +   HOST_WIDE_INT pos = INTVAL (XEXP (*loc, 2));
>> +
>> +   if (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0))
>> + loc = &XEXP (*loc, 0);
>> + }
>
> This means that the other values of "pos" bypass the:
>
>   else
> return loc;
>
> so you'll get an infinite loop.  I think it would be neater to split
> this out into:
>
> /* Return true if X is a sign_extract or zero_extract from the least
>significant bit.  */
>
> static bool
> lsb_bitfield_op_p (rtx X)
> {
>   ...;
> }
>
> else if (lsb_bitfield_op_p (*loc))
>   loc = &XEXP (*loc, 0);
>
> Looks good to me otherwise FWIW.
>
> Thanks,
> Richard


arm-lra.patch
Description: Binary data


Re: RFC: patch to build GCC for arm with LRA

2013-09-11 Thread Yvan Roux
> Yeah, good point.  TBH I prefer it with separate ifs though, because the
> three cases are dealing with three different types of rtl (unary, binary
> and ternary).  But I don't mind much either way.

Ok, it's fine for me too.

> The new patch looks good to me, thanks.  Just one minor style nit:
> "return false" rather than "return 0" for the bool.  Maybe also change:
>
> /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
>bit can be used too.  */
>
> to something like:
>
> /* A [SIGN|ZERO]_EXTRACT from the least significant bit effectively
>acts as a combined truncation and extension.  */

Yeah, its clearer.  I'll post the new patch in the other thread.

> I really will try to make that my last comment and leave things open
> for an official review :-)

:-) once again many thanks for your help Richard.

Cheers,
Yvan


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-11 Thread Yvan Roux
Here is the new patch discussed in the other thread.

Thanks
Yvan

2013-09-11  Yvan Roux  
Vladimir Makarov  

* rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations
from the least significant bit.
(strip_address_mutations): Add bitfield operations handling.
(shift_code_p): New predicate for shifting operations.
(must_be_index_p): Add shifting operations handling.
(set_address_index): Likewise.


On 11 September 2013 09:00, Yvan Roux  wrote:
> New attempt, with fixes from Richard's comments (discussed in the other 
> thread).
>
> Thanks,
> Yvan
>
> 2013-09-09  Yvan Roux  
> Vladimir Makarov  
>
> * rtlanal.c (strip_address_mutations): Add bitfield operations
> handling.
> (shift_code_p): New predicate for shifting operations.
> (must_be_index_p): Add shifting operations handling.
> (set_address_index): Likewise.
>
>
> On 9 September 2013 10:01, Yvan Roux  wrote:
>> Hi,
>>
>> here are the modifications, discussed in another thread, needed in
>> rtlanal.c by ARM targets (AArch32 and AArch64) to build GCC with LRA.
>>
>> Is it ok for trunk ?
>>
>> Thanks,
>> Yvan
>>
>> 2013-09-09  Yvan Roux  
>> Vladimir Makarov  
>>
>> * rtlanal.c (must_be_index_p, set_address_index): Add ASHIFTRT,
>> LSHIFTRT, ROTATE, ROTATERT and SIGN_EXTRACT handling.
>> (set_address_base): Add SIGN_EXTRACT handling.


arm-lra-rtl.patch
Description: Binary data


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-16 Thread Yvan Roux
Adding Eric and Steven in the loop as it is RTL related.

Thanks
Yvan

On 11 September 2013 21:08, Yvan Roux  wrote:
> Here is the new patch discussed in the other thread.
>
> Thanks
> Yvan
>
> 2013-09-11  Yvan Roux  
> Vladimir Makarov  
>
> * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations
> from the least significant bit.
> (strip_address_mutations): Add bitfield operations handling.
> (shift_code_p): New predicate for shifting operations.
> (must_be_index_p): Add shifting operations handling.
> (set_address_index): Likewise.


Re: RFC: patch to build GCC for arm with LRA

2013-09-23 Thread Yvan Roux
Hi,

here is a status of LRA on ARM, after Vladimir's patch and a rebase of
my branch:

AArch64:

No more issues in the testsuite, the libstdc++ ones weren't LRA
specific, they happened at the pass manager level and due to PCH
inclusion, but were fixed with the trunk update. Thus, I'll post a
patch to enable LRA on AArch64, but it is still needed to have the RTL
analyser patch accepted.

AArch32:

No more issues in libstdc++ as well (same as reasons as AArch64), and
only 3 failures in the testsuite:
- The first one is invalid as the test sans the assembler for
"ldaex\tr\[0-9\]+..." and it fails because with LRA the chosen
register is r12 and thus the instruction is "ldaex  ip,..."
- The two others are the same bug where LRA keeps some REG_NOTES (DEAD
and UNUSED) on a comparison pattern where reload removes then, and the
results is that the comparison is removed.  I'm currently working on
this issue.
- Thumb still doesn't bootstrap.

Thanks,
Yvan


On 11 September 2013 20:57, Yvan Roux  wrote:
>> Yeah, good point.  TBH I prefer it with separate ifs though, because the
>> three cases are dealing with three different types of rtl (unary, binary
>> and ternary).  But I don't mind much either way.
>
> Ok, it's fine for me too.
>
>> The new patch looks good to me, thanks.  Just one minor style nit:
>> "return false" rather than "return 0" for the bool.  Maybe also change:
>>
>> /* Bitfield operations [SIGN|ZERO]_EXTRACT from the least significant
>>bit can be used too.  */
>>
>> to something like:
>>
>> /* A [SIGN|ZERO]_EXTRACT from the least significant bit effectively
>>acts as a combined truncation and extension.  */
>
> Yeah, its clearer.  I'll post the new patch in the other thread.
>
>> I really will try to make that my last comment and leave things open
>> for an official review :-)
>
> :-) once again many thanks for your help Richard.
>
> Cheers,
> Yvan


[PATCH, ARM] Fix assembly scan test.

2013-09-24 Thread Yvan Roux
Hi,

this patch fix the scan-assembler pattern of
gcc.target/arm/atomic-comp-swap-release-acquire.c, which didn't
allowed aliases register and failed  when enabling LRA where 'ip' is
used in the ldaex instruction.

Thanks,
Yvan

2013-09-24  Yvan Roux  

* gcc.target/arm/atomic-comp-swap-release-acquire.c: Correct
scan-assembler pattern to support register aliases.


fix-asm-scan.diff
Description: Binary data


Re: [PATCH, RTL] Prepare ARM build with LRA

2013-09-24 Thread Yvan Roux
Ping

On 16 September 2013 10:57, Yvan Roux  wrote:
> Adding Eric and Steven in the loop as it is RTL related.
>
> Thanks
> Yvan
>
> On 11 September 2013 21:08, Yvan Roux  wrote:
>> Here is the new patch discussed in the other thread.
>>
>> Thanks
>> Yvan
>>
>> 2013-09-11  Yvan Roux  
>> Vladimir Makarov  
>>
>> * rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield 
>> operations
>> from the least significant bit.
>> (strip_address_mutations): Add bitfield operations handling.
>> (shift_code_p): New predicate for shifting operations.
>> (must_be_index_p): Add shifting operations handling.
>> (set_address_index): Likewise.


[PATCH, LRA, AARCH64] Switching LRA on for AArch64

2013-09-24 Thread Yvan Roux
Hi,

The following patch switch LRA on for AArch64.  The patch introduces
an undocumented option -mlra to use LRA instead of  reload, for a
testing purpose.  Please notice that this patch is dependent on the
one submitted in the thread below:

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00805.html

Thanks,
Yvan

2013-09-24  Yvan Roux  

* config/aarch64/aarch64.opt (mlra): New option.
* config/aarch64/aarch64.c (aarch64_lra_p): New function.
(TARGET_LRA_P): Define.


aarch64-lra.diff
Description: Binary data


Re: RFC: patch to build GCC for arm with LRA

2013-09-24 Thread Yvan Roux
Hi,

> Fair enough - we should just fix the test and move on.

Done.

> I would suggest in addition a transitional command-line option to switch
> between LRA and reload as a temporary measure so that folks can do some more
> experimenting for AArch32.

I've a patch which fixes the REG_NOTE issues, it's still under
validation, but seems to work.  I'll add the -mlra option to switch
between LRA and reload, but my question is, will LRA be enabled by
default or not, as we still have the Thumb issue ? Maybe switching LRA
on only in ARM mode is a good trade off, but in that case what is the
best way to do it ? (I'll pass a full validation when we'll be agree
on that point)

Thanks,
Yvan


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-24 Thread Yvan Roux
Hi Eric,

Thanks for the review.

> +/* Return true if X is a sign_extract or zero_extract from the least
> +   significant bit.  */
> +
> +static bool
> +lsb_bitfield_op_p (rtx x)
> +{
> +  if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS)
> +{
> +  enum machine_mode mode = GET_MODE(x);
> +  unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
> +  HOST_WIDE_INT pos = INTVAL (XEXP (x, 2));
> +
> +  return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len :
> 0));
>
> It seems strange to use the destination mode to decide whether this is the LSB
> of the source.

Indeed, I think it has to be the mode of loc, but I just wonder if it
is not always the same, as in the doc it is written that mode m is the
same as the mode that would be used for loc if it were a register.


> +/* Return true if X is a shifting operation.  */
> +
> +static bool
> +shift_code_p (rtx x)
> +{
> +  return (GET_CODE (x) == ASHIFT
> +  || GET_CODE (x) == ASHIFTRT
> +  || GET_CODE (x) == LSHIFTRT
> +  || GET_CODE (x) == ROTATE
> +  || GET_CODE (x) == ROTATERT);
> +}
>
> ROTATE and ROTATERT aren't really shifting operations though, so are they
> really needed here?

According to gcc internals, ROTATE and ROTATERT are similar to the
shifting operations, but to be more accurate maybe we can rename
shif_code_p in shift_and _rotate_code_p rotation are used in arm
address calculation, and thus need to be handle in  must_be_index_p
and set_address_index

Thanks,
Yvan


[PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.

2013-09-24 Thread Yvan Roux
Hi,

This patch removes REG_DEAD and REG_UNUSED notes in update_inc_notes,
as it is what the function is supposed to do (see the comments) and as
keeping these notes produce some failures, at least on ARM.

Thanks,
Yvan

2013-09-24  Yvan Roux  

* lra.c (update_inc_notes): Remove all REG_DEAD and REG_UNUSED notes.


lra-reg-notes.diff
Description: Binary data


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-24 Thread Yvan Roux
> So can we assert that we have a REG here and use GET_MODE (XEXP (x, 0))?  Or
> else return false if we don't have a REG.

I'm currently testing the patch with the modification below

+static bool
+lsb_bitfield_op_p (rtx x)
+{
+  if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS)
+{
+  enum machine_mode mode = GET_MODE(XEXP (x, 0));
+  unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
+  HOST_WIDE_INT pos = INTVAL (XEXP (x, 2));
+
+  return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0));
+}
+  return false;
+}


>> According to gcc internals, ROTATE and ROTATERT are similar to the
>> shifting operations, but to be more accurate maybe we can rename
>> shif_code_p in shift_and _rotate_code_p rotation are used in arm
>> address calculation, and thus need to be handle in  must_be_index_p
>> and set_address_index
>
> Egad.  I guess I just wanted to see it written down. :-)

Sorry, I'm not sure I understand well, it means that you prefer the
shift_and _rotate_code_p notation, right ?

> Btw, are you sure that you don't need to modify must_be_index_p instead?
>
> /* Return true if X must be an index rather than a base.  */
>
> static bool
> must_be_index_p (rtx x)
> {
>   return GET_CODE (x) == MULT || GET_CODE (x) == ASHIFT;
> }
>
> and call it from set_address_index?

Yes, if we don't modify must_be_index_p we'll have failures when
enabling LRA on ARM.  You can find an history if the patch in the
thread named "RFC: patch to build GCC for arm with LRA" which started
more or less here :

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00526.html

Thanks,
Yvan

> --
> Eric Botcazou


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-24 Thread Yvan Roux
Thanks Eric, here is the new patch, validation is ongoing for ARM.

Yvan

2013-09-24  Yvan Roux  
Vladimir Makarov  

* rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield operations
from the least significant bit.
(strip_address_mutations): Add bitfield operations handling.
(must_be_index_p): Add shifting and rotate operations handling.
(set_address_base): Use must_be_base_p predicate.
(set_address_index):Use must_be_index_p predicate.



On 24 September 2013 17:26, Eric Botcazou  wrote:
>> Sorry, I'm not sure I understand well, it means that you prefer the
>> shift_and _rotate_code_p notation, right ?
>
> Let's do the following in addition to the lsb_bitfield_op_p thing:
>   1. Replace the LO_SUM test in set_address_base by a call to must_be_base_p,
>   2. Replace the MULT || ASHIFT test in set_address_index by a call to
> must_be_index_p,
>   3. Add the new cases to must_be_index_p directly, with a comment saying that
> there are e.g. for the ARM.
>
> --
> Eric Botcazou


rtlanal.patch
Description: Binary data


Re: [PATCH, LRA] Remove REG_DEAD and REG_UNUSED notes.

2013-09-24 Thread Yvan Roux
> The description is too terse.  In the RTL middle-end, you shouldn't have to
> manually deal with the REG_DEAD and REG_UNUSED notes (unlike REG_EQUAL and
> REG_EQUIV notes), as the DF framework is supposed to do it for you.

Sorry, for that.  The description of the LRA function update_inc_notes
explicitly says that it removes all the REG_DEAD and REG_UNUSED notes,
but the code didn't do it, and after the LRA pass we indeed still have
that kind of notes, which wasn't the case with reload, and for
instance with the code below:

int f(int x)
{
  return (x >> (sizeof (x) * __CHAR_BIT__ - 1)) ? -1 : 1;
}

we used to have that kind of insns on ARM:

(insn 8 3 27 2 (parallel [
(set (reg:CC 100 cc)
(compare:CC (reg:SI 0 r0 [ x ])
(const_int 0 [0])))
(set (reg/v:SI 112 [ x ])
(reg:SI 0 r0 [ x ]))
]) cmp-lra.c:4 205 {*movsi_compare0}
 (expr_list:REG_DEAD (reg:SI 0 r0 [ x ])
(expr_list:REG_UNUSED (reg:CC 100 cc)
(nil
(insn 27 8 28 2 (set (reg:CC 100 cc)
(compare:CC (reg/v:SI 112 [ x ])
(const_int 0 [0]))) cmp-lra.c:4 228 {*arm_cmpsi_insn}
 (expr_list:REG_DEAD (reg/v:SI 112 [ x ])
(nil)))
(note 28 27 17 2 NOTE_INSN_DELETED)
(insn 17 28 20 2 (set (reg/i:SI 0 r0)
(if_then_else:SI (ge (reg:CC 100 cc)
(const_int 0 [0]))
(const_int 1 [0x1])
(const_int -1 [0x]))) cmp-lra.c:5 249
{*movsicc_insn}
 (expr_list:REG_DEAD (reg:CC 100 cc)
(nil)))
(insn 20 17 0 2 (use (reg/i:SI 0 r0)) cmp-lra.c:5 -1
 (nil))

with the notes still present after LRA, and the post-reload pass just
drop insns 8 and 27 and the generated code has no more comparison in
it...

Hope it makes more sense, now

Thanks,
Yvan


Re: [PATCH, ARM, LRA] Prepare ARM build with LRA

2013-09-25 Thread Yvan Roux
hmm, I don't see clearly where we loose the XEXP (x, n) information
when calling must_be_base_p(*inner) and/or must_be_index_p(*inner) in
set_address_base and set_address_index.

BTW, the validation on ARM (AARch32 and AARch64) is clean.

Thanks,
Yvan

On 24 September 2013 18:36, Richard Sandiford
 wrote:
> Eric Botcazou  writes:
>>> Sorry, I'm not sure I understand well, it means that you prefer the
>>> shift_and _rotate_code_p notation, right ?
>>
>> Let's do the following in addition to the lsb_bitfield_op_p thing:
>>   1. Replace the LO_SUM test in set_address_base by a call to must_be_base_p,
>>   2. Replace the MULT || ASHIFT test in set_address_index by a call to
>> must_be_index_p,
>>   3. Add the new cases to must_be_index_p directly, with a comment saying 
>> that
>> there are e.g. for the ARM.
>
> FWIW, I'd prefer to keep it as-is, since must_be_base_p (x) and
> must_be_index_p (x) don't imply that we should look specifically at
> XEXP (x, 0) (rather than just X, or XEXP (x, 1), etc.).  I think it's
> better to keep the code tests and the associated XEXPs together.
>
> Thanks,
> Richard


Re: [PATCH, LRA, AARCH64] Switching LRA on for AArch64

2013-09-25 Thread Yvan Roux
Hi,

the needed lra analyser patch was commited as r202914.

Thanks,
Yvan

On 24 September 2013 11:03, Yvan Roux  wrote:
> Hi,
>
> The following patch switch LRA on for AArch64.  The patch introduces
> an undocumented option -mlra to use LRA instead of  reload, for a
> testing purpose.  Please notice that this patch is dependent on the
> one submitted in the thread below:
>
> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00805.html
>
> Thanks,
> Yvan
>
> 2013-09-24  Yvan Roux  
>
> * config/aarch64/aarch64.opt (mlra): New option.
> * config/aarch64/aarch64.c (aarch64_lra_p): New function.
> (TARGET_LRA_P): Define.


Re: [PATCH, RTL] Prepare ARM build with LRA

2013-09-26 Thread Yvan Roux
(Added Eric and Richard)

Sorry for the inconvenience Iain, It's ok for my side.

Thanks,
Yvan

On 26 September 2013 13:18, Iain Sandoe  wrote:
> Hi Yvan,
>
> On 24 Sep 2013, at 09:29, Yvan Roux wrote:
>
>>> On 11 September 2013 21:08, Yvan Roux  wrote:
>>>> Here is the new patch discussed in the other thread.
>>>>
>>>> Thanks
>>>> Yvan
>>>>
>>>> 2013-09-11  Yvan Roux  
>>>>Vladimir Makarov  
>>>>
>>>>* rtlanal.c (lsb_bitfield_op_p): New predicate for bitfield 
>>>> operations
>>>>from the least significant bit.
>>>>(strip_address_mutations): Add bitfield operations handling.
>>>>(shift_code_p): New predicate for shifting operations.
>>>>(must_be_index_p): Add shifting operations handling.
>>>>(set_address_index): Likewise.
>
> as discussed on irc, this part (applied as r202914) breaks bootstrap on 
> powerpc-darwin9 (and, presumably, other BE targets) with a signed/unsigned 
> compare error at rtlanal.c:5482
>
> below is a trivial patch, which makes both parts of test signed.
> With this, bootstrap completes on powerpc-darwin9 - however, you might want 
> to check that it still does what you intended.
>
> thanks
> Iain
>
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index 24cbcd2..0349bcc 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -5476,7 +5476,7 @@ lsb_bitfield_op_p (rtx x)
>if (GET_RTX_CLASS (GET_CODE (x)) == RTX_BITFIELD_OPS)
>  {
>enum machine_mode mode = GET_MODE (XEXP (x, 0));
> -  unsigned HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
> +  HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
>HOST_WIDE_INT pos = INTVAL (XEXP (x, 2));
>
>return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 
> 0));
>


RFA: Switch on LRA on ARM (AArch32)

2013-10-14 Thread Yvan Roux
Hi,

The status of LRA support for AArch32 is the sequel :
- there is some regressions in the testsuite (gcc/g++, libstdc++ and
fortran) in ARM mode, all due to the same neon legitimate address
issue (tested in hard and softfp mode).
- the compiler doesn't bootstrap with LRA enable for thumb (during
libgcc build).

As stage 1 will be close soon, my first question is can we add the
code to enable LRA before its ending, and fix the issues during stage
2 (according to Vladimir the legitimate address issue is mainly an LRA
bugfix). The attached patch enables LRA by default but disable it for
Thumb and let the opportunity to force LRA with -mlra, but maybe we
can just turn it off by default.

I need your help for the Thumb issue, as suggested by Vladimir I
disabled secondary reload for thumb (with the attached arm.h.diff
patch) to let LRA deal with it and break a cycle, but I now face an
issue exhibited by the attached testcase (thumb-lra.i) and the command
line:

cc1 -O2 -mthumb -mlra thumb-lra.i

Here LRA as to deal with the thumb1_movhi_insn :

(insn 11 5 14 2 (set (reg:HI 0 r0)
(const_int -1318 [0xfada])) /work/tmp/t2.i:14 206
{*thumb1_movhi_insn}
 (nil))

and creates new regs to do it :

 11: r0:HI=r114:HI
Inserting insn reload before:
 18: r115:SI=0xfada
 19: r114:HI=r115:SI#0
  REG_EQUAL 0xfada

 Choosing alt 6 in insn 18:  (0) l  (1) mi {*thumb1_movsi_insn}
 Creating newreg=116 from oldreg=115, assigning class LO_REGS to r116

 18: r116:SI=0xfada
Inserting insn reload after:
 20: r115:SI=r116:SI

Creating newreg=117, assigning class LO_REGS to scratch r117

and during this mov processing, we call gen_thumb_movhi_clobber which
ICE because the first rtx parameter is not a strict memory address but
a register, and here I'm not sure of what to do between fixing this in
the backend or in LRA, and how to do it.

Thanks,
Yvan


arm-lra.diff
Description: Binary data


arm.h.diff
Description: Binary data


thumb-lra.i
Description: Binary data


[COMMITTED] LRA enabling on ARM

2013-10-16 Thread Yvan Roux
Hi,

after the discussion in the thread below I've committed the enabling
of LRA on ARM through the new option -mlra. Notice that we still rely
on reload until the regression has been resolved.

http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00908.html

Thanks,
Yvan

2013-10-16  Yvan Roux  

* config/arm/arm.opt (mlra): New option.
* config/arm/arm.c (arm_lra_p): New function.
(TARGET_LRA_P): Define.


arm-lra.diff
Description: Binary data


Re: [PATCH] [ARM] Fix PR57909 : ICE with internal memcpy and -mno-unaligned-access

2013-10-18 Thread Yvan Roux
Ping^2

I forgot this one was still pending.

On 13 August 2013 14:21, Yvan Roux  wrote:
> Ping.
>
> On 23 July 2013 16:18, Yvan Roux  wrote:
>> Hi,
>>
>> I forgot to add the test case with the PR fix, the attached patch add it.
>>
>> Thanks,
>> Yvan
>>
>> ChangeLog
>>
>> gcc/testsuite
>>
>> 2013-07-23  Yvan Roux  
>>
>> PR target/57909
>> * gcc.target/arm/pr57909.c: New test.
>>
>>
>> On 17 July 2013 10:58, Ramana Radhakrishnan  wrote:
>>> On 07/17/13 09:53, Yvan Roux wrote:
>>>>
>>>> Hi,
>>>>
>>>> this patch fixes the issue described in PR57909, where we have an ICE
>>>> during the internal memcpy, as some UNSPEC_UNALIGNED insns are emitted
>>>> even if -mno-unaligned-access flag is passed. See the link below for a
>>>> more detailled description:
>>>>
>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57909
>>>>
>>>> Tested without any regression.
>>>>
>>>> Thanks,
>>>> Yvan
>>>>
>>>> ChangeLog
>>>>
>>>> gcc/
>>>>
>>>> 2013-07-17  Yvan Roux  
>>>>
>>>>  PR target/57909
>>>>  * config/arm/arm.c (gen_movmem_ldrd_strd): Fix unaligned
>>>> load/store
>>>>  usage in HI mode.
>>>>
>>>
>>> Ok.
>>>
>>> regards
>>> Ramana
>>>
>>>


pr57909-test.diff
Description: Binary data


Re: patch to fix PR58784 (ARM LRA crash)

2013-10-31 Thread Yvan Roux
(this time in plain text !)

> Does this mean we can now turn on LRA for ARM state by default and
> start looking at performance issues if any ?

With the other patch for 58785 and a small one I've to post, we are even
about to turn LRA on by default completely, as the compiler now bootstrap
also in thumb and first testsuite are clean :-)

I juste want ton pass a full validation before.

Yvan


>> Ramana
>>
>>
>>
>>
>>
>> > 2013-10-30  Vladimir Makarov  
>> >
>> > PR target/58784
>> > * lra.c (check_rtl): Remove address check before LRA work.
>> >
>> > 2013-10-30  Vladimir Makarov  
>> >
>> > PR target/58784
>> > * gcc.target/arm/pr58784.c: New.
>> >


Re: [PATCH][AArch64] Remove an unused reload hook.

2016-02-25 Thread Yvan Roux
Hi,

On 26 January 2015 at 18:01, Matthew Wahab  wrote:
> Hello,
>
> The LEGITIMIZE_RELOAD_ADDRESS macro is only needed for reload. Since the
> Aarch64 backend no longer supports reload, this macro is not needed and this
> patch removes it.
>
> Tested aarch64-none-linux-gnu with gcc-check. No new failures.
>
> Ok for trunk?
> Matthew
>
> gcc/
> 2015-01-26  Matthew Wahab  
>
> * config/aarch64/aarch64.h (LEGITIMIZE_RELOAD_ADDRESS): Remove.
> * config/aarch64/arch64-protos.h
> (aarch64_legitimize_reload_address): Remove.
> * config/aarch64/aarch64.c (aarch64_legitimize_reload_address):
> Remove.

It seems that this old patch was forgotten, I guess that it'll have to
wait for GCC 7 now, but I think it's a good thing top cleanup the
reload specific hooks and constructions (I've another patch on for
that under on-going).

Cheers,
Yvan


Re: [PATCH, ARM, LRA] Fixed bootstrap failure in Thumb mode

2013-11-27 Thread Yvan Roux
Ping

On 18 November 2013 09:40, Yvan Roux  wrote:
> Ping.
>
> On 7 November 2013 15:56, Yvan Roux  wrote:
>> Hi,
>>
>> this patch fixed an LRA cycling due to secondary reload (Thumb mode).
>> Notice that this patch is a prerequisite to turn on LRA by default on
>> ARM.  Bootstrapped on a9 and a15 without any regression in the
>> testsuite as LRA is off by default and with the regression reported in
>> the thread bellow when LRA is on.
>>
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html
>>
>> Thanks,
>> Yvan
>>
>> 2013-11-07  Yvan Roux  
>>
>> * config/arm/arm.h (THUMB_SECONDARY_INPUT_RELOAD_CLASS): Return 
>> NO_REGS
>> for LRA.


Re: [patch][RFC] make lra.c:check_rtl set maybe_hot_insn_p

2013-11-27 Thread Yvan Roux
Ping.

On 19 November 2013 09:52, Yvan Roux  wrote:
>> yep, all good performance-wise :)
>
> Great, Thanks Kyrill.
>
> Ok for trunk ?
>
> Yvan


Re: RFA: patch to fix PR58785 (an ARM LRA crash)

2013-11-27 Thread Yvan Roux
Ping.

On 20 November 2013 10:22, Yvan Roux  wrote:
> Hi,
>
> as Richard said, only a subset of rclass is allowed to be returned by
> preferred_reload_class.  I've tested the attached patched in Thumb
> mode, on ARMv5, A9 and A9hf and on cross A15 without regression.
>
> Yvan
>
> 2013-11-20  Yvan Roux  
>
> PR target/58785
> * config/arm/arm.c (arm_preferred_reload_class): Only return LO_REGS
> when rclass is GENERAL_REGS.


Re: [patch][RFC] make lra.c:check_rtl set maybe_hot_insn_p

2013-11-27 Thread Yvan Roux
> Please include either the patch you are pinging or at the least a link to it
> in the archives.

Ok, sorry for that, here is the patch and Changelog

Yvan


2013-11-17  Yvan Roux  

* config/arm/arm.md (store_minmaxsi): Use only when
optimize_function_for_size_p.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index e8d5464..da387fb 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3642,7 +3642,7 @@
 [(match_operand:SI 1 "s_register_operand" "r")
  (match_operand:SI 2 "s_register_operand" "r")]))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT && optimize_insn_for_size_p()"
+  "TARGET_32BIT && optimize_function_for_size_p (cfun)"
   "*
   operands[3] = gen_rtx_fmt_ee (minmax_code (operands[3]), SImode,
operands[1], operands[2]);


Re: [PATCH, ARM, LRA] Fixed bootstrap failure in Thumb mode

2013-11-27 Thread Yvan Roux
> How can that be correct?
>
> The secondary reload macros/hooks define cases where additional registers
> are needed to reload certain forms of rtl.  I doubt the use of LRA
> completely eliminates the need for secondary reloads.

Vladimir explained me that in that case on arm, secondary reload hook
confuses LRA, and that returning NO_REGS will let LRA deal with
constraints, but I may have badly understand what he said.

Yvan


Re: [PATCH, ARM, LRA] Fixed bootstrap failure in Thumb mode

2013-11-27 Thread Yvan Roux
On 27 November 2013 18:58, Vladimir Makarov  wrote:
> On 11/27/2013, 12:16 PM, Jeff Law wrote:
>>
>> On 11/27/13 03:18, Yvan Roux wrote:
>>>
>>> Ping
>>>
>>> On 18 November 2013 09:40, Yvan Roux  wrote:
>>>>
>>>> Ping.
>>>>
>>>> On 7 November 2013 15:56, Yvan Roux  wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> this patch fixed an LRA cycling due to secondary reload (Thumb mode).
>>>>> Notice that this patch is a prerequisite to turn on LRA by default on
>>>>> ARM.  Bootstrapped on a9 and a15 without any regression in the
>>>>> testsuite as LRA is off by default and with the regression reported in
>>>>> the thread bellow when LRA is on.
>>>>>
>>>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html
>>>>>
>>>>> Thanks,
>>>>> Yvan
>>>>>
>>>>> 2013-11-07  Yvan Roux  
>>>>>
>>>>>  * config/arm/arm.h (THUMB_SECONDARY_INPUT_RELOAD_CLASS):
>>>>> Return NO_REGS
>>>>>  for LRA
>>
>> ?
>>
>> How can that be correct?
>>
>> The secondary reload macros/hooks define cases where additional registers
>> are needed to reload certain forms of rtl.  I doubt the use of LRA
>> completely eliminates the need for secondary reloads.
>
>   When I designed LRA I wanted to remove as many hooks as possible because I
> thought insn constraints as major info source are enough. Unfortunately I
> did not succeed fully with secondary reload hooks and macros.  It is still
> needed for some complicated cases but general rule to port LRA to a target
> is to try to switch these hooks off.
>
>   For example, LRA can generate a move of pseudos of classes for which
> hardware has no real insn.  After that looking on the move constraints,  LRA
> can generate more move insns and additional pseudos to generate moves (loads
> or stores if additional pseudos got NO_REGS) which represent real hardware
> insns.  In complicated cases when these macros are still needed, LRA runs
> into infinite loop of such move generation if the macros are not used.
>
>   I might be return to a project to remove necessity of such hooks and
> macros for LRA but I am not sure when.  I guess I need to write a small
> guidance too how to port LRA.
>
>   Also I found that in many cases as here, the macros although working for
> reload can confuse LRA (because of different reload pass and LRA
> implementation approaches).  So I guess the patch is ok although I can not
> approve it as I am not an arm port maintainer.

Thanks for the clarification Vladimir, and BTW I just find the same
cycling issue on iWMMXT.

Yvan


Re: [patch][RFC] make lra.c:check_rtl set maybe_hot_insn_p

2013-11-27 Thread Yvan Roux
On 27 November 2013 19:13, Jeff Law  wrote:
> On 11/27/13 10:30, Yvan Roux wrote:
>>>
>>> Please include either the patch you are pinging or at the least a link to
>>> it
>>> in the archives.
>>
>>
>> Ok, sorry for that, here is the patch and Changelog
>>
>> Yvan
>>
>>
>> 2013-11-17  Yvan Roux  
>>
>>  * config/arm/arm.md (store_minmaxsi): Use only when
>>  optimize_function_for_size_p.
>
> Thanks.
>
> This is fine for the trunk.  And yes, having an insn's validity change based
> on what block it's in is most definitely bad.
>
> I was a bit concerned have the x86 backend, as I happened to know it uses
> optimize_insn_for_* rather extensively.  But thankfully it's only used in
> splitters, expanders & peephole2 patterns, which should all be safe.

I did the same on ARM and only found one in a peephole2 pattern.

> If you wouldn't mind, could you look at md.texi and see if there's a
> reasonable place to put verbage about this issue in the internals manual?
> It might save someone time in the future :-)

Sure, I'll look at that.

Thanks,
Yvan

> Thanks,
> Jeff
>


Re: RFA: patch to fix PR58785 (an ARM LRA crash)

2013-11-27 Thread Yvan Roux
> I don't think he was quite that definitive.  "One reading of the manual
> suggests ...".  However, Richard's interpretation is the same as I've had
> for eons.  You can return the original class, a narrower class or NO_REGS.

Yes, I also read the manual and discussed the patch with Richard
offline, but was a bit lazy in the description ;)

>   I've tested the attached patched in Thumb
>>>
>>> mode, on ARMv5, A9 and A9hf and on cross A15 without regression.
>>>
>>> Yvan
>>>
>>> 2013-11-20  Yvan Roux  
>>>
>>>  PR target/58785
>>>  * config/arm/arm.c (arm_preferred_reload_class): Only return
>>> LO_REGS
>>>  when rclass is GENERAL_REGS.
>
> This is fine for the trunk as it implements Richard's recommendation and
> brings the code into compliance with the semantics of P_R_C.

I'll also add the testcase I put in bugzilla and that was in Vlad's
first patch, if it's ok for you.

Yvan


Re: [PATCH, ARM, LRA] Fixed bootstrap failure in Thumb mode

2013-11-27 Thread Yvan Roux
On 27 November 2013 19:27, Jeff Law  wrote:
> On 11/27/13 10:49, Yvan Roux wrote:
>>>
>>> How can that be correct?
>>>
>>> The secondary reload macros/hooks define cases where additional registers
>>> are needed to reload certain forms of rtl.  I doubt the use of LRA
>>> completely eliminates the need for secondary reloads.
>>
>>
>> Vladimir explained me that in that case on arm, secondary reload hook
>> confuses LRA, and that returning NO_REGS will let LRA deal with
>> constraints, but I may have badly understand what he said.
>
> So I think with the additional information from Vlad, I think we can go
> forward with your patch -- conditionally approved for the trunk.  The
> condition is giving the ARM maintainers 24hrs to object.
>
> I'm concerned that we don't have good documentation on when these macros are
> still needed in an LRA world.  So it's much more difficult for anyone to
> know if a change of this nature is correct or not.  As you've likely seen
> from my other replies in this thread, I've asked Vlad to work on the
> documentation aspects.

Ok great, as I may play again with these macros to fix the iWMMXT issue.

Yvan


Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-11 Thread Yvan Roux
Hi Vladimir,

I've some regressions on ARM after this SP elimination patch, and they
are execution failures.  Here is the list:

g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
gcc.c-torture/execute/va-arg-22.c  -O2
gcc.dg/atomic/c11-atomic-exec-5.c  -O0
gfortran.dg/direct_io_12.f90  -O[23]
gfortran.dg/elemental_dependency_1.f90  -O2
gfortran.dg/matmul_2.f90  -O2
gfortran.dg/matmul_6.f90  -O2
gfortran.dg/mvbits_7.f90  -O3
gfortran.dg/unlimited_polymorphic_1.f03  -O3

I reduced and looked at var-arg-22.c and the issue is that in
lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
we try to do here is to change the pseudo 195 of the insn 118 below :

(insn 118 114 112 8 (set (reg:DI 195)
(unspec:DI [
(mem:DI (plus:SI (reg/f:SI 215)
(const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
+ 64B]+8 S8 A8])
] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
 (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
(const_int 8 [0x8])) [7 a35+8 S8 A32])
(nil)))

with its equivalent (x arg of lra_eliminate_regs_1):

(mem/c:DI (plus:SI (reg/f:SI 102 sfp)
(const_int 76 [0x4c])) [7 a35+8 S8 A32])

lra_eliminate_regs_1 is called with full_p = true (it is not really
clear for what it means), but in the PLUS switch case, we have offset
= 0xb (given by ep->offset) and  as lra_get_insn_recog_data
(insn)->sp_offset value is 0, we will indeed add 0xb to the original
0x4c offset.

So, here I don't get if it is the sp_offset value of the
lra_insn_recog_data element which is not well updated or if lra_
eliminate_regs_1 has to be called with update_p and not full_p (which
fixed the value in that particular case).  Is it more obvious for you
?

Thanks
Yvan


On 3 December 2013 16:39, Vladimir Makarov  wrote:
> On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:
>>
>> On 2 December 2013 23:44, Vladimir Makarov  wrote:
>>
>>> If somebody with the rights approves, I can commit it tomorrow.
>>>
>>> 2013-12-02  Vladimir Makarov  
>>>
>>>  * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
>>> Check
>>>  LR_REGNUM.
>>>  (aarch64_can_eliminate): Don't check elimination source when
>>>  frame_pointer_requred is false.
>>>
>>
>>
>> This is fine with me, go ahead and commit it.  Thanks /Marcus
>>
> Committed as rev. 205637 with changelog fix of a typo found by Jeff.
>


Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-11 Thread Yvan Roux
> Pragmatically, I think it's time we turned LRA on by default now that
> we are in stage3 and that would help with getting more issues out of
> the auto-testers quicker than anything else. Given we are now well
> into stage3, we should make sure that the LRA support gets as much
> testing as it can get in the run-up to the release.

I agree.

> Can you prepare a patch for this please ?

I'll post the patch on the list.

Thanks,
Yvan


[PATCH, ARM, LRA] Switch on LRA on ARM.

2013-12-11 Thread Yvan Roux
Hi,

here is the patch to turn LRA on by default on ARM, there is still
some regressions in the testsuite, as reported in the thread below,
but as Ramana said in the same thread, we now need to find the
remaining issue as fast as possible.

http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01074.html

Thanks
Yvan


2013-12-11  Yvan Roux  

* config/arm/arm.opt (mlra): Enable LRA by default.
Index: gcc/config/arm/arm.opt
===
--- gcc/config/arm/arm.opt  (revision 205885)
+++ gcc/config/arm/arm.opt  (working copy)
@@ -144,7 +144,7 @@
 Specify the name of the target floating point hardware/format
 
 mlra
-Target Report Var(arm_lra_flag) Init(0) Save
+Target Report Var(arm_lra_flag) Init(1) Save
 Use LRA instead of reload (transitional)
 
 mhard-float


Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-11 Thread Yvan Roux
On 11 December 2013 19:25, Vladimir Makarov  wrote:
> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>
>> Hi Vladimir,
>>
>> I've some regressions on ARM after this SP elimination patch, and they
>> are execution failures.  Here is the list:
>>
>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>> gcc.c-torture/execute/va-arg-22.c  -O2
>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>> gfortran.dg/direct_io_12.f90  -O[23]
>> gfortran.dg/elemental_dependency_1.f90  -O2
>> gfortran.dg/matmul_2.f90  -O2
>> gfortran.dg/matmul_6.f90  -O2
>> gfortran.dg/mvbits_7.f90  -O3
>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>
>> I reduced and looked at var-arg-22.c and the issue is that in
>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>
>> (insn 118 114 112 8 (set (reg:DI 195)
>>  (unspec:DI [
>>  (mem:DI (plus:SI (reg/f:SI 215)
>>  (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>> + 64B]+8 S8 A8])
>>  ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>   (expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>  (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>  (nil)))
>>
>> with its equivalent (x arg of lra_eliminate_regs_1):
>>
>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>  (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>
>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>> clear for what it means),
>
>
> It means we use full offset between the regs, otherwise we use change in the
> full offset from the previous iteration (it can be changed as we reserve
> stack memory for spilled pseudos and the reservation can be done several
> times).  As equiv value is stored as it was before any elimination, we need
> always to use full offset to make elimination.

Ok thanks it's clearer.

>  but in the PLUS switch case, we have offset
>>
>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>> 0x4c offset.
>>
>
> 0 value is suspicious because it is default.  We might have not set up it
> from neighbor insns.
>
>
>
>> So, here I don't get if it is the sp_offset value of the
>> lra_insn_recog_data element which is not well updated or if lra_
>> eliminate_regs_1 has to be called with update_p and not full_p (which
>> fixed the value in that particular case).  Is it more obvious for you
>> ?
>>
>
> Yvan, could you send me the reduced preprocessed case and the options for
> cc1 to reproduce it.


Here is cc1 command line :

cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
-mfpu=neon -mthumb  v2.c  -O2

I use a native build on a chromebook, but it's reproducible with a
cross compiler.

With the attached test case the issue is when processing insn 118.

Thanks,
Yvan
typedef __builtin_va_list __gnuc_va_list;
typedef __gnuc_va_list va_list;

extern void abort (void);
extern void exit (int);


void bar (int n, int c)
{
  static int lastn = -1, lastc = -1;

  if (lastn != n) {
if (lastc != lastn)
  abort ();
lastc = 0;
lastn = n;
  }

  if (c != (char) (lastc ^ (n << 3)))
abort ();
  lastc++;
}

 typedef struct { char x[31]; } A31;
 typedef struct { char x[32]; } A32;
 typedef struct { char x[35]; } A35;
 typedef struct { char x[72]; } A72;

void foo (int size, ...)
{
 A31 a31;
 A32 a32;
 A35 a35;
 A72 a72;


  va_list ap;

  int i;

  if (size != 21) abort ();

  __builtin_va_start(ap,size);

 a31 = __builtin_va_arg(ap,typeof (a31));
 for (i = 0; i < 31; i++) bar (31, a31.x[i]);
 a32 = __builtin_va_arg(ap,typeof (a32));
 for (i = 0; i < 32; i++) bar (32, a32.x[i]);
 a35 = __builtin_va_arg(ap,typeof (a35));
 for (i = 0; i < 35; i++) bar (35, a35.x[i]);
 a72 = __builtin_va_arg(ap,typeof (a72));
 for (i = 0; i < 72; i++) bar (72, a72.x[i]);

  __builtin_va_end(ap);

}

int main (void)
{
 A31 a31;
 A32 a32;
 A35 a35;
 A72 a72;
 int i;
 for (i = 0; i < 31; i++) a31.x[i] = i ^ (31 << 3);
 for (i = 0; i < 32; i++) a32.x[i] = i ^ (32 << 3);
 for (i = 0; i < 35; i++) a35.x[i] = i ^ (35 << 3);
 for (i = 0; i < 72; i++) a72.x[i] = i ^ (72 << 3);

  foo (21, a31, a32, a35, a72);
  exit (0);

}


Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-13 Thread Yvan Roux
Thanks for your help Vlad.  Another bad news about this PR fix, is
that it has resurrected the thumb_movhi_clobber bug (PR 58785) but in
a different manner as the original failing testcase still pass.  I
attached a testcase to be compiled with :

cc1 -mthumb -mcpu=cortex-m0 -O2 m.c

And Thumb bootstrap seems to be broken with an ICE in check_rtl, I'm
checking if it is the same issue.

Yvan


On 12 December 2013 20:18, Vladimir Makarov  wrote:
> On 12/11/2013, 1:59 PM, Yvan Roux wrote:
>>
>> On 11 December 2013 19:25, Vladimir Makarov  wrote:
>>>
>>> On 12/11/2013, 5:35 AM, Yvan Roux wrote:
>>>>
>>>>
>>>> Hi Vladimir,
>>>>
>>>> I've some regressions on ARM after this SP elimination patch, and they
>>>> are execution failures.  Here is the list:
>>>>
>>>> g++.dg/cilk-plus/AN/array_test_ND_tplt.cc  -O3 -fcilkplus
>>>> gcc.c-torture/execute/va-arg-22.c  -O2
>>>> gcc.dg/atomic/c11-atomic-exec-5.c  -O0
>>>> gfortran.dg/direct_io_12.f90  -O[23]
>>>> gfortran.dg/elemental_dependency_1.f90  -O2
>>>> gfortran.dg/matmul_2.f90  -O2
>>>> gfortran.dg/matmul_6.f90  -O2
>>>> gfortran.dg/mvbits_7.f90  -O3
>>>> gfortran.dg/unlimited_polymorphic_1.f03  -O3
>>>>
>>>> I reduced and looked at var-arg-22.c and the issue is that in
>>>> lra_eliminate_regs_1  (called by get_equiv_with_elimination) we
>>>> transformed sfp + 0x4c in sp + 0xfc because of a bad sp offset.  What
>>>> we try to do here is to change the pseudo 195 of the insn 118 below :
>>>>
>>>> (insn 118 114 112 8 (set (reg:DI 195)
>>>>   (unspec:DI [
>>>>   (mem:DI (plus:SI (reg/f:SI 215)
>>>>   (const_int 8 [0x8])) [7 MEM[(struct A35 *)_12
>>>> + 64B]+8 S8 A8])
>>>>   ] UNSPEC_UNALIGNED_LOAD)) v2.c:49 146 {unaligned_loaddi}
>>>>(expr_list:REG_EQUIV (mem/c:DI (plus:SI (reg/f:SI 192)
>>>>   (const_int 8 [0x8])) [7 a35+8 S8 A32])
>>>>   (nil)))
>>>>
>>>> with its equivalent (x arg of lra_eliminate_regs_1):
>>>>
>>>> (mem/c:DI (plus:SI (reg/f:SI 102 sfp)
>>>>   (const_int 76 [0x4c])) [7 a35+8 S8 A32])
>>>>
>>>> lra_eliminate_regs_1 is called with full_p = true (it is not really
>>>> clear for what it means),
>>>
>>>
>>>
>>> It means we use full offset between the regs, otherwise we use change in
>>> the
>>> full offset from the previous iteration (it can be changed as we reserve
>>> stack memory for spilled pseudos and the reservation can be done several
>>> times).  As equiv value is stored as it was before any elimination, we
>>> need
>>> always to use full offset to make elimination.
>>
>>
>> Ok thanks it's clearer.
>>
>>>   but in the PLUS switch case, we have offset
>>>>
>>>>
>>>> = 0xb (given by ep->offset) and  as lra_get_insn_recog_data
>>>> (insn)->sp_offset value is 0, we will indeed add 0xb to the original
>>>> 0x4c offset.
>>>>
>>>
>>> 0 value is suspicious because it is default.  We might have not set up it
>>> from neighbor insns.
>>>
>>>
>>>
>>>> So, here I don't get if it is the sp_offset value of the
>>>> lra_insn_recog_data element which is not well updated or if lra_
>>>> eliminate_regs_1 has to be called with update_p and not full_p (which
>>>> fixed the value in that particular case).  Is it more obvious for you
>>>> ?
>>>>
>>>
>>> Yvan, could you send me the reduced preprocessed case and the options for
>>> cc1 to reproduce it.
>>
>>
>>
>> Here is cc1 command line :
>>
>> cc1 -quiet -march=armv7-a -mtune=cortex-a15 -mfloat-abi=hard
>> -mfpu=neon -mthumb  v2.c  -O2
>>
>> I use a native build on a chromebook, but it's reproducible with a
>> cross compiler.
>>
>> With the attached test case the issue is when processing insn 118.
>
>
> The offset is updated two times and that is wrong.  That is because memory
> in init insn is shared by ira_reg_equiv and the test involves 2 equivalent
> substitutions.  As I wrote equiv should be stored in original form by the
> current patch design.  Simple copying will not work as the first
> substitution is not done in this case.
>
> I need some time t

Re: patch to fix arm testsuite regression

2013-12-16 Thread Yvan Roux
Thanks for the fix Vladimir, I confirm that most of ARM regression are
fixed, and Thumb2 bootstrap is ok. I'm still working on sorting out
the remaining regressions, most of them are on Thumb1.

Thanks
Yvan

On 13 December 2013 22:10, Vladimir Makarov  wrote:
> On 12/13/2013, 4:02 PM, Jakub Jelinek wrote:
>>
>> On Fri, Dec 13, 2013 at 03:48:25PM -0500, Vladimir Makarov wrote:
>>>
>>> 2013-12-12  Vladimir Makarov  
>>>
>>>  * ira.h (struct ira_reg_equiv): Rename to ira_reg_equiv_s.
>>>  * ira.c: Ditto.
>>>  * lra-int.h (lra_init_equiv): New prototype.
>>>  * lra-constraints.c (lra_init_equiv, update_equiv): New
>>> functions.
>>>  (loc_equivalence_callback): Use the 3rd arg.
>>>  (lra_constraints): Update equivalences.  Pass curr_insn to
>>>  simplify_replace_fn_rtx.
>>>  * lra.c (lra): Call lra_init_equiv.
>>
>>
>>> @@ -2988,12 +2988,12 @@
>>>   return;
>>> ira_reg_equiv_len = max_reg_num () * 3 / 2 + 1;
>>> ira_reg_equiv
>>> -= (struct ira_reg_equiv *) xrealloc (ira_reg_equiv,
>>> += (struct ira_reg_equiv_s *) xrealloc (ira_reg_equiv,
>>>  ira_reg_equiv_len
>>> -* sizeof (struct
>>> ira_reg_equiv));
>>> +* sizeof (struct
>>> ira_reg_equiv_s));
>>
>>
>>   ira_reg_equiv = XRESIZEVEC (struct ira_reg_equiv_s, ira_reg_equiv,
>>  ira_reg_equiv_len);
>>
>> instead?
>>
> Yes, it is shorter.  Thanks, Jakub.  I'll commit it in my next patch.
>
>


Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-17 Thread Yvan Roux
On 17 December 2013 00:03, Vladimir Makarov  wrote:
> On 12/13/2013, 8:07 AM, Yvan Roux wrote:
>>
>> Thanks for your help Vlad.  Another bad news about this PR fix, is
>> that it has resurrected the thumb_movhi_clobber bug (PR 58785) but in
>> a different manner as the original failing testcase still pass.  I
>> attached a testcase to be compiled with :
>>
>> cc1 -mthumb -mcpu=cortex-m0 -O2 m.c
>>
>> And Thumb bootstrap seems to be broken with an ICE in check_rtl, I'm
>> checking if it is the same issue.
>>
>
> The compiler crashes because a reload pattern is trying to take address of
> memory which is actually a spilled pseudo for LRA.  The pattern is designed
> for reload which always uses memory not a spilled pseudo as LRA does.
>
> But we don't need to adjust the pattern for LRA.  LRA can manage by itself
> without reload patterns.
>
> I found that I missed to switch off these patterns for LRA fully.  The
> following patch solves the problem.  The same was done for
> THUMB_SECONDARY_INPUT_RELOAD_CLASS long ago.
>
> Yvan, could go from this patch by yourself.  I mean testing and getting its
> approval from an ARM maintainer.  Thanks.


I remember having tested that very same patch when we changed
THUMB_SECONDARY_INPUT_RELOAD_CLASS and having build issues, but with
the fixes made since the summer, the build and the testsuite are now
ok.  Before submitting this patch I wanted to check if  it is not more
general fix which is needed, and modifying
SECONDARY_OUTPUT_RELOAD_CLASS and SECONDARY_INPUT_RELOAD_CLASS instead
of the THUMB macros, because this is here that the target IWMMXT is
handled and that we have the lra loop issue during the constraint
solving for that target.  First results shows that it fixes also some
Thumb1 regressions, but I don't have the full results for the moment.

Thanks
Yvan


[Patch, ARM, LRA] Fix Thumb1 ICE

2013-12-18 Thread Yvan Roux
Hi,

this patch from Vladimir fixes an ICE when compiling newlib in Thumb1.
 It returns NO_REGS in THUMB_SECONDARY_OUTPUT_RELOAD_CLASS, the same
way we did for THUMB_SECONDARY_INPUT_RELOAD_CLASS.

The testsuite is OK with this patch, but as we have also a regression
on iWMMXT, I tried to avoid the secondary reload restriction at a
higher level : in SECONDARY_[INPUT|OUTPUT]_RELOAD_CLASS, as these
macros handle the iWMMXT target.  Unfortunately it doesn't fix the
issue, but the testsuite results are the same as with the attached
patch.

It seems to me that this second solution is more LRA friendly (i.e.
doing less thing on the target side) but I want your opinion.

If the Thumb fix is sufficient, here is the Changelog

2013-12-18  Vladimir Makarov  

* config/arm/arm.h (THUMB_SECONDARY_OUTPUT_RELOAD_CLASS): Return NO_REGS
for LRA.

Thanks,
Yvan
Index: config/arm/arm.h
===
--- config/arm/arm.h(revision 206023)
+++ config/arm/arm.h(working copy)
@@ -1285,11 +1285,12 @@ enum reg_class
   : NO_REGS))
 
 #define THUMB_SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)\
-  ((CLASS) != LO_REGS && (CLASS) != BASE_REGS  \
-   ? ((true_regnum (X) == -1 ? LO_REGS \
-   : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS  \
-   : NO_REGS)) \
-   : NO_REGS)
+  (lra_in_progress ? NO_REGS   \
+   : (CLASS) != LO_REGS && (CLASS) != BASE_REGS
\
+  ? ((true_regnum (X) == -1 ? LO_REGS  \
+ : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS
\
+ : NO_REGS))   \
+  : NO_REGS)
 
 /* Return the register class of a scratch register needed to copy IN into
or out of a register in CLASS in MODE.  If it can be done directly,


Re: RFA: patch to fix PR59787 (arm target)

2014-01-14 Thread Yvan Roux
> A quick grep of the arm backend shows 11 instances of reload_in_progress:
>
> arm.c:  && !(reload_in_progress || reload_completed)
> arm.c:  if (! (reload_in_progress || reload_completed)
> arm.c:  if (! (reload_in_progress || reload_completed)
> arm.c:  if (! (reload_in_progress || reload_completed)
> arm.c:   reload_in_progress || reload_completed))
> arm.md:  "TARGET_32BIT && (reload_in_progress || reload_completed)"
> arm.md:  "TARGET_32BIT && (reload_in_progress || reload_completed)"
> arm.md:  "TARGET_32BIT && (reload_in_progress || reload_completed)"
> arm.md:  "TARGET_32BIT && (reload_in_progress || reload_completed)"
> predicates.md:"offsettable_address_p (reload_completed |
> reload_in_progress,
> predicates.md:  (and (match_test "reload_in_progress ||
> reload_completed")
>
> and aarch64 has five more:
>
> aarch64.md:  "reload_completed || reload_in_progress"
> aarch64.md:  "reload_completed || reload_in_progress"
> aarch64.md:  "reload_completed || reload_in_progress"
> aarch64.md:  "reload_completed || reload_in_progress"
> aarch64.md:  "reload_completed || reload_in_progress"
>
> Yvan, could you do a quick audit on these to see if they are also likely
> to need fixing?

Yes, I'll check all of them.


Re: RFA: patch to fix PR59787 (arm target)

2014-01-14 Thread Yvan Roux
Thanks for the hint Vladimir, I'll pass some validation on arm.c and
arm.md/aarch64.md separately.

On 14 January 2014 20:09, Vladimir Makarov  wrote:
> On 01/14/2014 01:41 PM, Yvan Roux wrote:
>>> A quick grep of the arm backend shows 11 instances of reload_in_progress:
>>>
>>> arm.c:  && !(reload_in_progress || reload_completed)
>>> arm.c:  if (! (reload_in_progress || reload_completed)
>>> arm.c:  if (! (reload_in_progress || reload_completed)
>>> arm.c:  if (! (reload_in_progress || reload_completed)
>>> arm.c:   reload_in_progress || reload_completed))
>>> arm.md:  "TARGET_32BIT && (reload_in_progress || reload_completed)"
>>> arm.md:  "TARGET_32BIT && (reload_in_progress || reload_completed)"
>>> arm.md:  "TARGET_32BIT && (reload_in_progress || reload_completed)"
>>> arm.md:  "TARGET_32BIT && (reload_in_progress || reload_completed)"
>>> predicates.md:"offsettable_address_p (reload_completed |
>>> reload_in_progress,
>>> predicates.md:  (and (match_test "reload_in_progress ||
>>> reload_completed")
>>>
>>> and aarch64 has five more:
>>>
>>> aarch64.md:  "reload_completed || reload_in_progress"
>>> aarch64.md:  "reload_completed || reload_in_progress"
>>> aarch64.md:  "reload_completed || reload_in_progress"
>>> aarch64.md:  "reload_completed || reload_in_progress"
>>> aarch64.md:  "reload_completed || reload_in_progress"
>>>
>>> Yvan, could you do a quick audit on these to see if they are also likely
>>> to need fixing?
>> Yes, I'll check all of them.
> I checked these places too.  I'd do analogous change for only arm.c in
> thumb1_legitimate_address_p,  neon_vector_mem_operand, and
> neon_struct_mem_operand.   I guess it is a a bad idea to do it in
> predicates.md.  Changes arm.md and aarch64.md is worth to try but I
> believe LRA will work without the changes.


Re: [PATCH][DRIVER] Wrong C++ include paths when configuring with "--with-sysroot=/"

2015-07-15 Thread Yvan Roux
Hi,

(Sorry for the delay I'm just back from a long sick leave)

>> There is this old patch submitted by Matthias on that same issue, if
>> its logic is the right one for you Joseph I can rebase/validate it
>> Joseph.
>>
>> https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00320.html
>
> Yes, that seems better.

I've rebased the patch on trunk, bootstrap is ok and when configuring
with options:
"--with-sysroot=/ --with-gxx-include-dir=/usr/include/c++/4.9.2"
gcc_gxx_include_dir keeps its leading slash.

Is it ok for trunk ?

Thanks,
Yvan


2015-07-15  Yvan Roux  
  Matthias Klose  

   * configure.ac: Move AC_ARG_WITH checks for native-system-header-dir,
   build-sysroot, sysroot from the `Miscenalleous configure options'
   to the `Directories' section and strip trailing `/' from with_sysroot.
   (gcc_gxx_include_dir): Don't strip a `/' sysroot value.
   * configure: Regenerated.
diff --git a/gcc/configure b/gcc/configure
index 9561e5c..1fc246b 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -769,10 +769,6 @@ REPORT_BUGS_TEXI
 REPORT_BUGS_TO
 PKGVERSION
 CONFIGURE_SPECS
-CROSS_SYSTEM_HEADER_DIR
-TARGET_SYSTEM_ROOT_DEFINE
-TARGET_SYSTEM_ROOT
-SYSROOT_CFLAGS_FOR_TARGET
 enable_shared
 enable_fixed_point
 enable_decimal_float
@@ -812,6 +808,10 @@ LDFLAGS
 CFLAGS
 CC
 GENINSRC
+CROSS_SYSTEM_HEADER_DIR
+TARGET_SYSTEM_ROOT_DEFINE
+TARGET_SYSTEM_ROOT
+SYSROOT_CFLAGS_FOR_TARGET
 target_subdir
 host_subdir
 build_subdir
@@ -873,6 +873,9 @@ ac_user_opts='
 enable_option_checking
 with_build_libsubdir
 with_local_prefix
+with_native_system_header_dir
+with_build_sysroot
+with_sysroot
 with_gxx_include_dir
 with_cpp_install_dir
 enable_generated_files_in_srcdir
@@ -899,9 +902,6 @@ enable_tls
 enable_objc_gc
 with_dwarf2
 enable_shared
-with_native_system_header_dir
-with_build_sysroot
-with_sysroot
 with_specs
 with_pkgversion
 with_bugurl
@@ -1685,6 +1685,12 @@ Optional Packages:
   --without-PACKAGE   do not use PACKAGE (same as --with-PACKAGE=no)
   --with-build-libsubdir=DIR  Directory where to find libraries for build 
system
   --with-local-prefix=DIR specifies directory to put local include
+  --with-native-system-header-dir=dir
+  use dir as the directory to look for standard
+  system header files in.  Defaults to /usr/include.
+  --with-build-sysroot=sysroot
+  use sysroot as the system root during the build
+  --with-sysroot[=DIR]search for usr/lib, usr/include, et al, within DIR
   --with-gxx-include-dir=DIR
   specifies directory to put g++ header files
   --with-cpp-install-dir=DIR
@@ -1697,12 +1703,6 @@ Optional Packages:
   --with-as   arrange to use the specified as (full pathname)
   --with-stabsarrange to use stabs instead of host debug format
   --with-dwarf2   force the default debug format to be DWARF 2
-  --with-native-system-header-dir=dir
-  use dir as the directory to look for standard
-  system header files in.  Defaults to /usr/include.
-  --with-build-sysroot=sysroot
-  use sysroot as the system root during the build
-  --with-sysroot[=DIR]search for usr/lib, usr/include, et al, within DIR
   --with-specs=SPECS  add SPECS to driver command-line processing
   --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
   --with-bugurl=URL   Direct users to URL to report a bug
@@ -3456,6 +3456,83 @@ if test x$local_prefix = x; then
local_prefix=/usr/local
 fi
 
+
+# Check whether --with-native-system-header-dir was given.
+if test "${with_native_system_header_dir+set}" = set; then :
+  withval=$with_native_system_header_dir;
+ case ${with_native_system_header_dir} in
+ yes|no) as_fn_error "bad value ${withval} given for 
--with-native-system-header-dir" "$LINENO" 5 ;;
+ /* | [A-Za-z]:[\\/]*) ;;
+ *) as_fn_error "--with-native-system-header-dir argument ${withval} must be 
an absolute directory" "$LINENO" 5 ;;
+ esac
+ configured_native_system_header_dir="${withval}"
+
+else
+  configured_native_system_header_dir=
+fi
+
+
+
+# Check whether --with-build-sysroot was given.
+if test "${with_build_sysroot+set}" = set; then :
+  withval=$with_build_sysroot; if test x"$withval" != x ; then
+ SYSROOT_CFLAGS_FOR_TARGET="--sysroot=$withval"
+   fi
+else
+  SYSROOT_CFLAGS_FOR_TARGET=
+fi
+
+
+
+if test "x$prefix" = xNONE; then
+ test_prefix=/usr/local
+else
+ test_prefix=$prefix
+fi
+if test "x$exec_prefix" = xNONE; then
+ test_exec_prefix=$test_prefix
+else
+ test_exec_prefix=$exec_prefix
+fi
+
+
+# Check whether --with-sysroot was given.
+if test "${with_sysroot+set}" = set; then :
+  withv

Re: [PATCH][ARM] PR target/67929 Tighten vfp3_const_double_for_bits checks

2015-11-01 Thread Yvan Roux
Hi Kyrill,

On 27 October 2015 at 13:08, Ramana Radhakrishnan
 wrote:
> On Wed, Oct 14, 2015 at 1:23 PM, Kyrill Tkachov  
> wrote:
>> Hi all,
>>
>> This patch fixes the referenced PR by rewriting the
>> vfp3_const_double_for_bits function in arm.c
>> The function is supposed to accept positive CONST_DOUBLE rtxes whose value
>> is an exact power of 2
>> and whose log2 is between 1 and 32. That is values like 2.0, 4.0, 8.9, 16.0
>> etc...
>>
>> The current implementation seems to have been written under the assumption
>> that exact_real_truncate returns
>> false if the input value is not an exact integer, whereas in fact
>> exact_real_truncate returns false if the
>> truncation operation was not exact, which are different things. This would
>> lead the function to accept any
>> CONST_DOUBLE that can truncate to a power of 2, such as 4.9, 16.2 etc.
>>
>> In any case, I've rewritten this function and used the real_isinteger
>> predicate to check if the real value
>> is an exact integer.
>>
>> The testcase demonstrates the kind of wrong code that this patch addresses.
>>
>> This bug appears on GCC 5 and 4.9 as well, but due to the recent
>> introduction of CONST_DOUBLE_REAL_VALUE
>> this patch doesn't apply on those branches. I will soon post the
>> backportable variant.
>>
>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>
>> Ok for trunk?
>
>
> Thanks, this is OK for trunk and all release branches.
>
> Ramana
>
>>
>> Thanks,
>> Kyrill
>>
>> 2015-10-12  Kyrylo Tkachov  
>>
>> PR target/67929
>> * config/arm/arm.c (vfp3_const_double_for_bits): Rewrite.
>> * config/arm/constraints.md (Dp): Update callsite.
>> * config/arm/predicates.md (const_double_vcvt_power_of_two): Likewise.
>>
>> 2015-10-12  Kyrylo Tkachov  
>>
>> PR target/67929
>> * gcc.target/arm/pr67929_1.c: New test.

This test fails when tested on hard-float targets, adding the
following line to avoid testing it in such cases will fix the issue,
but I wonder if there is a better dejaGNU directives sequence to do
that.

/* { dg-skip-if "avoid conflicting multilib options" { *-*-*eabihf } {
"*" } { "" } } */

Cheers,
Yvan


Re: [PATCH][ARM] PR target/67929 Tighten vfp3_const_double_for_bits checks

2015-11-02 Thread Yvan Roux
On 2 November 2015 at 09:38, Ramana Radhakrishnan
 wrote:
>

 2015-10-12  Kyrylo Tkachov  

 PR target/67929
 * gcc.target/arm/pr67929_1.c: New test.
>>
>> This test fails when tested on hard-float targets, adding the
>> following line to avoid testing it in such cases will fix the issue,
>> but I wonder if there is a better dejaGNU directives sequence to do
>> that.
>>
>> /* { dg-skip-if "avoid conflicting multilib options" { *-*-*eabihf } {
>> "*" } { "" } } */
>
> No, not without further investigation into why the test is failing.

Sorry, it fails because of the ABI mismatch between the built libs for
HF targets and the testcase which is built with the flag
-mfloat-abi=softfp (which is added by the directive arm_vfpv3_ok)

Yvan

> regards
> Ramana
>
>>
>> Cheers,
>> Yvan
>>


Re: [PATCH][ARM] PR target/67929 Tighten vfp3_const_double_for_bits checks

2015-11-02 Thread Yvan Roux
On 2 November 2015 at 10:24, Ramana Radhakrishnan
 wrote:
>
>
> On 02/11/15 09:01, Christophe Lyon wrote:
>> On 2 November 2015 at 09:51, Yvan Roux  wrote:
>>> On 2 November 2015 at 09:38, Ramana Radhakrishnan
>>>  wrote:
>>>>
>>>>>>>
>>>>>>> 2015-10-12  Kyrylo Tkachov  
>>>>>>>
>>>>>>> PR target/67929
>>>>>>> * gcc.target/arm/pr67929_1.c: New test.
>>>>>
>>>>> This test fails when tested on hard-float targets, adding the
>>>>> following line to avoid testing it in such cases will fix the issue,
>>>>> but I wonder if there is a better dejaGNU directives sequence to do
>>>>> that.
>>>>>
>>>>> /* { dg-skip-if "avoid conflicting multilib options" { *-*-*eabihf } {
>>>>> "*" } { "" } } */
>>>>
>>>> No, not without further investigation into why the test is failing.
>>>
>>> Sorry, it fails because of the ABI mismatch between the built libs for
>>> HF targets and the testcase which is built with the flag
>>> -mfloat-abi=softfp (which is added by the directive arm_vfpv3_ok)
>>>
>> I think that's what I meant in:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67929#c7
>
> Ah, I see what you mean - instead I would just remove all the special options 
> and move this test into gcc.c-torture/execute.
>
> There are enough testers that test by default to armhf now for us to be 
> worried about testing the exact combination.

Ha yes that's ture and I remember that we ended to that same
conclusion for one testcase I tried to find the exact float ABI flag
combination several months ago.


Yvan
> regards
> Ramana
>
>>
>> Christophe.
>>
>>> Yvan
>>>
>>>> regards
>>>> Ramana
>>>>
>>>>>
>>>>> Cheers,
>>>>> Yvan
>>>>>


Re: [PATCH][ARM] PR 49526: Add support for smmul,smmla,smmls instructions

2015-11-24 Thread Yvan Roux
Hi Kyrill,

On 24 November 2015 at 14:31, Kyrill Tkachov  wrote:
> Ping.
>
> Thanks,
> Kyrill
>
>
>
> On 13/11/15 11:50, Kyrill Tkachov wrote:
>>
>> Ping.
>> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00686.html
>>
>> Thanks,
>> Kyrill
>>
>> On 06/11/15 17:05, Kyrill Tkachov wrote:
>>>
>>> Hi all,
>>>
>>> This patch introduces support for the smmul, smmla and smmls instructions
>>> that appear in armv6 architecture levels
>>> and higher. To quote the SMMUL description from the ARMARM:
>>> "Signed Most Significant Word Multiply multiplies two signed 32-bit
>>> values, extracts the most significant 32 bits of
>>> the result, and writes those bits to the destination register."
>>>
>>> The smmla and smmls are the multiply-accumulate and multiply-subtract
>>> extensions of that multiply.
>>> There also exists an smmulr variant that rounds the result rather than
>>> truncating it.
>>> However, when I tried adding patterns for those forms I got an LRA ICE
>>> that I was not able to figure out.
>>> I'll try to find a testcase for that, but in the meantime there's no
>>> reason to not have patterns for the
>>> non-rounding variants.

I agree, but would be interested to give a look at the LRA issue when
you have a testcase.

>>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>> I've seen this trigger in quite a few places in SPEC2006 where it always
>>> made the code better.
>>>
>>> Ok for trunk?

Patch looks good to me (but can't approved it), I just wonder if
adding a subreg_highpart_p predicate in emit-rtl.c would be useful,
looking at other usage of subreg_highpart_offset it doesn't seem to be
the case, but maybe for consistency with the existing
subreg_lowpart_p.

Cheers,
Yvan

>>> Thanks,
>>> Kyrill
>>>
>>> 2015-11-06  Kyrylo Tkachov  
>>>
>>> PR target/49526
>>> * config/arm/arm.md (*mulsidi3si_v6): New pattern.
>>> (*mulsidi3siaddsi_v6): Likewise.
>>> (*mulsidi3sisubsi_v6): Likewise.
>>> * config/arm/predicates.md (subreg_highpart_operator):
>>> New predicate.
>>>
>>> 2015-11-06  Kyrylo Tkachov  
>>>
>>> PR target/49526
>>> * gcc.target/arm/pr49526_1.c: New test.
>>> * gcc.target/arm/pr49526_2.c: Likewise.
>>> * gcc.target/arm/pr49526_3.c: Likewise.
>>
>>
>


Re: [PATCH][ARM] Enable fusion of AES instructions

2015-11-25 Thread Yvan Roux
Hi Wilco,

> Enable instruction fusion of AES instructions on ARM for Cortex-A53 and
> Cortex-A57.

I've a question regarding Cortex-A35, I don't see the same
documentation for it on ARM website as we have for the other cores
yet, but is AES fusion not beneficial for it or is it planned to do it
later ?

BTW, patch looks good to me (but can't approved it)

Cheers,
Yvan

> OK for commit?
>
> ChangeLog:
> 2015-11-20  Wilco Dijkstra  
>
> * gcc/config/arm/arm.c (arm_cortex_a53_tune): Add AES fusion.
> (arm_cortex_a57_tune): Likewise.
> (aarch_macro_fusion_pair_p): Add support for AES fusion.
> * gcc/config/arm/arm-protos.h (fuse_ops): Add FUSE_AES_AESMC.
>
> ---
>  gcc/config/arm/arm-protos.h | 5 +++--
>  gcc/config/arm/arm.c| 9 +++--
>  2 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index f9b1276..4801bb8 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -302,8 +302,9 @@ struct tune_params
>enum fuse_ops
>{
>  FUSE_NOTHING   = 0,
> -FUSE_MOVW_MOVT = 1 << 0
> -  } fusible_ops: 1;
> +FUSE_MOVW_MOVT = 1 << 0,
> +FUSE_AES_AESMC = 1 << 1
> +  } fusible_ops: 2;
>/* Depth of scheduling queue to check for L2 autoprefetcher.  */
>enum {SCHED_AUTOPREF_OFF, SCHED_AUTOPREF_RANK, SCHED_AUTOPREF_FULL}
>  sched_autopref: 2;
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 02f5dc3..7077199 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -1969,7 +1969,7 @@ const struct tune_params arm_cortex_a53_tune =
>tune_params::DISPARAGE_FLAGS_NEITHER,
>tune_params::PREF_NEON_64_FALSE,
>tune_params::PREF_NEON_STRINGOPS_TRUE,
> -  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
> +  FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC),
>tune_params::SCHED_AUTOPREF_OFF
>  };
>
> @@ -1992,7 +1992,7 @@ const struct tune_params arm_cortex_a57_tune =
>tune_params::DISPARAGE_FLAGS_ALL,
>tune_params::PREF_NEON_64_FALSE,
>tune_params::PREF_NEON_STRINGOPS_TRUE,
> -  FUSE_OPS (tune_params::FUSE_MOVW_MOVT),
> +  FUSE_OPS (tune_params::FUSE_MOVW_MOVT | tune_params::FUSE_AES_AESMC),
>tune_params::SCHED_AUTOPREF_FULL
>  };
>
> @@ -29668,6 +29668,11 @@ aarch_macro_fusion_pair_p (rtx_insn* prev,
> rtx_insn* curr)
>&& REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST
> (prev_set)))
>  return true;
>  }
> +
> +  if (current_tune->fusible_ops & tune_params::FUSE_AES_AESMC
> +  && aarch_crypto_can_dual_issue (prev, curr))
> +return true;
> +
>return false;
>  }
>
> --
> 1.9.1
>
>
>


Re: [PATCH][ARM] PR 49526: Add support for smmul,smmla,smmls instructions

2015-11-25 Thread Yvan Roux
On 25 November 2015 at 11:36, Kyrill Tkachov  wrote:
> Hi Yvan,
>
>
> On 24/11/15 15:05, Yvan Roux wrote:
>>
>> Hi Kyrill,
>>
>> On 24 November 2015 at 14:31, Kyrill Tkachov 
>> wrote:
>>>
>>> Ping.
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>>
>>> On 13/11/15 11:50, Kyrill Tkachov wrote:
>>>>
>>>> Ping.
>>>> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00686.html
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> On 06/11/15 17:05, Kyrill Tkachov wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> This patch introduces support for the smmul, smmla and smmls
>>>>> instructions
>>>>> that appear in armv6 architecture levels
>>>>> and higher. To quote the SMMUL description from the ARMARM:
>>>>> "Signed Most Significant Word Multiply multiplies two signed 32-bit
>>>>> values, extracts the most significant 32 bits of
>>>>> the result, and writes those bits to the destination register."
>>>>>
>>>>> The smmla and smmls are the multiply-accumulate and multiply-subtract
>>>>> extensions of that multiply.
>>>>> There also exists an smmulr variant that rounds the result rather than
>>>>> truncating it.
>>>>> However, when I tried adding patterns for those forms I got an LRA ICE
>>>>> that I was not able to figure out.
>>>>> I'll try to find a testcase for that, but in the meantime there's no
>>>>> reason to not have patterns for the
>>>>> non-rounding variants.
>>
>> I agree, but would be interested to give a look at the LRA issue when
>> you have a testcase.
>
>
> I've filed PR 68536 for the ICE I've been seeing. Any insight is, of course,
> welcome.

Ok great, I'll give a look.

>>>>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>>>> I've seen this trigger in quite a few places in SPEC2006 where it
>>>>> always
>>>>> made the code better.
>>>>>
>>>>> Ok for trunk?
>>
>> Patch looks good to me (but can't approved it), I just wonder if
>> adding a subreg_highpart_p predicate in emit-rtl.c would be useful,
>> looking at other usage of subreg_highpart_offset it doesn't seem to be
>> the case, but maybe for consistency with the existing
>> subreg_lowpart_p.
>
>
> Perhaps, we can put it there if there's much scope for its reuse in other
> parts of the compiler. I didn't put it there in the first place so as not to
> extend the review scope of the patch to the midend...

Yes it's what I thought, and I don't see other usage. Now on AArch64 I
see that the pattern used to generate smulh and umulh
(muldi3_highpart), which are doing the same kind of operation in
the 64bit world, is doing it in a different manner (lshift and
truncate) and I'm not sure which approach is the best here.

Cheers,
Yvan

> Thanks for looking at this,
> Kyrill
>
>
>>
>> Cheers,
>> Yvan
>>
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>> 2015-11-06  Kyrylo Tkachov  
>>>>>
>>>>>  PR target/49526
>>>>>  * config/arm/arm.md (*mulsidi3si_v6): New pattern.
>>>>>  (*mulsidi3siaddsi_v6): Likewise.
>>>>>  (*mulsidi3sisubsi_v6): Likewise.
>>>>>  * config/arm/predicates.md (subreg_highpart_operator):
>>>>>  New predicate.
>>>>>
>>>>> 2015-11-06  Kyrylo Tkachov  
>>>>>
>>>>>  PR target/49526
>>>>>  * gcc.target/arm/pr49526_1.c: New test.
>>>>>  * gcc.target/arm/pr49526_2.c: Likewise.
>>>>>  * gcc.target/arm/pr49526_3.c: Likewise.
>>>>
>>>>
>


Re: [PR68432 01/22][ARM] Remove operand dependency from "type" attribute

2015-11-25 Thread Yvan Roux
Hi,

On 25 November 2015 at 14:54, Bernd Schmidt  wrote:
> On 11/25/2015 01:21 PM, Richard Sandiford wrote:
>>
>> - (plus:SI (match_operand:SI 0 "s_register_operand" "l,  r")
>> -  (match_operand:SI 1 "arm_add_operand""lPv,rIL"))
>> + (plus:SI (match_operand:SI 0 "s_register_operand" "l,l, r,r")
>> +  (match_operand:SI 1 "arm_add_operand""l,Pv,r,IL"))
>
>
> I'll leave it to ARM maintainers to approve or reject, but I want to point
> out one thing: IMO this sort of thing is better written as "l,lPv,r,rIL",
> allowing a larger set of inputs for the later alternatives while still
> returning the more specific earlier one if it matches. I don't know whether
> LRA takes advantage of that, but it could help reload, and any post-reload
> pass might look at the current alternative for a given insn to see whether
> it can be replaced by a register.

I don't remember the PR number (I can dig if needed) but I remember
that we had an LRA ICE due to redundancy in patterns alternative
description.  Thus, if it is beneficial to give larger set of inputs
in later alternatives, we have to strengthen the support in LRA.

> Bernd


[arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Yvan Roux
Hi,

This patch adds Linaro version string and release macros to ARM GCC 8
vendor branch.

Ok to commit?

Thanks
Yvan

gcc/ChangeLog
2018-08-10  Yvan Roux  

* LINARO-VERSION: New file.
* configure.ac: Add Linaro version string.
* configure: Regenerate.
* Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
(CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
(cppbuiltin.o): Depend on $(LINAROVER).
* cppbuiltin.c (parse_linarover): New.
(define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.
Index: gcc/LINARO-VERSION
===
--- gcc/LINARO-VERSION	(nonexistent)
+++ gcc/LINARO-VERSION	(working copy)
@@ -0,0 +1 @@
+8.2-2018.08~dev
Index: gcc/Makefile.in
===
--- gcc/Makefile.in	(revision 263464)
+++ gcc/Makefile.in	(working copy)
@@ -854,10 +854,12 @@
 DEVPHASE:= $(srcdir)/DEV-PHASE # experimental, prerelease, ""
 DATESTAMP   := $(srcdir)/DATESTAMP # MMDD or empty
 REVISION:= $(srcdir)/REVISION  # [BRANCH revision XX]
+LINAROVER   := $(srcdir)/LINARO-VERSION # M.x-.MM[-S][~dev]
 
 BASEVER_c   := $(shell cat $(BASEVER))
 DEVPHASE_c  := $(shell cat $(DEVPHASE))
 DATESTAMP_c := $(shell cat $(DATESTAMP))
+LINAROVER_c := $(shell cat $(LINAROVER))
 
 ifeq (,$(wildcard $(REVISION)))
 REVISION_c  :=
@@ -884,6 +886,7 @@
   "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\""
 PKGVERSION_s:= "\"@PKGVERSION@\""
 BUGURL_s:= "\"@REPORT_BUGS_TO@\""
+LINAROVER_s := "\"$(LINAROVER_c)\""
 
 PKGVERSION  := @PKGVERSION@
 BUGURL_TEXI := @REPORT_BUGS_TEXI@
@@ -2883,8 +2886,9 @@
   -DSTANDARD_EXEC_PREFIX=\"$(libdir)/gcc/\" \
   @TARGET_SYSTEM_ROOT_DEFINE@
 
-CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s)
-cppbuiltin.o: $(BASEVER)
+CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
+	-DLINAROVER=$(LINAROVER_s)
+cppbuiltin.o: $(BASEVER) $(LINAROVER)
 
 CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES)
 
Index: gcc/configure
===
--- gcc/configure	(revision 263464)
+++ gcc/configure	(working copy)
@@ -1726,7 +1726,8 @@
   --with-stabsarrange to use stabs instead of host debug format
   --with-dwarf2   force the default debug format to be DWARF 2
   --with-specs=SPECS  add SPECS to driver command-line processing
-  --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
+  --with-pkgversion=PKG   Use PKG in the version string in place of "Linaro
+  GCC `cat $srcdir/LINARO-VERSION`"
   --with-bugurl=URL   Direct users to URL to report a bug
   --with-multilib-listselect multilibs (AArch64, SH and x86-64 only)
   --with-gnu-ld   assume the C compiler uses GNU ld default=no
@@ -7649,7 +7650,7 @@
   *)   PKGVERSION="($withval) " ;;
  esac
 else
-  PKGVERSION="(GCC) "
+  PKGVERSION="(Linaro GCC `cat $srcdir/LINARO-VERSION`) "
 
 fi
 
@@ -18448,7 +18449,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18451 "configure"
+#line 18452 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18554,7 +18555,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18557 "configure"
+#line 18558 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
Index: gcc/configure.ac
===
--- gcc/configure.ac	(revision 263464)
+++ gcc/configure.ac	(working copy)
@@ -929,7 +929,7 @@
 )
 AC_SUBST(CONFIGURE_SPECS)
 
-ACX_PKGVERSION([GCC])
+ACX_PKGVERSION([Linaro GCC `cat $srcdir/LINARO-VERSION`])
 ACX_BUGURL([https://gcc.gnu.org/bugs/])
 
 # Sanity check enable_languages in case someone does not run the toplevel
Index: gcc/cppbuiltin.c
===
--- gcc/cppbuiltin.c	(revision 263464)
+++ gcc/cppbuiltin.c	(working copy)
@@ -53,18 +53,41 @@
 *patchlevel = s_patchlevel;
 }
 
+/* Parse a LINAROVER version string of the format "M.m-year.month[-spin][~dev]"
+   to create Linaro release number MM and spin version.  */
+static void
+parse_linarover (int *release, int *spin)
+{
+  static int s_year = -1, s_month, s_spin;
 
+  if (s_year == -1)
+if (sscanf (LINAROVER, "%*[^-]-%d.%d-%d", &s_year, &s_month, &s_spin) != 3)
+  {
+	sscanf (LINAROVER, "%*[^-]-%d.%d", &s_year, &s_month);
+	s_spin = 0;
+  }
+
+  if (release)
+*release = s_year * 100 + s_month;
+
+  if (spin)
+*spin = s_spin;

Re: [arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Yvan Roux
On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan
 wrote:
>
> On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux  wrote:
> > Hi,
> >
> > This patch adds Linaro version string and release macros to ARM GCC 8
> > vendor branch.
> >
> > Ok to commit?
> >
>
>
> Ok if no regressions. (I'm assuming you've built and eyeballed that
> the pre-processor macros are being produced). (I have a patch to
> www-docs for this branch that I'm writing up and should try and get
> out today. Though it would be nice to have tests for these if
> possible.

I've not passed the regression testsuite since this patch is part of
Linaro vendor branches for a long time, I've just built an x86_64 c
compiler from arm-8-branch and verified the version string and macros:

$ ~/wip/arm-8-install/bin/gcc --version
gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802

$ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO
#define __LINARO_SPIN__ 0
#define __LINARO_RELEASE__ 201808

I can add some tests, but it will take some time to remember me how
these kind of thing is tested in the testsuite ;)

> regards
> Ramana
>
>
> > Thanks
> > Yvan
> >
> > gcc/ChangeLog
> > 2018-08-10  Yvan Roux  
> >
> > * LINARO-VERSION: New file.
> > * configure.ac: Add Linaro version string.
> > * configure: Regenerate.
> > * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
> > (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
> > (cppbuiltin.o): Depend on $(LINAROVER).
> > * cppbuiltin.c (parse_linarover): New.
> > (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.


Re: [arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Yvan Roux
On Fri, 10 Aug 2018 at 14:31, Yvan Roux  wrote:
>
> On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan
>  wrote:
> >
> > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux  wrote:
> > > Hi,
> > >
> > > This patch adds Linaro version string and release macros to ARM GCC 8
> > > vendor branch.
> > >
> > > Ok to commit?
> > >
> >
> >
> > Ok if no regressions. (I'm assuming you've built and eyeballed that
> > the pre-processor macros are being produced). (I have a patch to
> > www-docs for this branch that I'm writing up and should try and get
> > out today. Though it would be nice to have tests for these if
> > possible.
>
> I've not passed the regression testsuite since this patch is part of
> Linaro vendor branches for a long time, I've just built an x86_64 c
> compiler from arm-8-branch and verified the version string and macros:
>
> $ ~/wip/arm-8-install/bin/gcc --version
> gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802
>
> $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO
> #define __LINARO_SPIN__ 0
> #define __LINARO_RELEASE__ 201808
>
> I can add some tests, but it will take some time to remember me how
> these kind of thing is tested in the testsuite ;)

Updated version of the patch, with a test case for Linaro macros.  The
test is not strict w/r to the version or spin number to avoid having
to update it every release or re-spin, do you think it is sufficient ?

Testsuite ran with the included test PASS.

gcc/ChangeLog
2018-08-10  Yvan Roux  

* LINARO-VERSION: New file.
* configure.ac: Add Linaro version string.
* configure: Regenerate.
* Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
(CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
    (cppbuiltin.o): Depend on $(LINAROVER).
* cppbuiltin.c (parse_linarover): New.
(define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.

gcc/testsuite/ChangeLog
2018-08-10  Yvan Roux  

* gcc.dg/cpp/linaro-macros.c: New test.
Index: gcc/LINARO-VERSION
===
--- gcc/LINARO-VERSION	(nonexistent)
+++ gcc/LINARO-VERSION	(working copy)
@@ -0,0 +1 @@
+8.2-2018.08~dev
Index: gcc/Makefile.in
===
--- gcc/Makefile.in	(revision 263464)
+++ gcc/Makefile.in	(working copy)
@@ -854,10 +854,12 @@
 DEVPHASE:= $(srcdir)/DEV-PHASE # experimental, prerelease, ""
 DATESTAMP   := $(srcdir)/DATESTAMP # MMDD or empty
 REVISION:= $(srcdir)/REVISION  # [BRANCH revision XX]
+LINAROVER   := $(srcdir)/LINARO-VERSION # M.x-.MM[-S][~dev]
 
 BASEVER_c   := $(shell cat $(BASEVER))
 DEVPHASE_c  := $(shell cat $(DEVPHASE))
 DATESTAMP_c := $(shell cat $(DATESTAMP))
+LINAROVER_c := $(shell cat $(LINAROVER))
 
 ifeq (,$(wildcard $(REVISION)))
 REVISION_c  :=
@@ -884,6 +886,7 @@
   "\"$(if $(DEVPHASE_c)$(filter-out 0,$(PATCHLEVEL_c)), $(DATESTAMP_c))\""
 PKGVERSION_s:= "\"@PKGVERSION@\""
 BUGURL_s:= "\"@REPORT_BUGS_TO@\""
+LINAROVER_s := "\"$(LINAROVER_c)\""
 
 PKGVERSION  := @PKGVERSION@
 BUGURL_TEXI := @REPORT_BUGS_TEXI@
@@ -2883,8 +2886,9 @@
   -DSTANDARD_EXEC_PREFIX=\"$(libdir)/gcc/\" \
   @TARGET_SYSTEM_ROOT_DEFINE@
 
-CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s)
-cppbuiltin.o: $(BASEVER)
+CFLAGS-cppbuiltin.o += $(PREPROCESSOR_DEFINES) -DBASEVER=$(BASEVER_s) \
+	-DLINAROVER=$(LINAROVER_s)
+cppbuiltin.o: $(BASEVER) $(LINAROVER)
 
 CFLAGS-cppdefault.o += $(PREPROCESSOR_DEFINES)
 
Index: gcc/configure
===
--- gcc/configure	(revision 263464)
+++ gcc/configure	(working copy)
@@ -1726,7 +1726,8 @@
   --with-stabsarrange to use stabs instead of host debug format
   --with-dwarf2   force the default debug format to be DWARF 2
   --with-specs=SPECS  add SPECS to driver command-line processing
-  --with-pkgversion=PKG   Use PKG in the version string in place of "GCC"
+  --with-pkgversion=PKG   Use PKG in the version string in place of "Linaro
+  GCC `cat $srcdir/LINARO-VERSION`"
   --with-bugurl=URL   Direct users to URL to report a bug
   --with-multilib-listselect multilibs (AArch64, SH and x86-64 only)
   --with-gnu-ld   assume the C compiler uses GNU ld default=no
@@ -7649,7 +7650,7 @@
   *)   PKGVERSION="($withval) " ;;
  esac
 else
-  PKGVERSION="(GCC) "
+  PKGVERSION="(Linaro GCC `cat $srcdir/LINARO-VERSION`) "
 
 fi
 
@@ -18448,7 +18449,7 @@
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_E

Re: [arm-8-branch] Add Linaro version string and macros

2018-08-10 Thread Yvan Roux
On Fri, 10 Aug 2018 at 17:01, Ramana Radhakrishnan
 wrote:
>
> On Fri, Aug 10, 2018 at 3:35 PM, Yvan Roux  wrote:
> > On Fri, 10 Aug 2018 at 14:31, Yvan Roux  wrote:
> >>
> >> On Fri, 10 Aug 2018 at 13:45, Ramana Radhakrishnan
> >>  wrote:
> >> >
> >> > On Fri, Aug 10, 2018 at 11:09 AM, Yvan Roux  wrote:
> >> > > Hi,
> >> > >
> >> > > This patch adds Linaro version string and release macros to ARM GCC 8
> >> > > vendor branch.
> >> > >
> >> > > Ok to commit?
> >> > >
> >> >
> >> >
> >> > Ok if no regressions. (I'm assuming you've built and eyeballed that
> >> > the pre-processor macros are being produced). (I have a patch to
> >> > www-docs for this branch that I'm writing up and should try and get
> >> > out today. Though it would be nice to have tests for these if
> >> > possible.
> >>
> >> I've not passed the regression testsuite since this patch is part of
> >> Linaro vendor branches for a long time, I've just built an x86_64 c
> >> compiler from arm-8-branch and verified the version string and macros:
> >>
> >> $ ~/wip/arm-8-install/bin/gcc --version
> >> gcc (Linaro GCC 8.2-2018.08~dev) 8.2.1 20180802
> >>
> >> $ ~/wip/arm-8-install/bin/gcc -E -dM - < /dev/null | grep LINARO
> >> #define __LINARO_SPIN__ 0
> >> #define __LINARO_RELEASE__ 201808
> >>
> >> I can add some tests, but it will take some time to remember me how
> >> these kind of thing is tested in the testsuite ;)
> >
> > Updated version of the patch, with a test case for Linaro macros.  The
> > test is not strict w/r to the version or spin number to avoid having
> > to update it every release or re-spin, do you think it is sufficient ?
> >
> > Testsuite ran with the included test PASS.
> >
> > gcc/ChangeLog
> > 2018-08-10  Yvan Roux  
> >
> >     * LINARO-VERSION: New file.
> > * configure.ac: Add Linaro version string.
> > * configure: Regenerate.
> > * Makefile.in (LINAROVER, LINAROVER_C, LINAROVER_S): Define.
> > (CFLAGS-cppbuiltin.o): Add LINAROVER macro definition.
> > (cppbuiltin.o): Depend on $(LINAROVER).
> > * cppbuiltin.c (parse_linarover): New.
> > (define_GNUC__): Define __LINARO_RELEASE__ and __LINARO_SPIN__ macros.
> >
> > gcc/testsuite/ChangeLog
> > 2018-08-10  Yvan Roux  
> >
> > * gcc.dg/cpp/linaro-macros.c: New test.
>
>
> Looks good, thanks  - can you put the Changelog in a Changelog.arm file ?

Sure, does it need the FSF Copyright lines at the end like other ChangeLogs ?

Thanks
Yvan

> regards
> Ramana


Re: [PATCH][AArch64][GCC 6] PR target/79041: Correct -mpc-relative-literal-loads logic in aarch64_classify_symbol

2017-06-22 Thread Yvan Roux
Hi all,

On 16 January 2017 at 16:34, Kyrill Tkachov  wrote:
>
> On 13/01/17 16:35, James Greenhalgh wrote:
>>
>> On Wed, Jan 11, 2017 at 04:32:45PM +, Kyrill Tkachov wrote:
>>>
>>> Hi all,
>>>
>>> In this PR we generated ADRP/ADD instructions with :lo12: relocations on
>>> symbols even though -mpc-relative-literal-loads is used. This is due to
>>> the
>>> confusing double-negative logic of the nopcrelative_literal_loads aarch64
>>> variable and its relation to the aarch64_nopcrelative_literal_loads
>>> global
>>> variable in the GCC 6 branch.
>>>
>>> Wilco fixed this on trunk as part of a larger patch (r237607) and parts
>>> of
>>> that patch were backported, but other parts weren't and that patch now
>>> doesn't apply cleanly to the branch.
>>
>> As I commented to Jakub at the time he made the first partial backport,
>> I'd much rather just see all of Wilco's patch backported. We're not on
>> the verge of a 6 release now, so please just backport Wilco's patch (as
>> should have been done all along if this had been correctly identified as
>> a fix rather than a clean-up).
>
>
> Unfortunately simply backporting the patch does not fix this PR.
> The aarch64_classify_symbol changes do not have the desired effect
> and the :lo12: relocations are generated.
> I'll look into it, but I believe that would require a bigger change than
> this one-liner.

Here is the backport of Wilco's patch (r237607) along with Kyrill's
one (r244643, which removed the remaining occurences of
aarch64_nopcrelative_literal_loads).  To fix the issue the original
patch has to be modified, to keep aarch64_pcrelative_literal_loads
test for large models in aarch64_classify_symbol.

On trunk and gcc-7-branch the :lo12: relocations are not generated
because of Wilco's fix for pr78733 (r243456 and 243486), but my
understanding is that the bug is still present since compiling
gcc.target/aarch64/pr78733.c with -mcmodel=large brings back the
:lo12: relocations (I'll submit a patch to add the test back if my
understanding is correct).

Cross built and regtested without issue for aarch64-linux-gnu,
aarch64-none-elf and aarch64_be-none-elf targets

OK for gcc-6-branch ?

Thanks
Yvan

gcc/ChangeLog
2017-06-22  Yvan Roux  

PR target/79041
Backport from mainline
2016-06-20  Wilco Dijkstra  

* config/aarch64/aarch64.opt
(mpc-relative-literal-loads): Rename internal option name.
* config/aarch64/aarch64.c
(aarch64_nopcrelative_literal_loads): Rename to
aarch64_pcrelative_literal_loads.
(aarch64_expand_mov_immediate): Likewise.
(aarch64_secondary_reload): Likewise.
(aarch64_can_use_per_function_literal_pools_p): Likewise.
(aarch64_classify_symbol): Likewise.
(aarch64_override_options_after_change_1): Rename and simplify logic.

2016-01-19  Kyrylo Tkachov  

* config/aarch64/aarch64-protos.h(aarch64_nopcrelative_literal_loads):
Delete.
* config/aarch64/aarch64.md
(aarch64_reload_movcp): Delete reference to
aarch64_nopcrelative_literal_loads.
(aarch64_reload_movcp): Likewise.

gcc/testsuite/ChangeLog
2017-06-22  Yvan Roux  

PR target/79041
* gcc.target/aarch64/pr79041.c: New test.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index fa97e29..7b10ff6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -436,7 +436,6 @@ int aarch64_ccmp_mode_to_code (enum machine_mode mode);
 bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset);
 bool aarch64_operands_ok_for_ldpstp (rtx *, bool, enum machine_mode);
 bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, enum machine_mode);
-extern bool aarch64_nopcrelative_literal_loads;
 
 extern void aarch64_asm_output_pool_epilogue (FILE *, const char *,
 	  tree, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e79165b..9b06320 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -152,7 +152,7 @@ enum aarch64_processor aarch64_tune = cortexa53;
 unsigned long aarch64_tune_flags = 0;
 
 /* Global flag for PC relative loads.  */
-bool aarch64_nopcrelative_literal_loads;
+bool aarch64_pcrelative_literal_loads;
 
 /* Support for command line parsing of boolean flags in the tuning
structures.  */
@@ -1703,7 +1703,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	 we need to expand the literal pool access carefully.
 	 This is something that needs to be done in a number
 	 of places, so could well live as a separate function.  */
-	  if (aarch64_nopcrelative_literal_loads)
+	  if (!aar

Re: [PATCH][AArch64][GCC 6] PR target/79041: Correct -mpc-relative-literal-loads logic in aarch64_classify_symbol

2017-06-27 Thread Yvan Roux
Hi,

On 22 June 2017 at 20:42, Yvan Roux  wrote:
> Hi all,
>
> On 16 January 2017 at 16:34, Kyrill Tkachov  
> wrote:
>>
>> On 13/01/17 16:35, James Greenhalgh wrote:
>>>
>>> On Wed, Jan 11, 2017 at 04:32:45PM +, Kyrill Tkachov wrote:
>>>>
>>>> Hi all,
>>>>
>>>> In this PR we generated ADRP/ADD instructions with :lo12: relocations on
>>>> symbols even though -mpc-relative-literal-loads is used. This is due to
>>>> the
>>>> confusing double-negative logic of the nopcrelative_literal_loads aarch64
>>>> variable and its relation to the aarch64_nopcrelative_literal_loads
>>>> global
>>>> variable in the GCC 6 branch.
>>>>
>>>> Wilco fixed this on trunk as part of a larger patch (r237607) and parts
>>>> of
>>>> that patch were backported, but other parts weren't and that patch now
>>>> doesn't apply cleanly to the branch.
>>>
>>> As I commented to Jakub at the time he made the first partial backport,
>>> I'd much rather just see all of Wilco's patch backported. We're not on
>>> the verge of a 6 release now, so please just backport Wilco's patch (as
>>> should have been done all along if this had been correctly identified as
>>> a fix rather than a clean-up).
>>
>>
>> Unfortunately simply backporting the patch does not fix this PR.
>> The aarch64_classify_symbol changes do not have the desired effect
>> and the :lo12: relocations are generated.
>> I'll look into it, but I believe that would require a bigger change than
>> this one-liner.
>
> Here is the backport of Wilco's patch (r237607) along with Kyrill's
> one (r244643, which removed the remaining occurences of
> aarch64_nopcrelative_literal_loads).  To fix the issue the original
> patch has to be modified, to keep aarch64_pcrelative_literal_loads
> test for large models in aarch64_classify_symbol.
>
> On trunk and gcc-7-branch the :lo12: relocations are not generated
> because of Wilco's fix for pr78733 (r243456 and 243486), but my
> understanding is that the bug is still present since compiling
> gcc.target/aarch64/pr78733.c with -mcmodel=large brings back the
> :lo12: relocations (I'll submit a patch to add the test back if my
> understanding is correct).
>
> Cross built and regtested without issue for aarch64-linux-gnu,
> aarch64-none-elf and aarch64_be-none-elf targets
>
> OK for gcc-6-branch ?

Sorry for the fast ping, but since a 6.4 rc1 is planned for tomorrow,
do you think that this fix can be part of it ?

> Thanks
> Yvan
>
> gcc/ChangeLog
> 2017-06-22  Yvan Roux  
>
> PR target/79041
> Backport from mainline
> 2016-06-20  Wilco Dijkstra  
>
> * config/aarch64/aarch64.opt
> (mpc-relative-literal-loads): Rename internal option name.
> * config/aarch64/aarch64.c
> (aarch64_nopcrelative_literal_loads): Rename to
> aarch64_pcrelative_literal_loads.
> (aarch64_expand_mov_immediate): Likewise.
> (aarch64_secondary_reload): Likewise.
> (aarch64_can_use_per_function_literal_pools_p): Likewise.
> (aarch64_classify_symbol): Likewise.
> (aarch64_override_options_after_change_1): Rename and simplify logic.
>
> 2016-01-19  Kyrylo Tkachov  
>
> * config/aarch64/aarch64-protos.h(aarch64_nopcrelative_literal_loads):
> Delete.
> * config/aarch64/aarch64.md
> (aarch64_reload_movcp): Delete reference to
> aarch64_nopcrelative_literal_loads.
> (aarch64_reload_movcp): Likewise.
>
> gcc/testsuite/ChangeLog
> 2017-06-22  Yvan Roux  
>
> PR target/79041
> * gcc.target/aarch64/pr79041.c: New test.


Re: [PATCH][AArch64][GCC 6] PR target/79041: Correct -mpc-relative-literal-loads logic in aarch64_classify_symbol

2017-06-27 Thread Yvan Roux
Hi Wilco

On 27 June 2017 at 12:53, Wilco Dijkstra  wrote:
> Hi Yvan,
>
>> Here is the backport of Wilco's patch (r237607) along with Kyrill's
>> one (r244643, which removed the remaining occurences of
>> aarch64_nopcrelative_literal_loads).  To fix the issue the original
>> patch has to be modified, to keep aarch64_pcrelative_literal_loads
>> test for large models in aarch64_classify_symbol.
>
> The patch looks good to me, however I can't approve it.

ok thanks for the review.

>> On trunk and gcc-7-branch the :lo12: relocations are not generated
>> because of Wilco's fix for pr78733 (r243456 and 243486), but my
>> understanding is that the bug is still present since compiling
>> gcc.target/aarch64/pr78733.c with -mcmodel=large brings back the
>> :lo12: relocations (I'll submit a patch to add the test back if my
>> understanding is correct).
>
> You're right, eventhough -mpc-relative-literal-loads doesn't make much sense
> in the large memory model, it seems best to keep the option orthogonal to
> enable the workaround. I've prepared a patch to fix this on trunk/GCC7.
> It also adds a test which we should add to your changes to GCC6 too.

ok, I think it is what kugan's proposed earlier today in:

https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01967.html

I agree that -mpc-relative-literal-loads and large memory model
doesn't make much sense, now it is what is used in kernel build
system, but if you handle that in a bigger fix already, that's awesome
:)

Thanks
Yvan

> Wilco


[Doc, AArch64] Fix/Update AArch64 options.

2017-06-27 Thread Yvan Roux
Hi,

I just noticed that some AArch64 options (-mpc-relative-literal-loads,
-msign-return-address=scope and -moverride=string) are missing in the
option summary part of the manual:

https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html#Option-Summary

and that the "-no" version of -mpc-relative-literal-loads is missing
in AArch64 options page:

https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options

This patch fixes these issues and remove a redundant "Save" property
in mpc-relative-literal-loads description.

Tested by re-generating the manual, Ok for trunk ?

Thanks
Yvan

gcc/ChangeLog
2017-06-27  Yvan Roux  

   * config/aarch64/aarch64.opt
   (mpc-relative-literal-loads): Remove redundant property.
   * doc/invoke.texi (AArch64): Add missing options.
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 942a7d5..0fd1bfa 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -146,7 +146,7 @@ EnumValue
 Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64)
 
 mpc-relative-literal-loads
-Target Report Save Var(pcrelative_literal_loads) Init(2) Save
+Target Report Var(pcrelative_literal_loads) Init(2) Save
 PC relative literal loads.
 
 msign-return-address=
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d1e097b..6e0e776 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -595,7 +595,9 @@ Objective-C and Objective-C++ Dialects}.
 -mlow-precision-recip-sqrt  -mno-low-precision-recip-sqrt@gol
 -mlow-precision-sqrt  -mno-low-precision-sqrt@gol
 -mlow-precision-div  -mno-low-precision-div @gol
--march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
+-mpc-relative-literal-loads -mno-pc-relative-literal-loads @gol
+-msign-return-address=@var{scope} @gol
+-march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  -moverride=@var{string}}
 
 @emph{Adapteva Epiphany Options}
 @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
@@ -14158,8 +14160,10 @@ across releases.
 This option is only intended to be useful when developing GCC.
 
 @item -mpc-relative-literal-loads
+@item -mno-pc-relative-literal-loads
 @opindex mpc-relative-literal-loads
-Enable PC-relative literal loads.  With this option literal pools are
+@opindex mno-pc-relative-literal-loads
+Enable or disable PC-relative literal loads.  With this option literal pools are
 accessed using a single instruction and emitted after each function.  This
 limits the maximum size of functions to 1MB.  This is enabled by default for
 @option{-mcmodel=tiny}.


Re: [PATCH][AArch64] Fix PR79041

2017-06-27 Thread Yvan Roux
Hi

On 27 June 2017 at 16:49, Wilco Dijkstra  wrote:
> As described in PR79041, -mcmodel=large -mpc-relative-literal-loads
> may be used to avoid generating ADRP/ADD or ADRP/LDR.  However both
> trunk and GCC7 may still emit ADRP for some constant pool literals.
> Fix this by adding a aarch64_pcrelative_literal_loads check.
>
> OK for trunk/GCC7 backport?

I can't approve it, but the patch is ok for me, I've built and
regtested the very same patch for aarch64-linux-gnu,
aarch64-none-elf and aarch64_be-none-elf targets :)

Yvan

> ChangeLog:
> 2017-06-27  Wilco Dijkstra  
>
> PR target/79041
> * config/aarch64/aarch64.c (aarch64_classify_symbol):
> Avoid SYMBOL_SMALL_ABSOLUTE .
> * testsuite/gcc.target/aarch64/pr79041-2.c: New test.
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 060cd8476d2954119daac495ecb059c9be73edbe..329d244e9cf16dbdf849e5dd02b3999caf0cd5a7
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10042,7 +10042,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
>   /* This is alright even in PIC code as the constant
>  pool reference is always PC relative and within
>  the same translation unit.  */
> - if (CONSTANT_POOL_ADDRESS_P (x))
> + if (CONSTANT_POOL_ADDRESS_P (x) && 
> !aarch64_pcrelative_literal_loads)
> return SYMBOL_SMALL_ABSOLUTE;
>   else
> return SYMBOL_FORCE_TO_MEM;
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
> b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
> new file mode 100644
> index 
> ..e7899725bad2b770f8488a07f99792113275bdf2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
> +
> +__int128
> +t (void)
> +{
> +  return (__int128)1 << 80;
> +}
> +
> +/* { dg-final { scan-assembler "adr" } } */


Re: [PATCH][AArch64] Fix PR79041

2017-06-27 Thread Yvan Roux
On 27 June 2017 at 16:55, Yvan Roux  wrote:
> Hi
>
> On 27 June 2017 at 16:49, Wilco Dijkstra  wrote:
>> As described in PR79041, -mcmodel=large -mpc-relative-literal-loads
>> may be used to avoid generating ADRP/ADD or ADRP/LDR.  However both
>> trunk and GCC7 may still emit ADRP for some constant pool literals.
>> Fix this by adding a aarch64_pcrelative_literal_loads check.
>>
>> OK for trunk/GCC7 backport?
>
> I can't approve it, but the patch is ok for me, I've built and
> regtested the very same patch for aarch64-linux-gnu,
> aarch64-none-elf and aarch64_be-none-elf targets :)
>
> Yvan
>
>> ChangeLog:
>> 2017-06-27  Wilco Dijkstra  
>>
>> PR target/79041
>> * config/aarch64/aarch64.c (aarch64_classify_symbol):
>> Avoid SYMBOL_SMALL_ABSOLUTE .
>> * testsuite/gcc.target/aarch64/pr79041-2.c: New test.
>> --
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> 060cd8476d2954119daac495ecb059c9be73edbe..329d244e9cf16dbdf849e5dd02b3999caf0cd5a7
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -10042,7 +10042,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
>>   /* This is alright even in PIC code as the constant
>>  pool reference is always PC relative and within
>>  the same translation unit.  */
>> - if (CONSTANT_POOL_ADDRESS_P (x))
>> + if (CONSTANT_POOL_ADDRESS_P (x) && 
>> !aarch64_pcrelative_literal_loads)

maybe just a small note here, in my backport patch on gcc-6-branch a
kept the existing order in the condition:

- if (nopcrelative_literal_loads
+ if (!aarch64_pcrelative_literal_loads
  && CONSTANT_POOL_ADDRESS_P (x))

Can we keep it, or maybe I should wait for your patch to be committed
on trunk, and include its backport in my patch for GCC 6.


>> return SYMBOL_SMALL_ABSOLUTE;
>>   else
>> return SYMBOL_FORCE_TO_MEM;
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
>> b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
>> new file mode 100644
>> index 
>> ..e7899725bad2b770f8488a07f99792113275bdf2
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
>> +
>> +__int128
>> +t (void)
>> +{
>> +  return (__int128)1 << 80;
>> +}
>> +
>> +/* { dg-final { scan-assembler "adr" } } */


Re: [Doc, AArch64] Fix/Update AArch64 options.

2017-06-28 Thread Yvan Roux
Hi Sandra,

On 27 June 2017 at 18:05, Sandra Loosemore  wrote:
> On 06/27/2017 06:19 AM, Yvan Roux wrote:
>
>> diff --git a/gcc/config/aarch64/aarch64.opt
>> b/gcc/config/aarch64/aarch64.opt
>> index 942a7d5..0fd1bfa 100644
>> --- a/gcc/config/aarch64/aarch64.opt
>> +++ b/gcc/config/aarch64/aarch64.opt
>> @@ -146,7 +146,7 @@ EnumValue
>>  Enum(aarch64_abi) String(lp64) Value(AARCH64_ABI_LP64)
>>
>>  mpc-relative-literal-loads
>> -Target Report Save Var(pcrelative_literal_loads) Init(2) Save
>> +Target Report Var(pcrelative_literal_loads) Init(2) Save
>>  PC relative literal loads.
>>
>>  msign-return-address=
>
>
> I think this qualifies as an obvious fix.  I can't approve it if it isn't,
> anyway  ;-)

Ok, I'll commit it separately unless there is an objection to its obviousness.

>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index d1e097b..6e0e776 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -595,7 +595,9 @@ Objective-C and Objective-C++ Dialects}.
>>  -mlow-precision-recip-sqrt  -mno-low-precision-recip-sqrt@gol
>>  -mlow-precision-sqrt  -mno-low-precision-sqrt@gol
>>  -mlow-precision-div  -mno-low-precision-div @gol
>> --march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
>> +-mpc-relative-literal-loads -mno-pc-relative-literal-loads @gol
>
>
> For options that have both positive and negative variants, we should only be
> listing the one that is not the default in the Option Summary table.  Can
> you please remove the existing redundant options listed for AArch64, instead
> of adding a new one?
>
>> +-msign-return-address=@var{scope} @gol
>> +-march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}
>> -moverride=@var{string}}
>>
>>  @emph{Adapteva Epiphany Options}
>>  @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
>> @@ -14158,8 +14160,10 @@ across releases.
>>  This option is only intended to be useful when developing GCC.
>>
>>  @item -mpc-relative-literal-loads
>> +@item -mno-pc-relative-literal-loads
>
>
> It is OK to list both the positive and negative forms in the full
> description, but in a table with multiple items in the same entry, the
> second and subsequent ones should use @itemx markup instead of @item.
>
>>  @opindex mpc-relative-literal-loads
>> -Enable PC-relative literal loads.  With this option literal pools are
>> +@opindex mno-pc-relative-literal-loads
>> +Enable or disable PC-relative literal loads.  With this option literal
>> pools are
>>  accessed using a single instruction and emitted after each function.
>> This
>>  limits the maximum size of functions to 1MB.  This is enabled by default
>> for
>>  @option{-mcmodel=tiny}.

OK, here is the new patch with the comments addressed.  I've spotted
that there is also some m / -mno  options at least in the ARM section,
I'll make another patch to fix that.

Thanks
Yvan


>
> -Sandra
>
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d1e097b..e1bb8a8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -587,15 +587,14 @@ Objective-C and Objective-C++ Dialects}.
 -mgeneral-regs-only @gol
 -mcmodel=tiny  -mcmodel=small  -mcmodel=large @gol
 -mstrict-align @gol
--momit-leaf-frame-pointer  -mno-omit-leaf-frame-pointer @gol
+-momit-leaf-frame-pointer @gol
 -mtls-dialect=desc  -mtls-dialect=traditional @gol
 -mtls-size=@var{size} @gol
--mfix-cortex-a53-835769  -mno-fix-cortex-a53-835769 @gol
--mfix-cortex-a53-843419  -mno-fix-cortex-a53-843419 @gol
--mlow-precision-recip-sqrt  -mno-low-precision-recip-sqrt@gol
--mlow-precision-sqrt  -mno-low-precision-sqrt@gol
--mlow-precision-div  -mno-low-precision-div @gol
--march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}}
+-mfix-cortex-a53-835769  -mfix-cortex-a53-843419 @gol
+-mlow-precision-recip-sqrt  -mlow-precision-sqrt  -mlow-precision-div @gol
+-mpc-relative-literal-loads @gol
+-msign-return-address=@var{scope} @gol
+-march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  -moverride=@var{string}}
 
 @emph{Adapteva Epiphany Options}
 @gccoptlist{-mhalf-reg-file  -mprefer-short-insn-regs @gol
@@ -14158,8 +14157,10 @@ across releases.
 This option is only intended to be useful when developing GCC.
 
 @item -mpc-relative-literal-loads
+@itemx -mno-pc-relative-literal-loads
 @opindex mpc-relative-literal-loads
-Enable PC-relative literal loads.  With this option literal pools are
+@opindex mno-pc-relative-literal-loads
+Enable or disable PC-relative literal loads.  With this option literal pools are
 accessed using a single instruction and emitted after each function.  This
 limits the maximum size of functions to 1MB.  This is enabled by default for
 @option{-mcmodel=tiny}.


Re: [Libgomp, Fortran] Fix canadian cross build

2017-07-03 Thread Yvan Roux
On 23 June 2017 at 15:44, Yvan Roux  wrote:
> Hello,
>
> Fortran parts of libgomp (omp_lib.mod, openacc.mod, etc...) are
> missing in a canadian cross build, at least when target gfortran
> compiler comes from PATH and not from GFORTRAN_FOR_TARGET.
>
> Back in 2010, executability test of GFORTRAN was added to fix libgomp
> build on cygwin, but when the executable doesn't contain the path,
> "test -x" fails and part of the library are not built.
>
> This patch fixes the issue by using M4 macro AC_PATH_PROG (which
> returns the absolute name) instead of AC_CHECK_PROG in the function
> defined in config/acx.m4: NCN_STRICT_CHECK_TARGET_TOOLS.  I renamed it
> into NCN_STRICT_PATH_TARGET_TOOLS to keep the semantic used in M4.
>
> Tested by building cross and candian cross toolchain (host:
> i686-w64-mingw32) for arm-linux-gnueabihf with issue and with a
> complete libgomp.
>
> ok for trunk ?

ping?

> Thanks
> Yvan
>
> config/ChangeLog
> 2017-06-23  Yvan Roux  
>
> * acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ...
> (NCN_STRICT_PATH_TARGET_TOOLS): ... this.  It reflects the replacement
> of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name of the
> program.
> (ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function.
>
> ChangeLog
> 2017-06-23  Yvan Roux  
>
> * configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of
> NCN_STRICT_CHECK_TARGET_TOOLS.
> * configure: Regenerate.


Re: [PATCH][AArch64][GCC 6] PR target/79041: Correct -mpc-relative-literal-loads logic in aarch64_classify_symbol

2017-07-03 Thread Yvan Roux
On 27 June 2017 at 13:14, Yvan Roux  wrote:
> Hi Wilco
>
> On 27 June 2017 at 12:53, Wilco Dijkstra  wrote:
>> Hi Yvan,
>>
>>> Here is the backport of Wilco's patch (r237607) along with Kyrill's
>>> one (r244643, which removed the remaining occurences of
>>> aarch64_nopcrelative_literal_loads).  To fix the issue the original
>>> patch has to be modified, to keep aarch64_pcrelative_literal_loads
>>> test for large models in aarch64_classify_symbol.
>>
>> The patch looks good to me, however I can't approve it.
>
> ok thanks for the review.
>
>>> On trunk and gcc-7-branch the :lo12: relocations are not generated
>>> because of Wilco's fix for pr78733 (r243456 and 243486), but my
>>> understanding is that the bug is still present since compiling
>>> gcc.target/aarch64/pr78733.c with -mcmodel=large brings back the
>>> :lo12: relocations (I'll submit a patch to add the test back if my
>>> understanding is correct).
>>
>> You're right, eventhough -mpc-relative-literal-loads doesn't make much sense
>> in the large memory model, it seems best to keep the option orthogonal to
>> enable the workaround. I've prepared a patch to fix this on trunk/GCC7.
>> It also adds a test which we should add to your changes to GCC6 too.
>
> ok, I think it is what kugan's proposed earlier today in:
>
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01967.html
>
> I agree that -mpc-relative-literal-loads and large memory model
> doesn't make much sense, now it is what is used in kernel build
> system, but if you handle that in a bigger fix already, that's awesome
> :)

ping?
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01708.html

> Thanks
> Yvan
>
>> Wilco


Re: [Libgomp, Fortran] Fix canadian cross build

2018-02-01 Thread Yvan Roux
On 5 October 2017 at 13:38, Petr Ovtchenkov  wrote:
> On Thu, 5 Oct 2017 12:56:36 +0200
> Yvan Roux  wrote:
>
>> On 5 September 2017 at 12:04, Jakub Jelinek  wrote:
>> > On Tue, Sep 05, 2017 at 10:58:22AM +0200, Yvan Roux wrote:
>> >> ping
>> >>
>> >> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01784.html
>> >
>> > This really needs to be reviewed by a build machinery maintainer.
>>
>> Thanks for the CC list update Jakub, ping again then...
>>
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01784.html

ping


> BTW,
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71212
> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01332.html
> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01334.html
>
>>
>> >> >>>>> config/ChangeLog
>> >> >>>>> 2017-06-23  Yvan Roux  
>> >> >>>>>
>> >> >>>>> * acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ...
>> >> >>>>> (NCN_STRICT_PATH_TARGET_TOOLS): ... this.  It reflects the 
>> >> >>>>> replacement
>> >> >>>>> of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name 
>> >> >>>>> of the
>> >> >>>>> program.
>> >> >>>>> (ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function.
>> >> >>>>>
>> >> >>>>> ChangeLog
>> >> >>>>> 2017-06-23  Yvan Roux  
>> >> >>>>>
>> >> >>>>> * configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of
>> >> >>>>> NCN_STRICT_CHECK_TARGET_TOOLS.
>> >> >>>>> * configure: Regenerate.
>> >
>> > Jakub
>
> --
>
>- ptr


[linaro/gcc-4_9-branch] Merge from gcc-4_8-branch and backports

2014-08-14 Thread Yvan Roux
Hi all

we have merged the gcc-4_8-branch into linaro/gcc-4_8-branch up to
revision 213802 as r213944.  We have also backported this set of revisions:

r204251 as r213841  PR sanitizer/58543
r206529 as r213842  PR target/59744
r206530 as r213842  PR target/59744 / fix changelog typo

This will be part of our 2014.08 4.8 release.

Thanks,
Yvan


[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and backports

2014-08-14 Thread Yvan Roux
Hi all

we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to
revision 213803 as r213943.  We have also backported this set of revisions:

r211140 as r213455  [AArch64] Drop ISB after FPCR write.
r211270 as r213790  [AArch64] Remove from arm_neon.h functions not
in the spec
r211271 as r213791  Fix check for __FAST_MATH in arm_neon.h
r211273 as r213792  [Patch,testsuite] Fix bind_pic_locally
r211275 as r213792  [Patch,testsuite] Fix tests that fail due to
symbol visibility when -fPIC
r211503 as r213793  [AArch64] fix and enable non-const shuffle for
bigendian using TBL instruction
r211779 as r213793  [AArch64][committed] Delete unused variable i
r212023 as r213794  [AArch64] Fix constraint vec_unpack_trunk
r212024 as r213795  [AArch32] Cortex-A5 rtx costs table
r212142 as r213796  [AArch32] Handle clz, rbit types in arm
pipeline descriptions
r212225 as r213797  [AArch64] Fix argument types for some
high_lane* intrinsics implemented in assembly
r212296 as r213798  [AArch64] Handle fcvta[su] and frint in RTX
cost function
r212358 as r213799  [AArch64] Relocate saved_varargs_size.
r212512 as r213799  [AArch64] Restructure callee save slot allocation logic.
r212752 as r213799  Unify slots x29/x30
r212753 as r213799  [AArch64] Add frame_size and hard_fp_offset to
machine.frame
r212912 as r213799  [AArch64] GNU-Stylize some un-formatted code.
r212913 as r213799  [AArch64] Consistent parameter types in
prologue/epilogue generation.
r212943 as r213799  [AArch64] Remove useless local variable.
r212945 as r213799  [AArch64] Remove useless parameter base_rtx.
r212946 as r213799  [AArch64] Use register offset in
cfun->machine->frame.reg_offset
r212947 as r213799  [AArch64] Remove useless variable 'increment'
r212949 as r213799  [AArch64] Hoist calculation of register rtx.
r212950 as r213799  [AArch64] Refactor code out into
aarch64_next_callee_save
r212951 as r213799  [AArch64] Use helper functions to handle multiple modes.
r212952 as r213799  [AArch64] Unify vector and core register
save/restore code.
r212954 as r213799  [AArch64] Split save restore path.
r212955 as r213799  [AArch64] Simplify prologue expand using new
helper functions.
r212956 as r213799  [AArch64] Simplify epilogue expansion using
new helper functions.
r212957 as r213799  [AArch64] Prologue and epilogue test cases.
r212958 as r213799  [AArch64] Optimize epilogue in the presence of
an outgoing args area.
r212959 as r213799  [AArch64] Extend frame state to track WB candidates.
r212976 as r213799  [AArch64] Infrastructure for optional use of writeback
r212996 as r213799  [AArch64] Optimize prologue when there is no
frame pointer.
r212997 as r213799  [AArch64] Optimize epilogue when there is no frame
r212999 as 213800   [PATCH, ARM] Fix PR61948 (ICE with DImode
shift by 1 bit)
r213000 as r213801  PR 61713: ICE when expanding single-threaded
version of atomic_test_and_set
r213376 as r213817  [AArch64][1/2] Remove UNSPEC_CLS and use clrsb
RTL code in its' place
r213555 as r213817  [AArch64][2/2] Add rtx cost function handling
of clz, clrsb, rbit

This will be part of our 2014.08 4.9 release.

Thanks,
Yvan


[PATCH, ARM] PR62248 - Configure error with --with-fpu=fp-armv8

2014-08-27 Thread Yvan Roux
Hi,

as reported in PR62248 there is a typo in gcc/config.gcc where
--with-fpu doesn't match -mfpu option for fp-armv8 value (fp-arm-v8 in
config.gcc). Here is the patch to fix it.

Thanks,
Yvan

2014-08-27  Yvan Roux  

* config.gcc:  Fix fp-armv8 option for arm*-*-* targets.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6862c127..3f68e3e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3528,7 +3528,7 @@ case "${target}" in
| vfp | vfp3 | vfpv3 \
| vfpv3-fp16 | vfpv3-d16 | vfpv3-d16-fp16 | vfpv3xd \
| vfpv3xd-fp16 | neon | neon-fp16 | vfpv4 | vfpv4-d16 \
-   | fpv4-sp-d16 | neon-vfpv4 | fp-arm-v8 | neon-fp-armv8 \
+   | fpv4-sp-d16 | neon-vfpv4 | fp-armv8 | neon-fp-armv8 \
 | crypto-neon-fp-armv8)
# OK
;;


Re: [PATCH, ARM] PR62248 - Configure error with --with-fpu=fp-armv8

2014-08-27 Thread Yvan Roux
On 27 August 2014 11:24, Richard Earnshaw  wrote:
> On 27/08/14 09:04, Yvan Roux wrote:
>> Hi,
>>
>> as reported in PR62248 there is a typo in gcc/config.gcc where
>> --with-fpu doesn't match -mfpu option for fp-armv8 value (fp-arm-v8 in
>> config.gcc). Here is the patch to fix it.
>>
>> Thanks,
>> Yvan
>>
>> 2014-08-27  Yvan Roux  
>>
>> * config.gcc:  Fix fp-armv8 option for arm*-*-* targets.
>>
>>
>> pr62248.diff
>>
>>
>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>> index 6862c127..3f68e3e 100644
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -3528,7 +3528,7 @@ case "${target}" in
>>   | vfp | vfp3 | vfpv3 \
>>   | vfpv3-fp16 | vfpv3-d16 | vfpv3-d16-fp16 | vfpv3xd \
>>   | vfpv3xd-fp16 | neon | neon-fp16 | vfpv4 | vfpv4-d16 \
>> - | fpv4-sp-d16 | neon-vfpv4 | fp-arm-v8 | neon-fp-armv8 \
>> + | fpv4-sp-d16 | neon-vfpv4 | fp-armv8 | neon-fp-armv8 \
>>  | crypto-neon-fp-armv8)
>>   # OK
>>   ;;
>>
>
> Ok; but better still would be to change this to use the official list in
> arm-fpus.def (like we do for CPU names).

Yes indeed, I'll do it that way.


Re: [PATCH, ARM] PR62248 - Configure error with --with-fpu=fp-armv8

2014-08-27 Thread Yvan Roux
Here is the patch that uses the arm-fpus.def list.

Thanks
Yvan

2014-08-27  Yvan Roux  

* config.gcc (arm*-*-*): Check --with-fpu against arm-fpus.def.

On 27 August 2014 12:35, Yvan Roux  wrote:
> On 27 August 2014 11:24, Richard Earnshaw  wrote:
>> On 27/08/14 09:04, Yvan Roux wrote:
>>> Hi,
>>>
>>> as reported in PR62248 there is a typo in gcc/config.gcc where
>>> --with-fpu doesn't match -mfpu option for fp-armv8 value (fp-arm-v8 in
>>> config.gcc). Here is the patch to fix it.
>>>
>>> Thanks,
>>> Yvan
>>>
>>> 2014-08-27  Yvan Roux  
>>>
>>> * config.gcc:  Fix fp-armv8 option for arm*-*-* targets.
>>>
>>>
>>> pr62248.diff
>>>
>>>
>>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>>> index 6862c127..3f68e3e 100644
>>> --- a/gcc/config.gcc
>>> +++ b/gcc/config.gcc
>>> @@ -3528,7 +3528,7 @@ case "${target}" in
>>>   | vfp | vfp3 | vfpv3 \
>>>   | vfpv3-fp16 | vfpv3-d16 | vfpv3-d16-fp16 | vfpv3xd \
>>>   | vfpv3xd-fp16 | neon | neon-fp16 | vfpv4 | vfpv4-d16 \
>>> - | fpv4-sp-d16 | neon-vfpv4 | fp-arm-v8 | neon-fp-armv8 \
>>> + | fpv4-sp-d16 | neon-vfpv4 | fp-armv8 | neon-fp-armv8 \
>>>  | crypto-neon-fp-armv8)
>>>   # OK
>>>   ;;
>>>
>>
>> Ok; but better still would be to change this to use the official list in
>> arm-fpus.def (like we do for CPU names).
>
> Yes indeed, I'll do it that way.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6862c127..7434a08 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3523,20 +3523,17 @@ case "${target}" in
;;
esac
 
-   case "$with_fpu" in
-   "" \
-   | vfp | vfp3 | vfpv3 \
-   | vfpv3-fp16 | vfpv3-d16 | vfpv3-d16-fp16 | vfpv3xd \
-   | vfpv3xd-fp16 | neon | neon-fp16 | vfpv4 | vfpv4-d16 \
-   | fpv4-sp-d16 | neon-vfpv4 | fp-arm-v8 | neon-fp-armv8 \
-| crypto-neon-fp-armv8)
-   # OK
-   ;;
-   *)
-   echo "Unknown fpu used in --with-fpu=$with_fpu" 2>&1
-   exit 1
-   ;;
-   esac
+   # see if it matches any of the entries in arm-fpus.def
+   if [ x"$with_fpu" = x ] \
+   || grep "^ARM_FPU(\"$with_fpu\"," \
+   ${srcdir}/config/arm/arm-fpus.def \
+   > /dev/null; then
+ # OK
+ true
+   else
+ echo "Unknown fpu used in --with-fpu=$with_fpu" 1>&2
+ exit 1
+   fi
 
case "$with_abi" in
"" \


Re: [PATCH, ARM] PR62248 - Configure error with --with-fpu=fp-armv8

2014-08-27 Thread Yvan Roux
with the PR in the ChangeLog:

2014-08-27  Yvan Roux  

PR other/62248
* config.gcc (arm*-*-*): Check --with-fpu against arm-fpus.def.


On 27 August 2014 13:10, Yvan Roux  wrote:
> Here is the patch that uses the arm-fpus.def list.
>
> Thanks
> Yvan
>
> 2014-08-27  Yvan Roux  
>
> * config.gcc (arm*-*-*): Check --with-fpu against arm-fpus.def.
>
> On 27 August 2014 12:35, Yvan Roux  wrote:
>> On 27 August 2014 11:24, Richard Earnshaw  wrote:
>>> On 27/08/14 09:04, Yvan Roux wrote:
>>>> Hi,
>>>>
>>>> as reported in PR62248 there is a typo in gcc/config.gcc where
>>>> --with-fpu doesn't match -mfpu option for fp-armv8 value (fp-arm-v8 in
>>>> config.gcc). Here is the patch to fix it.
>>>>
>>>> Thanks,
>>>> Yvan
>>>>
>>>> 2014-08-27  Yvan Roux  
>>>>
>>>> * config.gcc:  Fix fp-armv8 option for arm*-*-* targets.
>>>>
>>>>
>>>> pr62248.diff
>>>>
>>>>
>>>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>>>> index 6862c127..3f68e3e 100644
>>>> --- a/gcc/config.gcc
>>>> +++ b/gcc/config.gcc
>>>> @@ -3528,7 +3528,7 @@ case "${target}" in
>>>>   | vfp | vfp3 | vfpv3 \
>>>>   | vfpv3-fp16 | vfpv3-d16 | vfpv3-d16-fp16 | vfpv3xd \
>>>>   | vfpv3xd-fp16 | neon | neon-fp16 | vfpv4 | vfpv4-d16 \
>>>> - | fpv4-sp-d16 | neon-vfpv4 | fp-arm-v8 | neon-fp-armv8 \
>>>> + | fpv4-sp-d16 | neon-vfpv4 | fp-armv8 | neon-fp-armv8 \
>>>>  | crypto-neon-fp-armv8)
>>>>   # OK
>>>>   ;;
>>>>
>>>
>>> Ok; but better still would be to change this to use the official list in
>>> arm-fpus.def (like we do for CPU names).
>>
>> Yes indeed, I'll do it that way.


Re: [PATCH, ARM] PR62248 - Configure error with --with-fpu=fp-armv8

2014-08-27 Thread Yvan Roux
Committed on trunk at r214573, and I'll backport it on 4.9 branch.

On 27 August 2014 14:26, Richard Earnshaw  wrote:
> On 27/08/14 12:35, Yvan Roux wrote:
>> with the PR in the ChangeLog:
>>
>> 2014-08-27  Yvan Roux  
>>
>> PR other/62248
>> * config.gcc (arm*-*-*): Check --with-fpu against arm-fpus.def.
>>
>>
>
> OK, thanks.
>
> R.
>
>> On 27 August 2014 13:10, Yvan Roux  wrote:
>>> Here is the patch that uses the arm-fpus.def list.
>>>
>>> Thanks
>>> Yvan
>>>
>>> 2014-08-27  Yvan Roux  
>>>
>>> * config.gcc (arm*-*-*): Check --with-fpu against arm-fpus.def.
>>>
>>> On 27 August 2014 12:35, Yvan Roux  wrote:
>>>> On 27 August 2014 11:24, Richard Earnshaw  wrote:
>>>>> On 27/08/14 09:04, Yvan Roux wrote:
>>>>>> Hi,
>>>>>>
>>>>>> as reported in PR62248 there is a typo in gcc/config.gcc where
>>>>>> --with-fpu doesn't match -mfpu option for fp-armv8 value (fp-arm-v8 in
>>>>>> config.gcc). Here is the patch to fix it.
>>>>>>
>>>>>> Thanks,
>>>>>> Yvan
>>>>>>
>>>>>> 2014-08-27  Yvan Roux  
>>>>>>
>>>>>> * config.gcc:  Fix fp-armv8 option for arm*-*-* targets.
>>>>>>
>>>>>>
>>>>>> pr62248.diff
>>>>>>
>>>>>>
>>>>>> diff --git a/gcc/config.gcc b/gcc/config.gcc
>>>>>> index 6862c127..3f68e3e 100644
>>>>>> --- a/gcc/config.gcc
>>>>>> +++ b/gcc/config.gcc
>>>>>> @@ -3528,7 +3528,7 @@ case "${target}" in
>>>>>>   | vfp | vfp3 | vfpv3 \
>>>>>>   | vfpv3-fp16 | vfpv3-d16 | vfpv3-d16-fp16 | vfpv3xd \
>>>>>>   | vfpv3xd-fp16 | neon | neon-fp16 | vfpv4 | vfpv4-d16 \
>>>>>> - | fpv4-sp-d16 | neon-vfpv4 | fp-arm-v8 | neon-fp-armv8 \
>>>>>> + | fpv4-sp-d16 | neon-vfpv4 | fp-armv8 | neon-fp-armv8 \
>>>>>>  | crypto-neon-fp-armv8)
>>>>>>   # OK
>>>>>>   ;;
>>>>>>
>>>>>
>>>>> Ok; but better still would be to change this to use the official list in
>>>>> arm-fpus.def (like we do for CPU names).
>>>>
>>>> Yes indeed, I'll do it that way.
>>
>
>


Re: [PATCH, ARM] PR62248 - Configure error with --with-fpu=fp-armv8

2014-08-27 Thread Yvan Roux
"true" seems to be used that way for aarch64*-*-* and arm*-*-* is it
preferable to change it to ";" for all occurrences  ?

Thanks,
Yvan

On 27 August 2014 18:51, Bernhard Reutner-Fischer  wrote:
> On 27 August 2014 16:22:28 CEST, Yvan Roux  wrote:
>>Committed on trunk at r214573, and I'll backport it on 4.9 branch.
>
> s/true/:/
>
> ?
> Thanks,
>
>


Re: [PATCH, ARM] PR62248 - Configure error with --with-fpu=fp-armv8

2014-08-27 Thread Yvan Roux
On 27 August 2014 19:07, Yvan Roux  wrote:
> "true" seems to be used that way for aarch64*-*-* and arm*-*-* is it
> preferable to change it to ";" for all occurrences  ?

sorry for the typo, I meant colon and not semicolon.

> Thanks,
> Yvan
>
> On 27 August 2014 18:51, Bernhard Reutner-Fischer  
> wrote:
>> On 27 August 2014 16:22:28 CEST, Yvan Roux  wrote:
>>>Committed on trunk at r214573, and I'll backport it on 4.9 branch.
>>
>> s/true/:/
>>
>> ?
>> Thanks,
>>
>>


Re: patch to fix PR58784 (ARM LRA crash)

2013-11-07 Thread Yvan Roux
Hi,

I've tested LRA on ARM in several configuration and it occurs that
only one regression is observed in Fortran for targets A15 and A9 in
hard and soft fp amd Thumb mode enabled by default (ARMv5 is clean).
The failing test case is gfortran.dg/realloc_on_assign_11.f90 in -O3
and the error seems to be of the same nature of
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58784, in the sens that
LRA ICEs in check_rtl on an insn which doesn't satisfy its
constraints.  Here is the insn :

(insn 633 543 656 24 (parallel [
(set (mem/c:SI (plus:SI (reg/f:SI 907)
(const_int 8 [0x8])) [4 A.54+8 S4 A64])
(smax:SI (reg:SI 562 [ D.5235 ])
(reg:SI 621 [ D.5235 ])))
(clobber (reg:CC 100 cc))
]).../gfortran.dg/realloc_on_assign_11.f90:29 121 {*store_minmaxsi}
 (expr_list:REG_DEAD (reg:SI 621 [ D.5235 ])
(expr_list:REG_UNUSED (reg:CC 100 cc)
(nil

Thanks,
Yvan


On 31 October 2013 17:31, Yvan Roux  wrote:
> (this time in plain text !)
>
>> Does this mean we can now turn on LRA for ARM state by default and
>> start looking at performance issues if any ?
>
> With the other patch for 58785 and a small one I've to post, we are even
> about to turn LRA on by default completely, as the compiler now bootstrap
> also in thumb and first testsuite are clean :-)
>
> I juste want ton pass a full validation before.
>
> Yvan
>
>
>>> Ramana
>>>
>>>
>>>
>>>
>>>
>>> > 2013-10-30  Vladimir Makarov  
>>> >
>>> > PR target/58784
>>> > * lra.c (check_rtl): Remove address check before LRA work.
>>> >
>>> > 2013-10-30  Vladimir Makarov  
>>> >
>>> > PR target/58784
>>> > * gcc.target/arm/pr58784.c: New.
>>> >


[PATCH, ARM, LRA] Fixed bootstrap failure in Thumb mode

2013-11-07 Thread Yvan Roux
Hi,

this patch fixed an LRA cycling due to secondary reload (Thumb mode).
Notice that this patch is a prerequisite to turn on LRA by default on
ARM.  Bootstrapped on a9 and a15 without any regression in the
testsuite as LRA is off by default and with the regression reported in
the thread bellow when LRA is on.

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html

Thanks,
Yvan

2013-11-07  Yvan Roux  

* config/arm/arm.h (THUMB_SECONDARY_INPUT_RELOAD_CLASS): Return NO_REGS
for LRA.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 1781b75..d054906 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1266,11 +1266,12 @@ enum reg_class
 
 /* Must leave BASE_REGS reloads alone */
 #define THUMB_SECONDARY_INPUT_RELOAD_CLASS(CLASS, MODE, X)		\
-  ((CLASS) != LO_REGS && (CLASS) != BASE_REGS\
-   ? ((true_regnum (X) == -1 ? LO_REGS	\
-   : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS	\
-   : NO_REGS)) 			\
-   : NO_REGS)
+  (lra_in_progress ? NO_REGS		\
+   : ((CLASS) != LO_REGS && (CLASS) != BASE_REGS			\
+  ? ((true_regnum (X) == -1 ? LO_REGS\
+ : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS	\
+ : NO_REGS)) 			\
+  : NO_REGS))
 
 #define THUMB_SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)		\
   ((CLASS) != LO_REGS && (CLASS) != BASE_REGS\


Re: [patch][RFC] make lra.c:check_rtl set maybe_hot_insn_p

2013-11-15 Thread Yvan Roux
Hi,

I'm agree.  I looked at the ARM backend and it occurs that the usage
of optimize_insn_for_size_p() was added to only use store_minmax in
cold path because of some performance issue.  But in any case its
usage doesn't shrink the number of instruction, if we are in ARM mode
3 are needed : 1 comparison, 1 conditional move and 1 store in O1, O2
and O3 and 1 comparison and 2 conditional sotres in Os :

cmp ip, r0
movlt   r0, ip
strr0, [lr, r3]

or

cmp   r5, r4
strle   r5, [lr, r3]
strgt   r4, [lr, r3]

and in Thumb mode 4 are needed in both cases because of the it
insertion.  Thus I think that this insn can be removed. Is it ok for
you ?

Thanks
Yvan


On 10 November 2013 14:17, Richard Biener  wrote:
> Steven Bosscher  wrote:
>>Hello,
>>
>>This patch is necessary to make ARM pass the test suite with LRA
>>enabled. The symptom is recog failing to recognize a store_minmaxsi
>>insn, see:
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html
>>
>>But I am not sure if that's also the root cause of the problem, or
>>whether the ARM back end should not let recognition of insn patterns
>>be dependent on the state of the profile flags.
>>
>>The pattern for store_minmaxsi (in arm.md) is:
>>
>>(define_insn "*store_minmaxsi"
>>  [(set (match_operand:SI 0 "memory_operand" "=m")
>>(match_operator:SI 3 "minmax_operator"
>> [(match_operand:SI 1 "s_register_operand" "r")
>>  (match_operand:SI 2 "s_register_operand" "r")]))
>>   (clobber (reg:CC CC_REGNUM))]
>>  "TARGET_32BIT && optimize_insn_for_size_p()"
>>  "*
>>  operands[3] = gen_rtx_fmt_ee (minmax_code (operands[3]), SImode,
>>operands[1], operands[2]);
>>  output_asm_insn (\"cmp\\t%1, %2\", operands);
>>  if (TARGET_THUMB2)
>>output_asm_insn (\"ite\t%d3\", operands);
>>  output_asm_insn (\"str%d3\\t%1, %0\", operands);
>>  output_asm_insn (\"str%D3\\t%2, %0\", operands);
>>  return \"\";
>>  "
>>  [(set_attr "conds" "clob")
>>   (set (attr "length")
>>(if_then_else (eq_attr "is_thumb" "yes")
>>  (const_int 14)
>>  (const_int 12)))
>>   (set_attr "type" "store1")]
>>)
>>
>>
>>Note the insn condition uses optimize_insn_for_size_p(). This means
>>the pattern can be valid or invalid dependent on the context where the
>>insn appears: in hot or cold code. IMHO this behavior should not be
>>allowed. The back end cannot expect the middle end to know at all
>>times the context that the insn appears in, and more importantly
>>whether a pattern is valid or not is independent of where the insn
>>appears: That is a *cost* issue!
>>
>>It seems to me that the ARM back end should be fixed here, not LRA's
>>check_rtl.
>>
>>Comments&thoughts?
>
> The intent is to allow this also to control combine and split, but certainly 
> making insns invalid after the fact is bad.  think of sinking a previously 
> hot insn into a cold path...
>
> Honza, how was this supposed to work?
>
> Richard.
>
>>Ciao!
>>Steven
>
>


Re: [patch][RFC] make lra.c:check_rtl set maybe_hot_insn_p

2013-11-15 Thread Yvan Roux
> Sometimes 4 will be needed, since both original register values may
> remain live.

Indeed.

> However, I'm inclined to agree that while it should be possible to
> decide at the *function* level whether or not an insn is valid, doing so
> at the block level is probably unsafe.

Ok, so the attached patch should fix the issue, its validation is ongoing.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index e8d5464..da387fb 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3642,7 +3642,7 @@
 [(match_operand:SI 1 "s_register_operand" "r")
  (match_operand:SI 2 "s_register_operand" "r")]))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT && optimize_insn_for_size_p()"
+  "TARGET_32BIT && optimize_function_for_size_p (cfun)"
   "*
   operands[3] = gen_rtx_fmt_ee (minmax_code (operands[3]), SImode,
operands[1], operands[2]);


Re: [patch][RFC] make lra.c:check_rtl set maybe_hot_insn_p

2013-11-18 Thread Yvan Roux
So, the validation is ok with this patch, I'm just not able to say if
the original performance issue is still fixed with it.  Could you
check it Kyrylo ?

Yvan

2013-11-17  Yvan Roux  

* config/arm/arm.md (store_minmaxsi): Use only when
optimize_function_for_size_p.


On 15 November 2013 15:59, Yvan Roux  wrote:
>> Sometimes 4 will be needed, since both original register values may
>> remain live.
>
> Indeed.
>
>> However, I'm inclined to agree that while it should be possible to
>> decide at the *function* level whether or not an insn is valid, doing so
>> at the block level is probably unsafe.
>
> Ok, so the attached patch should fix the issue, its validation is ongoing.


Re: [PATCH, ARM, LRA] Fixed bootstrap failure in Thumb mode

2013-11-18 Thread Yvan Roux
Ping.

On 7 November 2013 15:56, Yvan Roux  wrote:
> Hi,
>
> this patch fixed an LRA cycling due to secondary reload (Thumb mode).
> Notice that this patch is a prerequisite to turn on LRA by default on
> ARM.  Bootstrapped on a9 and a15 without any regression in the
> testsuite as LRA is off by default and with the regression reported in
> the thread bellow when LRA is on.
>
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00725.html
>
> Thanks,
> Yvan
>
> 2013-11-07  Yvan Roux  
>
> * config/arm/arm.h (THUMB_SECONDARY_INPUT_RELOAD_CLASS): Return 
> NO_REGS
> for LRA.


Re: [patch][RFC] make lra.c:check_rtl set maybe_hot_insn_p

2013-11-18 Thread Yvan Roux
> Hi Yvan,
> I'll run the benchmark today to confirm the performance, but from compiling
> some code sequences that exhibited the bad behaviour in the past, I see that
> this patch still fixes the issues. store_minmaxsi is not generated when
> optimising for speed.

Ok Cool, Thanks Kyrill

Cheers,
Yvan


Re: [patch][RFC] make lra.c:check_rtl set maybe_hot_insn_p

2013-11-19 Thread Yvan Roux
> yep, all good performance-wise :)

Great, Thanks Kyrill.

Ok for trunk ?

Yvan


Re: RFA: patch to fix PR58785 (an ARM LRA crash)

2013-11-20 Thread Yvan Roux
Hi,

as Richard said, only a subset of rclass is allowed to be returned by
preferred_reload_class.  I've tested the attached patched in Thumb
mode, on ARMv5, A9 and A9hf and on cross A15 without regression.

Yvan

2013-11-20  Yvan Roux  

PR target/58785
* config/arm/arm.c (arm_preferred_reload_class): Only return LO_REGS
when rclass is GENERAL_REGS.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5c53440..63f10bd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6882,10 +6882,7 @@ arm_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, 
reg_class_t rclass)
 return rclass;
   else
 {
-  if (rclass == GENERAL_REGS
- || rclass == HI_REGS
- || rclass == NO_REGS
- || rclass == STACK_REG)
+  if (rclass == GENERAL_REGS)
return LO_REGS;
   else
return rclass;


[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and backports

2014-10-17 Thread Yvan Roux
Hi all

we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to
revision 216130 as r216256.  We have also backported this set of revisions:

r209643 as 215975 : [AArch64] Define TARGET_FLAGS_REGNUM
r211881 as 215975 : PR target/61565
r213035 as 215846 : [AArch64] libitm: Improve _ITM_beginTransaction
r213090 as 215847 : [AArch64] Fix *extr_insv_lower_reg pattern
r214824 as 215977 : [AArch64] Use CC_Z and CC_NZ with csinc and
similar instructions
r214825 as 216007 : [AArch32][1/2] Implement lceil, lfloor, lround
optabs with new ARMv8-A instructions
r214826 as 216007 : [AArch32][2/2] Vectorise lroundf, lfloorf, lceilf
using the new ARMv8-A vcvt* instructions
r214886 as 215944 : [AArch64] Improve epilogue unwind info rth
r214940 as 215853 : [AArch64] Add a mode to operand 1 of sibcall_value_insn
r214943 as 215946 : [AArch64] Add a builtin for rbit(q?)_p8; add
intrinsics and tests.
r214944 as 215948 : [AArch32/AArch64] Schedule alu_ext for Cortex-A53
r214945 as 215949 : [AArch64] Remove varargs from aarch64_simd_expand_args
r214947 as 215854 : [AArch64] Tidy: remove unused qualifier_const_pointer
r214959 as 215857 : [AArch32/AArch64] Add scheduling info for ARMv8-A
FPU new instructions in Cortex-A53
r215050 as 215858 : [AArch32[1/7] Convert FP mnemonics to UAL | mov patterns.
r215051 as 215858 : [AArch32][2/7] Convert FP mnemonics to UAL |
add/sub/div/abs patterns
r215052 as 215858 : [AARch32][3/7] Convert FP mnemonics to UAL |
mul+add patterns
r215053 as 215858 : [AArch32][4/7] Convert FP mnemonics to UAL | vcvt patterns
r215054 as 215858 : [AArch32][5/7] Convert FP mnemonics to UAL | sqrt
and FP compare patterns
r215055 as 215858 : [AArch32][6/7] Convert FP mnemonics to UAL |
movcc_vfp (fmstat)
r215056 as 215858 : [AArch32][7/7] Convert FP mnemonics to UAL |
f{ld,st}m -> v{ld,st}m
r215067 as 215923 : [AArch32] Enable auto-vectorization for copysignf
r215085 as 216007 : [AArch32][tests] Make input and output arrays
128-bit aligned in vectorisation tests
r215086 as 215928 : [AARch64] Add crtfastmath for AArch64
r215101 as 215929 : PR target/56846 libstdc++
r215136 as 215932 : PR target/63209
r215205 as 215935 : [Ree] Ensure inserted copy don't change the number
of hard registers
r215260 as 215937 : [AArch64] Fix force_simd macro in vdup_lane_2
r215321 as 215938 : Disallow -mfpu=neon for unsuitable architectures
r215346 as 215940 : movmisalign_neon_load
r215385 as 215941 : [AArch64] Add constraint letter for
stack_protect_test pattern
r215471 as 216004 : [AArch64] Auto-generate the "BUILTIN_" macros

This will be part of our 2014.10 4.9 release.

Thanks,
Yvan


[linaro/gcc-4_9-branch] Merge from gcc-4_9-branch and backports

2014-06-12 Thread Yvan Roux
Hi all,

we have merged the gcc-4_9-branch into linaro/gcc-4_9-branch up to
revision 211054 as r211495.  We have also backported this set of revisions:

r209419 as r211497 : PR rtl-optimization/60663
r209457 as r211496 : TRY_EMPTY_VM_SPACE Change aarch64 ilp32
r209559 as r211498 : [AArch64] vrnd<*>_f64 patch
r209561 as r211505 : Suppress Redundant Flag Setting for Cortex-A15.
r209613 as r211506 : AArch32 Support ORN for DIMode
r209614 as r211507 : Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
r209615 as r211508 : [ARM] Allow any register for DImode values in Thumb2
r209617 as r211509 : [AArch64] Fix possible wrong code generation when
comparing DImode values.
r209618 as r211511 : [AArch64] Add a space to memory asm code between
base register and offset.
r209627 as r211512 : [AArch64] Fix indentation.
r209636 as r211512 : [AArch64] Fix aarch64_initial_elimination_offset
calculation.
r209640 as r211514 : [AArch64] vqneg and vqabs intrinsics implementation.
r209641 as r211515 : [AArch64] Vreinterpret re-implemention.
r209642 as r211515 : [AArch64] 64-bit float vreinterpret implemention
r209643 as r211516 : [AArch64] Define TARGET_FLAGS_REGNUM
r209645 as r211517 : [AArch64] Fix TLS for ILP32.
r209649 as r211518 : Merge longlong.h from glibc tree.
r209659 as r211519 : AArch64 add, sub, mul in TImode
r209701 as r211520 : [ARM] Handle FMA code in rtx costs.
r209702 as r211520 : [ARM] Cortex-A8 rtx cost table
r209703 as r211520 : [ARM][1/3] Add rev field to rtx cost tables
r209704 as r211520 : [AArch64][2/3] Recognise rev16 operations on
SImode and DImode data
r209705 as r211520 : [ARM][3/3] Recognise bitwise operations leading
to SImode rev16
r209706 as r211521 : [AArch64] Add handling of bswap operations in rtx costs
r209710 as r211523 : [ARM] Initialize new tune_params values
r209711 as r211524 : [AArch64] Fully support rotate on logical operations.
r209712 as r211530 : [AARCH64] Use standard patterns for stack protection.
r209713 as r211560 : [AArch64] VDUP Testcases
r209736 as r211573 : [AArch64] Vectorise bswap[16,32,64]
r209742 as r211574 : [AArch64] Reverse TBL indices for big-endian.
r209747 as r211575 : Fix warning in libgfortran configure script
r209749 as r211574 : [AArch64] Enable TBL for big-endian.
r209806 as r211576 : [ARM] Initialise T16-related fields in Cortex-A8
tuning struct.
r209808 as r211577 : [ARM] Enable tail call optimization for long call
r209878 as r211578 : [AArch64] Relax modes_tieable_p and
cannot_change_mode_class
r209880 as r211579 : [AArch64] Improve vst4_lane intrinsics
r209893 as r211580 : Add execution + assembler tests of the AArch64
ZIP Intrinsics.
r209897 as r211581 : Remove PUSH_ARGS_REVERSED from the RTL expander.
r209906 as r211582 : [AArch64/ARM 2/3] Rewrite AArch64 ZIP Intrinsics
using __builtin_shuffle
r209908 as r211582 : Add execution tests of ARM ZIP Intrinsics.
r210615 as r211583 : libitm: Enable aarch64
r211211 as r211584 : [AARCH64]Support full addressing modes for
ldr/str in vectorization scenarios

This will be part of our 2014.06 release.

Thanks,
Yvan


[linaro/gcc-4_9-branch] AArch64 costs model backports

2014-06-20 Thread Yvan Roux
Hi all,

we have backported a set of AArch64 costs model related revisions in
the linaro/gcc-4_9-branch at r211843.  The backported revisions are:

210493 : [AArch64 costs 1/18] Refactor aarch64_address_costs.
210494 : [AArch64 costs 2/18] Add cost tables for Cortex-A57
210495 : [AArch64 costs 3/18] Wrap aarch64_rtx_costs to dump verbose output
210496 : [AArch64 costs 4/18] Better estimate cost of building a constant
210497 : [AArch64 costs 5/18] Factor out common MULT cases
210498 : [AArch64 costs 6/18] Set default costs and handle vector modes.
210499 : [AArch64 costs 7/18] Improve SET cost.
210500 : [AArch64 costs 8/18] Cost memory accesses using address costs
210501 : [AArch64 costs 9/18] Better cost logical operations
210502 : [AArch64 costs 10/18] Improve costs for sign/zero extend operations
210503 : [AArch64 costs 11/18] Improve costs for rotate and shift operations.
210504 : [AArch64 costs 12/18] Improve costs for sign/zero extracts
210505 : [AArch64 costs 13/18] Improve costs for div/mod
210506 : [AArch64 costs 14/18] Cost comparisons, flag setting
operators and IF_THEN_ELSE
210507 : [AArch64 costs 15/18] Cost more Floating point RTX.
210508 : [AArch64 costs 16/18] Cost TRUNCATE
210509 : [AArch64 costs 17/18] Cost for SYMBOL_REF, HIGH and LO_SUM
210510 : [AArch64 costs 18/18] Dump a message if we are unable to cost an insn.
210512 : [AArch64 costs] Fixup to costing of FNMUL
211205 : aarch64_if_then_else_costs refactor
211206 : aarch64_if_then_else_costs

Thanks,
Yvan


[linaro/gcc-4_8-branch] Merge from gcc-4_8-branch

2014-06-20 Thread Yvan Roux
Hi,

we have merged the gcc-4_8-branch into linaro/gcc-4_8-branch up to
revision 210799 as r211850.

Thanks
Yvan


Re: [PATCH, AArch64] Add Cortex-A53 erratum 843419 configure-time option

2015-05-12 Thread Yvan Roux
Hi,

On 5 May 2015 at 10:11, Marcus Shawcroft  wrote:
> On 4 May 2015 at 09:58, Yvan Roux  wrote:
>
>> Yes this is a better formulation.
>>
>>> +corresponding flag to the linker.  It can be explicitly disabled
>>> during
>>> +compilation by passing the @option{-mno-fix-cortex-a53-835769} option.
>>>
>>> Copy paste error here with the previous errata number.
>>
>> Here is the patch with the modifications.  Is it needed to backport it
>> into 4.9 and 5.1 branches ?
>
>
> This is OK for trunk.  The back ports make sense, but leave it 48
> hours, if there are no objections then go ahead.

Committed in 5 and 4.9 branches at revisions r223044 and r223046.

Cheers,
Yvan


Re: [PATCH][DRIVER] Wrong C++ include paths when configuring with "--with-sysroot=/"

2015-05-20 Thread Yvan Roux
Hi,

On 8 May 2015 at 00:07, Joseph Myers  wrote:
> On Mon, 20 Apr 2015, Pavel Kopyl wrote:
>
>> Hi all,
>>
>>
>> To build a GCC-4.9.2 ARM cross-compiler for my setting I need to configure it
>> with  "--with-sysroot=/ --with-gxx-include-dir=/usr/include/c++/4.9.2".
>> But I found that gcc driver removes the leading slash from resulting paths:
>>
>> `gcc -print-prog-name=cc1plus` -v
>> ...
>> ignoring nonexistent directory "usr/include/c++/4.9.2"   <- HERE
>> ignoring nonexistent directory
>> "usr/include/c++/4.9.2/armv7l-tizen-linux-gnueabi"   <- AND HERE
>> ignoring nonexistent directory "usr/include/c++/4.9.2/backward" <- AND HERE
>> #include "..." search starts here:
>> #include <...> search starts here:
>> /usr/lib/gcc/armv7l-tizen-linux-gnueabi/4.9.2/include
>> /usr/local/include
>> /usr/lib/gcc/armv7l-tizen-linux-gnueabi/4.9.2/include-fixed
>> /usr/include
>>
>> It's also reproducible on trunk.
>>
>> Attached patch fixes this bug.
>
> You don't explain the rationale for this patch, in terms of the logical
> semantics of the various variables involved, and why, in terms of those
> logical semantics, this patch is the correct approach for fixing the
> observed problems.
>
> As I read the code, it's not the driver that removes the leading slash.
> Rather, it's the code in configure.ac:
>
> gcc_gxx_include_dir_add_sysroot=0
> if test "${with_sysroot+set}" = set; then
>   gcc_gxx_without_sysroot=`expr "${gcc_gxx_include_dir}" : 
> "${with_sysroot}"'\(.*\)'`
>   if test "${gcc_gxx_without_sysroot}"; then
> gcc_gxx_include_dir="${gcc_gxx_without_sysroot}"
> gcc_gxx_include_dir_add_sysroot=1
>   fi
> fi
>
> What I'd say is that this code is mishandling the case of a --with-sysroot
> path that ends with '/' (or, I suppose, '\', on hosts where that's a
> directory separator).  That is, it's producing a gcc_gxx_include_dir
> setting with no leading '/'.  I think it would be more appropriate for
> this configure.ac code to remove (sysroot minus trailing directory
> separator), so that the gcc_gxx_include_dir setting after this code still
> has a leading directory separator whether or not the --with-sysroot
> setting ended with such a separator.  Given such a configure.ac change, I
> wouldn't then expect changes elsewhere in the compiler to be needed.  But
> if that doesn't work to fix the bug, I think you need to elaborate further
> on the semantics of the various variables involved (in configure.ac and in
> the compiler).

There is this old patch submitted by Matthias on that same issue, if
its logic is the right one for you Joseph I can rebase/validate it
Joseph.

https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00320.html

Cheers,
Yvan

> --
> Joseph S. Myers
> jos...@codesourcery.com


[PATCH, ARM] Alternatives and type attributes fixes.

2015-04-27 Thread Yvan Roux
Hi,

This is a follow-up of PR64208 where an LRA loop was due to redundancy
in insn's alternatives.  I've checked all the insns in the arm backend
to avoid latent problems and this patch fixes the issues I've spotted.

Only thumb2_movsicc_insn contained duplicated alternatives, and the
rest of the changes are related to the type attribute, which were not
accurate or used accordingly to their definition.  Notice that the
modifications have only a limited impact as in most of the pipeline
descriptions, multiple/mov_reg/alu_reg/.. types are used the same way,
only cortex a8 seems to have a real difference between alu or multiple
instruction and mov.

Bootstrapped and regtested on arm-linux-gnueabihf.  Ok for trunk ?

Thanks,
Yvan

2015-04-27  Yvan Roux  

* config/arm/arm.mb (*arm_movt): Fix type attribute.
(*movsi_compare0): Likewise.
(*cmpsi_shiftsi): Likewise.
(*cmpsi_shiftsi_swp): Likewise.
(*movsicc_insn): Likewise.
(*cond_move): Likewise.
(*if_plus_move): Likewise.
(*if_move_plus): Likewise.
(*if_arith_move): Likewise.
(*if_move_arith): Likewise.
(*if_shift_move): Likewise.
(*if_move_shift): Likewise.
* config/arm/thumb2.md (*thumb2_movsi_insn): Fix type attribute.
(*thumb2_movsicc_insn): Fix alternative redundancy.
(*thumb2_addsi_short): Split register and immediate alternatives.
(thumb2_addsi3_compare0): Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 164ac13..ee23deb 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5631,7 +5631,7 @@
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
(set_attr "length" "4")
-   (set_attr "type" "mov_imm")]
+   (set_attr "type" "alu_sreg")]
 )
 
 (define_insn "*arm_movsi_insn"
@@ -5919,7 +5919,7 @@
cmp%?\\t%0, #0
sub%.\\t%0, %1, #0"
   [(set_attr "conds" "set")
-   (set_attr "type" "alus_imm,alus_imm")]
+   (set_attr "type" "alus_sreg,alus_sreg")]
 )
 
 ;; Subroutine to store a half word from a register into memory.
@@ -6886,7 +6886,7 @@
   [(set_attr "conds" "set")
(set_attr "shift" "1")
(set_attr "arch" "32,a,a")
-   (set_attr "type" "alus_shift_imm,alu_shift_reg,alus_shift_imm")])
+   (set_attr "type" "alus_shift_imm,alus_shift_reg,alus_shift_imm")])
 
 (define_insn "*cmpsi_shiftsi_swp"
   [(set (reg:CC_SWP CC_REGNUM)
@@ -6899,7 +6899,7 @@
   [(set_attr "conds" "set")
(set_attr "shift" "1")
(set_attr "arch" "32,a,a")
-   (set_attr "type" "alus_shift_imm,alu_shift_reg,alus_shift_imm")])
+   (set_attr "type" "alus_shift_imm,alus_shift_reg,alus_shift_imm")])
 
 (define_insn "*arm_cmpsi_negshiftsi_si"
   [(set (reg:CC_Z CC_REGNUM)
@@ -7492,10 +7492,10 @@
 (const_string "mov_imm")
 (const_string "mov_reg"))
   (const_string "mvn_imm")
-  (const_string "mov_reg")
-  (const_string "mov_reg")
-  (const_string "mov_reg")
-  (const_string "mov_reg")])]
+  (const_string "multiple")
+  (const_string "multiple")
+  (const_string "multiple")
+  (const_string "multiple")])]
 )
 
 (define_insn "*movsfcc_soft_insn"
@@ -8653,7 +8653,14 @@
 return \"\";
   "
   [(set_attr "conds" "use")
-   (set_attr "type" "mov_reg,mov_reg,multiple")
+   (set_attr_alternative "type"
+ [(if_then_else (match_operand 2 "const_int_operand" 
"")
+(const_string "mov_imm")
+(const_string "mov_reg"))
+  (if_then_else (match_operand 1 "const_int_operand" 
"")
+(const_string "mov_imm")
+(const_string "mov_reg"))
+  (const_string "multiple")])
(set_attr "length" "4,4,8")]
 )
 
@@ -9449,8 +9456,8 @@
 (const_string "alu_imm" )
 (const_string "alu_sreg"))
   (const_string "alu_imm")
-  

Re: [PATCH, ARM] Alternatives and type attributes fixes.

2015-04-27 Thread Yvan Roux
On 27 April 2015 at 15:20, Kyrill Tkachov  wrote:
> Hi Yvan,
>
>
> On 27/04/15 13:25, Yvan Roux wrote:
>>
>> Hi,
>>
>> This is a follow-up of PR64208 where an LRA loop was due to redundancy
>> in insn's alternatives.  I've checked all the insns in the arm backend
>> to avoid latent problems and this patch fixes the issues I've spotted.
>>
>> Only thumb2_movsicc_insn contained duplicated alternatives, and the
>> rest of the changes are related to the type attribute, which were not
>> accurate or used accordingly to their definition.  Notice that the
>> modifications have only a limited impact as in most of the pipeline
>> descriptions, multiple/mov_reg/alu_reg/.. types are used the same way,
>> only cortex a8 seems to have a real difference between alu or multiple
>> instruction and mov.
>>
>> Bootstrapped and regtested on arm-linux-gnueabihf.  Ok for trunk ?
>>
>> Thanks,
>> Yvan
>>
>> 2015-04-27  Yvan Roux
>>
>>  * config/arm/arm.mb (*arm_movt): Fix type attribute.
>>  (*movsi_compare0): Likewise.
>>  (*cmpsi_shiftsi): Likewise.
>>  (*cmpsi_shiftsi_swp): Likewise.
>>  (*movsicc_insn): Likewise.
>>  (*cond_move): Likewise.
>>  (*if_plus_move): Likewise.
>>  (*if_move_plus): Likewise.
>>  (*if_arith_move): Likewise.
>>  (*if_move_arith): Likewise.
>>  (*if_shift_move): Likewise.
>>  (*if_move_shift): Likewise.
>>  * config/arm/thumb2.md (*thumb2_movsi_insn): Fix type attribute.
>>  (*thumb2_movsicc_insn): Fix alternative redundancy.
>>  (*thumb2_addsi_short): Split register and immediate alternatives.
>>  (thumb2_addsi3_compare0): Likewise.
>>
>> insn.diff
>>
>>
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index 164ac13..ee23deb 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -5631,7 +5631,7 @@
>> [(set_attr "predicable" "yes")
>>  (set_attr "predicable_short_it" "no")
>>  (set_attr "length" "4")
>> -   (set_attr "type" "mov_imm")]
>> +   (set_attr "type" "alu_sreg")]
>>   )
>
>
> For some context, this is the *arm_movt pattern that generates
> a movt instruction. The documentation in types.md says:
> "This excludes MOV and MVN but includes MOVT." So using alu_sreg
> is correct according to that logic.
> However, don't you want to also update the arm_movtas_ze pattern
> that generates movt but erroneously has the type 'mov_imm'?

Ha, yes I missed this one ! I'll will update it.

>> (define_insn "*arm_movsi_insn"
>> @@ -5919,7 +5919,7 @@
>>  cmp%?\\t%0, #0
>>  sub%.\\t%0, %1, #0"
>> [(set_attr "conds" "set")
>> -   (set_attr "type" "alus_imm,alus_imm")]
>> +   (set_attr "type" "alus_sreg,alus_sreg")]
>>   )
>
>
> Why change that? They are instructions with immediate operands
> so alus_imm should be gorrect?

Oups, I certainly misread #0 and %0 this one is correct.

>> @@ -486,12 +486,12 @@
>>   )
>> (define_insn_and_split "*thumb2_movsicc_insn"
>> -  [(set (match_operand:SI 0 "s_register_operand"
>> "=l,l,r,r,r,r,r,r,r,r,r")
>> +  [(set (match_operand:SI 0 "s_register_operand"
>> "=l,l,r,r,r,r,r,r,r,r,r,r")
>> (if_then_else:SI
>>  (match_operator 3 "arm_comparison_operator"
>>   [(match_operand 4 "cc_register" "") (const_int 0)])
>> -(match_operand:SI 1 "arm_not_operand" "0 ,lPy,0 ,0,rI,K,rI,rI,K
>> ,K,r")
>> -(match_operand:SI 2 "arm_not_operand" "lPy,0 ,rI,K,0 ,0,rI,K
>> ,rI,K,r")))]
>> +(match_operand:SI 1 "arm_not_operand" "0 ,lPy,0 ,0,rI,K,I ,r,rI,K
>> ,K,r")
>> +(match_operand:SI 2 "arm_not_operand" "lPy,0 ,rI,K,0 ,0,rI,I,K
>> ,rI,K,r")))]
>> "TARGET_THUMB2"
>> "@
>>  it\\t%D3\;mov%D3\\t%0, %2
>> @@ -504,12 +504,14 @@
>>  #
>>  #
>>  #
>> +   #
>>  #"
>>  ; alt 6: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
>> -   ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
>> -   ; alt 8: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
>> -   ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #

Re: [PATCH, ARM] Alternatives and type attributes fixes.

2015-04-28 Thread Yvan Roux
Hi,

On 27 April 2015 at 15:58, Yvan Roux  wrote:
> On 27 April 2015 at 15:20, Kyrill Tkachov  wrote:
>> Hi Yvan,
>>
>>
>> On 27/04/15 13:25, Yvan Roux wrote:
>>>
>>> Hi,
>>>
>>> This is a follow-up of PR64208 where an LRA loop was due to redundancy
>>> in insn's alternatives.  I've checked all the insns in the arm backend
>>> to avoid latent problems and this patch fixes the issues I've spotted.
>>>
>>> Only thumb2_movsicc_insn contained duplicated alternatives, and the
>>> rest of the changes are related to the type attribute, which were not
>>> accurate or used accordingly to their definition.  Notice that the
>>> modifications have only a limited impact as in most of the pipeline
>>> descriptions, multiple/mov_reg/alu_reg/.. types are used the same way,
>>> only cortex a8 seems to have a real difference between alu or multiple
>>> instruction and mov.
>>>
>>> Bootstrapped and regtested on arm-linux-gnueabihf.  Ok for trunk ?
>>>
>>> Thanks,
>>> Yvan
>>>
>>> 2015-04-27  Yvan Roux
>>>
>>>  * config/arm/arm.mb (*arm_movt): Fix type attribute.
>>>  (*movsi_compare0): Likewise.
>>>  (*cmpsi_shiftsi): Likewise.
>>>  (*cmpsi_shiftsi_swp): Likewise.
>>>  (*movsicc_insn): Likewise.
>>>  (*cond_move): Likewise.
>>>  (*if_plus_move): Likewise.
>>>  (*if_move_plus): Likewise.
>>>  (*if_arith_move): Likewise.
>>>  (*if_move_arith): Likewise.
>>>  (*if_shift_move): Likewise.
>>>  (*if_move_shift): Likewise.
>>>  * config/arm/thumb2.md (*thumb2_movsi_insn): Fix type attribute.
>>>  (*thumb2_movsicc_insn): Fix alternative redundancy.
>>>  (*thumb2_addsi_short): Split register and immediate alternatives.
>>>  (thumb2_addsi3_compare0): Likewise.
>>>
>>> insn.diff
>>>
>>>
>>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>>> index 164ac13..ee23deb 100644
>>> --- a/gcc/config/arm/arm.md
>>> +++ b/gcc/config/arm/arm.md
>>> @@ -5631,7 +5631,7 @@
>>> [(set_attr "predicable" "yes")
>>>  (set_attr "predicable_short_it" "no")
>>>  (set_attr "length" "4")
>>> -   (set_attr "type" "mov_imm")]
>>> +   (set_attr "type" "alu_sreg")]
>>>   )
>>
>>
>> For some context, this is the *arm_movt pattern that generates
>> a movt instruction. The documentation in types.md says:
>> "This excludes MOV and MVN but includes MOVT." So using alu_sreg
>> is correct according to that logic.
>> However, don't you want to also update the arm_movtas_ze pattern
>> that generates movt but erroneously has the type 'mov_imm'?
>
> Ha, yes I missed this one ! I'll will update it.
>
>>> (define_insn "*arm_movsi_insn"
>>> @@ -5919,7 +5919,7 @@
>>>  cmp%?\\t%0, #0
>>>  sub%.\\t%0, %1, #0"
>>> [(set_attr "conds" "set")
>>> -   (set_attr "type" "alus_imm,alus_imm")]
>>> +   (set_attr "type" "alus_sreg,alus_sreg")]
>>>   )
>>
>>
>> Why change that? They are instructions with immediate operands
>> so alus_imm should be gorrect?
>
> Oups, I certainly misread #0 and %0 this one is correct.
>
>>> @@ -486,12 +486,12 @@
>>>   )
>>> (define_insn_and_split "*thumb2_movsicc_insn"
>>> -  [(set (match_operand:SI 0 "s_register_operand"
>>> "=l,l,r,r,r,r,r,r,r,r,r")
>>> +  [(set (match_operand:SI 0 "s_register_operand"
>>> "=l,l,r,r,r,r,r,r,r,r,r,r")
>>> (if_then_else:SI
>>>  (match_operator 3 "arm_comparison_operator"
>>>   [(match_operand 4 "cc_register" "") (const_int 0)])
>>> -(match_operand:SI 1 "arm_not_operand" "0 ,lPy,0 ,0,rI,K,rI,rI,K
>>> ,K,r")
>>> -(match_operand:SI 2 "arm_not_operand" "lPy,0 ,rI,K,0 ,0,rI,K
>>> ,rI,K,r")))]
>>> +(match_operand:SI 1 "arm_not_operand" "0 ,lPy,0 ,0,rI,K,I ,r,rI,K
>>> ,K,r")
>>> +(match_operand:SI 2 "arm_not_operand" "lPy,0 ,rI,K,0 ,0,rI,I,K
>>> ,rI,K,r")))]
>>> "TARGE

[PATCH, ARM] Fix for pr65924

2015-04-29 Thread Yvan Roux
Hi,

here is the patch for PR65924, only tested on the original testcase so far.

Thanks
Yvan

gcc/
2015-04-29  Yvan Roux  

PR target/65924
* config/arm/thumb2.md (*thumb2_addsi3_compare0_scratch): Fix operand
number in type attribute expression.

gcc/testsuite/
2015-04-29  Yvan Roux  

PR target/65924
gcc.target/arm/pr65924.c: New test.
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 4f9faac..2c91542 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1305,7 +1305,7 @@
   "
   [(set_attr "conds" "set")
(set_attr "length" "2,4")
-   (set (attr "type") (if_then_else (match_operand 2 "const_int_operand" "")
+   (set (attr "type") (if_then_else (match_operand 1 "const_int_operand" "")
 (const_string "alus_imm")
 (const_string "alus_sreg")))]
 )
diff --git a/gcc/testsuite/gcc.target/arm/pr65924.c 
b/gcc/testsuite/gcc.target/arm/pr65924.c
new file mode 100644
index 000..746749f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr65924.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mthumb" } */
+
+int a, b, c;
+int fn1() {
+  if (b + a < 0)
+c = 0;
+}


Re: [PATCH, ARM] Fix for pr65924

2015-04-29 Thread Yvan Roux
Hi Kyrill,

On 29 April 2015 at 12:06, Kyrill Tkachov  wrote:
> Hi Yvan,
>
> On 29/04/15 09:57, Yvan Roux wrote:
>>
>> Hi,
>>
>> here is the patch for PR65924, only tested on the original testcase so
>> far.
>
>
> Can you please double check that the problem doesn't appear
> in the other patterns you touched with your cleanup patch?
> i.e. that you don't do match_operand on an out-of-bounds operand number?

Sure, I just re-check all the patterns and there is no other issue.

Cheers,
Yvan

> Thanks,
> Kyrill
>
>
>>
>> Thanks
>> Yvan
>>
>> gcc/
>> 2015-04-29  Yvan Roux  
>>
>>  PR target/65924
>>  * config/arm/thumb2.md (*thumb2_addsi3_compare0_scratch): Fix operand
>>  number in type attribute expression.
>>
>> gcc/testsuite/
>> 2015-04-29  Yvan Roux  
>>
>>  PR target/65924
>>  gcc.target/arm/pr65924.c: New test.
>
>


  1   2   >