from:"andre.simoesdiasvieira at arm dot com"

[Bug rtl-optimization/78255] [5/6 regression] Indirect sibling call causing wrong code generation for ARM

2017-04-12 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78255

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #17 from Andre Vieira  ---
Yes, how do I change this to "verified"?

[Bug c++/77388] Reference to a packed structure member

2016-08-26 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77388

--- Comment #5 from Andre Vieira  ---
I see, thank you! 

Oh and leaving out the const yields an error:
t.cpp:28:16: error: cannot bind packed field '((B*)this)->B::s->test_struct::c'
to 'short int&'
   return A (s->c);

[Bug c++/77388] Reference to a packed structure member

2016-08-26 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77388

--- Comment #3 from Andre Vieira  ---
Thank you Richard!

I have a follow up question. Why is this only a problem when passing by
reference and not when passing a pointer?

So say:
#define PACKED __attribute__ ((packed))

#define TYPE_C short

typedef struct {
TYPE_C c;
} PACKED test_struct;

class A
{
  const TYPE_C * c;
public:
  A (const TYPE_C * _c) :
c(_c) {};
};

class B
{
public:
  B();
  A foo ();
private:
  test_struct * s;
};

A B::foo ()
{
  return A (&(s->c));
}

Wouldn't there still be an alignment mismatch between A::c and s->c?

[Bug c++/77388] New: Reference to a packed structure member

2016-08-26 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77388

Bug ID: 77388
   Summary: Reference to a packed structure member
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

As initially reported by Michal on
https://answers.launchpad.net/gcc-arm-embedded/+question/345145 gcc seems to be
showing some weird behavior when it comes to passing a reference to a member in
a packed structure.

I was able to reduce the testcase presented in that launchpad ticket to the
following program:

$cat t.cpp:
#define PACKED __attribute__ ((packed))

#define TYPE_C short

typedef struct {
TYPE_C c;
} PACKED test_struct;

class A
{
  const TYPE_C 
public:
  A (const TYPE_C & _c) :
c(_c) {};
};

class B
{
public:
  B();
  A foo ();
private:
  test_struct * s;
};

A B::foo ()
{
  return A (s->c);
}


Compiling this with
$arm-none-eabi-g++ -mcpu=cortex-m7 -mthumb -S -O1 t.cpp -fdump-tree-optimized

Will yield the following dump:
;; Function A B::foo() (_ZN1B3fooEv, funcdef_no=3, decl_uid=4607, cgraph_uid=3,
symbol_order=3)

A B::foo() (struct B * const this)
{
  const short int D.4636;
  struct A D.4650;

  :
  MEM[(struct A *)] = 
  D.4636 ={v} {CLOBBER};
  return D.4650;

}

As you can see, it will not load the struct's field.

Changing the 'TYPE_C' define to 'char' will yield the following  dump:
;; Function A B::foo() (_ZN1B3fooEv, funcdef_no=3, decl_uid=4607, cgraph_uid=3,
symbol_order=3)

A B::foo() (struct B * const this)
{
  struct A D.4649;
  struct test_struct * _1;
  char * _2;

  :
  _1 = this_4(D)->s;
  _2 = &_1->c;
  MEM[(struct A *)] = _2;
  return D.4649;

}

Now when the type is 'char' it seems to be able to get the fields address.

Can anyone shine some light on this for me? Is referencing a packed structure's
member that is not guaranteed to be aligned (so not char) undefined behavior?

[Bug rtl-optimization/70164] [6/7 Regression] Code/performance regression due to poor register allocation on Cortex-M0

2016-07-01 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164

--- Comment #16 from Andre Vieira  ---
Any progress on this one?

[Bug tree-optimization/71237] [7 regression] scev tests failing after pass reorganization

2016-05-26 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237

--- Comment #3 from Andre Vieira  ---
Created attachment 38576
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38576=edit
Regular generation at -O2

[Bug tree-optimization/71237] [7 regression] scev tests failing after pass reorganization

2016-05-26 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237

--- Comment #2 from Andre Vieira  ---
Created attachment 38575
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38575=edit
Assembly with the changed passes.def removing one pass of lim

[Bug tree-optimization/71237] [7 regression] scev tests failing after pass reorganization

2016-05-26 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237

--- Comment #1 from Andre Vieira  ---
So yes disabling LIM will make the tests "PASS". Though I couldnt find an
option to do this, I disabled the pass by changing passes.def, so that doesnt
sound like a good idea to test SCCP. 

However, it might be good to point out that at least for arm-none-eabi and
x86_64-pc-linux-gnu these tests are no longer testing SCCP, SCCP will not
change this code. I looked at the dumps and compared assembly of -O2 with and
without '-fno-tree-scev-cprop'.

On arm-none-eabi, it used to be IVOPTS that made the test pass, it would reuse
the same ivtmp for computing the address used by the memory dereference and the
a_p assignment. Now due to the reordering of LIM, it will no longer do this.

On x86_64 I see the following code coming out of the OPTIMIZED dump for the
scev-4.c case:

...
  :
  # ivtmp.10_14 = PHI <_24(3), ivtmp.10_25(4)>
  i_11 = (int) ivtmp.10_14;
  MEM[symbol: a, index: ivtmp.10_14, step: 8, offset: 4B] = 100;
  ivtmp.10_25 = ivtmp.10_14 + _24;
  i_22 = (int) ivtmp.10_25;
  if (i_22 <= 999)
goto ;
  else
goto ;

  :
  _2 = (sizetype) i_11;
  _3 = _2 * 8;
  _10 = _3 + 4;
  _1 =  + _10;
  a_p = _1;
...

Now yes the scan-times  will pass, but thats because the MEM is using
symbol:a instead of base:  Not sure this can be qualified as a proper PASS.
Disabling LIM here the same way I did before, that is removing the pass_lim
after pass_laddress and before pass_split_crit_edges generates the following
OPTIMIZED dump:

...
  :
  _16 = (sizetype) k_4(D);
  _15 = _16 * 8;
  _21 = _15 + 4;
  _22 =  + _21;
  ivtmp.9_14 = (unsigned long) _22;

  :
  # i_11 = PHI 
  # ivtmp.9_13 = PHI 
  _1 = (int *) ivtmp.9_13;
  MEM[base: _1, offset: 0B] = 100;
  i_8 = k_4(D) + i_11;
  ivtmp.9_17 = ivtmp.9_13 + _15;
  if (i_8 <= 999)
goto ;
  else
goto ;

  :
  a_p = _1;
...

I prefer this output, since you loose the needless tailing address calculation.
I am not so sure the eventually generated assembly is better in this case
though. Ill add both as attachments.

[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "" 1

2016-05-23 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563

--- Comment #19 from Andre Vieira  ---
> First of all please open a new bug for the FAILs.  Second, the fix will
> be mostly adjusting the testcase expectations (eventually disabling LIM
> for example if we want to test SCCP abilities).

Opened a new ticket for this PR71237, makes sense to continue the discussions
there. I also quoted your comment there.

[Bug tree-optimization/71237] New: scev tests failing after pass reorganization

2016-05-23 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71237

Bug ID: 71237
   Summary: scev tests failing after pass reorganization
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

Ever since the reorganization of the passes moving lim before sink makes these
tests fail.

FAIL: gcc.dg/tree-ssa/scev-3.c scan-tree-dump-times optimized "" 1
FAIL: gcc.dg/tree-ssa/scev-4.c scan-tree-dump-times optimized "" 1
FAIL: gcc.dg/tree-ssa/scev-5.c scan-tree-dump-times optimized "" 1

This was first reported in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563#c11
for sparc*-*-solaris2.*, it also fails on arm-none-eabi.

Some further discussions on that thread.

Quoting Richard:

rguent...@suse.de 2016-05-23 07:40:31 UTC

>On Fri, 20 May 2016, andre.simoesdiasvieira at arm dot com wrote:
> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563
>> 
>> --- Comment #17 from Andre Vieira  ---
>> Ah yes my bad, its not sccp doing it... got a bit confused there... It is
>> indeed sink that moves that sequence down. Sorry for the noise.
>> 
>> Question remains on how to clean this up though. Ideally you would like to 
>> >end
>> up with
>> 
>> a_p = ;
>> outside of the loop.
> 
>First of all please open a new bug for the FAILs.  Second, the fix will
>be mostly adjusting the testcase expectations (eventually disabling LIM
>for example if we want to test SCCP abilities).
> 
>As to your question it techincally is a job of CSE though it may be a
>tough job to make it figure out the equivalence.
> 
>Now, in the case of scev-5.c (the only regression I see on x86_64, with
>-m32), SCCP fails to do final value replacement for i_24:
> 
>  :
>  # i_12 = PHI <i_10(3), i_9(5)>
>  MEM[(int *)][i_12] = 100;
>  i_9 = i_5 + i_12;
>  if (i_9 <= 999)
>goto ;
>  else
>goto ;
> 
>  :
>  goto ;
> 
>  :
>  # i_24 = PHI <i_12(4)>
>  _2 = (sizetype) i_24;
>  _3 = _2 * 4;
>  _1 =  + _3;
>  a_p = _1;
> 
>which may or may not be a good thing - this way IVOPTs can see the
>extra use of i_12 on the loop exit and it _could_ look for derived
>uses of that so it _may_ be able to replace the use of i_24 with
>something better.

[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "" 1

2016-05-20 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563

--- Comment #17 from Andre Vieira  ---
Ah yes my bad, its not sccp doing it... got a bit confused there... It is
indeed sink that moves that sequence down. Sorry for the noise.

Question remains on how to clean this up though. Ideally you would like to end
up with

a_p = ;
outside of the loop.

[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "" 1

2016-05-20 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563

--- Comment #15 from Andre Vieira  ---
So the code change for sccp moves the following sequence out of the loop:

  _2 = (sizetype) i_30;
  _3 = _2 * 8;
  _10 = _3 + 4;
  _1 =  + _10;
  a_p = _1;

This is basically: 
*a_p = [last_i].y;

I agree that that makes sense, were it not for the fact that sequence is
recomputing the address of a[i].y which is already computed inside the loop
for:

MEM[(int *)][i_11].y = 100;

When IVOPTS comes around it creates a code sequence similar to this one to
calculate the address it writes 100 to, and you end up with a needless
recomputation of the address.

Now I don't know what phase should be responsible for cleaning this up, whether
its sccp's responsibility to realize that the address computation is the same,
or whether there should be some sort of common sub-expression elimination step
in between or something else.

[Bug testsuite/52563] FAIL: gcc.dg/tree-ssa/scev-[3,4].c scan-tree-dump-times optimized "" 1

2016-05-20 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52563

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #12 from Andre Vieira  ---
Same regression observed on arm-none-eabi.

[Bug middle-end/71062] [7 regression] r235622 and restrict pointers

2016-05-11 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71062

Andre Vieira  changed:

   What|Removed |Added

 Target||arm
Summary|[bugzilla] r235622 and  |[7 regression] r235622 and
   |restrict pointers   |restrict pointers

--- Comment #1 from Andre Vieira  ---
Register keyword here is superfluous. It is all down to the restrict keyword.

[Bug middle-end/71062] New: [bugzilla] r235622 and restrict pointers

2016-05-11 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71062

Bug ID: 71062
   Summary: [bugzilla] r235622 and restrict pointers
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

Hi there,

I have encountered a new FAIL when testing newlib-nano for the arm-none-eabi
toolchain which I believe is caused by a change in code generation for
freopen.c.

After some investigation I was able to trace the issue back to a pointer
comparison with a restrict qualified pointer. See below a small piece of code
that illustrates the issue.

$cat t.c
extern const char bar;

int
foo (register char *__restrict p)
{
  if (p == )
return 1;

  return 0;
}

Since revision r235622, the pointer comparison here is evaluated to false
during compilation and the whole basic block is optimized away.

After some inner struggle and quite a few passes over the restrict definition
in the C99 standard(Committee Draft -- April 12, 2011, N1570) I think this
assumption that p and  can't point to the same object might be invalid.

Yes, the C-standard defines the '&' operator to yield a pointer to the object.
Though the formal definition of restrict only seems to apply to the
dereferencing of pointers. In this case, we do not dereference the pointer
created by '' and thus do not access the object that 'p' might be pointing
to.

Does my reasoning make sense? I find it quite difficult to wrap my head around
the definition of restrict.

Cheers,
Andre

[Bug libstdc++/70379] c99_classification_macros_c++98.cc failing with newlib

2016-04-04 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70379

Andre Vieira  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Andre Vieira  ---
Yaakov fixed this in newlib, commit hash
b9bbe1bccb1254ce891fc92961be2ec3cd3f6e4a

Thanks, closing ticket.

[Bug libstdc++/70379] New: c99_classification_macros_c++98.cc failing with newlib

2016-03-23 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70379

Bug ID: 70379
   Summary: c99_classification_macros_c++98.cc failing with newlib
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

26_numerics/headers/cmath/c99_classification_macros_c++98.cc fails for newlib
on arm-none-eabi with the following errors (clipped the error messages to only
include a few):
src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:38:16:
error: macro "isgreater" requires 2 arguments, but only 1 given
src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:40:21:
error: macro "isgreaterequal" requires 2 arguments, but only 1 given
src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:42:13:
error: macro "isless" requires 2 arguments, but only 1 given
src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:44:18:
error: macro "islessequal" requires 2 arguments, but only 1 given
src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:46:20:
error: macro "islessgreater" requires 2 arguments, but only 1 given
src/gcc/libstdc++-v3/testsuite/26_numerics/headers/cmath/c99_classification_macros_c++98.cc:48:18:
error: macro "isunordered" requires 2 arguments, but only 1 given

This new failure is due to a change that has been made to newlib where
-std=c++98 no longer includes the C99 math functions from math.h whereas
gnu++98 still does. This leads to _GLIBCXX98_USE_C99_MATH not being declared at
configuration time, since this is set by testing a compilation with -std=c++98.
This macro is the macro used in cmath to know whether the C99 math functions
are present, if so it needs to undefine the ones that are macros. This test
uses -std=gnu++98, which will have these macros defined and
_GLIBCXX98_USE_C99_MATH not set which leads us to the errors above.

This is now also an issue with at least gcc-5.

[Bug rtl-optimization/70278] [6 regression] LRA ICE on trunk for ARM Thumb1 with Os

2016-03-21 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70278

Andre Vieira  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Andre Vieira  ---
This fixes it on our end. Thank you Bernd. 
Marking this as RESOLVED FIXED.

[Bug rtl-optimization/70278] New: LRA ICE on trunk for ARM Thumb1 with Os

2016-03-20 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70278

Bug ID: 70278
   Summary: LRA ICE on trunk for ARM Thumb1 with Os
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

Created attachment 37999
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37999=edit
reduced e_hypot.c

Hello,

We are running into an ICE in lra on trunk when compiling newlib for an ARM
thumb1 target with -Os.

This happens when compiling newlib/libm/math/e_hypot.c 

See the attached reduced source and the error when executing the command below:
$arm-none-eabi-gcc -march=armv4t -mthumb -Os -S besttry.c
The error:
besttry.c: In function '__ieee754_hypot':
besttry.c:27:1: internal compiler error: in
lra_create_new_reg_with_unique_value, at lra.c:188
 }
 ^
0x94d30b lra_create_new_reg_with_unique_value(machine_mode, rtx_def*,
reg_class, char const*)
../src/gcc/lra.c:188
0x94d358 lra_create_new_reg(machine_mode, rtx_def*, reg_class, char const*)
../src/gcc/lra.c:228
0x9590e4 split_reg
../src/gcc/lra-constraints.c:5034
0x95a2da split_if_necessary
../src/gcc/lra-constraints.c:5142
0x95e08c inherit_in_ebb
../src/gcc/lra-constraints.c:5527
0x95e08c lra_inheritance()
../src/gcc/lra-constraints.c:5813
0x950900 lra(_IO_FILE*)
../src/gcc/lra.c:2312
0x9082d1 do_reload
../src/gcc/ira.c:5408
0x9082d1 execute
../src/gcc/ira.c:5592


My initial investigation into lra shows that this is due to split_reg calling
lra_create_new_reg with mode = VOIDmode. Others with more lra experience might
be able to spot the origin of the issue quicker though.

A bisect shows that this seems to be introduced with revision r234184 where reg
split now calls lra_create_new_reg with 'mode' that seems to be set by
'lra_reg_info[hard_regno].biggest_mode' which in our case will be SImode. This
mode is passed down to lra_create_new_reg_with_unique_value which asserts if
its not VOIDmode.

Cheers,
Andre

PS: Unrelated to this, the code in lra_create_new_reg_with_unique_value looks a
bit odd, I think mode should have been initialized with VOIDmode there.

[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0

2016-03-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164

--- Comment #5 from Andre Vieira  ---
Ah yes I forgot to mention, this is reproduceable with:
$arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -Os -S pr45701-1.c

[Bug rtl-optimization/64164] [4.9/5/6 Regression] one more stack slot used due to one less inlining level

2016-03-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64164

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #59 from Andre Vieira  ---
I believe PR70164 is related to this.

[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0

2016-03-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164

--- Comment #4 from Andre Vieira  ---
Revision r226901 is linked to PR64164, so I added Alexandre Oliva to the watch
list.

[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0

2016-03-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164

--- Comment #3 from Andre Vieira  ---
Created attachment 37923
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37923=edit
pre-patch reload dump

[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0

2016-03-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164

--- Comment #2 from Andre Vieira  ---
Created attachment 37922
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37922=edit
pre-patch ira dump

[Bug rtl-optimization/70164] Code/performance regression due to poor register allocation on Cortex-M0

2016-03-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164

--- Comment #1 from Andre Vieira  ---
Created attachment 37921
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37921=edit
current reload dump

[Bug rtl-optimization/70164] New: Code/performance regression due to poor register allocation on Cortex-M0

2016-03-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70164

Bug ID: 70164
   Summary: Code/performance regression due to poor register
allocation on Cortex-M0
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

Created attachment 37920
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37920=edit
current ira dump

After a quick investigation of the testcase in
gcc/testsuite/gcc.target/arm/pr45701-1.c for cortex-m0 on trunk I found out
that the test case was failing due to a change in the register allocation after
revision r226901.

Before this register allocation would choose to load the global 'hist_verify'
onto r6 representing 'old_verify' prior to the function call to
pre_process_line. This old_verify is used after the function call. With the
patch it decides to load it onto r3, a caller-saved register, which means it
has to be spilled before the function call and reloaded after.

Before patch:
history_expand_line_internal:
push{r3, r4, r5, r6, r7, lr}
ldr r3, .L5
ldr r5, .L5+4
ldr r4, [r3]
movsr3, #0
ldr r6, [r5]   ; <--- load of 'hist_verify' onto r6
movsr7, r0
str r3, [r5]
bl  pre_process_line
addsr6, r4, r6
str r6, [r5]
movsr4, r0
cmp r7, r0
bne .L2
bl  str_len
addsr0, r0, #1
bl  x_malloc
movsr1, r4
bl  str_cpy
movsr4, r0
.L2:
movsr0, r4
@ sp needed
pop {r3, r4, r5, r6, r7, pc}

Current:
history_expand_line_internal:
push{r0, r1, r2, r4, r5, r6, r7, lr}
ldr r3, .L3
ldr r5, .L3+4
ldr r6, [r3]
ldr r3, [r5]; <--- load of 'hist_verify' onto r3
movsr7, r0
str r3, [sp, #4]; <--- Spill
movsr3, #0
str r3, [r5]
bl  pre_process_line
ldr r3, [sp, #4]; <--- Reload
movsr4, r0
addsr6, r6, r3
str r6, [r5]
cmp r7, r0
bne .L1
bl  str_len
addsr0, r0, #1
bl  x_malloc
movsr1, r4
bl  str_cpy
movsr4, r0
.L1:
movsr0, r4
@ sp needed
pop {r1, r2, r3, r4, r5, r6, r7, pc}


I have also attached the dumps for ira and reload for both pre-patch and
current. In the current reload dump insn 9 represents the load onto r3 and insn
62 the spill. In pre-patch ira/reload the load is in insn 10.

I am not familiar with RA in GCC, so I'm not entirely sure what code to blame
for this sub-optimal allocation, any comments or pointers would be most
welcome.

[Bug target/70063] msp430 stack corruption for naked functions

2016-03-03 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70063

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #2 from Andre Vieira  ---
I believe pr69979 reports a related issue with arm targets.

[Bug target/69979] ARM naked function attribute not handling structs bigger than 32 bits correctly

2016-03-03 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69979

--- Comment #1 from Andre Vieira  ---
I believe expand_function_start is responsible for this code. When it calls
assign_parms it will generate RTL to copy the incoming struct parameter onto
the stack.

[Bug target/69979] New: ARM naked function attribute not handling structs bigger than 32 bits correctly

2016-02-26 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69979

Bug ID: 69979
   Summary: ARM naked function attribute not handling structs
bigger than 32 bits correctly
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

As reported by Cory in https://bugs.launchpad.net/gcc-arm-embedded/+bug/1549542
It seems the naked function attribute for ARM is generating code for struct
parameters being passed in registers. This code stores these structs being
passed as registers on the stack, using 'r3' as a scratch register. Apart from
being suboptimal, this writes to 'r3' even though 'r3' might be used to hold a
parameter!

For instance with the following C code:
struct test {
  int a;
  int b;
};

int
foo (struct test t, int a, int b)
{
  __asm ("mov r0, r3\n\t"
 "bx  lr");
}

when compiled with
$arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -S

will yield the following assembly:
foo:
@ Naked Function: prologue and epilogue provided by programmer.
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
mov r3, r7
stm r3, {r0, r1}
.syntax unified
@ 9 "tnaked.c" 1
mov r0, r3
bx lr
@ 0 "" 2
.syntax unified
nop
mov r0, r3

As you see 'r3' will have been rewritten with the frame pointer before being
moved to 'r0' for the return. Also the last 'mov r0, r3' after the 'nop' looks
a bit odd!

Something equally weird happens when returning such a struct:

struct test
bar (int a, int b, int c)
{
__asm ("stmia r0, {r2, r3}\n\t"
   "bx lr");
}

One would naturally expect to be storing 'b' and 'c' into '[r0]', the place
where the caller expects the return value to be written to. However the
following assembly is generated, which overwrites r3 (which should contain
argument 'c'):

bar:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mov r3, r0
@ 16 "tnaked.c" 1
stmia r0, {r2, r3}
bx lr
@ 0 "" 2
.thumb
mov r0, r3
bx  lr

Again with the unexpected epilogue code creeping in.

I have observed this behavior for various ARM targets dating back to gcc 4.8
(haven't tried earlier than that).

[Bug rtl-optimization/69752] New: Reload removing instruction with side-effect

2016-02-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69752

Bug ID: 69752
   Summary: Reload removing instruction with side-effect
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

This behavior was caught when debugging the following fail for -mcpu=cortex-m0:
FAIL: g++.dg/torture/vshuf-v2di.C -O2 execution test


After some debugging I noticed that reload would remove an insn that contained
a post_inc which would cause the shuffle to be off by 4 (the value of the post
increase).

Using the exact same sources as the testsuite, if you compile with
-fdump-rtl-all you can observe that pre reload (ira) you encounter the
following sequence of RTL:

(insn 455 191 213 6 (set (reg/f:SI 267)
(reg/f:SI 379)) 748 {*thumb1_movsi_insn}
 (nil))
(insn 213 455 216 6 (set (reg:SI 266)
(mem/u/c:SI (post_inc:SI (reg/f:SI 267)) [4  S4 A32])) 748
{*thumb1_movsi_insn}
 (expr_list:REG_EQUIV (const_int -1044200508 [0xc1c2c3c4])
(expr_list:REG_INC (reg/f:SI 267)
(nil
(insn 216 213 218 6 (set (reg:SI 268)
(mem/u/c:SI (reg/f:SI 267) [4  S4 A64])) 748 {*thumb1_movsi_insn}
 (expr_list:REG_DEAD (reg/f:SI 267)
(nil)))

Where pseudo register 267 is post_incremented in insn 213 and used in insn 216
right after.

After reload:

...
Removing equiv init insn 443 (freq=107)
  443: r381:SI=sfp:SI+0x10
  REG_EQUIV sfp:SI-0x40
deleting insn with uid = 443.
...
(insn 455 191 213 6 (set (reg/f:SI 5 r5 [267])
(reg/f:SI 2 r2 [379])) 748 {*thumb1_movsi_insn}
 (nil))
(note 213 455 216 6 NOTE_INSN_DELETED)
(insn 216 213 521 6 (set (reg:SI 5 r5 [268])
(mem/u/c:SI (reg/f:SI 5 r5 [267]) [4  S4 A64])) 748
{*thumb1_movsi_insn}
 (nil))

As you see pseudo register 268 (now in r5), will be loaded from mem (r5) which
is still pointing to the old value of 267 and not the increased one, causing an
offset error of 4. I checked and pseudo register 748 is the same in both cases.
Also adding -fno-auto-inc to the test will yield a PASS result.

[Bug rtl-optimization/69752] Reload removing instruction with side-effect

2016-02-10 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69752

--- Comment #1 from Andre Vieira  ---
Tried it with GCC 5.2.1 and 6.0, all show the same behavior. For 4.9 I couldnt
reproduce the issue.

[Bug target/69538] New: gcc.dg/torture/stackalign/builtin-apply-4.c fails with flto for aarch32 targets with single precision FPU

2016-01-28 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69538

Bug ID: 69538
   Summary: gcc.dg/torture/stackalign/builtin-apply-4.c fails with
flto for aarch32 targets with single precision FPU
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

I am getting an execution failure for:
gcc.dg/torture/stackalign/builtin-apply-4.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects

I have tried trunk, 5.2 and 4.9 all fail for various (cortex-m4/7 armv7-a)
AArch32 targets when running with: "-mfloat-abi=hard -mfpu=fpv{4,5}-sp-d16"

I inspected the RTL produced with the following command line:
$arm-none-eabi-gcc  -fno-diagnostics-show-caret -fdiagnostics-color=never -O2
-fno-fat-lto-objects -fgnu89-inline -specs=rdimon.specs
-Wa,-mno-warn-deprecated -Wl,-Ttext-segment=0x2100  builtin-apply-4.c  -lm
-mthumb -march=armv7-a -mfloat-abi=hard -mfpu=fpv5-sp-d16 -o
builtin-apply-4_working.exe -save-temps  -fdump-rtl-all

Which doesnt have flto and passes the execution test. It produces the following
RTL for the call to bar:
(call_insn/c/i:TI 8 7 38 2 (parallel [
(set (reg:DF 16 s0)
(call (mem:SI (symbol_ref:SI ("bar") [flags 0x3] 
) [0 bar S4 A32])
(const_int 0 [0])))
(use (const_int 0 [0]))
(clobber (reg:SI 14 lr))
]) builtin-apply-4.c:34 209 {*call_value_symbol}


Now the same command line with -flto fails the execution test and produces the
following RTL for the call to bar:
(call_insn/u/i:TI 8 7 39 2 (parallel [
(set (reg:DF 0 r0)
(call (mem:SI (symbol_ref:SI ("bar.constprop.0") [flags 0x3] 
) [0 bar.constprop S4 A32])
(const_int 0 [0])))
(use (const_int 0 [0]))
(clobber (reg:SI 14 lr))
]) builtin-apply-4.c:27 209 {*call_value_symbol}

Using the same float ABI the LTO expects the return of bar in r0-r1, even
though its a double and in hard float abi it should be passed in s0-s1 (d0) as
is the case with the no LTO version.

[Bug target/69227] FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors)

2016-01-28 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69227

Andre Vieira  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Andre Vieira  ---
Test was changed to require C99 runtime in r232487.

[Bug c++/68385] [6 Regression] ICE building libstdc++ on arm-none-eabi

2016-01-15 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68385

--- Comment #8 from Andre Vieira  ---
It did fix it for me, sorry for the late reply.

[Bug target/69227] FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors)

2016-01-11 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69227

--- Comment #2 from Andre Vieira  ---
I have decided to email the newlib mailinglist to figure out which function
classes we should and should not support for 'arm-none-eabi'.

See https://sourceware.org/ml/newlib/2016/msg9.html

[Bug target/69227] New: FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors)

2016-01-11 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69227

Bug ID: 69227
   Summary: FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test
for excess errors)
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andre.simoesdiasvieira at arm dot com
  Target Milestone: ---

Commit r232191 causes the following fail on arm-none-eabi target:

FAIL: gcc.dg/torture/builtin-integral-1.c -O1 (test for excess errors)

As it no longer folds away __builtin_ceill for __builtin_fabsf. This is because
Gerald's patch checks for 'targetm.libc_has_function (function_c99_misc)' for a
transformation used here and, for arm-none-eabi, TARGET_LIBC_HAS_FUNCTION is
defined as 'no_c99_libc_has_function', which always returns false.

The question now is whether we should support function_c99_misc with
'arm-none-eabi', which comes with newlib. I believe newlib does not claim to
fully support C99.

[Bug testsuite/68232] gcc.dg/ifcvt-4.c fails on some arm configurations

2015-12-03 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #3 from Andre Vieira  ---
Fails also on any ARM M-profile arch/cpu combination I've tried (all with
-mthumb):
-march={armv6-m,armv7-m}
or -mcpu=cortex-m{0,0plus,3,4,7}

It does pass for armv7-r and cortex-r4 with and without -mthumb.

This all with target 'arm-none-eabi'.

[Bug c++/68385] [6 Regression] ICE building libstdc++ on arm-none-eabi

2015-11-18 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68385

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #2 from Andre Vieira  ---
Hi Jason,

I don't fully understand what is going wrong here, but when debugging I found
that the tree it complains about is coming from a call to
convert_to_integer_nofold in  the line in ocp_convert, this used to have a
fold_if_not_in_template. I found that I no longer got the ICE after reverting
the code there to fold 'converted'. Not sure this actually fixes it, I'd need
to look further into your patch for this. Hopefully this saves you some
debugging yourself.

The issue seemed to originate from a nop_expr around a param_declaration and
fold gets rid of it.

Hope this helps.

Cheers,
Andre

[Bug testsuite/67948] xor-and.c needs updating after r228661

2015-10-20 Thread andre.simoesdiasvieira at arm dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67948

Andre Vieira  changed:

   What|Removed |Added

 CC||andre.simoesdiasvieira@arm.
   ||com

--- Comment #2 from Andre Vieira  ---
I am working on this and proposed a fix on gcc-patches, see
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01899.html 

I can't assign the bug to me as I don't have write access.

39 matches

Mail list logo