[Bug c/92276] Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))'

2019-10-30 Thread Lijian.Zhang at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276

--- Comment #4 from Lijian Zhang  ---
(In reply to Richard Biener from comment #1)
> Instead of trying to force the compiler to unroll with -funroll-loops you can
> use #pragma GCC unroll N on individual loops instead.
> 
> The attributes should not conflict in any way.

Hi Richard,
Does it make sense to you that '__attribute__ ((optimize("unroll-loops")))' has
to be moved ahead of the caller, if the callee is defined with '__attribute__
((__always_inline__))'?

Thanks.

[Bug c/92276] Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))'

2019-10-30 Thread Lijian.Zhang at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276

Lijian Zhang  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Lijian Zhang  ---
In my case, the callee is defined with '__attribute__ ((__always_inline__))',
and I want to apply automatic loop unrolling. The '__attribute__
((optimize("unroll-loops")))' has to be added for the caller, not the callee.

[Bug c/92276] Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))'

2019-10-30 Thread Lijian.Zhang at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276

--- Comment #2 from Lijian Zhang  ---
(In reply to Richard Biener from comment #1)
> Instead of trying to force the compiler to unroll with -funroll-loops you can
> use #pragma GCC unroll N on individual loops instead.
> 
> The attributes should not conflict in any way.

Sorry, I made a mistake that in my case '__attribute__
((optimize("unroll-loops")))' should be used for the caller, not the callee.
#pragma GCC optimize ("unroll-loops") is also working.
Thanks for your suggestion!

/
#include 
#include 
#include 

static inline __attribute__ ((__always_inline__))
unsigned int clib_crc32c (unsigned int v, unsigned char * s, int len)
{
  for (; len >= 8; len -= 8, s += 8)
v = __crc32cd (v, *((unsigned long *) s));

  for (; len >= 4; len -= 4, s += 4)
v = __crc32cw (v, *((unsigned int *) s));

  for (; len >= 2; len -= 2, s += 2)
v = __crc32ch (v, *((unsigned short *) s));

  for (; len >= 1; len -= 1, s += 1)
v = __crc32cb (v, *((unsigned char *) s));

  return v;
}

__attribute__ ((optimize("unroll-loops")))
int main (int argc, char *argv[])
{
unsigned char s[40] = {argc, 0, argc, 0};
unsigned char ss[32] = {argc, 0, argc, 0, argc, 0};
unsigned int v = 0xbeefdead, vv = 0xdeadbeef;
int len = strtol (argv[1], NULL, 10);

v = clib_crc32c (v, s, 40);
vv = clib_crc32c (vv, ss, 32);

printf ("%8X\n", v);
printf ("%8X\n", vv);
return 0;
}

[Bug c/92276] New: Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))'

2019-10-30 Thread Lijian.Zhang at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276

Bug ID: 92276
   Summary: Embedded __attribute__ ((optimize("unroll-loops"))) is
not working together with '__attribute__
((__always_inline__))'
   Product: gcc
   Version: 8.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Lijian.Zhang at arm dot com
  Target Milestone: ---

Dear experts,
I'm trying to use '__attribute__ ((optimize("unroll-loops")))' to apply
automatic loop unrolling to a static-line function with __attribute__
((__always_inline__)).
But the loop is not unrolled from the assembly output. The compiling command is
'gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72 unroll.c -S'. 

However, if I apply -funroll-loops option to the compiling process, i.e.,
compile with command 'gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72
-funroll-loops unroll.c -S'. I can see loop is unrolled from the assembly
output.

And if I compile without -funroll-loops option, and if '__attribute__
((__always_inline__))' is commented out, '__attribute__ ((__always_inline__))'
is also taking effect.

So it seems those two attribute parameters are not working together, which
seems to be unreasonable to me. I want some functions to be inlined and also
the loops inside those functions unrolled automatically, as the loop iteration
number is fixed.

lijian@net-arm-d05-08:~/C/unroll$ gcc --version
gcc (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

lijian@net-arm-d05-08:~/C/unroll$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/;
SUPPORT_URL="https://help.ubuntu.com/;
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/;
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy;
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

lijian@net-arm-d05-08:~/C/unroll$ gcc -march=armv8-a+crc -O2 -W -Wall
-mtune=cortex-a72 unroll.c -S

lijian@net-arm-d05-08:~/C/unroll$ lscpu
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  64
On-line CPU(s) list: 0-63
Thread(s) per core:  1
Core(s) per socket:  32
Socket(s):   2
NUMA node(s):4
Vendor ID:   ARM
Model:   2
Model name:  Cortex-A72
Stepping:r0p2
BogoMIPS:100.00
L1d cache:   32K
L1i cache:   48K
L2 cache:1024K
L3 cache:16384K
NUMA node0 CPU(s):   0-15
NUMA node1 CPU(s):   16-31
NUMA node2 CPU(s):   32-47
NUMA node3 CPU(s):   48-63
Flags:   fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid



#include 
#include 
#include 

static inline __attribute__ ((__always_inline__))
__attribute__ ((optimize("unroll-loops")))
unsigned int clib_crc32c (unsigned int v, unsigned char * s, int len)
{
  for (; len >= 8; len -= 8, s += 8)
v = __crc32cd (v, *((unsigned long *) s));

  for (; len >= 4; len -= 4, s += 4)
v = __crc32cw (v, *((unsigned int *) s));

  for (; len >= 2; len -= 2, s += 2)
v = __crc32ch (v, *((unsigned short *) s));

  for (; len >= 1; len -= 1, s += 1)
v = __crc32cb (v, *((unsigned char *) s));

  return v;
}

int main (int argc, char *argv[])
{
unsigned char s[40] = {argc, 0, argc, 0};
unsigned char ss[32] = {argc, 0, argc, 0, argc, 0};
unsigned int v = 0xbeefdead, vv = 0xdeadbeef;
int len = strtol (argv[1], NULL, 10);

for (int i = 0; i < len; i++) {
v = clib_crc32c (v, s, 40);
vv = clib_crc32c (vv, ss, 32);
}

printf ("%8X\n", v);
printf ("%8X\n", vv);
return 0;
}


[Bug target/87358] [8/9 Regression] ICE when -mtune=thunderx2t99 applied

2018-09-24 Thread Lijian.Zhang at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87358

--- Comment #9 from Lijian Zhang  ---
Hi Andrew,
I only reproduced this issue with gcc-7.3.0, but not able to reproduce the
failure with gcc-8.2.0/gcc-8.1.0
But from your description, gcc-8.2.0 still have this issue, and this issue is
target to be fixed in gcc-8.3.0?
Thanks.

[Bug c/87358] ICE when -mtune=thunderx2t99 applied

2018-09-18 Thread Lijian.Zhang at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87358

Lijian Zhang  changed:

   What|Removed |Added

 CC||Lijian.Zhang at arm dot com

--- Comment #1 from Lijian Zhang  ---
Created attachment 44723
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44723=edit
pre-process file

[Bug c/87358] New: ICE when -mtune=thunderx2t99 applied

2018-09-18 Thread Lijian.Zhang at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87358

Bug ID: 87358
   Summary: ICE when -mtune=thunderx2t99 applied
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Lijian.Zhang at arm dot com
  Target Milestone: ---

lijian@armada8040-1:~/ICE.issue$ gcc --version
gcc (Ubuntu/Linaro 7.3.0-16ubuntu3) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

lijian@armada8040-1:~/ICE.issue$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/;
SUPPORT_URL="https://help.ubuntu.com/;
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/;
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy;
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

lijian@armada8040-1:~/ICE.issue$ lscpu
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):   2
Vendor ID:   ARM
Model:   1
Model name:  Cortex-A72
Stepping:r0p1
CPU max MHz: 2000.
CPU min MHz: 100.
BogoMIPS:50.00
Flags:   fp asimd evtstrm aes pmull sha1 sha2 crc32

lijian@armada8040-1:~/ICE.issue$ gcc -c l2_learn.i -O2 
-march=armv8.1-a+crc+crypto -mtune=thunderx2t99
/home/lijian/tasks/dualQuad/origin/src/vnet/l2/l2_learn.c: In function
‘l2learn_node_fn_thunderx2t99’:
/home/lijian/tasks/dualQuad/origin/src/vnet/l2/l2_learn.c:430:1: internal
compiler error: Segmentation fault
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

lijian@armada8040-1:~/ICE.issue$ gcc -c l2_learn.i -O2 
-march=armv8.1-a+crc+crypto
lijian@armada8040-1:~/ICE.issue$
lijian@armada8040-1:~/ICE.issue$