[Bug target/102294] memset expansion is sometimes slow for small sizes

2021-09-13 Thread bart.vanassche at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

--- Comment #13 from Bart Van Assche  ---
Hi H.J. Lu, thank you for having taken a look. I would like to try your patch.
However, I'm not a gcc developer so I don't have a gcc tree checked out on my
development workstation. It may take some time before I can test the patch that
you shared.

[Bug middle-end/102294] memset expansion is sometimes slow for small sizes

2021-09-12 Thread bart.vanassche at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

--- Comment #9 from Bart Van Assche  ---
Hmm ... isn't movups a floating-point instruction? I want to avoid floating
point instructions since my understanding is that it is not allowed to use
these in kernel code. See e.g.
https://stackoverflow.com/questions/13886338/use-of-floating-point-in-the-linux-kernel.

[Bug middle-end/102294] memset expansion is sometimes slow for small sizes

2021-09-12 Thread bart.vanassche at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

--- Comment #7 from Bart Van Assche  ---
Initializing small data structures via structure assignment is a common
approach in the Linux kernel.

This is the code gcc generates with the no-sse option applied:

(gdb) disas bio_init3
Dump of assembler code for function bio_init3:
   0x004011b0 <+0>: mov%rdi,%r8
   0x004011b3 <+3>: mov$0xf,%ecx
   0x004011b8 <+8>: xor%eax,%eax
   0x004011ba <+10>:rep stos %rax,%es:(%rdi)
   0x004011bd <+13>:movl   $0x1,0x20(%r8)
   0x004011c5 <+21>:mov%dx,0x62(%r8)
   0x004011ca <+26>:movl   $0x1,0x64(%r8)
   0x004011d2 <+34>:mov%rsi,0x68(%r8)
   0x004011d6 <+38>:ret

This is the code clang generates with the no-sse option applied:

(gdb) disas bio_init3
Dump of assembler code for function bio_init3:
   0x004012c0 <+0>: movq   $0x0,0x18(%rdi)
   0x004012c8 <+8>: movq   $0x0,0x10(%rdi)
   0x004012d0 <+16>:movq   $0x0,0x8(%rdi)
   0x004012d8 <+24>:movq   $0x0,(%rdi)
   0x004012df <+31>:movl   $0x1,0x20(%rdi)
   0x004012e6 <+38>:movq   $0x0,0x24(%rdi)
   0x004012ee <+46>:movq   $0x0,0x2c(%rdi)
   0x004012f6 <+54>:movq   $0x0,0x34(%rdi)
   0x004012fe <+62>:movq   $0x0,0x3c(%rdi)
   0x00401306 <+70>:movq   $0x0,0x44(%rdi)
   0x0040130e <+78>:movq   $0x0,0x4c(%rdi)
   0x00401316 <+86>:movq   $0x0,0x54(%rdi)
   0x0040131e <+94>:movq   $0x0,0x5a(%rdi)
   0x00401326 <+102>:   mov%dx,0x62(%rdi)
   0x0040132a <+106>:   movl   $0x1,0x64(%rdi)
   0x00401331 <+113>:   mov%rsi,0x68(%rdi)
   0x00401335 <+117>:   movq   $0x0,0x70(%rdi)
   0x0040133d <+125>:   ret

Is there any x86_64 CPU on which the latter code runs slower than the former?

[Bug middle-end/102294] structure assignment slower than memberwise initialization

2021-09-12 Thread bart.vanassche at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

--- Comment #5 from Bart Van Assche  ---
Please note that bio_init3() does not use atomic_set() but ATOMIC_INIT(). The
definition of ATOMIC_INIT() is as follows:

#define ATOMIC_INIT(v) (atomic_t){.counter = (v)}

[Bug middle-end/102294] structure assignment slower than memberwise initialization

2021-09-12 Thread bart.vanassche at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

--- Comment #3 from Bart Van Assche  ---
Thanks for the quick feedback. I have modified the test program and added
target("no-sse") to the bio_init[123]() functions. With that change applied the
results are as follows:

$ gcc -O2 -o bio_init bio_init.c && ./bio_init
Elapsed time: 0.965606 s
Elapsed time: 0.529943 s
Elapsed time: 0.734645 s
$ clang -O2 -o bio_init-clang bio_init.c && ./bio_init-clang
Elapsed time: 0.633179 s
Elapsed time: 0.605532 s
Elapsed time: 0.504315 s

It seems like clang still generates significantly better code for bio_init3()
than gcc?

[Bug middle-end/102294] structure assignment slower than memberwise initialization

2021-09-12 Thread bart.vanassche at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

Bart Van Assche  changed:

   What|Removed |Added

  Attachment #51444|0   |1
is obsolete||

--- Comment #2 from Bart Van Assche  ---
Created attachment 51445
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51445=edit
Test program that illustrates the issue

[Bug c/102294] New: structure assignment slower than memberwise initialization

2021-09-12 Thread bart.vanassche at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294

Bug ID: 102294
   Summary: structure assignment slower than memberwise
initialization
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bart.vanassche at gmail dot com
  Target Milestone: ---

Created attachment 51444
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51444=edit
Test program that illustrates the issue

The output of the attached test program is as follows for an Intel Core i7-4790
CPU (3.6 GHz) when compiled with -O2:
$ ~/test/bio_init 
Elapsed time: 0.874763 s
Elapsed time: 0.480335 s
Elapsed time: 0.733273 s

The above output shows that bio_init2() runs faster than bio_init3() and that
bio_init3() runs faster than bio_init1(). bio_init3() uses structure assignment
to initialize struct bio while bio_init2() uses memberwise initialization.
bio_init1() uses memset(). To me it was a big surprise to see that bio_init3()
is slower than bio_init2(). Apparently clang generates better code:

$ clang -O2 -o bio_init-clang bio_init.c
$ ./bio_init-clang 

Elapsed time: 0.446804 s
Elapsed time: 0.455009 s
Elapsed time: 0.407392 s

Can gcc be modified such that bio_init3() runs at least as fast as bio_init2()?

The bio_init[123]() source code comes from the Linux kernel. Optimization level
-O2 has been chosen because that is what the Linux kernel uses.

[Bug middle-end/52925] [4.5/4.6 Regression] var-tracking never terminates

2012-04-10 Thread bart.vanassche at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52925

Bart Van Assche bart.vanassche at gmail dot com changed:

   What|Removed |Added

 CC||bart.vanassche at gmail dot
   ||com

--- Comment #2 from Bart Van Assche bart.vanassche at gmail dot com 
2012-04-10 10:52:49 UTC ---
I ran into this issue too - see also
http://bugzilla.novell.com/show_bug.cgi?id=756235.


[Bug libstdc++/51504] New: Data race hunting instructions in manual do not work

2011-12-11 Thread bart.vanassche at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51504

 Bug #: 51504
   Summary: Data race hunting instructions in manual do not work
Classification: Unclassified
   Product: gcc
   Version: 4.6.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: bart.vanass...@gmail.com


Created attachment 26048
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26048
Test program that allows to reproduce the bug

According to the instructions in the Data Race Hunting paragraph
(http://gcc.gnu.org/onlinedocs/libstdc++/manual/debug.html), the following
should be sufficient to avoid false positive data race reports on multithreaded
programs:

#include valgrind/drd.h
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(addr) \
  ANNOTATE_HAPPENS_BEFORE(addr)
#define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(addr) \
  ANNOTATE_HAPPENS_AFTER(addr)
#define _GLIBCXX_EXTERN_TEMPLATE -1

Unfortunately that's not sufficient. The output I obtained for a small test
program is:

$ ./vg-in-place --tool=drd drd/tests/std_thread 21 | grep -E 'Confl|SUMMARY'
==18629== Conflicting store by thread 1 at 0x0433e02c size 4
==18629== Conflicting store by thread 1 at 0x0433e02c size 4
==18629== Conflicting load by thread 1 at 0x0433e034 size 4
==18629== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 16 from 6)

After digging around a little I found out that in the created thread the 
reference count of the _Impl object is decremented from inside libstdc++. So no
matter which macros are defined in the code that includes thread, that
reference count decrementing code won't be annotated. Moving the implementation
of the function execute_native_thread_routine() from src/thread.cc to
include/std/thread might fix this (haven't tried this).

Detailed information:
$ uname -a
Linux f16 3.1.4-1.fc16.i686.PAE #1 SMP Tue Nov 29 12:23:00 UTC 2011 i686 i686
i386 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.2/lto-wrapper
Target: i686-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin
--enable-java-awt=gtk --disable-dssi
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-ppl --with-cloog --with-tune=generic --with-arch=i686
--build=i686-redhat-linux
Thread model: posix
gcc version 4.6.2 20111027 (Red Hat 4.6.2-1) (GCC)

This bug can be reproduced by running the following commands on a system with
gcc 4.6.x:
svn co -r12291 svn://svn.valgrind.org/valgrind/trunk valgrind
cd valgrind
./autogen.sh
./configure
make -s
make -s check
./vg-in-place --tool=drd drd/tests/std_thread


[Bug libstdc++/51504] Data race hunting instructions in manual do not work

2011-12-11 Thread bart.vanassche at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51504

--- Comment #1 from Bart Van Assche bart.vanassche at gmail dot com 
2011-12-11 20:26:47 UTC ---
Created attachment 26049
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26049
Detailed DRD output for the test program