[Bug target/102294] memset expansion is sometimes slow for small sizes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 --- Comment #13 from Bart Van Assche --- Hi H.J. Lu, thank you for having taken a look. I would like to try your patch. However, I'm not a gcc developer so I don't have a gcc tree checked out on my development workstation. It may take some time before I can test the patch that you shared.
[Bug middle-end/102294] memset expansion is sometimes slow for small sizes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 --- Comment #9 from Bart Van Assche --- Hmm ... isn't movups a floating-point instruction? I want to avoid floating point instructions since my understanding is that it is not allowed to use these in kernel code. See e.g. https://stackoverflow.com/questions/13886338/use-of-floating-point-in-the-linux-kernel.
[Bug middle-end/102294] memset expansion is sometimes slow for small sizes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 --- Comment #7 from Bart Van Assche --- Initializing small data structures via structure assignment is a common approach in the Linux kernel. This is the code gcc generates with the no-sse option applied: (gdb) disas bio_init3 Dump of assembler code for function bio_init3: 0x004011b0 <+0>: mov%rdi,%r8 0x004011b3 <+3>: mov$0xf,%ecx 0x004011b8 <+8>: xor%eax,%eax 0x004011ba <+10>:rep stos %rax,%es:(%rdi) 0x004011bd <+13>:movl $0x1,0x20(%r8) 0x004011c5 <+21>:mov%dx,0x62(%r8) 0x004011ca <+26>:movl $0x1,0x64(%r8) 0x004011d2 <+34>:mov%rsi,0x68(%r8) 0x004011d6 <+38>:ret This is the code clang generates with the no-sse option applied: (gdb) disas bio_init3 Dump of assembler code for function bio_init3: 0x004012c0 <+0>: movq $0x0,0x18(%rdi) 0x004012c8 <+8>: movq $0x0,0x10(%rdi) 0x004012d0 <+16>:movq $0x0,0x8(%rdi) 0x004012d8 <+24>:movq $0x0,(%rdi) 0x004012df <+31>:movl $0x1,0x20(%rdi) 0x004012e6 <+38>:movq $0x0,0x24(%rdi) 0x004012ee <+46>:movq $0x0,0x2c(%rdi) 0x004012f6 <+54>:movq $0x0,0x34(%rdi) 0x004012fe <+62>:movq $0x0,0x3c(%rdi) 0x00401306 <+70>:movq $0x0,0x44(%rdi) 0x0040130e <+78>:movq $0x0,0x4c(%rdi) 0x00401316 <+86>:movq $0x0,0x54(%rdi) 0x0040131e <+94>:movq $0x0,0x5a(%rdi) 0x00401326 <+102>: mov%dx,0x62(%rdi) 0x0040132a <+106>: movl $0x1,0x64(%rdi) 0x00401331 <+113>: mov%rsi,0x68(%rdi) 0x00401335 <+117>: movq $0x0,0x70(%rdi) 0x0040133d <+125>: ret Is there any x86_64 CPU on which the latter code runs slower than the former?
[Bug middle-end/102294] structure assignment slower than memberwise initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 --- Comment #5 from Bart Van Assche --- Please note that bio_init3() does not use atomic_set() but ATOMIC_INIT(). The definition of ATOMIC_INIT() is as follows: #define ATOMIC_INIT(v) (atomic_t){.counter = (v)}
[Bug middle-end/102294] structure assignment slower than memberwise initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 --- Comment #3 from Bart Van Assche --- Thanks for the quick feedback. I have modified the test program and added target("no-sse") to the bio_init[123]() functions. With that change applied the results are as follows: $ gcc -O2 -o bio_init bio_init.c && ./bio_init Elapsed time: 0.965606 s Elapsed time: 0.529943 s Elapsed time: 0.734645 s $ clang -O2 -o bio_init-clang bio_init.c && ./bio_init-clang Elapsed time: 0.633179 s Elapsed time: 0.605532 s Elapsed time: 0.504315 s It seems like clang still generates significantly better code for bio_init3() than gcc?
[Bug middle-end/102294] structure assignment slower than memberwise initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 Bart Van Assche changed: What|Removed |Added Attachment #51444|0 |1 is obsolete|| --- Comment #2 from Bart Van Assche --- Created attachment 51445 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51445=edit Test program that illustrates the issue
[Bug c/102294] New: structure assignment slower than memberwise initialization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 Bug ID: 102294 Summary: structure assignment slower than memberwise initialization Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: bart.vanassche at gmail dot com Target Milestone: --- Created attachment 51444 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51444=edit Test program that illustrates the issue The output of the attached test program is as follows for an Intel Core i7-4790 CPU (3.6 GHz) when compiled with -O2: $ ~/test/bio_init Elapsed time: 0.874763 s Elapsed time: 0.480335 s Elapsed time: 0.733273 s The above output shows that bio_init2() runs faster than bio_init3() and that bio_init3() runs faster than bio_init1(). bio_init3() uses structure assignment to initialize struct bio while bio_init2() uses memberwise initialization. bio_init1() uses memset(). To me it was a big surprise to see that bio_init3() is slower than bio_init2(). Apparently clang generates better code: $ clang -O2 -o bio_init-clang bio_init.c $ ./bio_init-clang Elapsed time: 0.446804 s Elapsed time: 0.455009 s Elapsed time: 0.407392 s Can gcc be modified such that bio_init3() runs at least as fast as bio_init2()? The bio_init[123]() source code comes from the Linux kernel. Optimization level -O2 has been chosen because that is what the Linux kernel uses.
[Bug middle-end/52925] [4.5/4.6 Regression] var-tracking never terminates
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52925 Bart Van Assche bart.vanassche at gmail dot com changed: What|Removed |Added CC||bart.vanassche at gmail dot ||com --- Comment #2 from Bart Van Assche bart.vanassche at gmail dot com 2012-04-10 10:52:49 UTC --- I ran into this issue too - see also http://bugzilla.novell.com/show_bug.cgi?id=756235.
[Bug libstdc++/51504] New: Data race hunting instructions in manual do not work
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51504 Bug #: 51504 Summary: Data race hunting instructions in manual do not work Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: bart.vanass...@gmail.com Created attachment 26048 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26048 Test program that allows to reproduce the bug According to the instructions in the Data Race Hunting paragraph (http://gcc.gnu.org/onlinedocs/libstdc++/manual/debug.html), the following should be sufficient to avoid false positive data race reports on multithreaded programs: #include valgrind/drd.h #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(addr) \ ANNOTATE_HAPPENS_BEFORE(addr) #define _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(addr) \ ANNOTATE_HAPPENS_AFTER(addr) #define _GLIBCXX_EXTERN_TEMPLATE -1 Unfortunately that's not sufficient. The output I obtained for a small test program is: $ ./vg-in-place --tool=drd drd/tests/std_thread 21 | grep -E 'Confl|SUMMARY' ==18629== Conflicting store by thread 1 at 0x0433e02c size 4 ==18629== Conflicting store by thread 1 at 0x0433e02c size 4 ==18629== Conflicting load by thread 1 at 0x0433e034 size 4 ==18629== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 16 from 6) After digging around a little I found out that in the created thread the reference count of the _Impl object is decremented from inside libstdc++. So no matter which macros are defined in the code that includes thread, that reference count decrementing code won't be annotated. Moving the implementation of the function execute_native_thread_routine() from src/thread.cc to include/std/thread might fix this (haven't tried this). Detailed information: $ uname -a Linux f16 3.1.4-1.fc16.i686.PAE #1 SMP Tue Nov 29 12:23:00 UTC 2011 i686 i686 i386 GNU/Linux $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/i686-redhat-linux/4.6.2/lto-wrapper Target: i686-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch=i686 --build=i686-redhat-linux Thread model: posix gcc version 4.6.2 20111027 (Red Hat 4.6.2-1) (GCC) This bug can be reproduced by running the following commands on a system with gcc 4.6.x: svn co -r12291 svn://svn.valgrind.org/valgrind/trunk valgrind cd valgrind ./autogen.sh ./configure make -s make -s check ./vg-in-place --tool=drd drd/tests/std_thread
[Bug libstdc++/51504] Data race hunting instructions in manual do not work
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51504 --- Comment #1 from Bart Van Assche bart.vanassche at gmail dot com 2011-12-11 20:26:47 UTC --- Created attachment 26049 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26049 Detailed DRD output for the test program