[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #23 from Jakub Jelinek --- The bug is fixed, you must be running into a different issue, either in the source you're compiling, or in the compiler. So, please open a new bugreport instead of commenting on a different one, and supply all the needed information (see http://gcc.gnu.org/bugs/ for details on what we need).
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 Thomas Gereke changed: What|Removed |Added CC||gcc at thomasgereke dot de --- Comment #22 from Thomas Gereke --- Seems the bug does still exist in 6.3.0 20170516 (Debian 6.3.0-18). I get a GP on >x0x5574d8c8 <...[abi:cxx11]() const+264>movdqa 0x68(%rsp),%xmm0 x0x5574d8ce <...[abi:cxx11]() const+270>lea0x80(%rsp),%r13 x0x5574d8d6 <...[abi:cxx11]() const+278>movq $0x0,0x50(%rsp) x0x5574d8df <...[abi:cxx11]() const+287>movl $0x0,0x10(%rsp) x0x5574d8e7 <...[abi:cxx11]() const+295>movaps %xmm0,(%rsp) x0x5574d8eb <...[abi:cxx11]() const+299>movq $0x0,0x6(%rsp) x0x5574d8f4 <...[abi:cxx11]() const+308>movw $0x0,0xe(%rsp) x0x5574d8fb <...[abi:cxx11]() const+315>movdqa (%rsp),%xmm1 x0x5574d900 <...[abi:cxx11]() const+320>movaps %xmm1,0x40(%rsp) The asm code is obviously wrong, because movdqa 0x68(%rsp),%xmm0 followed by movdqa (%rsp),%xmm1 without changes to %rsp has to fail. %rsp was 0x7fffecd477d0. Code was C++ compiled with -O3 and x86_64. The underlying data structure is boost::asio::ip::address, which consists of an enum (4 bytes), address_v4 (4 bytes) and address_v6 (16 bytes). The GP occurs when accessing the ipv6 address.
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #21 from H.J. Lu --- This bug isn't fixed in GCC 4.9. -O3 increases alignment from 64 bits to 128 bits on the original testcase: Hardware watchpoint 6: *(unsigned int *) 0x7fffee9b4468 Old value = 64 New value = 128 ensure_base_align (stmt_info=0x1c8f990, dr=0x1db5b20) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:4907 4907 DECL_USER_ALIGN (base_decl) = 1; (gdb) bt #0 ensure_base_align (stmt_info=0x1c8f990, dr=0x1db5b20) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:4907 #1 0x00d33471 in vectorizable_store (stmt=0x7fffed95a280, gsi=0x7fffd830, vec_stmt=0x7fffd790, slp_node=0x1d9e7a0) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:5131 #2 0x00d38f80 in vect_transform_stmt (stmt=0x7fffed95a280, gsi=0x7fffd830, grouped_store=0x7fffd84a, slp_node=0x1d9e7a0, slp_node_instance=0x1cb3e10) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:7211 #3 0x00d5a980 in vect_schedule_slp_instance (node=0x1d9e7a0, instance=0x1cb3e10, vectorization_factor=1) at /export/gnu/import/git/gcc-release/gcc/tree-vect-slp.c:3084 #4 0x00d5abd0 in vect_schedule_slp (loop_vinfo=0x0, bb_vinfo=0x1ddf410) at /export/gnu/import/git/gcc-release/gcc/tree-vect-slp.c:3154 #5 0x00d5aea7 in vect_slp_transform_bb (bb=0x7fffece8ec30) at /export/gnu/import/git/gcc-release/gcc/tree-vect-slp.c:3230 #6 0x00d5e41b in execute_vect_slp () at /export/gnu/import/git/gcc-release/gcc/tree-vectorizer.c:605 #7 0x00d5e4c9 in (anonymous namespace)::pass_slp_vectorize::execute ( this=0x1b97010) at /export/gnu/import/git/gcc-release/gcc/tree-vectorizer.c:649 #8 0x00a7da14 in execute_one_pass (pass=0x1b97010) ---Type to continue, or q to quit---q at /export/gnu/imporQuit (gdb) f 1 #1 0x00d33471 in vectorizable_store (stmt=0x7fffed95a280, gsi=0x7fffd830, vec_stmt=0x7fffd790, slp_node=0x1d9e7a0) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:5131 5131 ensure_base_align (stmt_info, dr); (gdb) f 2 #2 0x00d38f80 in vect_transform_stmt (stmt=0x7fffed95a280, gsi=0x7fffd830, grouped_store=0x7fffd84a, slp_node=0x1d9e7a0, slp_node_instance=0x1cb3e10) at /export/gnu/import/git/gcc-release/gcc/tree-vect-stmts.c:7211 7211 done = vectorizable_store (stmt, gsi, &vec_stmt, slp_node); (gdb) This bug may be really fixed by r221268: iff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index aa9d43f..41ff802 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -4956,8 +4956,13 @@ ensure_base_align (stmt_vec_info stmt_info, struct data_reference *dr) tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree base_decl = ((dataref_aux *)dr->aux)->base_decl; - DECL_ALIGN (base_decl) = TYPE_ALIGN (vectype); - DECL_USER_ALIGN (base_decl) = 1; + if (decl_in_symtab_p (base_decl)) + symtab_node::get (base_decl)->increase_alignment (TYPE_ALIGN (vectype)); + else + { + DECL_ALIGN (base_decl) = TYPE_ALIGN (vectype); + DECL_USER_ALIGN (base_decl) = 1; + } ((dataref_aux *)dr->aux)->base_misaligned = false; } } in GCC 5.
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #20 from Jakub Jelinek --- Author: jakub Date: Wed Jun 12 06:43:05 2013 New Revision: 199984 URL: http://gcc.gnu.org/viewcvs?rev=199984&root=gcc&view=rev Log: PR target/56564 * varasm.c (decl_binds_to_current_def_p): Call binds_local_p target hook even for !TREE_PUBLIC decls. If no resolution info is available, return false for common and external decls. Modified: trunk/gcc/ChangeLog trunk/gcc/varasm.c Author: jakub Date: Wed Jun 12 06:46:53 2013 New Revision: 199985 URL: http://gcc.gnu.org/viewcvs?rev=199985&root=gcc&view=rev Log: PR target/56564 * gcc.target/i386/pr56564-1.c: Skip on darwin, mingw and cygwin. * gcc.target/i386/pr56564-3.c: Likewise. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr56564-1.c trunk/gcc/testsuite/gcc.target/i386/pr56564-3.c
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #19 from Jakub Jelinek --- The mingw/cygwin stuff. The testcases assume that the symbols have decl_binds_to_current_def_p false, if that isn't the case (because darwin/mingw apparently don't allow symbol interposition), then the testcase can't work on those.
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #18 from Dominique d'Humieres --- (In reply to comment #17) > Yeah, MachO is broken by design, guess the tests need to be restricted > to non-darwin non-PE. Questions: (1) What is PE? (2) Is the second "return 0;" wrong code or valid optimization? If the former, why? (3) Is the decoration "__emutls_v." the same for all the emutls platforms? If not, where can I find the variants?
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #17 from Jakub Jelinek --- (In reply to Dominique d'Humieres from comment #16) > On x86_64-apple-darwin10.8 at revision 199935, I get the following failures > for the tests added at revision 199898: > > FAIL: gcc.target/i386/pr56564-1.c scan-tree-dump-times optimized "&s" 1 > FAIL: gcc.target/i386/pr56564-1.c scan-tree-dump-times optimized "return 0" 1 > FAIL: gcc.target/i386/pr56564-3.c scan-tree-dump-times optimized "&s" 1 > FAIL: gcc.target/i386/pr56564-3.c scan-tree-dump-times optimized "&t" 1 Yeah, MachO is broken by design, guess the tests need to be restricted to non-darwin non-PE.
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #16 from Dominique d'Humieres --- On x86_64-apple-darwin10.8 at revision 199935, I get the following failures for the tests added at revision 199898: FAIL: gcc.target/i386/pr56564-1.c scan-tree-dump-times optimized "&s" 1 FAIL: gcc.target/i386/pr56564-1.c scan-tree-dump-times optimized "return 0" 1 FAIL: gcc.target/i386/pr56564-3.c scan-tree-dump-times optimized "&s" 1 FAIL: gcc.target/i386/pr56564-3.c scan-tree-dump-times optimized "&t" 1 The optimized dumps are (blank lines removed): [macbook] f90/bug% cat pr56564-1.c.165t.optimized ;; Function foo (foo, funcdef_no=0, decl_uid=1741, symbol_order=2) foo () { : return 0; } ;; Function bar (bar, funcdef_no=1, decl_uid=1744, symbol_order=3) bar () { : return 0; } [macbook] f90/bug% cat pr56564-3.c.165t.optimized ;; Function foo (foo, funcdef_no=0, decl_uid=1741, symbol_order=2) foo () { struct S * D.1770; long int s.0; int _2; int _3; : _5 = __builtin___emutls_get_address (&__emutls_v.s); s.0_1 = (long int) _5; _2 = (int) s.0_1; _3 = _2 & 15; return _3; } ;; Function bar (bar, funcdef_no=1, decl_uid=1744, symbol_order=3) bar () { char * D.1769; char[16] * D.1768; long int _1; int _2; int _3; : _5 = __builtin___emutls_get_address (&__emutls_v.t); _6 = &*_5[0]; _1 = (long int) _6; _2 = (int) _1; _3 = _2 & 15; return _3; }
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #15 from Jakub Jelinek --- Author: jakub Date: Tue Jun 11 06:03:46 2013 New Revision: 199934 URL: http://gcc.gnu.org/viewcvs?rev=199934&root=gcc&view=rev Log: PR target/56564 * varasm.c (get_variable_align): Move #endif to the right place. Modified: trunk/gcc/ChangeLog trunk/gcc/varasm.c
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 Jakub Jelinek changed: What|Removed |Added Assignee|hubicka at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #14 from Jakub Jelinek --- Author: jakub Date: Mon Jun 10 15:41:52 2013 New Revision: 199898 URL: http://gcc.gnu.org/viewcvs?rev=199898&root=gcc&view=rev Log: PR target/56564 * varasm.c (align_variable): Don't use DATA_ALIGNMENT or CONSTANT_ALIGNMENT if !decl_binds_to_current_def_p (decl). Use DATA_ABI_ALIGNMENT for that case instead if defined. (get_variable_align): New function. (get_variable_section, emit_bss, emit_common, assemble_variable_contents, place_block_symbol): Use get_variable_align instead of DECL_ALIGN. (assemble_noswitch_variable): Add align argument, use it instead of DECL_ALIGN. (assemble_variable): Adjust caller. Use get_variable_align instead of DECL_ALIGN. * config/i386/i386.h (DATA_ALIGNMENT): Adjust x86_data_alignment caller. (DATA_ABI_ALIGNMENT): Define. * config/i386/i386-protos.h (x86_data_alignment): Adjust prototype. * config/i386/i386.c (x86_data_alignment): Add opt argument. If opt is false, only return the psABI mandated alignment increase. * config/c6x/c6x.h (DATA_ALIGNMENT): Renamed to... (DATA_ABI_ALIGNMENT): ... this. * config/mmix/mmix.h (DATA_ALIGNMENT): Renamed to... (DATA_ABI_ALIGNMENT): ... this. * config/mmix/mmix.c (mmix_data_alignment): Adjust function comment. * config/s390/s390.h (DATA_ALIGNMENT): Renamed to... (DATA_ABI_ALIGNMENT): ... this. * doc/tm.texi.in (DATA_ABI_ALIGNMENT): Document. * doc/tm.texi: Regenerated. * gcc.target/i386/pr56564-1.c: New test. * gcc.target/i386/pr56564-2.c: New test. * gcc.target/i386/pr56564-3.c: New test. * gcc.target/i386/pr56564-4.c: New test. * gcc.target/i386/avx256-unaligned-load-4.c: Add -fno-common. * gcc.target/i386/avx256-unaligned-store-1.c: Likewise. * gcc.target/i386/avx256-unaligned-store-3.c: Likewise. * gcc.target/i386/avx256-unaligned-store-4.c: Likewise. * gcc.target/i386/vect-sizes-1.c: Likewise. * gcc.target/i386/memcpy-1.c: Likewise. * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c (tmp): Initialize. * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c (tmp): Likewise. Added: trunk/gcc/testsuite/gcc.target/i386/pr56564-1.c trunk/gcc/testsuite/gcc.target/i386/pr56564-2.c trunk/gcc/testsuite/gcc.target/i386/pr56564-3.c trunk/gcc/testsuite/gcc.target/i386/pr56564-4.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/c6x/c6x.h trunk/gcc/config/i386/i386-protos.h trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/config/mmix/mmix.c trunk/gcc/config/mmix/mmix.h trunk/gcc/config/s390/s390.h trunk/gcc/doc/tm.texi trunk/gcc/doc/tm.texi.in trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-load-4.c trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-1.c trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c trunk/gcc/testsuite/gcc.target/i386/avx256-unaligned-store-4.c trunk/gcc/testsuite/gcc.target/i386/memcpy-1.c trunk/gcc/testsuite/gcc.target/i386/vect-sizes-1.c trunk/gcc/varasm.c
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #13 from Jakub Jelinek --- Created attachment 30275 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30275&action=edit gcc49-pr56564.patch Untested fix. Honza, is the array type >= 16 bytes alignment increase the only ABI mandated one and all the rest is just optimization?
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #12 from Jakub Jelinek --- Maybe it was original DATA_ALIGNMENT purpose, but it certainly serves for both right now, which is wrong, we need one for ABI mandated stuff and one for optimization stuff beyond, where optimization alignment can be used if it can be proved that we'll bind to the optimized decl, but ABI has to be used otherwise. E.g. x86_64 ABI says that certain arrays are aligned that and that way, it is certainly something beyond what TYPE_ALIGN provides (changing TYPE_ALIGN of the arrays would affect layout of structures, but that is wrong).
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 Sandra Loosemore changed: What|Removed |Added CC||sandra at codesourcery dot com --- Comment #11 from Sandra Loosemore --- This affects at least PowerPC, too, which implements DATA_ALIGNMENT to add additional alignment beyond that specified by the ABI. Isn't TYPE_ALIGN already supposed to return the ABI-mandated alignment for objects of a given type? The documentation for DATA_ALIGNMENT already suggests that its purpose is to add additional alignment for optimization purposes and I suspect other targets may be using it that way, too. Perhaps what's needed here is more careful monitoring of the places where DATA_ALIGNMENT is being used, rather than splitting it into two macros or adding an argument to control the two uses. Or at least, we'd have to clarify how the requirements for the ABI-conforming use of DATA_ALIGNMENT differ from what TYPE_ALIGN is supposed to do. It seems to me that DATA_ALIGNMENT's original purpose was to add additional alignment on variable definitions, and IIUC the problem now is either that it is being used in other contexts or that its intended use is not taking into account common, weak, and/or comdat definitions where the linker may substitute a less-aligned definition from another compilation unit. Also, somebody should check whether vect_can_force_dr_alignment_p in tree-vect-data-refs.c is catching all the cases it needs to for ABI conformance.
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 Jan Hubicka changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |hubicka at gcc dot gnu.org |gnu.org | --- Comment #10 from Jan Hubicka 2013-04-08 15:22:21 UTC --- Mine.
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #9 from Jakub Jelinek 2013-03-08 12:38:20 UTC --- Smaller testcase (-O2 -fpic): struct S { long a, b; } s; int foo (void) { return ((long) &s) & 15; } is since http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162943 optimized into return 0, even when (probably) the psABI doesn't guarantee that. But e.g. for __builtin_memset (&s, 0, sizeof (s)); one can see already in 4.0 RTL dumps with -O2 -fpic that MEM_ALIGN of s is assumed to be 128-bit.
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 --- Comment #8 from Jakub Jelinek 2013-03-08 11:35:36 UTC --- Guess we'd need to split DATA_ALIGNMENT into two different macros (or one with an extra argument), so that align_variable would know what alignment is part of ABI and what is just an optimization above that, then align_variable could call targetm.binds_local_p to see if DECL_ALIGN can be increased to the optimization level or needs to stay at the ABI guaranteed level. And then when assembling vars, we'd increase the emitted alignment to the optimization level.
[Bug target/56564] movdqa on possibly-8-byte-aligned struct with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56564 Richard Biener changed: What|Removed |Added Keywords||ABI, wrong-code Target||x86_64-*-* Status|WAITING |NEW --- Comment #7 from Richard Biener 2013-03-08 11:26:19 UTC --- Confirmed.