Re: [RFC] Vectorization of indexed elements
On Fri, Sep 27, 2013 at 03:50:08PM +0100, Vidya Praveen wrote: [...] I can't really insist on the single lane load.. something like: vc:V4SI[0] = c vt:V4SI = vec_duplicate:V4SI (vec_select:SI vc:V4SI 0) va:V4SI = vb:V4SI op vt:V4SI Or is there any other way to do this? Can you elaborate on I can't really insist on the single lane load? What's the single lane load in your example? Loading just one lane of the vector like this: vc:V4SI[0] = c // from the above scalar example or vc:V4SI[0] = c[2] is what I meant by single lane load. In this example: t = c[2] ... vb:v4si = b[0:3] vc:v4si = { t, t, t, t } va:v4si = vb:v4si op vc:v4si If we are expanding the CONSTRUCTOR as vec_duplicate at vec_init, I cannot insist 't' to be vector and t = c[2] to be vect_t[0] = c[2] (which could be seen as vec_select:SI (vect_t 0) ). I'd expect the instruction pattern as quoted to just work (and I hope we expand an uniform constructor { a, a, a, a } properly using vec_duplicate). As much as I went through the code, this is only done using vect_init. It is not expanded as vec_duplicate from, for example, store_constructor() of expr.c Do you see any issues if we expand such constructor as vec_duplicate directly instead of going through vect_init way? VP
Mothballing C11 atomic work for now.
I don't have the time to finish pushing through the C11 atomic work for this release. Much of the remaining parts are in the parser, which I know very little about, and I won't be able to do a sufficient job in the time remaining, so I am switching my focus to the interface work and getting the header files re-factored before we end stage 1. I have put all the work to date in a branch 'C11-atomic' which is based off of trunk on Sept 25th. I have detailed the current status as well as what else needs doing on the C11 atomic wiki page http://gcc.gnu.org/wiki/Atomic/C11 I have also uploaded the specific patches that have been applied to the branch, along with their revision numbers. they are also on that wiki page. The patches are very similar to what I posted here http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00420.html I addressed jsm's basic comments, but did not address the larger issues of warnings/errors and converting lvalues into rvals, and other front end issues. I also removed the places where i tried to treat the atomic qualifier like it was volatile.. I think that was wrong and was masking other issues. If this work is important to someone else, you are welcome to pick it up. My parser expertise is minimal, and most of the remaining work is in that part of the compiler. The facilities are provided already to do the expansion of atomic variable into the appropriate sequences, they just need to be called from the right places in the parser. I may well get back to this next spring for the next release, but for now I am mothballing it until I have the time to learn that parts I need to learn to finish it. Andrew
Re: Mothballing C11 atomic work for now.
If C11 atomics are not going into 4.9, then comments made to reject http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58016 no longer hold and I would ask that the resolution of both it and http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53769 be reconsidered. Jeff On Fri, Sep 27, 2013 at 2:10 PM, Andrew MacLeod amacl...@redhat.com wrote: I don't have the time to finish pushing through the C11 atomic work for this release. Much of the remaining parts are in the parser, which I know very little about, and I won't be able to do a sufficient job in the time remaining, so I am switching my focus to the interface work and getting the header files re-factored before we end stage 1. I have put all the work to date in a branch 'C11-atomic' which is based off of trunk on Sept 25th. I have detailed the current status as well as what else needs doing on the C11 atomic wiki page http://gcc.gnu.org/wiki/Atomic/C11 I have also uploaded the specific patches that have been applied to the branch, along with their revision numbers. they are also on that wiki page. The patches are very similar to what I posted here http://gcc.gnu.org/ml/gcc-patches/2013-08/msg00420.html I addressed jsm's basic comments, but did not address the larger issues of warnings/errors and converting lvalues into rvals, and other front end issues. I also removed the places where i tried to treat the atomic qualifier like it was volatile.. I think that was wrong and was masking other issues. If this work is important to someone else, you are welcome to pick it up. My parser expertise is minimal, and most of the remaining work is in that part of the compiler. The facilities are provided already to do the expansion of atomic variable into the appropriate sequences, they just need to be called from the right places in the parser. I may well get back to this next spring for the next release, but for now I am mothballing it until I have the time to learn that parts I need to learn to finish it. Andrew -- Jeff Hammond jeff.scie...@gmail.com
[Bug c++/58548] New: ICE with local struct in function with auto parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58548 Bug ID: 58548 Summary: ICE with local struct in function with auto parameter Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: reichelt at gcc dot gnu.org The following code snippet triggers an ICE on trunk (4.9.0 20130926) when compiled with -std=gnu++1y: === void foo(auto) { struct A { int i; }; } === bug.cc: In function 'void foo(auto1)': bug.cc:3:18: error: data member 'i' cannot be a member template struct A { int i; }; ^ neu40.cc:6:18: internal compiler error: in poplevel, at cp/decl.c:560 0x554850 poplevel(int, int, int) ../../gcc/gcc/cp/decl.c:560 0x58e568 end_template_decl() ../../gcc/gcc/cp/pt.c:3786 0x62ed7b finish_fully_implicit_template ../../gcc/gcc/cp/parser.c:29040 0x637ad1 cp_parser_member_declaration ../../gcc/gcc/cp/parser.c:20086 0x6381ee cp_parser_member_specification_opt ../../gcc/gcc/cp/parser.c:19630 0x6381ee cp_parser_class_specifier_1 ../../gcc/gcc/cp/parser.c:18885 0x63ab90 cp_parser_class_specifier ../../gcc/gcc/cp/parser.c:19101 0x63ab90 cp_parser_type_specifier ../../gcc/gcc/cp/parser.c:14080 0x6500a9 cp_parser_decl_specifier_seq ../../gcc/gcc/cp/parser.c:11328 0x654139 cp_parser_simple_declaration ../../gcc/gcc/cp/parser.c:10918 0x656140 cp_parser_block_declaration ../../gcc/gcc/cp/parser.c:10867 0x657230 cp_parser_declaration_statement ../../gcc/gcc/cp/parser.c:10514 0x63fad7 cp_parser_statement ../../gcc/gcc/cp/parser.c:9274 0x640dde cp_parser_statement_seq_opt ../../gcc/gcc/cp/parser.c:9552 0x640f26 cp_parser_compound_statement ../../gcc/gcc/cp/parser.c:9506 0x6522db cp_parser_function_body ../../gcc/gcc/cp/parser.c:18318 0x6522db cp_parser_ctor_initializer_opt_and_function_body ../../gcc/gcc/cp/parser.c:18354 0x65331f cp_parser_function_definition_after_declarator ../../gcc/gcc/cp/parser.c:22338 0x654027 cp_parser_function_definition_from_specifiers_and_declarator ../../gcc/gcc/cp/parser.c:22259 0x654027 cp_parser_init_declarator ../../gcc/gcc/cp/parser.c:16347 Please submit a full bug report, [etc.] Furthermore, IMHO the error message is bogus and the code should be accepted.
[Bug c++/58549] New: [c++1y] ICE with local function in function with auto parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58549 Bug ID: 58549 Summary: [c++1y] ICE with local function in function with auto parameter Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: reichelt at gcc dot gnu.org The following valid code snippet (compiled with -std=gnu++1y) triggers an ICE on trunk (4.9.0 20130926): === void foo(auto) { void bar(); } === bug.cc: In function 'void foo(auto1)': bug.cc:4:1: internal compiler error: in finish_function, at cp/decl.c:13852 } ^ 0x56b38f finish_function(int) ../../gcc/gcc/cp/decl.c:13852 0x65333d cp_parser_function_definition_after_declarator ../../gcc/gcc/cp/parser.c:22344 0x654027 cp_parser_function_definition_from_specifiers_and_declarator ../../gcc/gcc/cp/parser.c:22259 0x654027 cp_parser_init_declarator ../../gcc/gcc/cp/parser.c:16347 0x6542df cp_parser_simple_declaration ../../gcc/gcc/cp/parser.c:10986 0x656140 cp_parser_block_declaration ../../gcc/gcc/cp/parser.c:10867 0x65f16e cp_parser_declaration ../../gcc/gcc/cp/parser.c:10764 0x65decd cp_parser_declaration_seq_opt ../../gcc/gcc/cp/parser.c:10650 0x65f7b6 cp_parser_translation_unit ../../gcc/gcc/cp/parser.c:3939 0x65f7b6 c_parse_file() ../../gcc/gcc/cp/parser.c:28898 0x772e94 c_common_parse_file() ../../gcc/gcc/c-family/c-opts.c:1046 Please submit a full bug report, [etc.]
[Bug c++/58548] [4.9 Regression] [c++1y] ICE with local struct in function with auto parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58548 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-09-27 CC||mpolacek at gcc dot gnu.org Target Milestone|--- |4.9.0 Summary|[c++1y] ICE with local |[4.9 Regression] [c++1y] |struct in function with |ICE with local struct in |auto parameter |function with auto ||parameter Ever confirmed|0 |1 --- Comment #1 from Marek Polacek mpolacek at gcc dot gnu.org --- Confirmed. I can't comment on whether it's valid code or not though.
[Bug c++/58549] [4.9 Regression] [c++1y] ICE with local function in function with auto parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58549 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-09-27 CC||mpolacek at gcc dot gnu.org Summary|[c++1y] ICE with local |[4.9 Regression] [c++1y] |function in function with |ICE with local function in |auto parameter |function with auto ||parameter Ever confirmed|0 |1 --- Comment #1 from Marek Polacek mpolacek at gcc dot gnu.org --- Confirmed with trunk, 4.8: q.C:1:10: error: parameter declared ‘auto’ void foo(auto) ^ Are these auto parameters really valid? What's their purpose?
[Bug c++/58549] [4.9 Regression] [c++1y] ICE with local function in function with auto parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58549 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.9.0
[Bug target/58546] volatile bug and also larger code at -Os
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58546 Uroš Bizjak ubizjak at gmail dot com changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #4 from Uroš Bizjak ubizjak at gmail dot com --- (In reply to Andrew Pinski from comment #3) This is a target specific issue as the RTL looks fine from expand: The splitter in question is the one with the comment: ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode. The splitter does check for aligned_operand operands, which in turn avoids volatiles. However, outside of the operand, data layout is not known to the predicate. Let's ask Honza about this.
[Bug c++/58550] New: [4.9 Regression] ][c++0x] ICE with auto in function return type and lto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58550 Bug ID: 58550 Summary: [4.9 Regression] ][c++0x] ICE with auto in function return type and lto Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: reichelt at gcc dot gnu.org The following (probably invalid) code snippet triggers an ICE on trunk (4.9.0 20130926) when compiled with std=c++0x -flto: auto foo(); auto fp = foo; bug.cc:1:10: warning: 'foo' function uses 'auto' type specifier without trailing return type [enabled by default] auto foo(); ^ bug.cc:3:14: internal compiler error: tree code 'template_type_parm' is not supported in LTO streams auto fp = foo; ^ 0xa17696 DFS_write_tree ../../gcc/gcc/lto-streamer-out.c:1244 0xa165c9 DFS_write_tree_body ../../gcc/gcc/lto-streamer-out.c:461 0xa165c9 DFS_write_tree ../../gcc/gcc/lto-streamer-out.c:1152 0xa165c9 DFS_write_tree_body ../../gcc/gcc/lto-streamer-out.c:461 0xa165c9 DFS_write_tree ../../gcc/gcc/lto-streamer-out.c:1152 0xa18907 lto_output_tree(output_block*, tree_node*, bool, bool) ../../gcc/gcc/lto-streamer-out.c:1334 0xa12cfc write_global_stream ../../gcc/gcc/lto-streamer-out.c:2084 0xa1a990 lto_output_decl_state_streams ../../gcc/gcc/lto-streamer-out.c:2128 0xa1a990 produce_asm_for_decls ../../gcc/gcc/lto-streamer-out.c:2413 0xa4e720 ipa_write_summaries_2 ../../gcc/gcc/passes.c:2283 0xa4f799 ipa_write_summaries_1 ../../gcc/gcc/passes.c:2314 0xa4f799 ipa_write_summaries() ../../gcc/gcc/passes.c:2371 0x807c5b ipa_passes ../../gcc/gcc/cgraphunit.c:2019 0x807c5b compile() ../../gcc/gcc/cgraphunit.c:2115 0x807ee9 finalize_compilation_unit() ../../gcc/gcc/cgraphunit.c:2269 0x61b2b0 cp_write_global_declarations() ../../gcc/gcc/cp/decl2.c:4360 Please submit a full bug report, [etc.] In GCC 4.8.1 the code was rejected: bug.cc:1:10: warning: 'foo' function uses 'auto' type specifier without trailing return type [enabled by default] auto foo(); ^ bug.cc:3:11: error: use of 'auto foo()' before deduction of 'auto' auto fp = foo;
[Bug c++/58549] [4.9 Regression] [c++1y] ICE with local function in function with auto parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58549 --- Comment #2 from Volker Reichelt reichelt at gcc dot gnu.org --- To me they look like a (syntactically simpler) alternative to template parameters. They were introduced here: 2013-09-16 Adam Butcher a...@jessamine.co.uk * cp-tree.h (type_uses_auto_or_concept): Declare. (is_auto_or_concept): Declare. * decl.c (grokdeclarator): Allow 'auto' parameters in lambdas with -std=gnu++1y or -std=c++1y or, as a GNU extension, in plain functions. [...]
[Bug c++/58550] [4.9 Regression] ][c++0x] ICE with auto in function return type and lto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58550 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-09-27 CC||mpolacek at gcc dot gnu.org Target Milestone|--- |4.9.0 Ever confirmed|0 |1 --- Comment #1 from Marek Polacek mpolacek at gcc dot gnu.org --- Confirmed with trunk. Interesting is that with -std=gnu++1y: w.C:3:11: error: use of ‘auto foo()’ before deduction of ‘auto’ auto fp = foo; ^
[Bug c++/58549] [4.9 Regression] [c++1y] ICE with local function in function with auto parameter
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58549 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added CC||abutcher at gcc dot gnu.org --- Comment #3 from Marek Polacek mpolacek at gcc dot gnu.org --- Started with r202850.
[Bug middle-end/58547] [4.9 Regression] rtlanal.c:5482:19: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58547 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.9.0
[Bug other/58545] [4.7/4.8/4.9 Regression] error: unable to find a register to spill in class 'POINTER_REGS'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58545 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.7.4 Summary|[4.7/4.8 Regression] error: |[4.7/4.8/4.9 Regression] |unable to find a register |error: unable to find a |to spill in class |register to spill in class |'POINTER_REGS' |'POINTER_REGS' --- Comment #2 from Richard Biener rguenth at gcc dot gnu.org --- Assuming 4.9 doesn't work either.
[Bug tree-optimization/58459] [4.9 regression] Loop invariant is not hoisted out of loop after r202525.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58459 --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- Author: rguenth Date: Fri Sep 27 08:14:53 2013 New Revision: 202966 URL: http://gcc.gnu.org/viewcvs?rev=202966root=gccview=rev Log: 2013-09-27 Richard Biener rguent...@suse.de PR tree-optimization/58459 * tree-ssa-forwprop.c (forward_propagate_addr_expr): Remove restriction not propagating into loops. * gcc.dg/tree-ssa/ssa-pre-31.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-31.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-forwprop.c
[Bug c++/58550] [4.9 Regression] ][c++0x] ICE with auto in function return type and lto
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58550 Marek Polacek mpolacek at gcc dot gnu.org changed: What|Removed |Added CC||jason at gcc dot gnu.org --- Comment #2 from Marek Polacek mpolacek at gcc dot gnu.org --- This one seems to start with r198099 -- but it might be some other latent issue...
[Bug middle-end/58547] [4.9 Regression] rtlanal.c:5482:19: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58547 Eric Botcazou ebotcazou at gcc dot gnu.org changed: What|Removed |Added Target|hppa-unknown-linux-gnu | Status|UNCONFIRMED |NEW Last reconfirmed||2013-09-27 CC||ebotcazou at gcc dot gnu.org Host|hppa-unknown-linux-gnu | Ever confirmed|0 |1 Build|hppa-unknown-linux-gnu | Severity|normal |major --- Comment #1 from Eric Botcazou ebotcazou at gcc dot gnu.org --- Confirmed on PowerPC.
[Bug middle-end/58551] New: [4.9 Regression] ICE with abort in OpenMP SESE region inside of some loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58551 Bug ID: 58551 Summary: [4.9 Regression] ICE with abort in OpenMP SESE region inside of some loop Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org /* { dg-do compile } */ /* { dg-options -O0 -fopenmp } */ void foo (int *a) { int i; for (i = 0; i 8; i++) #pragma omp task if (a[i]) __builtin_abort (); } ICEs in 4.9, because __builtin_abort () bb after outlining the SESE region has bogus loop_father.
[Bug middle-end/58547] [4.9 Regression] rtlanal.c:5482:19: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58547 --- Comment #2 from Iain Sandoe iains at gcc dot gnu.org --- Author: iains Date: Fri Sep 27 08:59:18 2013 New Revision: 202967 URL: http://gcc.gnu.org/viewcvs?rev=202967root=gccview=rev Log: gcc: PR middle-end/58547 * rtlanal.c (lsb_bitfield_op_p): Make both parts of the comparison signed. Modified: trunk/gcc/ChangeLog trunk/gcc/rtlanal.c
[Bug middle-end/58551] [4.9 Regression] ICE with abort in OpenMP SESE region inside of some loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58551 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org --- Anoter testcase that ICEs even with -O2 -fopenmp: /* { dg-do compile } */ /* { dg-options -O2 -fopenmp } */ void bar (int, int); void foo (int *a) { int i; for (i = 0; i 8; i++) #pragma omp task if (a[i]) { int j, k; for (j = 0; j 10; j++) for (k = 0; k 8; k++) bar (j, k); for (k = 0; k 12; k++) bar (-1, k); __builtin_abort (); } }
[Bug middle-end/58551] [4.9 Regression] ICE with abort in OpenMP SESE region inside of some loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58551 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2013-09-27 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 30907 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30907action=edit gcc49-pr58551.patch Untested fix.
[Bug tree-optimization/58459] [4.9 regression] Loop invariant is not hoisted out of loop after r202525.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58459 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Richard Biener rguenth at gcc dot gnu.org --- Fixed.
[Bug sanitizer/58543] Invalid unpoisoning of stack redzones on ARM
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543 --- Comment #3 from Yury Gribov y.gribov at samsung dot com --- Created attachment 30908 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30908action=edit Test results Tests seem to pass both on x86_64 and on ARM (attached).
[Bug tree-optimization/58532] [4.9 Regression] bootstrap failure with BOOT_CFLAGS=-g -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58532 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|WAITING |ASSIGNED --- Comment #4 from Richard Biener rguenth at gcc dot gnu.org --- Ok, I reproduced it. Bootstrap comparison failure! gcc/dwarf2out.o differs gcc/fortran/parse.o differs libiberty/regex.o differs libiberty/pic/regex.o differs somehow GCC has miscompiled itself.
[Bug middle-end/58551] [4.9 Regression] ICE with abort in OpenMP SESE region inside of some loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58551 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.9.0
[Bug middle-end/58551] [4.9 Regression] ICE with abort in OpenMP SESE region inside of some loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58551 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Attachment #30907|0 |1 is obsolete|| --- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org --- Created attachment 30909 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30909action=edit gcc49-pr58551.patch Updated untested patch that should also fix num_nodes adjustments.
[Bug tree-optimization/58532] [4.9 Regression] bootstrap failure with BOOT_CFLAGS=-g -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58532 --- Comment #5 from Richard Biener rguenth at gcc dot gnu.org --- There is a compare-debug failure on fortran/parse.o at least, reducing that.
[Bug target/58507] Incorrect parsing of `-mmcu=msp430*`
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58507 --- Comment #1 from Nick Clifton nickc at redhat dot com --- Created attachment 30910 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30910action=edit Fix objdump output Proposed patch to fix objdump output
[Bug tree-optimization/58532] [4.9 Regression] bootstrap failure with BOOT_CFLAGS=-g -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58532 --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org --- One difference happens in 057.cunrolli already, we create a preheader for a loop depending on -g: ;; Function bool gfc_parse_file() (_Z14gfc_parse_filev, funcdef_no=257, decl_ui d=17369, symbol_order=156) Created preheader block for loop 4 -Created preheader block for loop 5 ;; 10 loops found ... @@ -18009,13 +19254,14 @@ bb 109: st_228 = parse_spec (131); - # st_470 = PHI st_228(109) - # error_471 = PHI 0(109) + # st_229 = PHI st_228(109), st_232(124) + # error_233 = PHI 0(109), 1(124) loop: - - bb 205: - # st_229 = PHI st_470(110), st_232(207) - # error_233 = PHI error_471(110), 1(207) + # DEBUG st = NULL + # DEBUG error = NULL + # DEBUG st = NULL + # DEBUG error = error_233 + # DEBUG st = st_229 st.280_230 = (int) st_229; switch (st.280_230) default: L61, case 12: L59, case 33: L60, case 131: L58 ... bb 124: st_232 = next_statement (); - - bb 207: - goto bb 205; + goto bb 110 (loop); (to investigate).
[Bug tree-optimization/58552] New: [4.9 Regression] -fcompare-debug failure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58552 Bug ID: 58552 Summary: [4.9 Regression] -fcompare-debug failure Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Created attachment 30911 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30911action=edit testcase spuriously reduced from a -O3 bootstrap miscompare. Fails at -O2.
[Bug tree-optimization/58552] [4.9 Regression] -fcompare-debug failure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58552 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.9.0
[Bug tree-optimization/58532] [4.9 Regression] bootstrap failure with BOOT_CFLAGS=-g -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58532 --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- First testcase reduction ended in PR58552, re-reducing.
[Bug lto/58528] lto1: internal compiler error: in build_abbrev_table, at dwarf2out.c:7478
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58528 --- Comment #6 from Charles charles.frasch at gmail dot com --- Created attachment 30912 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30912action=edit script to reproduce the ICE This script reproduces the bug. It requires 27 .ii files and one archive file of Google's gtest 1.6.0. If this is acceptable will either attach the .ii files or send you a tarball directly.
[Bug c/53001] -Wfloat-conversion should be available to warn about floating point errors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53001 --- Comment #19 from Joshua Cogliati jjcogliati-r1 at yahoo dot com --- Created attachment 30913 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30913action=edit Patch to add -Wfloat-conversion option against trunk This version is against gcc trunk (rev 202818). It now bootstraps. It adds about ten casts so that the existing float conversions in gcc are now explicit instead of implicit so that gcc can bootstrap even with the new warning.
[Bug tree-optimization/58552] [4.9 Regression] -fcompare-debug failure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58552 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-09-27 Ever confirmed|0 |1 --- Comment #1 from Richard Biener rguenth at gcc dot gnu.org --- starts already with early inlining.
[Bug tree-optimization/58552] [4.9 Regression] -fcompare-debug failure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58552 --- Comment #2 from Richard Biener rguenth at gcc dot gnu.org --- Reduced: extern void fancy_abort () __attribute__ ((__noreturn__)); extern C { struct __jmp_buf_tag { }; typedef struct __jmp_buf_tag jmp_buf[1]; extern int _setjmp (struct __jmp_buf_tag __env[1]) throw (); } extern void *gfc_state_stack; static jmp_buf eof_buf; static void push_state () { if (!gfc_state_stack) fancy_abort (); } bool gfc_parse_file (void) { int seen_program=0; if (_setjmp (eof_buf)) return false; if (seen_program) goto duplicate_main; seen_program = 1; push_state (); push_state (); duplicate_main: return true; }
[Bug tree-optimization/58552] [4.9 Regression] -fcompare-debug failure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58552 --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org --- Index: gcc/tree-cfg.c === --- gcc/tree-cfg.c (revision 202971) +++ gcc/tree-cfg.c (working copy) @@ -1013,6 +1013,9 @@ make_abnormal_goto_edges (basic_block bb break; } } + if (!gsi_end_p (gsi) + is_gimple_debug (gsi_stmt (gsi))) + gsi_next_nondebug (gsi); if (!gsi_end_p (gsi)) { /* Make an edge to every setjmp-like call. */ fixes it.
[Bug middle-end/58551] [4.9 Regression] ICE with abort in OpenMP SESE region inside of some loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58551 --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org --- Author: jakub Date: Fri Sep 27 13:44:10 2013 New Revision: 202972 URL: http://gcc.gnu.org/viewcvs?rev=202972root=gccview=rev Log: PR middle-end/58551 * tree-cfg.c (move_sese_region_to_fn): Also move loops that are children of outermost saved_cfun's loop, and set it up to be moved to dest_cfun's outermost loop. Fix up num_nodes adjustments if loop != loop0 and SESE region contains bbs that belong to loop0. * c-c++-common/gomp/pr58551.c: New test. Added: trunk/gcc/testsuite/c-c++-common/gomp/pr58551.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-cfg.c
[Bug libstdc++/57465] Failed postcondition for std::function constructed with null function pointer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57465 --- Comment #1 from Jonathan Wakely redi at gcc dot gnu.org --- Author: redi Date: Fri Sep 27 14:06:09 2013 New Revision: 202974 URL: http://gcc.gnu.org/viewcvs?rev=202974root=gccview=rev Log: PR libstdc++/57465 * include/std/functional (_Function_base::_Base_manager::_M_not_empty_function): Fix overload for pointers. * testsuite/20_util/function/cons/57465.cc: New. Added: trunk/libstdc++-v3/testsuite/20_util/function/cons/57465.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/std/functional
[Bug libstdc++/57465] Failed postcondition for std::function constructed with null function pointer
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57465 --- Comment #2 from Jonathan Wakely redi at gcc dot gnu.org --- Fixed on the trunk so far.
[Bug libfortran/58015] FAIL: gfortran.dg/round_4.f90: Unsatisfied symbol nextafterl
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58015 --- Comment #4 from dave.anglin at bell dot net --- On 9/21/2013 11:13 AM, dominiq at lps dot ens.fr wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58015 --- Comment #2 from Dominique d'Humieres dominiq at lps dot ens.fr --- Is this PR different from pr58113 beside the missing nextafterl on hppa64-hp-hpux11.11? I hacked c99_functions.c to provide nextafterl using nextafterq from libquadmath. With this, I see the bug in pr58113. Regarding nextafterl, I'm thinking about an include hack to math.h for hppa*-*-hpux11*. On all HP-UX systems, the l and q long double and quad math functions are equivalent. Dave
[Bug tree-optimization/58359] __builtin_unreachable prevents vectorization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58359 --- Comment #4 from Anatoly Sinyavin a.sinyavin at samsung dot com --- Created attachment 30914 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30914action=edit Fisrt patch
[Bug tree-optimization/58359] __builtin_unreachable prevents vectorization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58359 --- Comment #5 from Anatoly Sinyavin a.sinyavin at samsung dot com --- Created attachment 30915 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30915action=edit Second patch
[Bug tree-optimization/58359] __builtin_unreachable prevents vectorization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58359 --- Comment #6 from Anatoly Sinyavin a.sinyavin at samsung dot com --- I have created two patches to fix this problem. The first patch (bug_fix_58359_builit_unreachable.patch) just moves functionality of optimize_unreachable from fab pass to cfg pass The second patch (bug_fix_58359_builit_unreachable.AGGRESSIVE.patch) is more aggressive variant. Origininal implementation of optimize_unreachable doesn't delete basic block if there is FORCED_LABEL, non debug statemnt, or call function before __built_unreachable in this basic block. I think we can't delete basic block if it contains some statement X before __built_unreachable. This statement X can potentially transfer control from this basic block and can't return. It's possible in two cases: if statement X is procedure call (without return) or assembler instruction. (See also __built_unreachable description)
[Bug middle-end/58463] ICE with -fdump-tree-all-all in vector indexed access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463 --- Comment #8 from pmatos at gcc dot gnu.org --- Author: pmatos Date: Fri Sep 27 14:54:43 2013 New Revision: 202976 URL: http://gcc.gnu.org/viewcvs?rev=202976root=gccview=rev Log: PR middle-end/58463 * gcc.dg/pr58463.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr58463.c Modified: trunk/gcc/ChangeLog
[Bug target/58507] Incorrect parsing of `-mmcu=msp430*`
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58507 --- Comment #2 from Nick Clifton nickc at redhat dot com --- Created attachment 30916 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30916action=edit Add parsing of known MSP430 MCU types I am currently testing this patch to see if it introduces any regressions into the gcc testsuite...
[Bug tree-optimization/58463] ICE with -fdump-tree-all-all in vector indexed access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463 --- Comment #9 from pmatos at gcc dot gnu.org --- Author: pmatos Date: Fri Sep 27 16:30:15 2013 New Revision: 202978 URL: http://gcc.gnu.org/viewcvs?rev=202978root=gccview=rev Log: Backport from mainline. 2013-09-27 Paulo Matos pma...@broadcom.com PR middle-end/58463 * gcc.dg/pr58463.c: New test. Added: branches/gcc-4_8-branch/gcc/testsuite/gcc.dg/pr58463.c Modified: branches/gcc-4_8-branch/gcc/ChangeLog
[Bug target/56716] during gcc 4.8.0 build on Cygwin: bid128_fma.c:4460:1: internal compiler error: Segmentation fault
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56716 --- Comment #11 from pmatos at gcc dot gnu.org --- Author: pmatos Date: Fri Sep 27 16:44:39 2013 New Revision: 202979 URL: http://gcc.gnu.org/viewcvs?rev=202979root=gccview=rev Log: Backport from mainline. PR middle-end/58463 2013-03-27 Richard Biener rguent...@suse.de PR tree-optimization/56716 * tree-ssa-structalias.c (perform_var_substitution): Adjust dumping for ref nodes. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/tree-ssa-structalias.c
[Bug middle-end/58463] ICE with -fdump-tree-all-all in vector indexed access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463 --- Comment #10 from pmatos at gcc dot gnu.org --- Author: pmatos Date: Fri Sep 27 16:44:39 2013 New Revision: 202979 URL: http://gcc.gnu.org/viewcvs?rev=202979root=gccview=rev Log: Backport from mainline. PR middle-end/58463 2013-03-27 Richard Biener rguent...@suse.de PR tree-optimization/56716 * tree-ssa-structalias.c (perform_var_substitution): Adjust dumping for ref nodes. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/tree-ssa-structalias.c
[Bug tree-optimization/58553] New: New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 Bug ID: 58553 Summary: New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jgreenhalgh at gcc dot gnu.org Created attachment 30917 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30917action=edit Preprocessed source Jeff's change to the Jump-Threading code here: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01910.html Introduced a regression for arm and aarch64 in gcc.c-torture/execute/memcpy-2.c, such that I now see: *** EXIT code emu: host signal 0 When executing the testcase on a model with command line: /work/gcc-clean/build-arm-none-eabi/install/bin/arm-none-eabi-gcc -B/work/gcc-clean/build-arm-none-eabi/obj/gcc2/gcc/ /work/gcc-clean/src/gcc/gcc/testsuite/gcc.c-torture/execute/memcpy-2.c -fno-diagnostics-show-caret -fdiagnostics-color=never -w -O3 -g -Wa,-mno-warn-deprecated -lm -marm -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -o /work/gcc-clean/build-arm-none-eabi/obj/gcc2/gcc/testsuite/gcc/memcpy-2.x -save-temps I've attached the preprocessed source and the output from -fdump-tree-dom1-details
[Bug tree-optimization/58553] New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 --- Comment #1 from jgreenhalgh at gcc dot gnu.org --- Created attachment 30918 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30918action=edit Output of dom1
[Bug middle-end/58463] ICE with -fdump-tree-all-all in vector indexed access
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463 Paulo J. Matos pa...@matos-sorge.com changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #11 from Paulo J. Matos pa...@matos-sorge.com --- Backported Richard's patch to branch 4.8 under r202979. Will mark as fixed.
[Bug tree-optimization/58554] New: Revision 202619 causes runtime failure in CPU2006 benchmark 445.gobmk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58554 Bug ID: 58554 Summary: Revision 202619 causes runtime failure in CPU2006 benchmark 445.gobmk Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: bergner at gcc dot gnu.org, dje.gcc at gmail dot com, rguenth at gcc dot gnu.org Host: powerpc64-linux Target: powerpc64-linux Build: powerpc64-linux gobmk started failing at runtime with the stated revision. Tracked down offending code (from benchmark source engine/board.c) and reduced to the following. Generated code is ignoring control dependence and simply calling memset to set the entire array. [pthaugen@igoo build_base_test_32.]$ cat junk.c extern int board_size; extern unsigned char board[421]; void clear_board(void) { int k; for (k = 0; k 421; k++) { /* Original: if (!((unsigned) (((k) / (19 + 1) - 1)) (unsigned) board_size (unsigned) (((k) % (19 + 1) - 1)) (unsigned) board_size)) */ if (k board_size ) board[k] = 3; } } [pthaugen@igoo build_base_test_32.]$ /home/pthaugen/install/gcc/trunk_work/bin/gcc -S -m32 -O3 junk.c Generated assembler for the function: clear_board: lis 3,board@ha li 4,3 la 3,board@l(3) li 5,421 b memset
[Bug tree-optimization/58553] New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 --- Comment #2 from Jeffrey A. Law law at redhat dot com --- James. Look in the .ldist dump. In particular look at that memset call. We're writing off the end of the structure. Now to walk backwards and figure out why :-)
[Bug middle-end/58551] [4.9 Regression] ICE with abort in OpenMP SESE region inside of some loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58551 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/58554] [4.9 Regression] Revision 202619 causes runtime failure in CPU2006 benchmark 445.gobmk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58554 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Keywords||wrong-code Target Milestone|--- |4.9.0 Summary|Revision 202619 causes |[4.9 Regression] Revision |runtime failure in CPU2006 |202619 causes runtime |benchmark 445.gobmk |failure in CPU2006 ||benchmark 445.gobmk Severity|normal |blocker
[Bug tree-optimization/58553] New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Depends on||58554 --- Comment #3 from Andrew Pinski pinskia at gcc dot gnu.org --- This sounds like bug 58554.
[Bug c++/58555] New: Floating point exception in want_inline_self_recursive_call_p
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58555 Bug ID: 58555 Summary: Floating point exception in want_inline_self_recursive_call_p Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dcb314 at hotmail dot com Created attachment 30919 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30919action=edit gzipped C++ source code I just tried to compile package flamerobin-0.9.3-4.20130401snap with gcc 4.9 trunk dated 20130925. It said ./src/metadata/root.cpp:375:1: internal compiler error: Floating point exception } ^ 0xafbfff crash_signal ../../src/trunk/gcc/toplev.c:335 0x50ed95 want_inline_self_recursive_call_p ../../src/trunk/gcc/ipa-inline.c:699 0xf72320 inline_small_functions ../../src/trunk/gcc/ipa-inline.c:1756 0xf72320 ipa_inline ../../src/trunk/gcc/ipa-inline.c:2009 0xf72320 execute ../../src/trunk/gcc/ipa-inline.c:2379 Please submit a full bug report, with preprocessed source if appropriate. Preprocessed source code attached. Flag -O3 required. Checking the compiler source code, the offending line is if (!max_count (edge-frequency * CGRAPH_FREQ_BASE / caller_freq = max_prob)) I speculate that caller_freq == 0 and someone has missed out a belt'n'braces check for zero before making the division.
[Bug tree-optimization/58553] New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 --- Comment #4 from Jeffrey A. Law law at redhat dot com --- Andrew. Yes it does. I've never looked at the ldist code, but the dump seems a bit strange: Analyzing # of iterations of loop 3 exit condition [1, + , 1](no_overflow) != 96 bounds on difference of bases: 95 ... 95 result: # of iterations 95, bounded by 95 __builtin_memset (MEM[(void *)u1 + 1B], 97, 96); So it determined the right iteration count but mucked up the count in the call to memset ?!? Weird
[Bug tree-optimization/58553] New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 Jeffrey A. Law law at redhat dot com changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #5 from Jeffrey A. Law law at redhat dot com --- *** Bug 58554 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/58554] [4.9 Regression] Revision 202619 causes runtime failure in CPU2006 benchmark 445.gobmk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58554 Jeffrey A. Law law at redhat dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||law at redhat dot com Resolution|--- |DUPLICATE --- Comment #1 from Jeffrey A. Law law at redhat dot com --- Duplicate. *** This bug has been marked as a duplicate of bug 58553 ***
[Bug tree-optimization/58553] New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 Bug 58553 depends on bug 58554, which changed state. Bug 58554 Summary: [4.9 Regression] Revision 202619 causes runtime failure in CPU2006 benchmark 445.gobmk http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58554 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE
[Bug tree-optimization/58553] New fail in PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution on arm and aarch64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58553 Bug 58553 depends on bug 58554, which changed state. Bug 58554 Summary: [4.9 Regression] Revision 202619 causes runtime failure in CPU2006 benchmark 445.gobmk http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58554 What|Removed |Added Status|RESOLVED|REOPENED Resolution|DUPLICATE |---
[Bug tree-optimization/58554] [4.9 Regression] Revision 202619 causes runtime failure in CPU2006 benchmark 445.gobmk
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58554 Jeffrey A. Law law at redhat dot com changed: What|Removed |Added Status|RESOLVED|REOPENED Last reconfirmed||2013-09-27 Resolution|DUPLICATE |--- Ever confirmed|0 |1 --- Comment #2 from Jeffrey A. Law law at redhat dot com --- Since this doesn't depend on the recent threading changes to trigger, I'm keeping this open as I'll probably revert a tiny piece of the threading changes which will make 58553 go latent.
[Bug c++/58555] Floating point exception in want_inline_self_recursive_call_p
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58555 Markus Trippelsdorf markus at trippelsdorf dot de changed: What|Removed |Added CC||markus at trippelsdorf dot de --- Comment #1 from Markus Trippelsdorf markus at trippelsdorf dot de --- Created attachment 30920 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30920action=edit reduced testcase
[Bug tree-optimization/58556] New: gen-vect-26.c / gen-vect-28.c regression merging from r202839 to r202981
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58556 Bug ID: 58556 Summary: gen-vect-26.c / gen-vect-28.c regression merging from r202839 to r202981 Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: amylaar at gcc dot gnu.org Target: arc-elf32 Created attachment 30921 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30921action=edit gen-vect-26.c.114t.vect dump file I just merged in trunk from https://github.com/mirrors/gcc.git, and I see four new failures (in just four days): 82870c82883 PASS: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect vectorized 1 lo ops 1 --- FAIL: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect vectorized 1 loops 1 82872c82885 PASS: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect Alignment of access forced using peeling 1 --- FAIL: gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect Alignment of access forced using peeling 1 82875c82888 PASS: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect vectorized 1 loops 1 --- FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect vectorized 1 loops 1 82877c82890 PASS: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect Alignment of access forced using peeling 1 --- FAIL: gcc.dg/tree-ssa/gen-vect-28.c scan-tree-dump-times vect Alignment of access forced using peeling 1
[Bug target/58490] __sync_bool_compare_and_swap sign bit failure
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58490 --- Comment #3 from Erik van der Werf erikvanderwerf at gmail dot com --- I'm sorry, that patch definitely looks relevant, and I'd like to try it, but somehow I did not manage to rebuild the arm-linux-gnueabi-gcc-4.7 package. I'm not a gcc expert, and trying to figure out how to configure the build for cross compilation turns out to be rather time consuming, so for now I'll just stay with gcc-4.6. BTW I also tried the new atomic built-ins (__atomic_compare_exchange) and those show the exact same problem.
RE: [PATCH]Fix computation of offset in ivopt
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Friday, September 27, 2013 1:07 PM To: 'Richard Biener' Cc: GCC Patches Subject: RE: [PATCH]Fix computation of offset in ivopt -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 24, 2013 6:31 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH]Fix computation of offset in ivopt On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote: + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref +cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false for either offset part you may end up doing half accounting and not stripping. Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite to if (!inside_addr) return orig_expr; tmp = component_ref_field_offset (expr); field = TREE_OPERAND (expr, 1); if (top_compref cst_and_fits_in_hwi (tmp) cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) { ... } Will be refined. note that this doesn't really handle overflows correctly as + *offset = off0 + int_cst_value (tmp) + boffset / + BITS_PER_UNIT; may still overflow. Since it's unsigned + signed + signed, according to implicit conversion, the signed operand will be converted to unsigned, so the overflow would only happen when off0 is huge number and tmp/boffset is large positive number, right? Do I need to check whether off0 is larger than the overflowed result? Also there is signed-unsigned problem here, see below. @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than + HOST_WIDE_INT. */ offset = sext_hwi (offset, TYPE_PRECISION + (utype)); + offset is computed elsewhere in difference_cost and the bug to me seems that it is unsigned. sign-extending it here is odd at least (and the extension should probably happen at sizetype precision, not that of utype). I agree, The root cause is in split_offset_1, in which offset is computed. Every time offset is computed in this function with a signed operand (like int_cst_value (tmp) above), we need to take care the possible negative number problem. Take this case as an example, we need to do below change: case INTEGER_CST: //... *offset = int_cst_value (expr); change to case INTEGER_CST: //... *offset = sext_hwi (int_cst_value (expr), type); and case MULT_EXPR: //... *offset = sext_hwi (int_cst_value (expr), type); to case MULT_EXPR: //... HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); *offset = sext_hwi (xxx, type); Any comments? Thought twice, I guess we can compute signed offset in strip_offset_1 and sign extend it for strip_offset, thus we don't need to change every computation of offset in that function. Thanks. bin
Re: OMP4/cilkplus: simd clone function mangling
On Thu, Sep 26, 2013 at 9:35 PM, Aldy Hernandez al...@redhat.com wrote: + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to omp + declare simd. */ + bool cilk_clone = flag_enable_cilkplus + lookup_attribute (cilk plus elemental, +DECL_ATTRIBUTES (new_node-symbol.decl)); + if (cilk_clone) +remove_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); Oh yeah, rth had asked me why I remove the attribute. My initial thoughts were that whether or not a function is a simd clone can be accessed through the cgraph bits (node-simdclone != NULL for the clone, and node-has_simd_clones for the parent). No sense keeping the attribute. But I can leave it if you think it's better. Why have it in the first place if it's marked in the cgraph? Richard. Aldy
Re: [google gcc-4_8] fix size_estimation for builtin_expect
On Fri, Sep 27, 2013 at 12:23 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, builtin_expect should be a NOP in size_estimation. Indeed, the call stmt itself is 0 weight in size and time. But it may introduce an extra relation expr which has non-zero size/time. The end result is: for w/ and w/o builtin_expect, we have different size/time estimation for early inlining. This patch fixes this problem. -Rong 2013-09-26 Rong Xu x...@google.com * ipa-inline-analysis.c (estimate_function_body_sizes): fix the size estimation for builtin_expect. This seems fine with an comment in the code what it is about. I also think we want to support mutiple builtin_expects in a BB so perhaps we want to have pointer set of statements to ignore? To avoid spagetti code, please just move the new logic into separate functions. Looks like this could use tree-ssa.c:walk_use_def_chains (please change its implementation as necessary, make it C++, etc. - you will be the first user again). Richard. Honza Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 202638) +++ ipa-inline-analysis.c (working copy) @@ -2266,6 +2266,8 @@ estimate_function_body_sizes (struct cgraph_node * /* Estimate static overhead for function prologue/epilogue and alignment. */ int overhead = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE); int size = overhead; + gimple fix_expect_builtin; + /* Benefits are scaled by probability of elimination that is in range 0,2. */ basic_block bb; @@ -2359,14 +2361,73 @@ estimate_function_body_sizes (struct cgraph_node * } } + fix_expect_builtin = NULL; for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) { gimple stmt = gsi_stmt (bsi); + if (gimple_call_builtin_p (stmt, BUILT_IN_EXPECT)) +{ + tree var = gimple_call_lhs (stmt); + tree arg = gimple_call_arg (stmt, 0); + use_operand_p use_p; + gimple use_stmt; + bool match = false; + bool done = false; + gcc_assert (var arg); + gcc_assert (TREE_CODE (var) == SSA_NAME); + + while (TREE_CODE (arg) == SSA_NAME) +{ + gimple stmt_tmp = SSA_NAME_DEF_STMT (arg); + switch (gimple_assign_rhs_code (stmt_tmp)) +{ + case LT_EXPR: + case LE_EXPR: + case GT_EXPR: + case GE_EXPR: + case EQ_EXPR: + case NE_EXPR: +match = true; +done = true; +break; + case NOP_EXPR: +break; + default: +done = true; +break; +} + if (done) +break; + arg = gimple_assign_rhs1 (stmt_tmp); +} + + if (match single_imm_use (var, use_p, use_stmt)) +{ + if (gimple_code (use_stmt) == GIMPLE_COND) +{ + fix_expect_builtin = use_stmt; +} +} + + /* we should see one builtin_expert call in one bb. */ + break; +} +} + + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) + { + gimple stmt = gsi_stmt (bsi); int this_size = estimate_num_insns (stmt, eni_size_weights); int this_time = estimate_num_insns (stmt, eni_time_weights); int prob; struct predicate will_be_nonconstant; + if (stmt == fix_expect_builtin) +{ + this_size--; + this_time--; +} + if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, );
Re: [PATCH]Fix computation of offset in ivopt
On Fri, Sep 27, 2013 at 7:07 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 24, 2013 6:31 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH]Fix computation of offset in ivopt On Tue, Sep 24, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote: + field = TREE_OPERAND (expr, 1); + if (DECL_FIELD_BIT_OFFSET (field) +cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + + tmp = component_ref_field_offset (expr); + if (top_compref +cst_and_fits_in_hwi (tmp)) + { + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + *offset = off0 + int_cst_value (tmp) + boffset / BITS_PER_UNIT; + return op0; + } the failure paths seem mangled, that is, if cst_and_fits_in_hwi is false for either offset part you may end up doing half accounting and not stripping. Btw, DECL_FIELD_BIT_OFFSET is always non-NULL. I suggest to rewrite to if (!inside_addr) return orig_expr; tmp = component_ref_field_offset (expr); field = TREE_OPERAND (expr, 1); if (top_compref cst_and_fits_in_hwi (tmp) cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) { ... } Will be refined. note that this doesn't really handle overflows correctly as + *offset = off0 + int_cst_value (tmp) + boffset / + BITS_PER_UNIT; may still overflow. Since it's unsigned + signed + signed, according to implicit conversion, the signed operand will be converted to unsigned, so the overflow would only happen when off0 is huge number and tmp/boffset is large positive number, right? Do I need to check whether off0 is larger than the overflowed result? Also there is signed-unsigned problem here, see below. @@ -4133,6 +4142,9 @@ get_computation_cost_at (struct ivopts_data *data, bitmap_clear (*depends_on); } + /* Sign-extend offset if utype has lower precision than + HOST_WIDE_INT. */ offset = sext_hwi (offset, TYPE_PRECISION + (utype)); + offset is computed elsewhere in difference_cost and the bug to me seems that it is unsigned. sign-extending it here is odd at least (and the extension should probably happen at sizetype precision, not that of utype). I agree, The root cause is in split_offset_1, in which offset is computed. Every time offset is computed in this function with a signed operand (like int_cst_value (tmp) above), we need to take care the possible negative number problem. Take this case as an example, we need to do below change: case INTEGER_CST: //... *offset = int_cst_value (expr); change to case INTEGER_CST: //... *offset = sext_hwi (int_cst_value (expr), type); and case MULT_EXPR: //... *offset = sext_hwi (int_cst_value (expr), type); to case MULT_EXPR: //... HOST_WIDE_INT xxx = (HOST_WIDE_INT)off0 * int_cst_value (op1); *offset = sext_hwi (xxx, type); Any comments? The issue is of course that we end up converting offsets to sizetype at some point which makes them all appear unsigned. The fix for this is to simply interpret them as signed ... but it's really a mess ;) Richard. Thanks. bin
Re: [PATCH, PR 57748] Check for out of bounds access, Part 2
Sure, but the modifier is not meant to force something into memory, especially when it is already in an register. Remember, we are only talking of structures here, and we only want to access one member. It is more the other way round: It says: You do not have to load the value in a register, if it is already in memory I'm happy EXPAND_MEMORY means we are interested in a memory result, even if the memory is constant and we could have propagated a constant value. */ We definitely want to propagate constant values here, look at the code below. And it already lists explicit cases where we really need to splill to memory. -- Eric Botcazou
Re: [PATCH, ARM, LRA] Prepare ARM build with LRA
They don't need to be kept synchronised as such. It's fine for the index to allow more than must_be_index_p. But if you're not keen on the current structure, does the following look better? Tested on x86_64-linux-gnu. Thanks, Richard gcc/ * rtlanal.c (must_be_base_p, must_be_index_p): Delete. (binary_scale_code_p, get_base_term, get_index_term): New functions. (set_address_segment, set_address_base, set_address_index) (set_address_disp): Accept the argument unconditionally. (baseness): Remove must_be_base_p and must_be_index_p checks. (decompose_normal_address): Classify as much as possible in the main loop. Yes, fine by me, thanks. -- Eric Botcazou
Re: RFA: Store the REG_BR_PROB probability directly as an int
Thanks for the testing. It also passes bootstrap on x86_64-linux-gnu. OK to install? Yes, thanks. -- Eric Botcazou
Re: [patch] Separate immediate uses and phi routines from tree-flow*.h
On Thu, Sep 26, 2013 at 6:07 PM, Andrew MacLeod amacl...@redhat.com wrote: On 09/25/2013 04:49 AM, Richard Biener wrote: On Tue, Sep 24, 2013 at 4:39 PM, Andrew MacLeod amacl...@redhat.com wrote: This larger patch moves all the immediate use and operand routines from tree-flow.h into tree-ssa-operands.h. It also moves the basic phi routines and prototypes into a newly created tree-phinodes.h, or tree-ssa-operands.h if they belong there. And finally shuffles a couple of other routines which allows tree-ssa-operands.h to be removed from the gimple.h header file. of note or interest: 1 - dump_decl_set() was defined in tree-into-ssa.c, but isn't really ssa specific. Its tree-specific, so normally I'd throw it into tree.c. Looking forward a little, its only used in a gimple context, so when we map to gimple_types it will need to be converted to/created for those. If it is in tree.c, I'll have to create a new version for gimple types, and then the routine in tree.c will become unused. Based on that, I figured gimple.c is the place place for it. 2 - has_zero_uses_1() and single_imm_use_1() were both in tree-cfg.c for some reason.. they've been moved to tree-ssa-operands.c 3 - a few routines seem like basic gimple routines, but really turn out to require the operand infrastructure to implement... so they are moved to tree-ssa-operands.[ch] as well. This sort of thing showed up when removing tree-ssa-operands.h from the gimple.h include file. These were things like gimple_vuse_op, gimple_vdef_op, update_stmt, and update_stmt_if_modified Note that things like gimple_vuse_op are on the interface border between gimple (where the SSA operands are stored) and SSA operands. So it's not so clear for them given they access internal gimple fields directly but use the regular SSA operand API. I'd prefer gimple_vuse_op and gimple_vdef_op to stay in gimple.[ch]. Ugg. I incorporated what we talked about, and it was much messier than expected :-P. I ended up with a chicken and egg problem between the gimple_v{use,def}_op routines in gimple-ssa.h and the operand routines in tree-ssa-operands.h. They both require each other, and I couldn't get things into a consistent state while they are in separate files. It was actually the immediate use iterators which were requiring gimple_vuse_op()... So I have created a new ssa-iterators.h file to resolve this problem. They build on the operand code and clearly has other prerequisites, so that seems reasonable to me... This in fact solves a couple of other little warts. It allows me to put both gimple_phi_arg_imm_use_ptr() and phi_arg_index_from_use() into tree-phinodes.h. It also exposes that gimple.c::walk_stmt_load_store_addr_ops() and friends actually depend on the existence of PHI nodes, meaning it really belongs on the gimple-ssa border as well. So I moved those into gimple-ssa.c It doesn't depend on PHI nodes but it also works for PHI nodes. So I'd rather have it in gimple.c. And finally, it turns out that a lot of files include tree-flow.h and depend on it to include gimple.h rather than including it themselves. Since tree-flow.h is losing its kitchen-sink attribute, and I needed to move it to the bottom of the #include list for tree-ssa.h, I have temporarily included gimple.h at the top of tree-ssa.h to make sure it gets hauled in. When I remove tree-flow.h as the everyone includes it file, I'll add gimple.h in all the appropriate .c files and remove it from tree-ssa.h. It would have just made this growing patch even more annoying for now. Does this seem reasonable? Yes - try leaving walk_stmt_load_store_addr_ops in gimple.c though, if that is technically possible. Otherwise I guess I don't mind. Thanks, Richard. Bootstraps on x86_64-unknown-linux-gnu and currently running regressions. Andrew PS Oh and I noticed the macro name for tree-outof-ssa.h wasnt right, so I changed it too. Next I'll diverge into trying to sort through putting all the phi-related structs and such into tree-phinodes.h
Re: [PATCH, RTL] Prepare ARM build with LRA
below is a trivial patch, which makes both parts of test signed. With this, bootstrap completes on powerpc-darwin9 - however, you might want to check that it still does what you intended. Please install under PR middle-end/58547 if not already done. -- Eric Botcazou
Re: Commit: MSP430: Pass -md on to assembler
Hi Mike, I must say though, it seems wrong to have to provide a sign-extend pointer pattern when pointers (on the MSP430) are unsigned. Agreed. If we instead ask, is it sane for gcc to ever want to signed extend in this case, the answer appears to be no. Why does it, ptr_mode is SImode, and expand_builtin_next_arg is used to perform the addition in this mode. It 'just' knows that is can be signed extended… and just does it that way. This seems like it is wrong. Index: builtins.c === --- builtins.c (revision 202634) +++ builtins.c (working copy) @@ -4094,7 +4094,7 @@ expand_builtin_next_arg (void) return expand_binop (ptr_mode, add_optab, crtl-args.internal_arg_pointer, crtl-args.arg_offset_rtx, - NULL_RTX, 0, OPTAB_LIB_WIDEN); + NULL_RTX, POINTERS_EXTEND_UNSIGNED 0, OPTAB_LIB_WIDEN); } /* Make it easier for the backends by protecting the valist argument would fix this problem. If this is done, the unmodified test case then doesn't abort. Arguably, the extension should be done as the port directs. It isn't clear to me why they do not. Ok? OK by me, although I cannot approve that particular patch. I did eventually find some test cases that exercised the sign-extend pointer pattern, so I was able to check the generated code - it worked OK. But I ran into a very strange problem. With your PARTIAL_INT_MODE_NAME patch applied GCC started erroneously eliminating NULL function pointer checks! This was particularly noticeable in libgcc/crtstuff.c where for example: static void __attribute__((used)) frame_dummy (void) { static struct object object; if (__register_frame_info) __register_frame_info (__EH_FRAME_BEGIN__, object); (this is a simplified version of the real code) ... is compiled as if it had be written as: static void __attribute__((used)) frame_dummy (void) { static struct object object; __register_frame_info (__EH_FRAME_BEGIN__, object); This only happens for the LARGE model (when pointers are PSImode) but I was baffled as to where it could be happening. Have you come across anything like this ? Cheers Nick
Re: [PATCH][RFC] Remove quadratic loop with component_uses_parent_alias_set
Like the following. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2013-09-26 Richard Biener rguent...@suse.de * alias.h (component_uses_parent_alias_set): Rename to ... (component_uses_parent_alias_set_from): ... this. * alias.c (component_uses_parent_alias_set): Rename to ... (component_uses_parent_alias_set_from): ... this and return the desired parent. (reference_alias_ptr_type_1): Use the result from component_uses_parent_alias_set_from instead of stripping components one at a time. * emit-rtl.c (set_mem_attributes_minus_bitpos): Adjust. FWIW it looks fine to me. -- Eric Botcazou
Re: [ping] [PATCH] Silence an unused variable warning
Let's CC Vladimir on this easy one. Cheers. Jan-Benedict Glaw jbg...@lug-owl.de a écrit: On Fri, 2013-09-20 20:51:37 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote: Hi! With the VAX target, I see this warning: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../../gcc/gcc -I../../../../gcc/gcc/. -I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include -I../../../../gcc/gcc/../libdecnumber -I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../../gcc/gcc/../libbacktrace ../../../../gcc/gcc/lra-eliminations.c -o lra-eliminations.o ../../../../gcc/gcc/lra-eliminations.c: In function ‘void init_elim_table()’: ../../../../gcc/gcc/lra-eliminations.c:1162:8: warning: unused variable ‘value_p’ [-Wunused-variable] bool value_p; ^ [...] Ping: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01568.html `-- http://gcc.gnu.org/ml/gcc-patches/2013-09/txtnrNwaGiD3x.txt MfG, JBG -- Dodji
Generic tuning in x86-tune.def 1/2
Hi, this is second part of the generic tuning changes sanityzing the tuning flags. This patch again is supposed to deal with the obvious part only. I will send separate patch for more changes. The flags changed agree on all CPUs considered for generic (and their optimization manuals) + amdfam10, core2 and Atom SLM. I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it seems like obvious omision (after double checking in optimization manual) and droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. Implementation of this feature was always bit weird and its main purpose was to avoid terrible branch predictor degeneration on the older AMD branch predictors. I benchmarked both spec2k and 2k6 to verify there are no regression. Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements in specfp benchmarks. Bootstrapped/regtested x86_64-linux, will wait for comments and commit it during weekend. I will be happy to revisit any of the generic tuning if regressions pop up. Overall this patch also brings small code size improvements for smaller loads/stores and less padding at -O2. Differences are sub 0.1% however. Honza * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for generic. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer. (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips. (X86_TUNE_AVOID_VECTOR_DECODE): Drop for generic. (X86_TUNE_REASSOC_FP_TO_PARALLEL): Enable for generic. Index: config/i386/x86-tune.def === --- config/i386/x86-tune.def(revision 202966) +++ config/i386/x86-tune.def(working copy) @@ -115,9 +115,9 @@ DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPEN m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMDFAM10 | m_BDVER | m_GENERIC) DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, sse_unaligned_load_optimal, - m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER | m_SLM) + m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER | m_SLM | m_GENERIC) DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, sse_unaligned_store_optimal, - m_COREI7 | m_BDVER | m_SLM) + m_COREI7 | m_BDVER | m_BTVER | m_SLM | m_GENERIC) DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, sse_packed_single_insn_optimal, m_BDVER) /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies @@ -146,8 +146,7 @@ DEF_TUNE (X86_TUNE_INTER_UNIT_CONVERSION /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more than 4 branch instructions in the 16 byte window. */ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, four_jump_limit, - m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_AMD_MULTIPLE - | m_GENERIC) + m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_ATHLON_K8 | m_AMDFAM10) DEF_TUNE (X86_TUNE_SCHEDULE, schedule, m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE | m_AMD_MULTIPLE | m_GENERIC) @@ -156,13 +155,13 @@ DEF_TUNE (X86_TUNE_USE_BT, use_bt, DEF_TUNE (X86_TUNE_USE_INCDEC, use_incdec, ~(m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_GENERIC)) DEF_TUNE (X86_TUNE_PAD_RETURNS, pad_returns, - m_AMD_MULTIPLE | m_GENERIC) + m_ATHLON_K8 | m_AMDFAM10 | | m_GENERIC) DEF_TUNE (X86_TUNE_PAD_SHORT_FUNCTION, pad_short_function, m_ATOM) DEF_TUNE (X86_TUNE_EXT_80387_CONSTANTS, ext_80387_constants, m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE | m_ATHLON_K8 | m_GENERIC) DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE, avoid_vector_decode, - m_K8 | m_GENERIC) + m_K8) /* X86_TUNE_PROMOTE_HIMODE_IMUL: Modern CPUs have same latency for HImode and SImode multiply, but 386 and 486 do HImode multiply faster. */ DEF_TUNE (X86_TUNE_PROMOTE_HIMODE_IMUL, promote_himode_imul, @@ -217,7 +216,7 @@ DEF_TUNE (X86_TUNE_REASSOC_INT_TO_PARALL /* X86_TUNE_REASSOC_FP_TO_PARALLEL: Try to produce parallel computations during reassociation of fp computation. */ DEF_TUNE (X86_TUNE_REASSOC_FP_TO_PARALLEL, reassoc_fp_to_parallel, - m_ATOM | m_SLM | m_HASWELL | m_BDVER1 | m_BDVER2) + m_ATOM | m_SLM | m_HASWELL | m_BDVER1 | m_BDVER2 | m_GENERIC) /* X86_TUNE_GENERAL_REGS_SSE_SPILL: Try to spill general regs to SSE regs instead of memory. */ DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, general_regs_sse_spill,
Re: [Patch] Let ordinary escaping in POSIX regex be valid
On 27 September 2013 03:15, Tim Shen wrote: POSIX ERE says that escaping an ordinary char, say R\n is not permitted, because 'n' is not a special char. However, they also say that : Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs. So let's support it not to make users surprised. Booted and tested under -m32 and -m64 I'm wondering whether we want to have a stricter mode that doesn't allow them, to help users avoid creating non-portable programs. We could check the value of the preprocessor macro __STRICT_ANSI__, which is set by -std=c++11 but not by -std=gnu++11, although that's not really the right flag. We want something more like the GNU shell utils' POSIXLY_CORRECT.
Re: User-define literals for std::complex.
On 27 September 2013 05:17, Ed Smith-Rowland wrote: The complex user-defined literals finally passed (n3779) with the resolution to DR1473 allowing the suffix id to touch the quotes (Can't find it but I put it in not too long ago). I think it's been approved by the LWG and looks like it will go to a vote by the full committee, but let's wait for that to pass before making any changes.
Re: [gomp4] Library side of depend clause support
On Fri, Sep 27, 2013 at 01:48:36AM +0200, Jakub Jelinek wrote: Perhaps. What if I do just minor cleanup (use flexible array members for the reallocated vectors, and perhaps keep only the last out/inout task in the hash table chains rather than all of them), retest, commit and then we can discuss/incrementally improve it? Here is what I've committed now, the incremental changes were really only using a structure with flex array member for the dependers vectors, removing/making redundant earlier !ent-is_in when adding !is_in into the chain and addition of new testcases. Let's improve it incrementally later. 2013-09-27 Jakub Jelinek ja...@redhat.com * libgomp.h: Include stdlib.h. (struct gomp_task_depend_entry, struct gomp_dependers_vec): New types. (struct gomp_task): Add dependers, depend_hash, depend_count, num_dependees and depend fields. (struct gomp_taskgroup): Add num_children field. (gomp_finish_task): Free depend_hash if non-NULL. * libgomp_g.h (GOMP_task): Add depend argument. * hashtab.h: New file. * task.c: Include hashtab.h. (hash_entry_type): New typedef. (htab_alloc, htab_free, htab_hash, htab_eq): New inlines. (gomp_init_task): Clear dependers, depend_hash and depend_count fields. (GOMP_task): Add depend argument, handle depend clauses. Increment num_children field in taskgroup. (gomp_task_run_pre): Don't increment task_running_count here, nor clear task_pending bit. (gomp_task_run_post_handle_depend_hash, gomp_task_run_post_handle_dependers, gomp_task_run_post_handle_depend): New functions. (gomp_task_run_post_remove_parent): Clear in_taskwait before signalling corresponding semaphore. (gomp_task_run_post_remove_taskgroup): Decrement num_children field and make the decrement to 0 MEMMODEL_RELEASE operation, rather than storing NULL to taskgroup-children. Clear in_taskgroup_wait before signalling corresponding semaphore. (gomp_barrier_handle_tasks): Move task_running_count increment and task_pending bit clearing here. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. (GOMP_taskwait): Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskwait_sem, enter critical section again. (GOMP_taskgroup_start): Initialize num_children field. (GOMP_taskgroup_end): Check num_children instead of children before critical section. If children is NULL, but num_children is non-zero, wait on taskgroup_sem. Call gomp_task_run_post_handle_depend. If more than one new tasks have been queued, wake other threads if needed. After waiting on taskgroup_sem, enter critical section again. * testsuite/libgomp.c/depend-1.c: New test. * testsuite/libgomp.c/depend-2.c: New test. * testsuite/libgomp.c/depend-3.c: New test. * testsuite/libgomp.c/depend-4.c: New test. --- libgomp/libgomp.h.jj2013-09-26 09:43:10.903930832 +0200 +++ libgomp/libgomp.h 2013-09-27 09:05:17.025402127 +0200 @@ -39,6 +39,7 @@ #include pthread.h #include stdbool.h +#include stdlib.h #ifdef HAVE_ATTRIBUTE_VISIBILITY # pragma GCC visibility push(hidden) @@ -253,7 +254,26 @@ enum gomp_task_kind GOMP_TASK_TIED }; +struct gomp_task; struct gomp_taskgroup; +struct htab; + +struct gomp_task_depend_entry +{ + void *addr; + struct gomp_task_depend_entry *next; + struct gomp_task_depend_entry *prev; + struct gomp_task *task; + bool is_in; + bool redundant; +}; + +struct gomp_dependers_vec +{ + size_t n_elem; + size_t allocated; + struct gomp_task *elem[]; +}; /* This structure describes a task to be run by a thread. */ @@ -268,6 +288,10 @@ struct gomp_task struct gomp_task *next_taskgroup; struct gomp_task *prev_taskgroup; struct gomp_taskgroup *taskgroup; + struct gomp_dependers_vec *dependers; + struct htab *depend_hash; + size_t depend_count; + size_t num_dependees; struct gomp_task_icv icv; void (*fn) (void *); void *fn_data; @@ -277,6 +301,7 @@ struct gomp_task bool final_task; bool copy_ctors_done; gomp_sem_t taskwait_sem; + struct gomp_task_depend_entry depend[]; }; struct gomp_taskgroup @@ -286,6 +311,7 @@ struct gomp_taskgroup bool in_taskgroup_wait; bool cancelled; gomp_sem_t taskgroup_sem; + size_t num_children; }; /* This structure describes a team of threads. These are the threads @@ -525,6 +551,8 @@ extern void gomp_barrier_handle_tasks (g static void inline gomp_finish_task (struct gomp_task *task) { + if (__builtin_expect (task-depend_hash != NULL, 0)) +free (task-depend_hash); gomp_sem_destroy (task-taskwait_sem); } ---
[PING] [C++ PATCH] demangler fix (take 2)
Gary Benson wrote: Hi all, This is a resubmission of my previous demangler fix [1] rewritten to avoid using hashtables and other libiberty features. From the above referenced email: d_print_comp maintains a certain amount of scope across calls (namely a stack of templates) which is used when evaluating references in template argument lists. If such a reference is later used from a subtitution then the scope in force at the time of the substitution is used. This appears to be wrong (I say appears because I couldn't find anything in the API [2] to clarify this). The attached patch causes the demangler to capture the scope the first time such a reference is traversed, and to use that captured scope on subsequent traversals. This fixes GDB PR 14963 [3] whereby a reference is resolved against the wrong template, causing an infinite loop and eventual stack overflow and segmentation fault. I've added the result to the demangler test suite, but I know of no way to check the validity of the demangled symbol other than by inspection (and I am no expert here!) If anybody knows a way to check this then please let me know! Otherwise, I hope this not-really-checked demangled version is acceptable. Thanks, Gary [1] http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00215.html [2] http://mentorembedded.github.io/cxx-abi/abi.html#mangling [3] http://sourceware.org/bugzilla/show_bug.cgi?id=14963 -- http://gbenson.net/ diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog index 89e108a..2ff8216 100644 --- a/libiberty/ChangeLog +++ b/libiberty/ChangeLog @@ -1,3 +1,20 @@ +2013-09-17 Gary Benson gben...@redhat.com + + * cp-demangle.c (struct d_saved_scope): New structure. + (struct d_print_info): New fields saved_scopes and + num_saved_scopes. + (d_print_init): Initialize the above. + (d_print_free): New function. + (cplus_demangle_print_callback): Call the above. + (d_copy_templates): New function. + (d_print_comp): New variables saved_templates and + need_template_restore. + [DEMANGLE_COMPONENT_REFERENCE, + DEMANGLE_COMPONENT_RVALUE_REFERENCE]: Capture scope the first + time the component is traversed, and use the captured scope for + subsequent traversals. + * testsuite/demangle-expected: Add regression test. + 2013-09-10 Paolo Carlini paolo.carl...@oracle.com PR bootstrap/58386 diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c index 70f5438..a199f6d 100644 --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -275,6 +275,18 @@ struct d_growable_string int allocation_failure; }; +/* A demangle component and some scope captured when it was first + traversed. */ + +struct d_saved_scope +{ + /* The component whose scope this is. */ + const struct demangle_component *container; + /* The list of templates, if any, that was current when this + scope was captured. */ + struct d_print_template *templates; +}; + enum { D_PRINT_BUFFER_LENGTH = 256 }; struct d_print_info { @@ -302,6 +314,10 @@ struct d_print_info int pack_index; /* Number of d_print_flush calls so far. */ unsigned long int flush_count; + /* Array of saved scopes for evaluating substitutions. */ + struct d_saved_scope *saved_scopes; + /* Number of saved scopes in the above array. */ + int num_saved_scopes; }; #ifdef CP_DEMANGLE_DEBUG @@ -3665,6 +3681,30 @@ d_print_init (struct d_print_info *dpi, demangle_callbackref callback, dpi-opaque = opaque; dpi-demangle_failure = 0; + + dpi-saved_scopes = NULL; + dpi-num_saved_scopes = 0; +} + +/* Free a print information structure. */ + +static void +d_print_free (struct d_print_info *dpi) +{ + int i; + + for (i = 0; i dpi-num_saved_scopes; i++) +{ + struct d_print_template *ts, *tn; + + for (ts = dpi-saved_scopes[i].templates; ts != NULL; ts = tn) + { + tn = ts-next; + free (ts); + } +} + + free (dpi-saved_scopes); } /* Indicate that an error occurred during printing, and test for error. */ @@ -3749,6 +3789,7 @@ cplus_demangle_print_callback (int options, demangle_callbackref callback, void *opaque) { struct d_print_info dpi; + int success; d_print_init (dpi, callback, opaque); @@ -3756,7 +3797,9 @@ cplus_demangle_print_callback (int options, d_print_flush (dpi); - return ! d_print_saw_error (dpi); + success = ! d_print_saw_error (dpi); + d_print_free (dpi); + return success; } /* Turn components into a human readable string. OPTIONS is the @@ -3913,6 +3956,36 @@ d_print_subexpr (struct d_print_info *dpi, int options, d_append_char (dpi, ')'); } +/* Return a shallow copy of the current list of templates. + On error d_print_error is called and a partial list may + be returned. Whatever is returned must be freed. */ + +static struct d_print_template * +d_copy_templates (struct
[patch] Fix PR bootstrap/58509
Hi, this fixes the ICE during the build of the Ada runtime on the SPARC, a fallout of the recent inliner changes: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01033.html The ICE is triggered because the ldd peephole merges an MEM with MEM_NOTRAP_P and a contiguous MEM without MEM_NOTRAP_P, keeping the MEM_NOTRAP_P flag on the result. As a consequence, an EH edge is eliminated and a BB is orphaned. I think this shows that my above inliner patch was too gross: when you have successive inlining, you can quickly end up with a mess of trapping and non- trapping memory accesses for the same object. So the attached seriously refines it, restricting it to parameters with reference type and leaning towards being less conservative. Again, this should only affect Ada. Tested on x86_64-suse-linux, OK for the mainline? 2013-09-27 Eric Botcazou ebotca...@adacore.com PR bootstrap/58509 * ipa-prop.h (get_ancestor_addr_info): Declare. * ipa-prop.c (get_ancestor_addr_info): Make public. * tree-inline.c (is_parm): Rename into... (is_ref_parm): ...this. (is_based_on_ref_parm): New predicate. (remap_gimple_op_r): Do not propagate TREE_THIS_NOTRAP on MEM_REF if a parameter with reference type has been remapped and the result is not based on another parameter with reference type. (copy_tree_body_r): Likewise on INDIRECT_REF and MEM_REF. 2013-09-27 Eric Botcazou ebotca...@adacore.com * gnat.dg/specs/opt1.ads: New test. -- Eric BotcazouIndex: tree-inline.c === --- tree-inline.c (revision 202912) +++ tree-inline.c (working copy) @@ -751,10 +751,11 @@ copy_gimple_bind (gimple stmt, copy_body return new_bind; } -/* Return true if DECL is a parameter or a SSA_NAME for a parameter. */ +/* Return true if DECL is a parameter with reference type or a SSA_NAME + for a parameter with reference type. */ static bool -is_parm (tree decl) +is_ref_parm (tree decl) { if (TREE_CODE (decl) == SSA_NAME) { @@ -763,7 +764,40 @@ is_parm (tree decl) return false; } - return (TREE_CODE (decl) == PARM_DECL); + return (TREE_CODE (decl) == PARM_DECL + TREE_CODE (TREE_TYPE (decl)) == REFERENCE_TYPE); +} + +/* Return true if DECL is based on a parameter with reference type or a + SSA_NAME for a parameter with with reference type. */ + +static bool +is_based_on_ref_parm (tree decl) +{ + HOST_WIDE_INT offset; + tree obj, expr; + gimple def_stmt; + + /* First the easy case. */ + if (is_ref_parm (decl)) +return true; + + /* Then look for an SSA name whose defining statement is of the form: + + D.1718_7 = parm_2(D)-f1; + + where parm_2 is a parameter with reference type. */ + if (TREE_CODE (decl) != SSA_NAME) +return false; + def_stmt = SSA_NAME_DEF_STMT (decl); + if (!def_stmt) +return false; + + expr = get_ancestor_addr_info (def_stmt, obj, offset); + if (!expr) +return false; + + return is_ref_parm (TREE_OPERAND (expr, 0)); } /* Remap the GIMPLE operand pointed to by *TP. DATA is really a @@ -865,12 +899,13 @@ remap_gimple_op_r (tree *tp, int *walk_s TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old); TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old); TREE_NO_WARNING (*tp) = TREE_NO_WARNING (old); - /* We cannot propagate the TREE_THIS_NOTRAP flag if we have - remapped a parameter as the property might be valid only - for the parameter itself. */ + /* We cannot always propagate the TREE_THIS_NOTRAP flag if we have + remapped a parameter with reference type as the property may be + valid only for the parameter. */ if (TREE_THIS_NOTRAP (old) - (!is_parm (TREE_OPERAND (old, 0)) - || (!id-transform_parameter is_parm (ptr + (!is_ref_parm (TREE_OPERAND (old, 0)) + || !id-transform_parameter + || is_based_on_ref_parm (ptr))) TREE_THIS_NOTRAP (*tp) = 1; *walk_subtrees = 0; return NULL; @@ -1092,12 +1127,13 @@ copy_tree_body_r (tree *tp, int *walk_su TREE_THIS_VOLATILE (*tp) = TREE_THIS_VOLATILE (old); TREE_SIDE_EFFECTS (*tp) = TREE_SIDE_EFFECTS (old); TREE_READONLY (*tp) = TREE_READONLY (old); - /* We cannot propagate the TREE_THIS_NOTRAP flag if we - have remapped a parameter as the property might be - valid only for the parameter itself. */ + /* We cannot always propagate the TREE_THIS_NOTRAP flag + if we have remapped a parameter with reference type as + the property may be valid only for the parameter. */ if (TREE_THIS_NOTRAP (old) - (!is_parm (TREE_OPERAND (old, 0)) - || (!id-transform_parameter is_parm (ptr + (!is_ref_parm (TREE_OPERAND (old, 0)) + || !id-transform_parameter + || is_based_on_ref_parm (ptr))) TREE_THIS_NOTRAP (*tp) = 1; } } @@ -1118,12 +1154,13 @@ copy_tree_body_r (tree *tp, int
Re: [Patch] Let ordinary escaping in POSIX regex be valid
On 9/27/13 4:34 AM, Jonathan Wakely wrote: On 27 September 2013 03:15, Tim Shen wrote: POSIX ERE says that escaping an ordinary char, say R\n is not permitted, because 'n' is not a special char. However, they also say that : Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs. So let's support it not to make users surprised. Booted and tested under -m32 and -m64 I'm wondering whether we want to have a stricter mode that doesn't allow them, to help users avoid creating non-portable programs. We could check the value of the preprocessor macro __STRICT_ANSI__, which is set by -std=c++11 but not by -std=gnu++11, although that's not really the right flag. We want something more like the GNU shell utils' POSIXLY_CORRECT. Indeed. I think that for now __STRICT_ANSI__ can do, it's important to manage to accept those otherwise, as we discovered yesterday, we easily reject quite a few rather sensible regex users can write or find in examples: this started when Tim, upon my suggestion, tried the examples in the new edition of Nicolai Josuttis book and found in one those an escaped closed curly bracket (note, closed, open are definitely fine), which apparently most of the other implementations do not reject. Paolo.
Re: OMP4/cilkplus: simd clone function mangling
On 09/27/13 03:18, Richard Biener wrote: On Thu, Sep 26, 2013 at 9:35 PM, Aldy Hernandez al...@redhat.com wrote: + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to omp + declare simd. */ + bool cilk_clone = flag_enable_cilkplus + lookup_attribute (cilk plus elemental, +DECL_ATTRIBUTES (new_node-symbol.decl)); + if (cilk_clone) +remove_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); Oh yeah, rth had asked me why I remove the attribute. My initial thoughts were that whether or not a function is a simd clone can be accessed through the cgraph bits (node-simdclone != NULL for the clone, and node-has_simd_clones for the parent). No sense keeping the attribute. But I can leave it if you think it's better. Why have it in the first place if it's marked in the cgraph? It would be placed there by the front-end when parsing Cilk Plus simd-enabled functions. It's only in the the omp stage that we transfer that information to the cgraph bits.
Re: [Patch] Let ordinary escaping in POSIX regex be valid
On 27 September 2013 13:32, Paolo Carlini wrote: On 9/27/13 4:34 AM, Jonathan Wakely wrote: On 27 September 2013 03:15, Tim Shen wrote: POSIX ERE says that escaping an ordinary char, say R\n is not permitted, because 'n' is not a special char. However, they also say that : Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs. So let's support it not to make users surprised. Booted and tested under -m32 and -m64 I'm wondering whether we want to have a stricter mode that doesn't allow them, to help users avoid creating non-portable programs. We could check the value of the preprocessor macro __STRICT_ANSI__, which is set by -std=c++11 but not by -std=gnu++11, although that's not really the right flag. We want something more like the GNU shell utils' POSIXLY_CORRECT. Indeed. I think that for now __STRICT_ANSI__ can do, it's important to manage to accept those otherwise, as we discovered yesterday, we easily reject quite a few rather sensible regex users can write or find in examples: this started when Tim, upon my suggestion, tried the examples in the new edition of Nicolai Josuttis book and found in one those an escaped closed curly bracket (note, closed, open are definitely fine), which apparently most of the other implementations do not reject. Ah I see. I definitely agree it's good to accept that instead of being unnecessarily strict, but other people will want the option of strict conformance, so I think we can please everyone with something like: else { #ifdef __STRICT_ANSI__ __throw_regex_error(regex_constants::error_escape); #else _M_token = _S_token_ord_char; _M_value.assign(1, __c); #endif }
[committed] Fix move_sese_region_to_fn (PR middle-end/58551)
Hi! I've committed the following fix to a regression introduced in 4.9 early loop construction. SESE regions, as documented above move_sese_region_to_fn, are allowed to contain calls to noreturn functions like abort/exit. But, basic blocks leading to noreturn functions aren't actually placed in the loop inside of which the SESE region is present, but directly inside of the outermost loop of the function. So, we can't just move change loop_father of bb's belonging to entry_bb's loop_father to new function's outermost loop and move loops which have their outer loop equal to entry_bb's loop_father and have their header in the SESE region into the new function, but we also have to handle the same way the outermost loop of the original function. Bootstrapped/regtested on x86_64-linux and i686-linux, preapproved by richi on IRC, committed to trunk. 2013-09-27 Jakub Jelinek ja...@redhat.com PR middle-end/58551 * tree-cfg.c (move_sese_region_to_fn): Also move loops that are children of outermost saved_cfun's loop, and set it up to be moved to dest_cfun's outermost loop. Fix up num_nodes adjustments if loop != loop0 and SESE region contains bbs that belong to loop0. * c-c++-common/gomp/pr58551.c: New test. --- gcc/tree-cfg.c.jj 2013-09-13 14:41:28.0 +0200 +++ gcc/tree-cfg.c 2013-09-27 12:23:48.582217401 +0200 @@ -6662,12 +6662,13 @@ move_sese_region_to_fn (struct function struct function *saved_cfun = cfun; int *entry_flag, *exit_flag; unsigned *entry_prob, *exit_prob; - unsigned i, num_entry_edges, num_exit_edges; + unsigned i, num_entry_edges, num_exit_edges, num_nodes; edge e; edge_iterator ei; htab_t new_label_map; struct pointer_map_t *vars_map, *eh_map; struct loop *loop = entry_bb-loop_father; + struct loop *loop0 = get_loop (saved_cfun, 0); struct move_stmt_d d; /* If ENTRY does not strictly dominate EXIT, this cannot be an SESE @@ -6760,16 +6761,29 @@ move_sese_region_to_fn (struct function set_loops_for_fn (dest_cfun, loops); /* Move the outlined loop tree part. */ + num_nodes = bbs.length (); FOR_EACH_VEC_ELT (bbs, i, bb) { - if (bb-loop_father-header == bb - loop_outer (bb-loop_father) == loop) + if (bb-loop_father-header == bb) { struct loop *this_loop = bb-loop_father; - flow_loop_tree_node_remove (bb-loop_father); - flow_loop_tree_node_add (get_loop (dest_cfun, 0), this_loop); - fixup_loop_arrays_after_move (saved_cfun, cfun, this_loop); + struct loop *outer = loop_outer (this_loop); + if (outer == loop + /* If the SESE region contains some bbs ending with +a noreturn call, those are considered to belong +to the outermost loop in saved_cfun, rather than +the entry_bb's loop_father. */ + || outer == loop0) + { + if (outer != loop) + num_nodes -= this_loop-num_nodes; + flow_loop_tree_node_remove (bb-loop_father); + flow_loop_tree_node_add (get_loop (dest_cfun, 0), this_loop); + fixup_loop_arrays_after_move (saved_cfun, cfun, this_loop); + } } + else if (bb-loop_father == loop0 loop0 != loop) + num_nodes--; /* Remove loop exits from the outlined region. */ if (loops_for_fn (saved_cfun)-exits) @@ -6789,6 +6803,7 @@ move_sese_region_to_fn (struct function /* Setup a mapping to be used by move_block_to_fn. */ loop-aux = current_loops-tree_root; + loop0-aux = current_loops-tree_root; pop_cfun (); @@ -6817,11 +6832,13 @@ move_sese_region_to_fn (struct function } loop-aux = NULL; + loop0-aux = NULL; /* Loop sizes are no longer correct, fix them up. */ - loop-num_nodes -= bbs.length (); + loop-num_nodes -= num_nodes; for (struct loop *outer = loop_outer (loop); outer; outer = loop_outer (outer)) -outer-num_nodes -= bbs.length (); +outer-num_nodes -= num_nodes; + loop0-num_nodes -= bbs.length () - num_nodes; if (saved_cfun-has_simduid_loops || saved_cfun-has_force_vect_loops) { --- gcc/testsuite/c-c++-common/gomp/pr58551.c.jj2013-09-27 11:18:20.825251967 +0200 +++ gcc/testsuite/c-c++-common/gomp/pr58551.c 2013-09-27 11:17:56.0 +0200 @@ -0,0 +1,33 @@ +/* PR middle-end/58551 */ +/* { dg-do compile } */ +/* { dg-options -O0 -fopenmp } */ + +void +foo (int *a) +{ + int i; + for (i = 0; i 8; i++) +#pragma omp task +if (a[i]) + __builtin_abort (); +} + +void bar (int, int); + +void +baz (int *a) +{ + int i; + for (i = 0; i 8; i++) +#pragma omp task +if (a[i]) + { + int j, k; + for (j = 0; j 10; j++) + for (k = 0; k 8; k++) + bar (j, k); + for (k = 0; k 12; k++) + bar (-1, k); + __builtin_abort (); + } +} Jakub
[patch] fix libstdc++/57465
PR libstdc++/57465 * include/std/functional (_Function_base::_Base_manager::_M_not_empty_function): Fix overload for pointers. * testsuite/20_util/function/cons/57465.cc: New. Tested x86_64-linux, committed to trunk. I'll apply it to the branches after it's been on trunk without problems for a while. commit 55531e9c74a5f2b4699250b6b302d49f7dc8c5ae Author: Jonathan Wakely jwakely@gmail.com Date: Wed Aug 7 01:38:39 2013 +0100 PR libstdc++/57465 * include/std/functional (_Function_base::_Base_manager::_M_not_empty_function): Fix overload for pointers. * testsuite/20_util/function/cons/57465.cc: New. diff --git a/libstdc++-v3/include/std/functional b/libstdc++-v3/include/std/functional index 63ba777..73cddfe 100644 --- a/libstdc++-v3/include/std/functional +++ b/libstdc++-v3/include/std/functional @@ -1932,7 +1932,7 @@ _GLIBCXX_HAS_NESTED_TYPE(result_type) templatetypename _Tp static bool - _M_not_empty_function(const _Tp* __fp) + _M_not_empty_function(_Tp* const __fp) { return __fp; } templatetypename _Class, typename _Tp diff --git a/libstdc++-v3/testsuite/20_util/function/cons/57465.cc b/libstdc++-v3/testsuite/20_util/function/cons/57465.cc new file mode 100644 index 000..44413fb --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/function/cons/57465.cc @@ -0,0 +1,31 @@ +// Copyright (C) 2013 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. +// +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. +// +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +// libstdc++/57465 + +// { dg-options -std=gnu++11 } + +#include functional +#include testsuite_hooks.h + +int main() +{ + using F = void(); + F* f = nullptr; + std::functionF x(f); + VERIFY( !x ); +}
[PATCH] Invalid unpoisoning of stack redzones on ARM
Hi all, I've recently submitted a bug report regarding invalid unpoisoning of stack frame redzones (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take a look at proposed patch (a simple one-liner) and check whether it's ok for commit? Thanks! -Yuri diff --git a/gcc/asan.c b/gcc/asan.c index 32f1837..acb00ea 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -895,7 +895,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) gcc_assert ((len 3) == 0); top_label = gen_label_rtx (); - addr = force_reg (Pmode, XEXP (shadow_mem, 0)); + addr = copy_to_reg (force_reg (Pmode, XEXP (shadow_mem, 0))); shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0); end = force_reg (Pmode, plus_constant (Pmode, addr, len)); emit_label (top_label);
Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition
On Thu, Sep 26, 2013 at 3:02 PM, Jan Hubicka hubi...@ucw.cz wrote: Why not just have probably_never_executed_bb_p return simply return false bb-frequency is non-zero (right now it does the opposite - We want to have frequencies guessed for functions that was not trained in the profiling run (that was patch I posted earlier that I think did not go in, yet). Right, but for splitting and bb layout purposes, for these statically guessed unprofiled functions we in fact don't want to do any splitting or treat the bbs as never executed (which shouldn't be a change from the status quo since all the bbs in these functions are currently 0 weight, it's only when we inline in the case of comdats that they appear colder than the surrounding code, but in fact we don't want this). The only other caller to probably_never_executed_bb_p is compute_function_frequency, but in the case of statically guessed functions they will have profile_status != PROFILE_READ and won't invoke probably_never_executed_bb_p. But re-reading our most recent exchange on the comdat profile issue, it sounds like you were suggesting guessing profiles for all 0-weight functions early, then dropping them from PROFILE_READ to PROFILE_GUESSED only once we determine in ipa-inline that there is a potentially non-zero call path to them. In that case with the change I describe above to probably_never_executed_bb_p, the 0-weight functions with 0 calls to them will incorrectly be marked as NODE_FREQUENCY_NORMAL, which would be bad as they would not be size optimized or moved into the cold section. So it seems like we want different handling of these guessed frequencies in compute_function_frequency and bb-reorder.c. Actually I think we can handle this by checking if the function entry block has a 0 count. If so, then we just look at the bb counts and not the frequencies for determining bb hotness as the frequencies would presumably have been statically-guessed. This will ensure that the cgraph node continues to be marked unlikely and size-optimized. If the function entry block has a non-zero count, then we look at both the bb count and the bb frequency - if they are both zero then the bb is probably never executed, but if either is non-zero then we should treat the block as possibly executed (which will come into play for splitting and bb layout). Teresa Currently I return true when frequency indicate that BB is executed at least in 1/4th of all executions. With the cases discussed I see we may need to reduce this threshold. In general I do not like much hard tests for 0 because meaning of 0 depends on REG_BR_FREQ_BASE that is supposed to be changeable and we may want to make frequencies sreal, too. I suppose we may introduce --param for this. You are also right that I should update probably_never_executed_edge_p (I intended so, but obviously the code ended up in mainline accidentally). I however saw at least one case of jump threading where this trick did not help: the jump threading update confused itself by scaling via counts rather than frequencies and ended up with dropping everything to 0. This makes it more tempting to try to go with sreals for those Honza returns true when bb-frequency is 0)? Making this change removed a bunch of other failures. With this change as well, there are only 3 cases that still fail with 1 train run that pass with 100. Need to look at those. Will you look into logic of do_jump or shall I try to dive in? I can take a look, but probably won't have a chance until late this week. If you don't get to it before then I will see if I can figure out why it is applying the branch probabilities this way. Teresa Honza -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: OMP4/cilkplus: simd clone function mangling
On Thu, Sep 26, 2013 at 02:31:33PM -0500, Aldy Hernandez wrote: --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -42806,6 +42806,43 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val) return val; } +/* Return the default vector mangling ISA code when none is specified + in a `processor' clause. */ + +static char +ix86_cilkplus_default_vector_mangling_isa_code (struct cgraph_node *clone + ATTRIBUTE_UNUSED) +{ + return 'x'; +} I think rth was suggesting using vecsize_mangle, vecsize_modifier or something else, instead of ISA, because it won't represent the ISA on all targets. It is just some magic letter used in mangling of the simd functions. + + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to omp + declare simd. */ + bool cilk_clone = flag_enable_cilkplus + lookup_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); Formatting. I'd say it should be bool cilk_clone = (flag_enable_cilkplus lookup_attribute (cilk plus elemental, DECL_ATTRIBUTES (new_node-symbol.decl))); + if (cilk_clone) +remove_attribute (cilk plus elemental, + DECL_ATTRIBUTES (new_node-symbol.decl)); I think it doesn't make sense to remove the attribute. + pretty_printer vars_pp; Do you really need two different pretty printers? Can't you just print _ZGV%c%c%d into pp (is pp_printf that cheap, wouldn't it be better to pp_string (pp, _ZGV), 2 pp_character + one pp_decimal_int?), and then do the loop over the args, which right now writes into vars_pp and finally pp_underscore and pp_string the normally mangled name? +/* Create a simd clone of OLD_NODE and return it. */ + +static struct cgraph_node * +simd_clone_create (struct cgraph_node *old_node) +{ + struct cgraph_node *new_node; + new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false, + NULL, NULL, simdclone); + My understanding of how IPA cloning etc. works is that you first set up various data structures describing how you change the arguments and only then actually do cgraph_function_versioning which already during the copying will do some of the transformations of the IL. But perhaps those transformations are too complicated to describe for tree-inline.c to make them for you. + tree attr = lookup_attribute (omp declare simd, + DECL_ATTRIBUTES (node-symbol.decl)); + if (!attr) +return; + do +{ + struct cgraph_node *new_node = simd_clone_create (node); + + bool inbranch_clause; + simd_clone_clauses_extract (new_node, TREE_VALUE (attr), + inbranch_clause); + simd_clone_compute_isa_and_simdlen (new_node); + simd_clone_mangle (node, new_node); As discussed on IRC, I was hoping that for OpenMP simd and selected targets (e.g. i?86-linux and x86_64-linux) we could do better than that, creating not just one or two clones as we do for Cilk+ where one can select which CPU (and thus ISA) he wants to build the clones for, but creating clones for all ISAs, and just based on command line options either emit just one of them as the really optimized one and the others just as thunks that would just call other simd clone functions or the normal function possibly several times. Jakub
Re: [PATCH] Invalid unpoisoning of stack redzones on ARM
On Fri, Sep 27, 2013 at 06:10:41PM +0400, Yury Gribov wrote: Hi all, I've recently submitted a bug report regarding invalid unpoisoning of stack frame redzones (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone take a look at proposed patch (a simple one-liner) and check whether it's ok for commit? Can you please be more verbose on why do you think it is the right fix, what exactly is the problem and why force_reg wasn't sufficient? What exactly was XEXP (shadow_mem, 0) that force_reg didn't force it into a pseudo? Also, you are missing a ChangeLog entry. diff --git a/gcc/asan.c b/gcc/asan.c index 32f1837..acb00ea 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -895,7 +895,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) gcc_assert ((len 3) == 0); top_label = gen_label_rtx (); - addr = force_reg (Pmode, XEXP (shadow_mem, 0)); + addr = copy_to_reg (force_reg (Pmode, XEXP (shadow_mem, 0))); shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0); end = force_reg (Pmode, plus_constant (Pmode, addr, len)); emit_label (top_label); Jakub
Add value range support into memcpy/memset expansion
Hi, this patch makes it possible to access value range info from setmem/movstr that I plan to use in i386 memcpy/memset expansion code. It is all quite straighforward except that I need to deal with cases where max size does not fit in HOST_WIDE_INT where I use maximal value as a marker. It is then translated as NULL pointer to the expander that is bit inconsistent with other places that use -1 as marker of unknown value. I also think we lose some cases because of TER replacing out the SSA_NAME by something else, but it seems to work in quite many cases. This can be probably tracked incrementally by disabling TER here or finally getting away from expanding calls via the generic route. Bootstrapped/regtested x86_64-linux, OK? Honza * doc/md.texi (setmem, movstr): Update documentation. * builtins.c (determine_block_size): New function. (expand_builtin_memcpy): Use it and pass it to emit_block_move_hints. (expand_builtin_memset_args): Use it and pass it to set_storage_via_setmem. * expr.c (emit_block_move_via_movmem): Add min_size/max_size parameters; update call to expander. (emit_block_move_hints): Add min_size/max_size parameters. (clear_storage_hints): Likewise. (set_storage_via_setmem): Likewise. (clear_storage): Update. * expr.h (emit_block_move_hints, clear_storage_hints, set_storage_via_setmem): Update prototype. Index: doc/md.texi === --- doc/md.texi (revision 202968) +++ doc/md.texi (working copy) @@ -5198,6 +5198,9 @@ destination and source strings are opera the expansion of this pattern should store in operand 0 the address in which the @code{NUL} terminator was stored in the destination string. +This patern has also several optional operands that are same as in +@code{setmem}. + @cindex @code{setmem@var{m}} instruction pattern @item @samp{setmem@var{m}} Block set instruction. The destination string is the first operand, @@ -5217,6 +5220,8 @@ respectively. The expected alignment di in a way that the blocks are not required to be aligned according to it in all cases. This expected alignment is also in bytes, just like operand 4. Expected size, when unknown, is set to @code{(const_int -1)}. +Operand 7 is the minimal size of the block and operand 8 is the +maximal size of the block (NULL if it can not be represented as CONST_INT). The use for multiple @code{setmem@var{m}} is as for @code{movmem@var{m}}. Index: builtins.c === --- builtins.c (revision 202968) +++ builtins.c (working copy) @@ -3070,6 +3070,51 @@ builtin_memcpy_read_str (void *data, HOS return c_readstr (str + offset, mode); } +/* LEN specify length of the block of memcpy/memset operation. + Figure out its range and put it into MIN_SIZE/MAX_SIZE. */ + +static void +determine_block_size (tree len, rtx len_rtx, + unsigned HOST_WIDE_INT *min_size, + unsigned HOST_WIDE_INT *max_size) +{ + if (CONST_INT_P (len_rtx)) +{ + *min_size = *max_size = UINTVAL (len_rtx); + return; +} + else +{ + double_int min, max; + if (TREE_CODE (len) == SSA_NAME + get_range_info (len, min, max) == VR_RANGE) + { + if (min.fits_uhwi ()) + *min_size = min.to_uhwi (); + else + *min_size = 0; + if (max.fits_uhwi ()) + *max_size = max.to_uhwi (); + else + *max_size = (HOST_WIDE_INT)-1; + } + else + { + if (host_integerp (TYPE_MIN_VALUE (TREE_TYPE (len)), 1)) + *min_size = tree_low_cst (TYPE_MIN_VALUE (TREE_TYPE (len)), 1); + else + *min_size = 0; + if (host_integerp (TYPE_MAX_VALUE (TREE_TYPE (len)), 1)) + *max_size = tree_low_cst (TYPE_MAX_VALUE (TREE_TYPE (len)), 1); + else + *max_size = GET_MODE_MASK (GET_MODE (len_rtx)); + } +} + gcc_checking_assert (*max_size = + (unsigned HOST_WIDE_INT) + GET_MODE_MASK (GET_MODE (len_rtx))); +} + /* Expand a call EXP to the memcpy builtin. Return NULL_RTX if we failed, the caller should emit a normal call, otherwise try to get the result in TARGET, if convenient (and in @@ -3092,6 +3137,8 @@ expand_builtin_memcpy (tree exp, rtx tar rtx dest_mem, src_mem, dest_addr, len_rtx; HOST_WIDE_INT expected_size = -1; unsigned int expected_align = 0; + unsigned HOST_WIDE_INT min_size; + unsigned HOST_WIDE_INT max_size; /* If DEST is not a pointer type, call the normal function. */ if (dest_align == 0) @@ -3111,6 +3158,7 @@ expand_builtin_memcpy (tree exp, rtx tar dest_mem = get_memory_rtx (dest, len); set_mem_align (dest_mem, dest_align); len_rtx = expand_normal (len); +
Re: Generic tuning in x86-tune.def 1/2
On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this is second part of the generic tuning changes sanityzing the tuning flags. This patch again is supposed to deal with the obvious part only. I will send separate patch for more changes. The flags changed agree on all CPUs considered for generic (and their optimization manuals) + amdfam10, core2 and Atom SLM. I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it seems like obvious omision (after double checking in optimization manual) and droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. Implementation of this feature was always bit weird and its main purpose was to avoid terrible branch predictor degeneration on the older AMD branch predictors. I benchmarked both spec2k and 2k6 to verify there are no regression. Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements in specfp benchmarks. Bootstrapped/regtested x86_64-linux, will wait for comments and commit it during weekend. I will be happy to revisit any of the generic tuning if regressions pop up. Overall this patch also brings small code size improvements for smaller loads/stores and less padding at -O2. Differences are sub 0.1% however. Honza * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for generic. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer. (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips. Can we drop generic on X86_TUNE_PAD_RETURNS? (X86_TUNE_AVOID_VECTOR_DECODE): Drop for generic. (X86_TUNE_REASSOC_FP_TO_PARALLEL): Enable for generic. -- H.J.
Re: [ping] [PATCH] Silence an unused variable warning
On 13-09-27 4:55 AM, Dodji Seketeli wrote: Let's CC Vladimir on this easy one. Cheers. All targets I know have ELIMINABLE_REGS defined. Therefore it was not caught before. . The patch is ok for me. Thanks. Jan-Benedict Glaw jbg...@lug-owl.de a écrit: On Fri, 2013-09-20 20:51:37 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote: Hi! With the VAX target, I see this warning: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../../gcc/gcc -I../../../../gcc/gcc/. -I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include -I../../../../gcc/gcc/../libdecnumber -I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../../gcc/gcc/../libbacktrace ../../../../gcc/gcc/lra-eliminations.c -o lra-eliminations.o ../../../../gcc/gcc/lra-eliminations.c: In function ‘void init_elim_table()’: ../../../../gcc/gcc/lra-eliminations.c:1162:8: warning: unused variable ‘value_p’ [-Wunused-variable] bool value_p; ^ [...]
Re: [PATCH] Make jump thread path carry more information
On 09/27/2013 08:42 AM, James Greenhalgh wrote: On Thu, Sep 26, 2013 at 04:26:35AM +0100, Jeff Law wrote: Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Installed on trunk. Hi Jeff, This patch caused a regression on Arm and AArch64 in: PASS-FAIL: gcc.c-torture/execute/memcpy-2.c execution, -O3 -fomit-frame-pointer From what I can see, the only place the behaviour of the threader has changed is in this hunk: Yes. The old code was dropping the tail off the thread path; if we're seeing failures on the ARM port as a result of fixing that goof we obviously need to address them. Let me take a looksie :-) If you could pass along a .i file it'd be helpful in case I want to look at something under the debugger. jeff
Re: Generic tuning in x86-tune.def 1/2
On Fri, Sep 27, 2013 at 1:56 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, this is second part of the generic tuning changes sanityzing the tuning flags. This patch again is supposed to deal with the obvious part only. I will send separate patch for more changes. The flags changed agree on all CPUs considered for generic (and their optimization manuals) + amdfam10, core2 and Atom SLM. I also added X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL to bobcat tuning, since it seems like obvious omision (after double checking in optimization manual) and droped X86_TUNE_FOUR_JUMP_LIMIT for buldozer cores. Implementation of this feature was always bit weird and its main purpose was to avoid terrible branch predictor degeneration on the older AMD branch predictors. I benchmarked both spec2k and 2k6 to verify there are no regression. Especially X86_TUNE_REASSOC_FP_TO_PARALLEL seems to bring nice improvements in specfp benchmarks. Bootstrapped/regtested x86_64-linux, will wait for comments and commit it during weekend. I will be happy to revisit any of the generic tuning if regressions pop up. Overall this patch also brings small code size improvements for smaller loads/stores and less padding at -O2. Differences are sub 0.1% however. Honza * x86-tune.def (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Enable for generic. (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Likewise. (X86_TUNE_FOUR_JUMP_LIMIT): Drop for generic and buldozer. (X86_TUNE_PAD_RETURNS): Drop for newer AMD chips. Can we drop generic on X86_TUNE_PAD_RETURNS? It is on my list for not-so-obvious changes. I tested and removed it from BDVER with intention to drop it from generic. But after furhter testing I lean towards keeping it for some extra time. I tested it on fam10 machines and it causes over 10% regressions on some benchmarks, including bzip and botan (where it is up to 4-fold regression). Missing a return on amdfam10 hardware is bad, because it causes return stack to go out of sync. At the same time I can not really measure benefits for disabling it - the code size cost is very small and runtime cost on non-amdfam10 cores is not important, too, since the function call overhead hide the extra nop quite easily. So I would incline to be apply extra care on this flag and keep it for extra release or two. Most of gcc.opensuse.org testing runs on these and adding random branch mispredictions will trash them. At the related note, would would you think of X86_TUNE_PARTIAL_FLAG_REG_STALL? I benchmarked it on my I5 notebook and it seems to have no measurable effects on spec2k6. I also did some benchmarking of the patch to disable alignments you proposed. Unforutnately I can measure slowdowns on fam10/bdver/and on botan/hand written loops even for core. I am considering to drop the branch target/function alignment and keep only loop alignment, but I did not test this yet. Honza