Fix gcc.dg/lower-subreg-1.c failure (was: [C Patch]: pr52543)
From: Richard Sandiford rdsandif...@googlemail.com Date: Tue, 1 May 2012 16:46:38 +0200 To repeat: as things stand, very few targets define proper rtx costs for SET. IMHO it's wrong to start blaming targets when rtx_cost doesn't take the mode in account in the first place, for the default cost. (Well, except for the modes-tieable subreg special-case.) The targets where an operation in N * word_mode costs no more than one in word_mode, if there even is one, is a minority, let's adjust the defaults to that. This patch is therefore expected to prevent lower-subreg from running in cases where it's actually benefical. If you see that happening, please check whether the rtx_costs are defined properly. Well, for CRIS (one of the targets of the PR53176 fallout) they are sane, basically. Where cris_rtx_costs returns true, it returns mostly(*) ballparkly-correct costs *where it's passed an rtx for which there's a corresponding insn*, otherwise falling back to the defaults. It shouldn't have to check for validity of the rtx asked about; core GCC already knows which insns there are and can gate that in rtx_cost or its callers. (*) I see a bug in that cris_rtx_costs doesn't check the mode for extendsidi2, to return COSTS_N_INSNS (3) instead of 0 (because a sign-extending move to SImode doesn't cost more than a move; a sign- or zero-extension is also free in an operand for addition and multiplication). But this isn't on the path that lower-subreg.c takes, so only an incidental observation... Of course, if the costs are defined properly and lower-subreg still makes the wrong choice, we need to look at why. By the way, regarding validity of rtx_cost calls: +++ gcc/lower-subreg.c 2012-05-01 09:46:48.473830772 +0100 +/* Return the cost of a CODE shift in mode MODE by OP1 bits, using the + rtxes in RTXES. SPEED_P selects between the speed and size cost. */ + +static int +shift_cost (bool speed_p, struct cost_rtxes *rtxes, enum rtx_code code, + enum machine_mode mode, int op1) +{ + PUT_MODE (rtxes-target, mode); + PUT_CODE (rtxes-shift, code); + PUT_MODE (rtxes-shift, mode); + PUT_MODE (rtxes-source, mode); + XEXP (rtxes-shift, 1) = GEN_INT (op1); + SET_SRC (rtxes-set) = rtxes-shift; + return insn_rtx_cost (rtxes-set, speed_p); +} + +/* For each X in the range [0, BITS_PER_WORD), set SPLITTING[X] + to true if it is profitable to split a double-word CODE shift + of X + BITS_PER_WORD bits. SPEED_P says whether we are testing + for speed or size profitability. + + Use the rtxes in RTXES to calculate costs. WORD_MOVE_ZERO_COST is + the cost of moving zero into a word-mode register. WORD_MOVE_COST + is the cost of moving between word registers. */ + +static void +compute_splitting_shift (bool speed_p, struct cost_rtxes *rtxes, +bool *splitting, enum rtx_code code, +int word_move_zero_cost, int word_move_cost) +{ I think there should be a gating check whether the target implements that kind of shift in that mode at all, before checking the cost. Not sure whether it's generally best to put that test here, or to make the rtx_cost function return the cost of a libcall for that mode when that happens. Similar for the other insns. Isn't the below better than doing virtually the same in each target's rtx_costs? Not tested yet besides make cc1 and checking that lower-subreg.c yields sane costs and that gcc.dg/lower-subreg-1.c passes for cris-elf. Note that untieable SUBREGs still get a higher cost than tieable ones. I'll test this for cris-elf, please tell me if/what other tests and targets are required (simulator or compilefarm targets only, please). * rtlanal.c (rtx_cost): Adjust default cost for X with a UNITS_PER_WORD factor for all X according to the size of its mode, not just for SUBREGs with untieable modes. Index: gcc/rtlanal.c === --- gcc/rtlanal.c (revision 187308) +++ gcc/rtlanal.c (working copy) @@ -3755,10 +3755,17 @@ rtx_cost (rtx x, enum rtx_code outer_cod enum rtx_code code; const char *fmt; int total; + int factor; if (x == 0) return 0; + /* A size N times larger than UNITS_PER_WORD likely needs N times as + many insns, taking N times as long. */ + factor = GET_MODE_SIZE (GET_MODE (x)) / UNITS_PER_WORD; + if (factor == 0) +factor = 1; + /* Compute the default costs of certain things. Note that targetm.rtx_costs can override the defaults. */ @@ -3766,20 +3773,27 @@ rtx_cost (rtx x, enum rtx_code outer_cod switch (code) { case MULT: - total = COSTS_N_INSNS (5); + total = factor * COSTS_N_INSNS (5); break; case DIV: case UDIV: case MOD: case UMOD: - total = COSTS_N_INSNS (7); + total = factor * COSTS_N_INSNS (7); break; case USE: /* Used in combine.c as a
Re: [PATCH] MIPS16: Fix truncated DWARF-2 line information
On Tue, 8 May 2012, Richard Sandiford wrote: Are you using a hard-float multilib for your -mabi=32/-mips16 Linux testing? Yeah. As an example: http://gcc.gnu.org/ml/gcc-testresults/2012-03/msg00393.html which doesn't look to bad. Clean fortran results, which I expect would test the FP interworking fairly heavily. (It's certainly been a source of bug fixes in the past, although I don't remember the results ever being terrible.) Yes, these look very good indeed. Especially with QEMU that I do not feel terribly confident about as far as MIPS16 emulation is concerned (I still need to track down a single piece of real silicon supporting both MIPS64 and MIPS16 code at a time). FAOD, this is with normal MIPS libraries and mips16 executables. There's still no way of building mips16 multilibs out of the box. I've checked some notes and the issue was with MIPS16 FP PIC code (and therefore obviously SVR4 stubs rather than PLT) indeed. I hope these pieces will get submitted eventually. Maciej
[PATCH] Add option for dumping to stderr (issue6190057)
In response to comments, I have updated the patch to support dumps in user provided files via the option -fdump-xxx=filename. The filenames stdout/stderr are treated specially, and are considered standard streams. Also updated documentation and a testcase. Okay for trunk? Thanks, Sharad 2012-05-08 Sharad Singhai sing...@google.com * doc/invoke.texi: Add documentation for new option. * tree-dump.c (dump_stream_p): New function. (dump_files): Update for new field. (dump_switch_p_1): Handle user provided filenames. (dump_begin): Likewise. (get_dump_file_name): Likewise. (dump_enable_all): Add new parameter USER_FILENAME. All callers updated. (dump_end): Remove attribute. * tree-pass.h (enum tree_dump_index): Add new constant. (struct dump_file_info): Add new field USER_FILENAME. * testsuite/g++.dg/other/dump-userfile-1.C: New test. Index: doc/invoke.texi === --- doc/invoke.texi (revision 187265) +++ doc/invoke.texi (working copy) @@ -5322,20 +5322,24 @@ Here are some examples showing uses of these optio @item -d@var{letters} @itemx -fdump-rtl-@var{pass} +@itemx -fdump-rtl-@var{pass}=@var{filename} @opindex d Says to make debugging dumps during compilation at times specified by @var{letters}. This is used for debugging the RTL-based passes of the compiler. The file names for most of the dumps are made by appending a pass number and a word to the @var{dumpname}, and the files are -created in the directory of the output file. Note that the pass -number is computed statically as passes get registered into the pass -manager. Thus the numbering is not related to the dynamic order of -execution of passes. In particular, a pass installed by a plugin -could have a number over 200 even if it executed quite early. -@var{dumpname} is generated from the name of the output file, if -explicitly specified and it is not an executable, otherwise it is the -basename of the source file. These switches may have different effects -when @option{-E} is used for preprocessing. +created in the directory of the output file. If the +@option{=@var{filename}} is appended to the longer form of the dump +option then the dump is done on that file instead of numbered +files. The filenames stdout and stderr are treated specially. Note +that the pass number is computed statically as passes get registered +into the pass manager. Thus the numbering is not related to the +dynamic order of execution of passes. In particular, a pass installed +by a plugin could have a number over 200 even if it executed quite +early. @var{dumpname} is generated from the name of the output file, +if explicitly specified and it is not an executable, otherwise it is +the basename of the source file. These switches may have different +effects when @option{-E} is used for preprocessing. Debug dumps can be enabled with a @option{-fdump-rtl} switch or some @option{-d} option @var{letters}. Here are the possible @@ -5599,6 +5603,10 @@ These dumps are defined but always produce empty f @opindex fdump-rtl-all Produce all the dumps listed above. +@item -fdump-rtl-all=stderr +@opindex fdump-rtl-all=stderr +Produce all RTL dumps on stderr. + @item -dA @opindex dA Annotate the assembler output with miscellaneous debugging information. @@ -5719,15 +5727,19 @@ counters for each function compiled. @item -fdump-tree-@var{switch} @itemx -fdump-tree-@var{switch}-@var{options} +@itemx -fdump-tree-@var{switch}-@var{options}=@var{filename} @opindex fdump-tree Control the dumping at various stages of processing the intermediate language tree to a file. The file name is generated by appending a switch specific suffix to the source file name, and the file is -created in the same directory as the output file. If the -@samp{-@var{options}} form is used, @var{options} is a list of -@samp{-} separated options which control the details of the dump. Not -all options are applicable to all dumps; those that are not -meaningful are ignored. The following options are available +created in the same directory as the output file. In case of +@option{=@var{filename}}, the dump output is on the given file. Note +that the filenames stdout and stderr are treated specially and dumps +are done on standard streams. If the @samp{-@var{options}} form is +used, @var{options} is a list of @samp{-} separated options which +control the details or location of the dump. Not all options are +applicable to all dumps; those that are not meaningful are ignored. +The following options are available @table @samp @item address @@ -5765,9 +5777,56 @@ Enable showing the tree dump for each statement. Enable showing the EH region number holding each statement. @item scev Enable showing scalar evolution analysis details. +@item slim +Inhibit dumping of members of a scope or body of a function merely +because
Re: patch ping: Add static branch predict heuristic of comparing IV to loop_bound variable
On Wed, May 09, 2012 at 09:02:14AM +0800, Dehao Chen wrote: Sorry for the error. Here is a new patch to fix them: gcc/testsuite/ChangeLog: 2012-05-08 Dehao Chen de...@google.com * gcc.dg/predict-1.c: Remove the replicated text in this text. * gcc.dg/predict-2.c: Likewise. * gcc.dg/predict-3.c: Likewise. * gcc.dg/predict-4.c: Likewise. * gcc.dg/predict-5.c: Likewise. * gcc.dg/predict-6.c: Likewise. Ok (you could have committed it as obvious even). --- gcc/ChangeLog (revision 187307) +++ gcc/ChangeLog (working copy) @@ -110,15 +110,15 @@ 2012-05-08 Dehao Chen de...@google.com - * predict.c (find_qualified_ssa_name): New - (find_ssa_name_in_expr): New - (find_ssa_name_in_assign_stmt): New - (is_comparison_with_loop_invariant_p): New - (is_bound_expr_similar): New - (predict_iv_comparison): New + * predict.c (find_qualified_ssa_name): New. + (find_ssa_name_in_expr): New. + (find_ssa_name_in_assign_stmt): New. + (is_comparison_with_loop_invariant_p): New. + (is_bound_expr_similar): New. + (predict_iv_comparison): New. (predict_loops): Add heuristic for loop-nested branches that compare an induction variable to a loop bound variable. - * predict.def (PRED_LOOP_IV_COMPARE): New macro + * predict.def (PRED_LOOP_IV_COMPARE): New macro. 2012-05-08 Uros Bizjak ubiz...@gmail.com Index: gcc/testsuite/gcc.dg/predict-3.c === --- gcc/testsuite/gcc.dg/predict-3.c (revision 187307) +++ gcc/testsuite/gcc.dg/predict-3.c (working copy) @@ -23,28 +23,3 @@ /* { dg-final { scan-tree-dump-times loop iv compare heuristics: 100.0% 4 profile_estimate} } */ /* { dg-final { cleanup-tree-dump profile_estimate } } */ -/* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-profile_estimate } */ - -extern int global; - -int bar(int); - -void foo (int bound) -{ - int i, ret = 0; - for (i = 0; i = bound; i++) -{ - if (i bound - 2) - global += bar (i); - if (i = bound) - global += bar (i); - if (i + 1 bound) - global += bar (i); - if (i != bound) - global += bar (i); -} -} - -/* { dg-final { scan-tree-dump-times loop iv compare heuristics: 100.0% 4 profile_estimate} } */ -/* { dg-final { cleanup-tree-dump profile_estimate } } */ Index: gcc/testsuite/gcc.dg/predict-4.c === --- gcc/testsuite/gcc.dg/predict-4.c (revision 187307) +++ gcc/testsuite/gcc.dg/predict-4.c (working copy) @@ -17,22 +17,3 @@ /* { dg-final { scan-tree-dump loop iv compare heuristics: 50.0% profile_estimate} } */ /* { dg-final { cleanup-tree-dump profile_estimate } } */ -/* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-profile_estimate } */ - -extern int global; - -int bar(int); - -void foo (int bound) -{ - int i, ret = 0; - for (i = 0; i 10; i++) -{ - if (i 5) - global += bar (i); -} -} - -/* { dg-final { scan-tree-dump loop iv compare heuristics: 50.0% profile_estimate} } */ -/* { dg-final { cleanup-tree-dump profile_estimate } } */ Index: gcc/testsuite/gcc.dg/predict-1.c === --- gcc/testsuite/gcc.dg/predict-1.c (revision 187307) +++ gcc/testsuite/gcc.dg/predict-1.c (working copy) @@ -25,30 +25,3 @@ /* { dg-final { scan-tree-dump-times loop iv compare heuristics: 0.0% 5 profile_estimate} } */ /* { dg-final { cleanup-tree-dump profile_estimate } } */ -/* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-profile_estimate } */ - -extern int global; - -int bar(int); - -void foo (int bound) -{ - int i, ret = 0; - for (i = 0; i bound; i++) -{ - if (i bound) - global += bar (i); - if (i = bound + 2) - global += bar (i); - if (i bound - 2) - global += bar (i); - if (i + 2 bound) - global += bar (i); - if (i == 10) - global += bar (i); -} -} - -/* { dg-final { scan-tree-dump-times loop iv compare heuristics: 0.0% 5 profile_estimate} } */ -/* { dg-final { cleanup-tree-dump profile_estimate } } */ Index: gcc/testsuite/gcc.dg/predict-5.c === --- gcc/testsuite/gcc.dg/predict-5.c (revision 187307) +++ gcc/testsuite/gcc.dg/predict-5.c (working copy) @@ -23,28 +23,3 @@ /* { dg-final { scan-tree-dump-times loop iv compare heuristics: 100.0% 4 profile_estimate} } */ /* { dg-final { cleanup-tree-dump profile_estimate } } */ -/* { dg-do compile } */ -/* { dg-options -O2 -fdump-tree-profile_estimate } */ - -extern int global; - -int bar (int); - -void foo (int base, int bound) -{ - int i, ret = 0; - for (i = base; i = bound; i++) -{ - if (i base) - global += bar (i); - if (i
Re: [PATCH] Add option for dumping to stderr (issue6190057)
On Tue, May 8, 2012 at 11:46 PM, Sharad Singhai sing...@google.com wrote: In response to comments, I have updated the patch to support dumps in user provided files via the option -fdump-xxx=filename. The filenames stdout/stderr are treated specially, and are considered standard streams. I think - should also be treated as special (or maybe the only one which should be treated as special). Thanks, Andrew Pinski Also updated documentation and a testcase. Okay for trunk? Thanks, Sharad 2012-05-08 Sharad Singhai sing...@google.com * doc/invoke.texi: Add documentation for new option. * tree-dump.c (dump_stream_p): New function. (dump_files): Update for new field. (dump_switch_p_1): Handle user provided filenames. (dump_begin): Likewise. (get_dump_file_name): Likewise. (dump_enable_all): Add new parameter USER_FILENAME. All callers updated. (dump_end): Remove attribute. * tree-pass.h (enum tree_dump_index): Add new constant. (struct dump_file_info): Add new field USER_FILENAME. * testsuite/g++.dg/other/dump-userfile-1.C: New test. Index: doc/invoke.texi === --- doc/invoke.texi (revision 187265) +++ doc/invoke.texi (working copy) @@ -5322,20 +5322,24 @@ Here are some examples showing uses of these optio @item -d@var{letters} @itemx -fdump-rtl-@var{pass} +@itemx -fdump-rtl-@var{pass}=@var{filename} @opindex d Says to make debugging dumps during compilation at times specified by @var{letters}. This is used for debugging the RTL-based passes of the compiler. The file names for most of the dumps are made by appending a pass number and a word to the @var{dumpname}, and the files are -created in the directory of the output file. Note that the pass -number is computed statically as passes get registered into the pass -manager. Thus the numbering is not related to the dynamic order of -execution of passes. In particular, a pass installed by a plugin -could have a number over 200 even if it executed quite early. -@var{dumpname} is generated from the name of the output file, if -explicitly specified and it is not an executable, otherwise it is the -basename of the source file. These switches may have different effects -when @option{-E} is used for preprocessing. +created in the directory of the output file. If the +@option{=@var{filename}} is appended to the longer form of the dump +option then the dump is done on that file instead of numbered +files. The filenames stdout and stderr are treated specially. Note +that the pass number is computed statically as passes get registered +into the pass manager. Thus the numbering is not related to the +dynamic order of execution of passes. In particular, a pass installed +by a plugin could have a number over 200 even if it executed quite +early. @var{dumpname} is generated from the name of the output file, +if explicitly specified and it is not an executable, otherwise it is +the basename of the source file. These switches may have different +effects when @option{-E} is used for preprocessing. Debug dumps can be enabled with a @option{-fdump-rtl} switch or some @option{-d} option @var{letters}. Here are the possible @@ -5599,6 +5603,10 @@ These dumps are defined but always produce empty f @opindex fdump-rtl-all Produce all the dumps listed above. +@item -fdump-rtl-all=stderr +@opindex fdump-rtl-all=stderr +Produce all RTL dumps on stderr. + @item -dA @opindex dA Annotate the assembler output with miscellaneous debugging information. @@ -5719,15 +5727,19 @@ counters for each function compiled. @item -fdump-tree-@var{switch} @itemx -fdump-tree-@var{switch}-@var{options} +@itemx -fdump-tree-@var{switch}-@var{options}=@var{filename} @opindex fdump-tree Control the dumping at various stages of processing the intermediate language tree to a file. The file name is generated by appending a switch specific suffix to the source file name, and the file is -created in the same directory as the output file. If the -@samp{-@var{options}} form is used, @var{options} is a list of -@samp{-} separated options which control the details of the dump. Not -all options are applicable to all dumps; those that are not -meaningful are ignored. The following options are available +created in the same directory as the output file. In case of +@option{=@var{filename}}, the dump output is on the given file. Note +that the filenames stdout and stderr are treated specially and dumps +are done on standard streams. If the @samp{-@var{options}} form is +used, @var{options} is a list of @samp{-} separated options which +control the details or location of the dump. Not all options are +applicable to all dumps; those that are not meaningful are ignored. +The following options are available @table @samp @item address @@
Re: [PATCH] Add option for dumping to stderr (issue6190057)
On Wed, May 9, 2012 at 1:51 AM, Andrew Pinski pins...@gmail.com wrote: On Tue, May 8, 2012 at 11:46 PM, Sharad Singhai sing...@google.com wrote: In response to comments, I have updated the patch to support dumps in user provided files via the option -fdump-xxx=filename. The filenames stdout/stderr are treated specially, and are considered standard streams. I think - should also be treated as special (or maybe the only one which should be treated as special). He originally wanted only stderr, so treating - only specially would not be in the line with his original goal. - is equivalent to stdout, so it is not like we don't have the functionally with his revised patch. Thanks, Andrew Pinski Also updated documentation and a testcase. Okay for trunk? Thanks, Sharad 2012-05-08 Sharad Singhai sing...@google.com * doc/invoke.texi: Add documentation for new option. * tree-dump.c (dump_stream_p): New function. (dump_files): Update for new field. (dump_switch_p_1): Handle user provided filenames. (dump_begin): Likewise. (get_dump_file_name): Likewise. (dump_enable_all): Add new parameter USER_FILENAME. All callers updated. (dump_end): Remove attribute. * tree-pass.h (enum tree_dump_index): Add new constant. (struct dump_file_info): Add new field USER_FILENAME. * testsuite/g++.dg/other/dump-userfile-1.C: New test. Index: doc/invoke.texi === --- doc/invoke.texi (revision 187265) +++ doc/invoke.texi (working copy) @@ -5322,20 +5322,24 @@ Here are some examples showing uses of these optio @item -d@var{letters} @itemx -fdump-rtl-@var{pass} +@itemx -fdump-rtl-@var{pass}=@var{filename} @opindex d Says to make debugging dumps during compilation at times specified by @var{letters}. This is used for debugging the RTL-based passes of the compiler. The file names for most of the dumps are made by appending a pass number and a word to the @var{dumpname}, and the files are -created in the directory of the output file. Note that the pass -number is computed statically as passes get registered into the pass -manager. Thus the numbering is not related to the dynamic order of -execution of passes. In particular, a pass installed by a plugin -could have a number over 200 even if it executed quite early. -@var{dumpname} is generated from the name of the output file, if -explicitly specified and it is not an executable, otherwise it is the -basename of the source file. These switches may have different effects -when @option{-E} is used for preprocessing. +created in the directory of the output file. If the +@option{=@var{filename}} is appended to the longer form of the dump +option then the dump is done on that file instead of numbered +files. The filenames stdout and stderr are treated specially. Note +that the pass number is computed statically as passes get registered +into the pass manager. Thus the numbering is not related to the +dynamic order of execution of passes. In particular, a pass installed +by a plugin could have a number over 200 even if it executed quite +early. @var{dumpname} is generated from the name of the output file, +if explicitly specified and it is not an executable, otherwise it is +the basename of the source file. These switches may have different +effects when @option{-E} is used for preprocessing. Debug dumps can be enabled with a @option{-fdump-rtl} switch or some @option{-d} option @var{letters}. Here are the possible @@ -5599,6 +5603,10 @@ These dumps are defined but always produce empty f @opindex fdump-rtl-all Produce all the dumps listed above. +@item -fdump-rtl-all=stderr +@opindex fdump-rtl-all=stderr +Produce all RTL dumps on stderr. + @item -dA @opindex dA Annotate the assembler output with miscellaneous debugging information. @@ -5719,15 +5727,19 @@ counters for each function compiled. @item -fdump-tree-@var{switch} @itemx -fdump-tree-@var{switch}-@var{options} +@itemx -fdump-tree-@var{switch}-@var{options}=@var{filename} @opindex fdump-tree Control the dumping at various stages of processing the intermediate language tree to a file. The file name is generated by appending a switch specific suffix to the source file name, and the file is -created in the same directory as the output file. If the -@samp{-@var{options}} form is used, @var{options} is a list of -@samp{-} separated options which control the details of the dump. Not -all options are applicable to all dumps; those that are not -meaningful are ignored. The following options are available +created in the same directory as the output file. In case of +@option{=@var{filename}}, the dump output is on the given file. Note +that the filenames stdout and stderr are treated specially and dumps +are done on standard streams. If the @samp{-@var{options}}
Re: [C++ Patch and pubnames 2/2] Adjust c decl pretty printer to match demangler (issue6195056)
On 9 May 2012 01:34, Cary Coutant ccout...@google.com wrote: A suggestion: Make dwarf_name call the demangler, and then a (new?) a function that converts a mangled decl to a human-readable string. In any case, the pretty-printer does a lot of stuff that is mostly useless for just printing a declaration (translation, wrapping, etc.). Bonus point if all GNU toolchain program use the same functions for demangling and undemagling (because I guess they actually don't, no?) I guess it cannot be so easy, so I apologize in advance for saying nonsense. It makes sense; and I don't think it is as complicated as it might sound. dwarf_name takes a tree; the demangler takes a mangled name. We don't have mangled names for many of the names we want to enter into the pubnames table. Sorry that was a typo. My proposal is to mangle the tree (GCC already does that) and then demangle it to a human-readable string (GCC also does that already, no? Or at least GDB and other programs have to do that anyway, no? In any case, the demangling code should be shared and, hence, consistent.) For the names that the mangler/demangler do not support, I guess it doesn't make sense to extend the demangler, so you will need to handle them explicitly. You may call then the pretty-printer, since anyway GDB and the other programs cannot demangle those names, so whatever is used by GCC should be fine. Or implement a custom pretty-printer for them. A custom pretty-printer will be for sure much simpler than the diagnostics pretty-printer, since you don't have to worry about all kinds of trees, wrapping, inheritance, i18n, etc. Probably I am missing some details that may make the plan above infeasible. In any case, I think your intention to make the output of the different toolchain programs more consistent is worthwhile. If dwarf_name has to use the pretty-printer, then I'd prefer that you changed the demangler to match the diagnostics output, rather than the other way around. Also, I am no maintainer, so these are merely suggestions that you may freely ignore. :-) Cheers, Manuel..
Re: [PATCH] Add option for dumping to stderr (issue6190057)
On Wed, May 9, 2012 at 1:46 AM, Sharad Singhai sing...@google.com wrote: [...] +@item -fdump-rtl-all=stderr +@opindex fdump-rtl-all=stderr You do not need to have a separate index entry for '=stderr' or '=stdout'. Rather, expand the description to state this in all the documentation for -fdump-xxx=yyy. [...] +/* Return non-zero iff the USER_FILENAME corresponds to stdout or + stderr stream. */ + +static int +dump_stream_p (const char *user_filename) +{ + if (user_filename) + return !strncmp (stderr, user_filename, 6) || + !strncmp (stdout, user_filename, 6); + else + return 0; +} The name is ambiguous. This function is testing whether its string argument designates one of the *standard* output streams. Name it to reflect that.. Have it take the dump state context. Also the coding convention: the binary operator || should be on next line. In fact the thing could be simpler. Instead of testing over and over again against stderr (once in this function, then again later), just return the corresponding standard FILE* pointer. Also, this is a case of overuse of strncmp. If you name the function dump_get_standard_stream: return strcmp(stderr, dfi-user_filename) == 0 ? stderr : stdcmp(stdout, dfi-use_filename) ? stdout : NULL; you can simplify: - name = get_dump_file_name (phase); dfi = get_dump_file_info (phase); - stream = fopen (name, dfi-state 0 ? w : a); - if (!stream) - error (could not open dump file %qs: %m, name); + if (dump_stream_p (dfi-user_filename)) + { + if (!strncmp (stderr, dfi-user_filename, 6)) + stream = stderr; + else + if (!strncmp (stdout, dfi-user_filename, 6)) + stream = stdout; + else + error (unknown stream: %qs: %m, dfi-user_filename); + dfi-state = 1; + } else - dfi-state = 1; - free (name); + { + name = get_dump_file_name (phase); + stream = fopen (name, dfi-state 0 ? w : a); + if (!stream) + error (could not open dump file %qs: %m, name); + else + dfi-state = 1; + free (name); + } if (flag_ptr) *flag_ptr = dfi-flags; @@ -987,35 +1017,45 @@ dump_flag_name (int phase) dump_begin. */ void -dump_end (int phase ATTRIBUTE_UNUSED, FILE *stream) +dump_end (int phase, FILE *stream) { - fclose (stream); + struct dump_file_info *dfi = get_dump_file_info (phase); + if (!dump_stream_p (dfi-user_filename)) + fclose (stream); } /* Enable all tree dumps. Return number of enabled tree dumps. */ static int -dump_enable_all (int flags) +dump_enable_all (int flags, const char *user_filename) { int ir_dump_type = (flags (TDF_TREE | TDF_RTL | TDF_IPA)); int n = 0; size_t i; for (i = TDI_none + 1; i (size_t) TDI_end; i++) - if ((dump_files[i].flags ir_dump_type)) - { - dump_files[i].state = -1; - dump_files[i].flags |= flags; - n++; - } + { + if ((dump_files[i].flags ir_dump_type)) + { + dump_files[i].state = -1; + dump_files[i].flags |= flags; + n++; + } + if (user_filename) + dump_files[i].user_filename = user_filename; + } for (i = 0; i extra_dump_files_in_use; i++) - if ((extra_dump_files[i].flags ir_dump_type)) - { - extra_dump_files[i].state = -1; - extra_dump_files[i].flags |= flags; - n++; - } + { + if ((extra_dump_files[i].flags ir_dump_type)) + { + extra_dump_files[i].state = -1; + extra_dump_files[i].flags |= flags; + n++; + } + if (user_filename) + extra_dump_files[i].user_filename = user_filename; + } return n; } @@ -1037,7 +1077,7 @@ dump_switch_p_1 (const char *arg, struct dump_file if (!option_value) return 0; - if (*option_value *option_value != '-') + if (*option_value *option_value != '-' *option_value != '=') return 0; ptr = option_value; @@ -1052,17 +1092,30 @@ dump_switch_p_1 (const char *arg, struct dump_file while (*ptr == '-') ptr++; end_ptr = strchr (ptr, '-'); + if (!end_ptr) end_ptr = ptr + strlen (ptr); length = end_ptr - ptr; + if (*ptr == '=') + { + /* Interpret rest of the argument as a dump filename. The + user provided filename overrides generated dump names as + well as other command line filenames. */ + flags |= TDF_USER_FILENAME; + if (dfi-user_filename) + free (dfi-user_filename); + dfi-user_filename = xstrdup (ptr + 1); + break; + } + for (option_ptr = dump_options; option_ptr-name; option_ptr++) if (strlen (option_ptr-name) == length !memcmp (option_ptr-name, ptr, length)) - { - flags |= option_ptr-value; +
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: + if (!flag_eliminate_malloc) + mark_stmt_necessary (stmt, true); return; default:;
Re: [PATCH] Fix endless loop in forwprop (PR tree-optimization/53226)
On Tue, May 8, 2012 at 8:33 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! The attached testcase loops endlessly, using more and more memory. The problem is that the prev stmt iterator sometimes references stmts that remove_prop_source_from_use decides to remove, and since Michael's gimple seq changes that seems to be fatal. Fixed by not keeping an iterator, but instead marking stmts that don't need revisiting and restarting with first stmt that needs revisiting. This assumes that new stmts will have uid 0, but I believe that is the case. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Using the stmt UID looks odd as you are using it as flag - why not use one of the two stmt flags available to passes? Btw, the other possibility would be to simply change the various combiners to require them to update the iterator to point at the first statement that they have inserted. I have some TLC ideas in this area, let's see if I can get to them. Ok with using gimple_set_plf and friends. Thanks, Richard. 2012-05-08 Jakub Jelinek ja...@redhat.com PR tree-optimization/53226 * tree-ssa-forwprop.c (ssa_forward_propagate_and_combine): Remove prev and prev_initialized vars, gimple_set_uid (stmt, 0) before processing it and gimple_set_uid (stmt, 1) if it doesn't need to be revisited, look for earliest stmt with uid 0 if something changed. * gcc.c-torture/compile/pr53226.c: New test. --- gcc/tree-ssa-forwprop.c.jj 2012-05-03 08:35:52.0 +0200 +++ gcc/tree-ssa-forwprop.c 2012-05-08 18:10:19.662061709 +0200 @@ -2677,8 +2677,7 @@ ssa_forward_propagate_and_combine (void) FOR_EACH_BB (bb) { - gimple_stmt_iterator gsi, prev; - bool prev_initialized; + gimple_stmt_iterator gsi; /* Apply forward propagation to all stmts in the basic-block. Note we update GSI within the loop as necessary. */ @@ -2771,12 +2770,14 @@ ssa_forward_propagate_and_combine (void) /* Combine stmts with the stmts defining their operands. Note we update GSI within the loop as necessary. */ - prev_initialized = false; for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) { gimple stmt = gsi_stmt (gsi); bool changed = false; + /* Mark stmt as potentially needing revisiting. */ + gimple_set_uid (stmt, 0); + switch (gimple_code (stmt)) { case GIMPLE_ASSIGN: @@ -2856,18 +2857,18 @@ ssa_forward_propagate_and_combine (void) { /* If the stmt changed then re-visit it and the statements inserted before it. */ - if (!prev_initialized) + for (; !gsi_end_p (gsi); gsi_prev (gsi)) + if (gimple_uid (gsi_stmt (gsi))) + break; + if (gsi_end_p (gsi)) gsi = gsi_start_bb (bb); else - { - gsi = prev; - gsi_next (gsi); - } + gsi_next (gsi); } else { - prev = gsi; - prev_initialized = true; + /* Stmt no longer needs to be revisited. */ + gimple_set_uid (stmt, 1); gsi_next (gsi); } } --- gcc/testsuite/gcc.c-torture/compile/pr53226.c.jj 2012-05-08 18:07:40.007000510 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr53226.c 2012-05-08 18:07:35.071029578 +0200 @@ -0,0 +1,13 @@ +/* PR tree-optimization/53226 */ + +void +foo (unsigned long *x, char y, char z) +{ + int i; + for (i = y; i z; ++i) + { + unsigned long a = ((unsigned char) i) 63UL; + unsigned long b = 1ULL a; + *x |= b; + } +} Jakub
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 9, 2012 at 4:12 PM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? Malloc implementation may record some info to a global structure, and a program may use this free(malloc()) pair to simulate the real runs to get some data, such as peak memory requirement. Dehao Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: + if (!flag_eliminate_malloc) + mark_stmt_necessary (stmt, true); return; default:;
Re: Fix gcc.dg/lower-subreg-1.c failure
Hans-Peter Nilsson hans-peter.nils...@axis.com writes: From: Richard Sandiford rdsandif...@googlemail.com Date: Tue, 1 May 2012 16:46:38 +0200 To repeat: as things stand, very few targets define proper rtx costs for SET. IMHO it's wrong to start blaming targets when rtx_cost doesn't take the mode in account in the first place, for the default cost. (Well, except for the modes-tieable subreg special-case.) The targets where an operation in N * word_mode costs no more than one in word_mode, if there even is one, is a minority, let's adjust the defaults to that. I'll pass on approving or disapproving this patch, but for the record: a factor of word_mode seems a bit too simplistic. It's OK for moves and logic ops, but addition of multiword modes is a bit more complicated. Multiplication and division by multiword modes is even more so, of course. Of course, if the costs are defined properly and lower-subreg still makes the wrong choice, we need to look at why. By the way, regarding validity of rtx_cost calls: +++ gcc/lower-subreg.c 2012-05-01 09:46:48.473830772 +0100 +/* Return the cost of a CODE shift in mode MODE by OP1 bits, using the + rtxes in RTXES. SPEED_P selects between the speed and size cost. */ + +static int +shift_cost (bool speed_p, struct cost_rtxes *rtxes, enum rtx_code code, + enum machine_mode mode, int op1) +{ + PUT_MODE (rtxes-target, mode); + PUT_CODE (rtxes-shift, code); + PUT_MODE (rtxes-shift, mode); + PUT_MODE (rtxes-source, mode); + XEXP (rtxes-shift, 1) = GEN_INT (op1); + SET_SRC (rtxes-set) = rtxes-shift; + return insn_rtx_cost (rtxes-set, speed_p); +} + +/* For each X in the range [0, BITS_PER_WORD), set SPLITTING[X] + to true if it is profitable to split a double-word CODE shift + of X + BITS_PER_WORD bits. SPEED_P says whether we are testing + for speed or size profitability. + + Use the rtxes in RTXES to calculate costs. WORD_MOVE_ZERO_COST is + the cost of moving zero into a word-mode register. WORD_MOVE_COST + is the cost of moving between word registers. */ + +static void +compute_splitting_shift (bool speed_p, struct cost_rtxes *rtxes, +bool *splitting, enum rtx_code code, +int word_move_zero_cost, int word_move_cost) +{ I think there should be a gating check whether the target implements that kind of shift in that mode at all, before checking the cost. Not sure whether it's generally best to put that test here, or to make the rtx_cost function return the cost of a libcall for that mode when that happens. Similar for the other insns. This has come up a few times in past discussions about rtx_cost (as I'm sure you remember :-)). On the one hand, I agree it might be nice to shield the backend from invalid insns. That would effectively mean calling expand on each insn though, which would be rather expensive. Especially in a GC world. It would also prevent the target from being able to take things like pipeline characteristics into account. E.g. if you know that something takes 2 operations that must be issued sequentially, you might want to add in an extra (sub-COSTS_N_INSN (1)) cost compared to 2 operations that can be issued together, even if the individual operations take the same number of cycles. The same goes for multiplication in general. On some targets the speed of a multiplication can depend on the second operand, so knowing its value is useful even if the target doesn't have a multiplication instruction that takes constant operands. As things stand, rtx_cost is intentionally used for more than just valid target insns. One of the main uses has always been to decide whether it is better to implement multiplications by a shift-and-add sequence, or whether to use multiplication instructions. The associated expmed code has never tried to decide which shifts are actually representable as matching rtl insns and which aren't. The same goes for division-using-multiplication. So I think this patch is using rtx_cost according to its current interface. If someone wants to change or restrict that interface, than that's a separate change IMO. And it should be done consistently rather than in this one place. In this case it doesn't matter anyway. If we never see a shift in mode M by amount X, we'll never need to make a decision about whether to split it. Isn't the below better than doing virtually the same in each target's rtx_costs? FWIW, MIPS, SH and x86 all used something slightly different (and more complicated). I imagine PowerPC and SPARC will too. So each seems a bit strong. That's not an objection to the patch though. I realise some ports do have very regular architectures where every register is the same width and has the same move cost. Richard
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 9, 2012 at 10:38 AM, Dehao Chen de...@google.com wrote: On Wed, May 9, 2012 at 4:12 PM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? Malloc implementation may record some info to a global structure, and a program may use this free(malloc()) pair to simulate the real runs to get some data, such as peak memory requirement. So why not use an alternate interface into this special allocator for this purpose? Dehao Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: + if (!flag_eliminate_malloc) + mark_stmt_necessary (stmt, true); return; default:;
Re: [PATCH] MIPS16: Remove DWARF-2 location information from GP accesses
On Tue, 8 May 2012, Richard Sandiford wrote: gcc-mips16-gp-pseudo-loc.patch Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.c === --- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.c 2012-05-02 23:42:46.185566469 +0100 +++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.c 2012-05-03 18:55:28.775580939 +0100 @@ -2622,7 +2622,8 @@ mips16_gp_pseudo_reg (void) scan = NEXT_INSN (scan); insn = gen_load_const_gp (cfun-machine-mips16_gp_pseudo_rtx); - emit_insn_after (insn, scan); + insn = emit_insn_after (insn, scan); + INSN_LOCATOR (insn) = 0; pop_topmost_sequence (); } An alternative would be to use prologue_locator, like ARM does. Is this instruction guaranteed to be emitted once per function only? I'm not sure whether that's an improvement though, so the patch is OK as-is, thanks. Applied now, thanks. Maciej
Re: [PATCH] MIPS16: Remove DWARF-2 location information from GP accesses
Maciej W. Rozycki ma...@codesourcery.com writes: On Tue, 8 May 2012, Richard Sandiford wrote: gcc-mips16-gp-pseudo-loc.patch Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.c === --- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.c2012-05-02 23:42:46.185566469 +0100 +++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.c 2012-05-03 18:55:28.775580939 +0100 @@ -2622,7 +2622,8 @@ mips16_gp_pseudo_reg (void) scan = NEXT_INSN (scan); insn = gen_load_const_gp (cfun-machine-mips16_gp_pseudo_rtx); - emit_insn_after (insn, scan); + insn = emit_insn_after (insn, scan); + INSN_LOCATOR (insn) = 0; pop_topmost_sequence (); } An alternative would be to use prologue_locator, like ARM does. Is this instruction guaranteed to be emitted once per function only? Yes.
Re: [committed] Fix lower-subreg cost calculation
On 08/05/12 22:42, Richard Sandiford wrote: Richard Earnshaw rearn...@arm.com writes: FTR, this caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53278 Well, this really has been a brown-paper-bag patch. Fixed as below. Tested on x86_64-linux-gnu and applied as obvious. Richard gcc/ PR rtl-optimization/53278 * lower-subreg.c (decompose_multiword_subregs): Remove left-over speed_p code from earlier patch. Index: gcc/lower-subreg.c === --- gcc/lower-subreg.c2012-05-08 19:45:31.0 +0100 +++ gcc/lower-subreg.c2012-05-08 19:45:31.793855523 +0100 @@ -1487,9 +1487,7 @@ decompose_multiword_subregs (void) FOR_EACH_BB (bb) { rtx insn; - bool speed_p; - speed_p = optimize_bb_for_speed_p (bb); FOR_BB_INSNS (bb, insn) { rtx pat; Which begs the question as to why -wshadow didn't pick this up... R.
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 9, 2012 at 5:22 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 9, 2012 at 10:38 AM, Dehao Chen de...@google.com wrote: On Wed, May 9, 2012 at 4:12 PM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? Malloc implementation may record some info to a global structure, and a program may use this free(malloc()) pair to simulate the real runs to get some data, such as peak memory requirement. So why not use an alternate interface into this special allocator for this purpose? There can be the following scenario: We want to add a module to an existing app. Before implementing the module, we want to collect some statistics on real runs. In this scenario, we need: * No change to the legacy code * Optimized build for the simulation run * Provide accurate statistical info We want to collect data for both new module and the legacy code without changing the later, thus we cannot use a new malloc/free interface. Thanks, Dehao Dehao Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: + if (!flag_eliminate_malloc) + mark_stmt_necessary (stmt,
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 9, 2012 at 11:48 AM, Dehao Chen de...@google.com wrote: On Wed, May 9, 2012 at 5:22 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 9, 2012 at 10:38 AM, Dehao Chen de...@google.com wrote: On Wed, May 9, 2012 at 4:12 PM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? Malloc implementation may record some info to a global structure, and a program may use this free(malloc()) pair to simulate the real runs to get some data, such as peak memory requirement. So why not use an alternate interface into this special allocator for this purpose? There can be the following scenario: We want to add a module to an existing app. Before implementing the module, we want to collect some statistics on real runs. In this scenario, we need: * No change to the legacy code * Optimized build for the simulation run * Provide accurate statistical info We want to collect data for both new module and the legacy code without changing the later, thus we cannot use a new malloc/free interface. So put in a optimization barrier then. Like free (({ void * x = malloc (4); __asm ( : +m (x)); __x; }); Btw, why can't you simply build the new module (which doesn't something real anyway, just fake stuff) without optimization? Thanks, Dehao Dehao Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 09, 2012 at 05:48:25PM +0800, Dehao Chen wrote: So why not use an alternate interface into this special allocator for this purpose? There can be the following scenario: We want to add a module to an existing app. Before implementing the module, we want to collect some statistics on real runs. In this scenario, we need: * No change to the legacy code * Optimized build for the simulation run * Provide accurate statistical info We want to collect data for both new module and the legacy code without changing the later, thus we cannot use a new malloc/free interface. Note that even in the glibc testsuite there had to be workarounds for this (asm barrier was used): http://sources.redhat.com/ml/libc-alpha/2012-05/msg00138.html Jakub
[PATCH] ARM/NEON: vld1q_dup_s64 builtin
Hello, On ARM+Neon, the expansion of vld1q_dup_s64() and vld1q_dup_u64() builtins currently fails to load the second vector element. Here is a small patch to address this problem: 2012-05-07 Christophe Lyon christophe.l...@st.com * gcc/config/arm/neon.md (neon_vld1_dup): Fix vld1q_dup_s64. Index: gcc/config/arm/neon.md === --- gcc/config/arm/neon.md(revision 2659) +++ gcc/config/arm/neon.md(revision 2660) @@ -4203,7 +4203,7 @@ if (GET_MODE_NUNITS (MODEmode) 2) return vld1.V_sz_elem\t{%e0[], %f0[]}, %A1; else -return vld1.V_sz_elem\t%h0, %A1; +return vld1.V_sz_elem\t%e0, %A1 \;vmov\t%f0, %e0; } [(set (attr neon_type) (if_then_else (gt (const_string V_mode_nunits) (const_string 1)) OK? Thanks, Christophe.
Backport: fma3 instruction generation for 'march=native' in AMD processors
Hello, Below is the patch that has been committed in trunk (Revision: 187075). We like to backport it to GCC 4.7 branch as couple of AMD processors require this change for fma3 instruction generation. Bootstrapping and testing are successful. Is it OK to commit in GCC 4.7 branch? Regards Ganesh PATCH = * config/i386/driver-i386.c (host_detect_local_cpu): Reset has_fma4 for AMD processors with both fma3 and fma4 support. Index: config/i386/driver-i386.c === --- config/i386/driver-i386.c (revision 186897) +++ config/i386/driver-i386.c (working copy) @@ -472,6 +472,8 @@ has_abm = ecx bit_ABM; has_lwp = ecx bit_LWP; has_fma4 = ecx bit_FMA4; + if (vendor == SIG_AMD has_fma4 has_fma) + has_fma4 = 0; has_xop = ecx bit_XOP; has_tbm = ecx bit_TBM; has_lzcnt = ecx bit_LZCNT;
Re: [RFC] improve caret diagnostics for overload failures
Someone opened a bug about this: http://gcc.gnu.org/PR53289 Pinging: http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01836.html On 29 April 2012 12:28, Manuel López-Ibáñez lopeziba...@gmail.com wrote: A new version using unsigned int for the flag type. It also adds another use in the C FE. I am not asking for approval, only whether this approach/implementation is the way to go. Cheers, Manuel. On 23 April 2012 20:09, Manuel López-Ibáñez lopeziba...@gmail.com wrote: So, apart from the type of the flag, are there any other comments on the patch? Is the approach acceptable? On 21 April 2012 17:51, Gabriel Dos Reis g...@integrable-solutions.net wrote: On Sat, Apr 21, 2012 at 9:42 AM, Jakub Jelinek ja...@redhat.com wrote: On Sat, Apr 21, 2012 at 04:26:32PM +0200, Manuel López-Ibáñez wrote: On 21 April 2012 16:22, Gabriel Dos Reis g...@integrable-solutions.net wrote: Do no use 'char' as the type of a flag. Prefer 'unsigned int'. Thanks, good catch! Should I worry about memory here and use something shorter? If it is a bool flag, you certainly should use bool type, which is shorter. It is a bit flag -- see the patch in his original message and 'enum diagnostic_info_flags'.
Re: [testsuite] Fix gcc.target/i386/hle-* testcases with Sun as
Hi Mike, On May 8, 2012, at 10:19 AM, Rainer Orth wrote: Several /gcc.target/i386/hle-*.c tests are currently failing on Solaris 9/x86 with Sun as: FAIL: gcc.target/i386/hle-add-acq-1.c scan-assembler lock[ \\n\\t]+(xacquire|.byte[ \\t]+0xf2)[ \\t\\n]+add The .s file has lock; .byte 0xf2 but the scan-assembler regex currently doesn't allow for the ; (which is not present with gas 2.22). The patch below does just that. Tested with the appropriate runtest invocation on i386-pc-solaris2.9 configured with as and gas respectively. Ok for mainline? Ok, assuming that the ; has to be there. If it doesn't have to be I just rechecked: this is covered by the gcc_cv_as_ix86_rep_lock_prefix test in gcc/configure.as which fails with the Solaris (8 and) 9 native assembler. there, an alternative patch might be to remove it from the port now instead of the patch. Right, that's why I was asking for review rather than just installing on my own. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Vectorizer versioning TLC
This continues the series of patches cleaning up versioning / peeling in the vectorizer. This moves computation of whether to do a cost model check to a central place where it is also easy to see where it will eventually end up generated. Apart from this re-org this recognizes that runtime checks are pointless if we could have done the check fully at compile-time. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2012-05-09 Richard Guenther rguent...@suse.de * tree-vectorizer.h (vect_loop_versioning): Adjust prototype. (vect_do_peeling_for_loop_bound): Likewise. (vect_do_peeling_for_alignment): Likewise. * tree-vect-loop-manip.c (conservative_cost_threshold): Remove. (vect_do_peeling_for_loop_bound): Get check_profitability and threshold as parameters. (vect_do_peeling_for_alignment): Likewise. (vect_loop_versioning): Likewise. * tree-vect-loop.c (vect_transform_loop): Compute check_profitability and threshold here. Control where to put the check here. Index: gcc/tree-vectorizer.h === *** gcc/tree-vectorizer.h (revision 187316) --- gcc/tree-vectorizer.h (working copy) *** extern LOC vect_loop_location; *** 807,816 in tree-vect-loop-manip.c. */ extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); ! extern void vect_loop_versioning (loop_vec_info); extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *, ! tree, gimple_seq); ! extern void vect_do_peeling_for_alignment (loop_vec_info); extern LOC find_loop_location (struct loop *); extern bool vect_can_advance_ivs_p (loop_vec_info); --- 807,816 in tree-vect-loop-manip.c. */ extern void slpeel_make_loop_iterate_ntimes (struct loop *, tree); extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge); ! extern void vect_loop_versioning (loop_vec_info, unsigned int, bool); extern void vect_do_peeling_for_loop_bound (loop_vec_info, tree *, ! unsigned int, bool); ! extern void vect_do_peeling_for_alignment (loop_vec_info, unsigned int, bool); extern LOC find_loop_location (struct loop *); extern bool vect_can_advance_ivs_p (loop_vec_info); Index: gcc/tree-vect-loop-manip.c === *** gcc/tree-vect-loop-manip.c (revision 187316) --- gcc/tree-vect-loop-manip.c (working copy) *** vect_update_ivs_after_vectorizer (loop_v *** 1853,1886 } } - /* Return the more conservative threshold between the -min_profitable_iters returned by the cost model and the user -specified threshold, if provided. */ - - static unsigned int - conservative_cost_threshold (loop_vec_info loop_vinfo, -int min_profitable_iters) - { - unsigned int th; - int min_scalar_loop_bound; - - min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND) - * LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 1); - - /* Use the cost model only if it is more conservative than user specified - threshold. */ - th = (unsigned) min_scalar_loop_bound; - if (min_profitable_iters -(!min_scalar_loop_bound - || min_profitable_iters min_scalar_loop_bound)) - th = (unsigned) min_profitable_iters; - - if (th vect_print_dump_info (REPORT_COST)) - fprintf (vect_dump, Profitability threshold is %u loop iterations., th); - - return th; - } - /* Function vect_do_peeling_for_loop_bound Peel the last iterations of the loop represented by LOOP_VINFO. --- 1853,1858 *** conservative_cost_threshold (loop_vec_in *** 1896,1902 void vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio, ! tree cond_expr, gimple_seq cond_expr_stmt_list) { tree ni_name, ratio_mult_vf_name; struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); --- 1868,1874 void vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio, ! unsigned int th, bool check_profitability) { tree ni_name, ratio_mult_vf_name; struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); *** vect_do_peeling_for_loop_bound (loop_vec *** 1904,1913 edge update_e; basic_block preheader; int loop_num; - bool check_profitability = false; - unsigned int th = 0; - int min_profitable_iters; int max_iter; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, === vect_do_peeling_for_loop_bound ===); --- 1876,1884 edge update_e; basic_block preheader; int loop_num; int max_iter; + tree cond_expr = NULL_TREE; + gimple_seq
Ping: [PATCH] Add -fdump-rtl-pass-quiet
Ping. On 02/05/12 10:49, Andrew Stubbs wrote: On 19/04/12 13:58, Andrew Stubbs wrote: In the meantime, Mr Maintainers, can I commit my patch while we wait for the new world order? I'm happy to change the option name quiet to something else if necessary. Ping. I think David may have a point about 'quiet' being an inappropriate name for this option. He proposed 'ir' which I don't like but will do the job just fine. Ok, with that change? Andrew
Re: Backport: fma3 instruction generation for 'march=native' in AMD processors
Hello! Below is the patch that has been committed in trunk (Revision: 187075). We like to backport it to GCC 4.7 branch as couple of AMD processors require this change for fma3 instruction generation. Is it OK to commit in GCC 4.7 branch? * config/i386/driver-i386.c (host_detect_local_cpu): Reset has_fma4 for AMD processors with both fma3 and fma4 support. OK. Thanks, Uros.
Re: [testsuite] Fix gcc.target/i386/hle-* testcases with Sun as
On Wed, May 9, 2012 at 12:53 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Several /gcc.target/i386/hle-*.c tests are currently failing on Solaris 9/x86 with Sun as: FAIL: gcc.target/i386/hle-add-acq-1.c scan-assembler lock[ \\n\\t]+(xacquire|.byte[ \\t]+0xf2)[ \\t\\n]+add The .s file has lock; .byte 0xf2 but the scan-assembler regex currently doesn't allow for the ; (which is not present with gas 2.22). The patch below does just that. Tested with the appropriate runtest invocation on i386-pc-solaris2.9 configured with as and gas respectively. Ok for mainline? Ok, assuming that the ; has to be there. If it doesn't have to be I just rechecked: this is covered by the gcc_cv_as_ix86_rep_lock_prefix test in gcc/configure.as which fails with the Solaris (8 and) 9 native assembler. there, an alternative patch might be to remove it from the port now instead of the patch. Right, that's why I was asking for review rather than just installing on my own. I'd rather see that we remove semicolon in this case, but please note that xchg with memory operand doesn't need lock prefix. If it is not too much trouble... Uros.
RE: Propose to add a new argument MULTILIB_REQUIRED in multilib framework
-Original Message- From: Joseph Myers [mailto:jos...@codesourcery.com] Sent: Thursday, May 03, 2012 8:49 PM To: Terry Guo Cc: gcc-patches@gcc.gnu.org Subject: RE: Propose to add a new argument MULTILIB_REQUIRED in multilib framework On Thu, 3 May 2012, Terry Guo wrote: 2012-05-03 Terry Guo terry@arm.com * Makefile.in (s-mlib): Add new argument MULTILIB_REQUIRED. * genmultilib (MULTILIB_REQUIRED): New. * doc/fragments.texi: Document the MULTILIB_REQUIRED. This is OK, with copyright dates on genmultilib and fragments.texi updated, in the absence of any build system maintainer objections within 72 hours. Committed into trunk with this patch at revision 187325 and copyright patch at revision 187327. BR, Terry
Re: [C++ Patch] PR 53158
Hi again, On 05/08/2012 03:00 PM, Jason Merrill wrote: On 05/07/2012 11:28 PM, Paolo Carlini wrote: error: could not convert ‘b.main()::lambda()()’ from ‘void’ to ‘bool’ It wouldn't say operator()? I think I'd leave that alone; it is somewhat more informative (about what b() expands to) and we're moving toward replacing %qE with caret anyway. The patch to ocp_convert is OK. As you may have noticed, the patchlet is still unapplied (I'm attaching below what I have tested and ready to get in per your approval). Is unapplied because I was really nervous due to the wrong location (thus caret) of the error call, at the end of the whole condition. Now, I'm wondering, shall we consistently use error_at (location_of (expr), ... for the error messages produced by the *convert* functions? The below quick fix makes me *much* more happy, the caret points to the closed round brace of the b() call. Can I trust all the exprs to come with an embedded location *at least* as accurate as input_location, normally better? In case I can do a pass through all of cvt.c etc and repost a largish patch... Thanks! Paolo. Index: testsuite/g++.dg/cpp0x/lambda/lambda-err2.C === --- testsuite/g++.dg/cpp0x/lambda/lambda-err2.C (revision 0) +++ testsuite/g++.dg/cpp0x/lambda/lambda-err2.C (revision 0) @@ -0,0 +1,12 @@ +// PR c++/53158 +// { dg-do compile { target c++11 } } + +int main() +{ + auto a = []() { return true; }; + auto b = []() { return a(); }; // { dg-error 'a' is not captured } + int c, d; + while (b() c d) // { dg-error could not convert } +{ +} +} Index: cp/cvt.c === --- cp/cvt.c(revision 187280) +++ cp/cvt.c(working copy) @@ -743,6 +743,12 @@ ocp_convert (tree type, tree expr, int convtype, i } if (code == BOOLEAN_TYPE) { + if (TREE_CODE (intype) == VOID_TYPE) + { + error (could not convert %qE from %void% to %bool%, expr); + return error_mark_node; + } + /* We can't implicitly convert a scoped enum to bool, so convert to the underlying type first. */ if (SCOPED_ENUM_P (intype) (convtype CONV_STATIC)) Index: cvt.c === --- cvt.c (revision 187290) +++ cvt.c (working copy) @@ -743,6 +743,14 @@ ocp_convert (tree type, tree expr, int convtype, i } if (code == BOOLEAN_TYPE) { + if (TREE_CODE (intype) == VOID_TYPE) + { + error_at (location_of (expr), + could not convert %qE from %void% to %bool%, + expr); + return error_mark_node; + } + /* We can't implicitly convert a scoped enum to bool, so convert to the underlying type first. */ if (SCOPED_ENUM_P (intype) (convtype CONV_STATIC))
Re: [PATCH] Fix endless loop in forwprop (PR tree-optimization/53226)
On Wed, May 09, 2012 at 10:21:04AM +0200, Richard Guenther wrote: Using the stmt UID looks odd as you are using it as flag - why not use one of the two stmt flags available to passes? Btw, the other possibility would be to simply change the various combiners to require them to update the iterator to point at the first statement that they have inserted. I have some TLC ideas in this area, let's see if I can get to them. Ok with using gimple_set_plf and friends. Here is what I've committed after doing another bootstrap/regtest: 2012-05-09 Jakub Jelinek ja...@redhat.com PR tree-optimization/53226 * tree-ssa-forwprop.c (ssa_forward_propagate_and_combine): Remove prev and prev_initialized vars, gimple_set_plf (stmt, GF_PLF_1, false) before processing it and gimple_set_plf (stmt, GF_PLF_1, true) if it doesn't need to be revisited, look for earliest stmt with !gimple_plf (stmt, GF_PLF_1) if something changed. * gcc.c-torture/compile/pr53226.c: New test. --- gcc/tree-ssa-forwprop.c.jj 2012-05-03 08:35:52.0 +0200 +++ gcc/tree-ssa-forwprop.c 2012-05-08 18:10:19.662061709 +0200 @@ -2677,8 +2677,7 @@ ssa_forward_propagate_and_combine (void) FOR_EACH_BB (bb) { - gimple_stmt_iterator gsi, prev; - bool prev_initialized; + gimple_stmt_iterator gsi; /* Apply forward propagation to all stmts in the basic-block. Note we update GSI within the loop as necessary. */ @@ -2771,12 +2770,14 @@ ssa_forward_propagate_and_combine (void) /* Combine stmts with the stmts defining their operands. Note we update GSI within the loop as necessary. */ - prev_initialized = false; for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) { gimple stmt = gsi_stmt (gsi); bool changed = false; + /* Mark stmt as potentially needing revisiting. */ + gimple_set_plf (stmt, GF_PLF_1, false); + switch (gimple_code (stmt)) { case GIMPLE_ASSIGN: @@ -2856,18 +2857,18 @@ ssa_forward_propagate_and_combine (void) { /* If the stmt changed then re-visit it and the statements inserted before it. */ - if (!prev_initialized) + for (; !gsi_end_p (gsi); gsi_prev (gsi)) + if (gimple_plf (gsi_stmt (gsi), GF_PLF_1)) + break; + if (gsi_end_p (gsi)) gsi = gsi_start_bb (bb); else - { - gsi = prev; - gsi_next (gsi); - } + gsi_next (gsi); } else { - prev = gsi; - prev_initialized = true; + /* Stmt no longer needs to be revisited. */ + gimple_set_plf (stmt, GF_PLF_1, true); gsi_next (gsi); } } --- gcc/testsuite/gcc.c-torture/compile/pr53226.c.jj2012-05-08 18:07:40.007000510 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr53226.c 2012-05-08 18:07:35.071029578 +0200 @@ -0,0 +1,13 @@ +/* PR tree-optimization/53226 */ + +void +foo (unsigned long *x, char y, char z) +{ + int i; + for (i = y; i z; ++i) +{ + unsigned long a = ((unsigned char) i) 63UL; + unsigned long b = 1ULL a; + *x |= b; +} +} Jakub
Fix PR53185 (vectorizer segfault)
Hi, the current code for strided loads can't deal with the situation when a prologue loop (peeling for alignment) is created after analyzing the data refs. There are multiple issues (non-constant steps in DRs mainly), so this is a simple stop gap. Regtesting on x86_64-linux (all langs) in progress. Okay for trunk? Ciao, Michael. PR tree-optimization/53185 * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Disable peeling when we see strided loads. testsuite/ * gcc.dg/vect/pr53185.c: New test. Index: tree-vect-data-refs.c === --- tree-vect-data-refs.c (revision 187287) +++ tree-vect-data-refs.c (working copy) @@ -1507,6 +1507,17 @@ vect_enhance_data_refs_alignment (loop_v GROUP_FIRST_ELEMENT (stmt_info) != stmt) continue; + /* FORNOW: Any strided load prevents peeling. The induction + variable analysis will fail when the prologue loop is generated, +and so we can't generate the new base for the pointer. */ + if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, strided load prevents peeling); + do_peeling = false; + break; + } + /* For invariant accesses there is nothing to enhance. */ if (integer_zerop (DR_STEP (dr))) continue; Index: testsuite/gcc.dg/vect/pr53185.c === --- testsuite/gcc.dg/vect/pr53185.c (revision 0) +++ testsuite/gcc.dg/vect/pr53185.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ftree-vectorize } */ +unsigned short a, e; +int *b, *d; +int c; +extern int fn2(); +void fn1 () { + void *f; + for (;;) { +fn2 (); +b = f; +e = 0; +for (; e a; ++e) + b[e] = d[e * c]; + } +}
Re: PR 40752: -Wconversion generates false warnings for operands not larger than target type
On 7 May 2012 16:52, Paolo Carlini paolo.carl...@oracle.com wrote: Hi, On 06/11/2010 10:23 PM, Manuel López-Ibáñez wrote: On 11 August 2009 02:01, Joseph S. Myersjos...@codesourcery.com wrote: On Tue, 11 Aug 2009, Manuel López-Ibáñez wrote: Modified the patch to make use of the new c-c++-common testsuite. Bootstrapped and regression tested on x86_64-linux-gnu. OK for trunk? I still think the warnings for these cases are mostly correct (there are cases where you may be able to make deductions about the range of possible values of the expression being converted) and appropriate, and if disabled should be disabled under some separate -Wno-conversion-whatever option. -Wno-conversion-after-promotion ? I am not sure what would be a good name, but I think it is worth to allow silencing these warnings, so suggestions are welcome. it looks like this PR is still open today, and I think resolving it one way or the other isn't much work... Thus I'm asking: shall we actually have a new -Wno-conversion-after-promotion? Or, alternately - the issue came up quite often in recent times, because other widespread compilers don't warn - suppress the warning only in case of conditional expressions: char foo(bool haveBar, char bar_) { return haveBar ? bar_ : 0; }; What do you think? I think Joseph has said before that if the range can be deduced to stay within limits of the target type, then we should never warn, no new option needed. This is exactly the case if you recurse only on conditional expression operands and casts, and it will fix the warning above. The other cases should be deal separately. Cheers, Manuel.
Re: [RFC ivopts] ARM - Make ivopts take into account whether pre and post increments are actually supported on targets.
I would like another set of eyes on the backend specific changes - I am currently regression testing this final version on FSF trunk. After testing and benchmarking and getting some private feedback about the patch, this is what I ended up committing. I have a follow up patch coming to adjust legitimate_address and friends for some of these modes. regards, Ramana 2012-05-09 Ramana Radhakrishnan ramana.radhakrish...@linaro.org * tree-ssa-loop-ivopts.c (add_autoinc_candidates, get_address_cost): Replace use of HAVE_{POST/PRE}_{INCREMENT/DECREMENT} with USE_{LOAD/STORE}_{PRE/POST}_{INCREMENT/DECREMENT} appropriately. * config/arm/arm.h (ARM_AUTOINC_VALID_FOR_MODE_P): New. (USE_LOAD_POST_INCREMENT): Define. (USE_LOAD_PRE_INCREMENT): Define. (USE_LOAD_POST_DECREMENT): Define. (USE_LOAD_PRE_DECREMENT): Define. (USE_STORE_PRE_DECREMENT): Define. (USE_STORE_PRE_INCREMENT): Define. (USE_STORE_POST_DECREMENT): Define. (USE_STORE_POST_INCREMENT): Define. (arm_auto_incmodes): Add enumeration. * config/arm/arm-protos.h (arm_autoinc_modes_ok_p): Declare. * config/arm/arm.c (arm_autoinc_modes_ok_p): Define. 2012-04-10 Ramana Radhakrishnan ramana.radhakrish...@linaro.org * tree-ssa-loop-ivopts.c (add_autoinc_candidates, get_address_cost): Replace use of HAVE_{POST/PRE}_{INCREMENT/DECREMENT} with USE_{LOAD/STORE}_{PRE/POST}_{INCREMENT/DECREMENT} appropriately. * config/arm/arm.h (ARM_AUTOINC_VALID_FOR_MODE_P): New. (USE_LOAD_POST_INCREMENT): Define. (USE_LOAD_PRE_INCREMENT): Define. (USE_LOAD_POST_DECREMENT): Define. (USE_LOAD_PRE_DECREMENT): Define. (USE_STORE_PRE_DECREMENT): Define. (USE_STORE_PRE_INCREMENT): Define. (USE_STORE_POST_DECREMENT): Define. (USE_STORE_POST_INCREMENT): Define. (arm_auto_incmodes): Add enumeration. * config/arm/arm-protos.h (arm_autoinc_modes_ok_p): Declare. * config/arm/arm.c (arm_autoinc_modes_ok_p): Define. (arm_rtx_costs_1): Adjust costs for auto-inc modes and pre / post modify in floating point mode. (arm_size_rtx_costs): Likewise. regards, Ramana Richard. Richard. Ramana Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 187327) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -2362,8 +2362,12 @@ cstepi = int_cst_value (step); mem_mode = TYPE_MODE (TREE_TYPE (*use-op_p)); - if ((HAVE_PRE_INCREMENT GET_MODE_SIZE (mem_mode) == cstepi) - || (HAVE_PRE_DECREMENT GET_MODE_SIZE (mem_mode) == -cstepi)) + if (((USE_LOAD_PRE_INCREMENT (mem_mode) + || USE_STORE_PRE_INCREMENT (mem_mode)) +GET_MODE_SIZE (mem_mode) == cstepi) + || ((USE_LOAD_PRE_DECREMENT (mem_mode) + || USE_STORE_PRE_DECREMENT (mem_mode)) + GET_MODE_SIZE (mem_mode) == -cstepi)) { enum tree_code code = MINUS_EXPR; tree new_base; @@ -2380,8 +2384,12 @@ add_candidate_1 (data, new_base, step, important, IP_BEFORE_USE, use, use-stmt); } - if ((HAVE_POST_INCREMENT GET_MODE_SIZE (mem_mode) == cstepi) - || (HAVE_POST_DECREMENT GET_MODE_SIZE (mem_mode) == -cstepi)) + if (((USE_LOAD_POST_INCREMENT (mem_mode) + || USE_STORE_POST_INCREMENT (mem_mode)) +GET_MODE_SIZE (mem_mode) == cstepi) + || ((USE_LOAD_POST_DECREMENT (mem_mode) + || USE_STORE_POST_DECREMENT (mem_mode)) + GET_MODE_SIZE (mem_mode) == -cstepi)) { add_candidate_1 (data, base, step, important, IP_AFTER_USE, use, use-stmt); @@ -3315,25 +3323,29 @@ reg0 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1); reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 2); - if (HAVE_PRE_DECREMENT) + if (USE_LOAD_PRE_DECREMENT (mem_mode) + || USE_STORE_PRE_DECREMENT (mem_mode)) { addr = gen_rtx_PRE_DEC (address_mode, reg0); has_predec[mem_mode] = memory_address_addr_space_p (mem_mode, addr, as); } - if (HAVE_POST_DECREMENT) + if (USE_LOAD_POST_DECREMENT (mem_mode) + || USE_STORE_POST_DECREMENT (mem_mode)) { addr = gen_rtx_POST_DEC (address_mode, reg0); has_postdec[mem_mode] = memory_address_addr_space_p (mem_mode, addr, as); } - if (HAVE_PRE_INCREMENT) + if (USE_LOAD_PRE_INCREMENT (mem_mode) + || USE_STORE_PRE_DECREMENT (mem_mode)) { addr = gen_rtx_PRE_INC (address_mode, reg0); has_preinc[mem_mode] = memory_address_addr_space_p (mem_mode, addr, as); } - if (HAVE_POST_INCREMENT) + if (USE_LOAD_POST_INCREMENT (mem_mode) + || USE_STORE_POST_INCREMENT (mem_mode)) { addr =
[PATCH] Move unexecuted vect testcase
Committed as obvious (passes on x86_64). Richard. 2012-05-09 Richard Guenther rguent...@suse.de PR tree-optimization/18437 * gfortran.dg/vect/rnflow-trs2a2.f90: Move ... * gfortran.dg/vect/fast-math-rnflow-trs2a2.f90: ... here.
RFA: Fix detecting of in-tree MPFR 3.1.0 sources
Hi Guys, http://www.mpfr.org/mpfr-current/#changes The current release of the MPFR library (v3.1.0) has reorganized its sources such the mpfr.h header file is now in a sub-directory called 'src', rather than being at the top level. This has broken GCC's use of in-tree MPFR sources. I am asking for permission to apply the patch below to fix the problem. I tested it by building an i686-pc-linux-gnu toolchain on a machine with no MPFR libraries installed, but with a copy of the mpfr 3.1.0 sources installed in-tree. I also built a second toolchain with an in-tree copy of the mpfr 2.4.2 sources, just to make sure that the old paths still worked. Both builds worked. OK to apply ? Cheers Nick gcc/ChangeLog 2012-05-09 Nick Clifton ni...@redhat.com * configure.ac (mpfr-dir): When using in-tree MPFR sources allow for the fact that from release v3.1.0 of MPFR the source files were moved into a src sub-directory. * configure: Regenerate. Index: configure.ac === --- configure.ac(revision 187320) +++ configure.ac(working copy) @@ -1289,9 +1289,16 @@ gmplibs=-L$with_mpfr_lib $gmplibs fi if test x$with_mpfr$with_mpfr_include$with_mpfr_lib = x test -d ${srcdir}/mpfr; then - gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir $gmplibs - gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr -I$$s/mpfr '$gmpinc - extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir + # MPFR v3.1.0 moved the sources into a src sub-directory. + if test -d ${srcdir}/mpfr/src; then +gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/src/'$lt_cv_objdir $gmplibs +gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr/src -I$$s/mpfr/src '$gmpinc +extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr/src --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/src/'$lt_cv_objdir + else +gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir $gmplibs +gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr -I$$s/mpfr '$gmpinc +extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir + fi # Do not test the mpfr version. Assume that it is sufficient, since # it is in the source tree, and the library has not been built yet # but it would be included on the link line in the version check below
Re: [C++ Patch] PR 53158
On 05/09/2012 08:21 AM, Paolo Carlini wrote: Is unapplied because I was really nervous due to the wrong location (thus caret) of the error call, at the end of the whole condition. Now, I'm wondering, shall we consistently use error_at (location_of (expr), ... for the error messages produced by the *convert* functions? Sure. Note that as an alternative to error_at (location_of (expr) you can just use %q+E; the + means use the location of this argument. The below quick fix makes me *much* more happy, the caret points to the closed round brace of the b() call. Can I trust all the exprs to come with an embedded location *at least* as accurate as input_location, normally better? I would think so, yes. If an expression doesn't have a location, location_of/%q+E will just use input_location. Jason
Re: [RFC] improve caret diagnostics for overload failures
On 04/29/2012 06:28 AM, Manuel López-Ibáñez wrote: A new version using unsigned int for the flag type. It also adds another use in the C FE. I am not asking for approval, only whether this approach/implementation is the way to go. That looks good. I would still also adjust the caret printer to avoid printing the same caret twice in a row even when we haven't updated the caller. Jason
Re: [C++ Patch] PR 53158
On 9 May 2012 14:59, Jason Merrill ja...@redhat.com wrote: On 05/09/2012 08:21 AM, Paolo Carlini wrote: Is unapplied because I was really nervous due to the wrong location (thus caret) of the error call, at the end of the whole condition. Now, I'm wondering, shall we consistently use error_at (location_of (expr), ... for the error messages produced by the *convert* functions? Sure. Note that as an alternative to error_at (location_of (expr) you can just use %q+E; the + means use the location of this argument. This far less clear than error_at(location, whatever). And it requires the diagnostics machinery to know about input_location. I removed %H precisely for those reasons. Please, let's stop using + in diagnostics and always use explicit locations. Cheers, Manuel.
Re: [RFC] improve caret diagnostics for overload failures
On 9 May 2012 15:04, Jason Merrill ja...@redhat.com wrote: On 04/29/2012 06:28 AM, Manuel López-Ibáńez wrote: A new version using unsigned int for the flag type. It also adds another use in the C FE. I am not asking for approval, only whether this approach/implementation is the way to go. That looks good. I would still also adjust the caret printer to avoid printing the same caret twice in a row even when we haven't updated the caller. I could implement that by storing the last location in the diagnostic_context or using a static location_t in diagnostic_show_locus. What is your preference? Cheers, Manuel.
Re: [C++ Patch] PR 53158
Hi, On 05/09/2012 02:59 PM, Jason Merrill wrote: On 05/09/2012 08:21 AM, Paolo Carlini wrote: Is unapplied because I was really nervous due to the wrong location (thus caret) of the error call, at the end of the whole condition. Now, I'm wondering, shall we consistently use error_at (location_of (expr), ... for the error messages produced by the *convert* functions? Sure. Note that as an alternative to error_at (location_of (expr) you can just use %q+E; the + means use the location of this argument. A great, very concise, didn't know that (but maybe we want to make Manuel happier ;) The below quick fix makes me *much* more happy, the caret points to the closed round brace of the b() call. Can I trust all the exprs to come with an embedded location *at least* as accurate as input_location, normally better? I would think so, yes. If an expression doesn't have a location, location_of/%q+E will just use input_location. In the meanwhile I found a counterexample ;) Consider: enum E { e1 }; E e = e1; void bar(int a) { if (e = a) ; } right now the location for the invalid conversion from ‘int’ to ‘E’ error message seems pretty good, points to the a in if (e = a). If I do: Index: call.c === --- call.c (revision 187290) +++ call.c (working copy) @@ -5704,7 +5704,7 @@ convert_like_real (conversion *convs, tree expr, t break; } - permerror (input_location, invalid conversion from %qT to %qT, + permerror (location_of (expr), invalid conversion from %qT to %qT, TREE_TYPE (expr), totype); if (fn) permerror (DECL_SOURCE_LOCATION (fn), then an horrible thing happen: the error message points to the b in bar in void bar(int a) !!! What the heck has that to do with the actual conversion two lines below?!? Isn't even pointing to the a in void bar(int a)! So... I guess I could start experimenting with my idea, but we should be ready to fix regressions... Or we can already take from the counterexample a general lesson? Or maybe a subset of the conversion* functions, those in cvt.c?, seems first blush safer to tweak? Paolo.
Re: PATCH: Update longlong.h from GLIBC
On Tue, May 8, 2012 at 4:05 AM, Andreas Jaeger a...@suse.com wrote: On Tuesday, May 08, 2012 11:59:34 Richard Earnshaw wrote: On 08/05/12 10:04, Andreas Jaeger wrote: On Tuesday, May 08, 2012 10:43:14 Richard Guenther wrote: On Mon, May 7, 2012 at 11:11 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, I am preparing to update GLIBC longlong.h from GCC. This patch updates GCC longlong.h to use a URL instead of an FSF postal address and replace spaces with tab. OK to install? Since I'd like to simply copy longlong.h from GCC release branch to GLIBC, Is this also OK for 4.7 branch? Why? Does it fix anything there? It makes sharing the file between gcc and glibc easier, Andreas Why should glibc be depending on the GCC release branch? Sounds like the tail wagging the dog. Ah, you discuss the release branch ;) Let HJ defend that one. Changing this file has quite a high potential for introducing regressions. I don't think we should risk that on the release branch. It's only whitespace IMO. I'm arguing for the trunk to take the change, I will take whatever I can get. Ian, is this OK for trunk? Thanks. -- H.J.
Re: [C++ Patch] PR 53158
I think the problem is you really want to use EXPR_LOC_OR_HERE rather than location_of; if the argument happens to be a DECL, location_of will give you the declaration location rather than the use location. Jason
Re: [C++ Patch] PR 53158
On 05/09/2012 09:04 AM, Manuel López-Ibáñez wrote: This far less clear than error_at(location, whatever). And it requires the diagnostics machinery to know about input_location. I removed %H precisely for those reasons. Please, let's stop using + in diagnostics and always use explicit locations. OK. Jason
Re: [RFC] improve caret diagnostics for overload failures
On 05/09/2012 09:07 AM, Manuel López-Ibáñez wrote: I could implement that by storing the last location in the diagnostic_context or using a static location_t in diagnostic_show_locus. What is your preference? diagnostic_context, I guess. Jason
Re: PR 40752: -Wconversion generates false warnings for operands not larger than target type
Hi, On 05/09/2012 02:54 PM, Manuel López-Ibáñez wrote: On 7 May 2012 16:52, Paolo Carlinipaolo.carl...@oracle.com wrote: Hi, On 06/11/2010 10:23 PM, Manuel López-Ibáñez wrote: On 11 August 2009 02:01, Joseph S. Myersjos...@codesourcery.comwrote: On Tue, 11 Aug 2009, Manuel López-Ibáñez wrote: Modified the patch to make use of the new c-c++-common testsuite. Bootstrapped and regression tested on x86_64-linux-gnu. OK for trunk? I still think the warnings for these cases are mostly correct (there are cases where you may be able to make deductions about the range of possible values of the expression being converted) and appropriate, and if disabled should be disabled under some separate -Wno-conversion-whatever option. -Wno-conversion-after-promotion ? I am not sure what would be a good name, but I think it is worth to allow silencing these warnings, so suggestions are welcome. it looks like this PR is still open today, and I think resolving it one way or the other isn't much work... Thus I'm asking: shall we actually have a new -Wno-conversion-after-promotion? Or, alternately - the issue came up quite often in recent times, because other widespread compilers don't warn - suppress the warning only in case of conditional expressions: char foo(bool haveBar, char bar_) { return haveBar ? bar_ : 0; }; What do you think? I think Joseph has said before that if the range can be deduced to stay within limits of the target type, then we should never warn, no new option needed. This is exactly the case if you recurse only on conditional expression operands and casts, and it will fix the warning above. The other cases should be deal separately. Good, I even have a patch essentially ready for that, but can you point me to that detailed feedback? I couldn't find it when I looked for it. Thanks Paolo.
Re: [C++ Patch] PR 53158
On 05/09/2012 03:20 PM, Jason Merrill wrote: I think the problem is you really want to use EXPR_LOC_OR_HERE rather than location_of; if the argument happens to be a DECL, location_of will give you the declaration location rather than the use location. Ah! Thanks a lot, now all those small details I always see in the diagnostics are becoming much more clear ;) Let's see what I can do... Paolo.
Re: Fix PR53185 (vectorizer segfault)
On Wed, May 9, 2012 at 2:38 PM, Michael Matz m...@suse.de wrote: Hi, the current code for strided loads can't deal with the situation when a prologue loop (peeling for alignment) is created after analyzing the data refs. There are multiple issues (non-constant steps in DRs mainly), so this is a simple stop gap. Regtesting on x86_64-linux (all langs) in progress. Okay for trunk? Ok. Thanks, Richard. Ciao, Michael. PR tree-optimization/53185 * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Disable peeling when we see strided loads. testsuite/ * gcc.dg/vect/pr53185.c: New test. Index: tree-vect-data-refs.c === --- tree-vect-data-refs.c (revision 187287) +++ tree-vect-data-refs.c (working copy) @@ -1507,6 +1507,17 @@ vect_enhance_data_refs_alignment (loop_v GROUP_FIRST_ELEMENT (stmt_info) != stmt) continue; + /* FORNOW: Any strided load prevents peeling. The induction + variable analysis will fail when the prologue loop is generated, + and so we can't generate the new base for the pointer. */ + if (STMT_VINFO_STRIDE_LOAD_P (stmt_info)) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, strided load prevents peeling); + do_peeling = false; + break; + } + /* For invariant accesses there is nothing to enhance. */ if (integer_zerop (DR_STEP (dr))) continue; Index: testsuite/gcc.dg/vect/pr53185.c === --- testsuite/gcc.dg/vect/pr53185.c (revision 0) +++ testsuite/gcc.dg/vect/pr53185.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -ftree-vectorize } */ +unsigned short a, e; +int *b, *d; +int c; +extern int fn2(); +void fn1 () { + void *f; + for (;;) { + fn2 (); + b = f; + e = 0; + for (; e a; ++e) + b[e] = d[e * c]; + } +}
Re: RFA: Fix detecting of in-tree MPFR 3.1.0 sources
On Wed, May 9, 2012 at 2:52 PM, Nick Clifton ni...@redhat.com wrote: Hi Guys, http://www.mpfr.org/mpfr-current/#changes The current release of the MPFR library (v3.1.0) has reorganized its sources such the mpfr.h header file is now in a sub-directory called 'src', rather than being at the top level. This has broken GCC's use of in-tree MPFR sources. I am asking for permission to apply the patch below to fix the problem. I tested it by building an i686-pc-linux-gnu toolchain on a machine with no MPFR libraries installed, but with a copy of the mpfr 3.1.0 sources installed in-tree. I also built a second toolchain with an in-tree copy of the mpfr 2.4.2 sources, just to make sure that the old paths still worked. Both builds worked. OK to apply ? I think we only support dropping in exactly the versions we provide in infrastructure/ - which matches the version we require in install.texi. Or did that change? Cheers Nick gcc/ChangeLog 2012-05-09 Nick Clifton ni...@redhat.com * configure.ac (mpfr-dir): When using in-tree MPFR sources allow for the fact that from release v3.1.0 of MPFR the source files were moved into a src sub-directory. * configure: Regenerate. Index: configure.ac === --- configure.ac (revision 187320) +++ configure.ac (working copy) @@ -1289,9 +1289,16 @@ gmplibs=-L$with_mpfr_lib $gmplibs fi if test x$with_mpfr$with_mpfr_include$with_mpfr_lib = x test -d ${srcdir}/mpfr; then - gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir $gmplibs - gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr -I$$s/mpfr '$gmpinc - extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir + # MPFR v3.1.0 moved the sources into a src sub-directory. + if test -d ${srcdir}/mpfr/src; then + gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/src/'$lt_cv_objdir $gmplibs + gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr/src -I$$s/mpfr/src '$gmpinc + extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr/src --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/src/'$lt_cv_objdir + else + gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir $gmplibs + gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr -I$$s/mpfr '$gmpinc + extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir + fi # Do not test the mpfr version. Assume that it is sufficient, since # it is in the source tree, and the library has not been built yet # but it would be included on the link line in the version check below
Re: [PATCH] gcc/config/freebsd-spec.h: Fix building PIE executables. Link them with crt{begin,end}S.o and Scrt1.o which are PIC instead of crt{begin,end}.o and crt1.o which are not. Spec synced from g
Hmm, sorry, it seems I forgot to look at MAINTAINERS and CC him... On Tue, 8 May 2012 09:53:43 -0400 Alexis Ballier aball...@gentoo.org wrote: gcc/config/i386/freebsd.h: Likewise. --- gcc/config/freebsd-spec.h |9 +++-- gcc/config/i386/freebsd.h |9 +++-- 2 files changed, 6 insertions(+), 12 deletions(-) diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h index 770a3d1..2808582 100644 --- a/gcc/config/freebsd-spec.h +++ b/gcc/config/freebsd-spec.h @@ -64,11 +64,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see before entering `main'. */ #define FBSD_STARTFILE_SPEC \ - %{!shared: \ - %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \ -%{!p:%{profile:gcrt1.o%s} \ - %{!profile:crt1.o%s \ - crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s} + %{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}} \ + crti.o%s %{shared|pie:crtbeginS.o%s;:crtbegin.o%s} /* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on the magical crtend.o file (see crtstuff.c) which provides part of @@ -77,7 +74,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see `crtn.o'. */ #define FBSD_ENDFILE_SPEC \ - %{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s + %{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s /* Provide a LIB_SPEC appropriate for FreeBSD as configured and as required by the user-land thread model. Before __FreeBSD_version diff --git a/gcc/config/i386/freebsd.h b/gcc/config/i386/freebsd.h index 649274d..dd69e43 100644 --- a/gcc/config/i386/freebsd.h +++ b/gcc/config/i386/freebsd.h @@ -67,11 +67,8 @@ along with GCC; see the file COPYING3. If not see #undef STARTFILE_SPEC #define STARTFILE_SPEC \ - %{!shared: \ - %{pg:gcrt1.o%s} %{!pg:%{p:gcrt1.o%s} \ -%{!p:%{profile:gcrt1.o%s} \ - %{!profile:crt1.o%s \ - crti.o%s %{!shared:crtbegin.o%s} %{shared:crtbeginS.o%s} + %{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}} \ + crti.o%s %{shared|pie:crtbeginS.o%s;:crtbegin.o%s} /* Provide a ENDFILE_SPEC appropriate for FreeBSD. Here we tack on the magical crtend.o file (see crtstuff.c) which provides part of @@ -81,7 +78,7 @@ along with GCC; see the file COPYING3. If not see #undef ENDFILE_SPEC #define ENDFILE_SPEC \ - %{!shared:crtend.o%s} %{shared:crtendS.o%s} crtn.o%s + %{shared|pie:crtendS.o%s;:crtend.o%s} crtn.o%s /* Provide a LINK_SPEC appropriate for FreeBSD. Here we provide support for the special GCC options -static and -shared, which allow us to
Re: RFA: Fix detecting of in-tree MPFR 3.1.0 sources
On Wed, May 9, 2012 at 3:26 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 9, 2012 at 2:52 PM, Nick Clifton ni...@redhat.com wrote: Hi Guys, http://www.mpfr.org/mpfr-current/#changes The current release of the MPFR library (v3.1.0) has reorganized its sources such the mpfr.h header file is now in a sub-directory called 'src', rather than being at the top level. This has broken GCC's use of in-tree MPFR sources. I am asking for permission to apply the patch below to fix the problem. I tested it by building an i686-pc-linux-gnu toolchain on a machine with no MPFR libraries installed, but with a copy of the mpfr 3.1.0 sources installed in-tree. I also built a second toolchain with an in-tree copy of the mpfr 2.4.2 sources, just to make sure that the old paths still worked. Both builds worked. OK to apply ? I think we only support dropping in exactly the versions we provide in infrastructure/ - which matches the version we require in install.texi. Or did that change? Btw, it would probably be better to make the drop-in compiles doing a staged install during build instead of using the build tree for use. Cheers Nick gcc/ChangeLog 2012-05-09 Nick Clifton ni...@redhat.com * configure.ac (mpfr-dir): When using in-tree MPFR sources allow for the fact that from release v3.1.0 of MPFR the source files were moved into a src sub-directory. * configure: Regenerate. Index: configure.ac === --- configure.ac (revision 187320) +++ configure.ac (working copy) @@ -1289,9 +1289,16 @@ gmplibs=-L$with_mpfr_lib $gmplibs fi if test x$with_mpfr$with_mpfr_include$with_mpfr_lib = x test -d ${srcdir}/mpfr; then - gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir $gmplibs - gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr -I$$s/mpfr '$gmpinc - extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir + # MPFR v3.1.0 moved the sources into a src sub-directory. + if test -d ${srcdir}/mpfr/src; then + gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/src/'$lt_cv_objdir $gmplibs + gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr/src -I$$s/mpfr/src '$gmpinc + extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr/src --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/src/'$lt_cv_objdir + else + gmplibs='-L$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir $gmplibs + gmpinc='-I$$r/$(HOST_SUBDIR)/mpfr -I$$s/mpfr '$gmpinc + extra_mpc_mpfr_configure_flags='--with-mpfr-include=$$s/mpfr --with-mpfr-lib=$$r/$(HOST_SUBDIR)/mpfr/'$lt_cv_objdir + fi # Do not test the mpfr version. Assume that it is sufficient, since # it is in the source tree, and the library has not been built yet # but it would be included on the link line in the version check below
Re: PR 53249: Multiple address modes for same address space
On Tue, May 8, 2012 at 4:05 PM, Richard Henderson r...@redhat.com wrote: On 05/06/2012 11:41 AM, Richard Sandiford wrote: PR middle-end/53249 * dwarf2out.h (get_address_mode): Move declaration to... * rtl.h: ...here. * dwarf2out.c (get_address_mode): Move definition to... * rtlanal.c: ...here. * var-tracking.c (get_address_mode): Delete. * combine.c (find_split_point): Use get_address_mode instead of targetm.addr_space.address_mode. * cselib.c (cselib_record_sets): Likewise. * dse.c (canon_address, record_store): Likewise. * emit-rtl.c (adjust_address_1, offset_address): Likewise. * expr.c (move_by_pieces, emit_block_move_via_loop, store_by_pieces) (store_by_pieces_1, expand_assignment, store_expr, store_constructor) (expand_expr_real_1): Likewise. * ifcvt.c (noce_try_cmove_arith): Likewise. * optabs.c (maybe_legitimize_operand_same_code): Likewise. * reload.c (find_reloads): Likewise. * sched-deps.c (sched_analyze_1, sched_analyze_2): Likewise. * sel-sched-dump.c (debug_mem_addr_value): Likewise. ok. r~ I checked in this testcase. Thanks. -- H.J. -- 2012-05-09 H.J. Lu hongjiu...@intel.com PR middle-end/53249 * gcc.target/i386/pr53249.c: New. diff --git a/gcc/testsuite/gcc.target/i386/pr53249.c b/gcc/testsuite/gcc.target/i386/pr53249.c new file mode 100644 index 000..9eab8bc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr53249.c @@ -0,0 +1,25 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options -O2 -mx32 -ftls-model=initial-exec -maddress-mode=short } */ + +struct gomp_task +{ + struct gomp_task *parent; +}; + +struct gomp_thread +{ + int foo1; + struct gomp_task *task; +}; + +extern __thread struct gomp_thread gomp_tls_data; + +void +__attribute__ ((noinline)) +gomp_end_task (void) +{ + struct gomp_thread *thr = gomp_tls_data; + struct gomp_task *task = thr-task; + + thr-task = task-parent; +}
Re: [patch] support for multiarch systems
Il 09/05/2012 02:38, Matthias Klose ha scritto: Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187271) +++ gcc/doc/invoke.texi (working copy) @@ -6110,6 +6110,11 @@ @file{../lib32}, or if OS libraries are present in @file{lib/@var{subdir}} subdirectories it prints e.g.@: @file{amd64}, @file{sparcv9} or @file{ev6}. +@item -print-multiarch +@opindex print-multiarch +Print the path to OS libraries for the selected multiarch, +relative to some @file{lib} subdirectory. + @item -print-prog-name=@var{program} @opindex print-prog-name Like @option{-print-file-name}, but searches for a program such as @samp{cpp}. So -print-multiarch is like --print-multi-os-directory? What is the difference, and where is it docuemnted? Should it fail if multiarch support is not compiled in? Paolo
Re: RFA: Fix detecting of in-tree MPFR 3.1.0 sources
On Wed, 9 May 2012, Nick Clifton wrote: Hi Guys, http://www.mpfr.org/mpfr-current/#changes The current release of the MPFR library (v3.1.0) has reorganized its sources such the mpfr.h header file is now in a sub-directory called 'src', rather than being at the top level. This has broken GCC's use of in-tree MPFR sources. I am asking for permission to apply the patch below to fix the problem. I tested it by building an i686-pc-linux-gnu toolchain on a machine with no MPFR libraries installed, but with a copy of the mpfr 3.1.0 sources installed in-tree. I also built a second toolchain with an in-tree copy of the mpfr 2.4.2 sources, just to make sure that the old paths still worked. Both builds worked. Note that this is PR50461, aka PR51935, which includes a patch which was submitted: http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01505.html (your patch is fine as well, from my non-reviewer point of view) On Wed, 9 May 2012, Richard Guenther wrote: I think we only support dropping in exactly the versions we provide in infrastructure/ - which matches the version we require in install.texi. Or did that change? The documentation just says 2.4.2 (or later). Even if it wasn't supported, why not take the patch? And it is a first step to changing the version in infrastructure, which will eventually have to happen. -- Marc Glisse
Re: RFA: Fix detecting of in-tree MPFR 3.1.0 sources
Il 09/05/2012 14:52, Nick Clifton ha scritto: Hi Guys, http://www.mpfr.org/mpfr-current/#changes The current release of the MPFR library (v3.1.0) has reorganized its sources such the mpfr.h header file is now in a sub-directory called 'src', rather than being at the top level. This has broken GCC's use of in-tree MPFR sources. I am asking for permission to apply the patch below to fix the problem. I tested it by building an i686-pc-linux-gnu toolchain on a machine with no MPFR libraries installed, but with a copy of the mpfr 3.1.0 sources installed in-tree. I also built a second toolchain with an in-tree copy of the mpfr 2.4.2 sources, just to make sure that the old paths still worked. Both builds worked. OK to apply ? Ok. I think it's nicer for the users if we enable both builds to work. Paolo
Re: PR 40752: -Wconversion generates false warnings for operands not larger than target type
On 9 May 2012 15:21, Paolo Carlini paolo.carl...@oracle.com wrote: Good, I even have a patch essentially ready for that, but can you point me to that detailed feedback? I couldn't find it when I looked for it. This is how I understood the following paragraph: I still think the warnings for these cases are mostly correct (there are cases where you may be able to make deductions about the range of possible values of the expression being converted) and appropriate, and if disabled should be disabled under some separate -Wno-conversion-whatever option. That is, cases where you not are able to make a deduction about ranges = the warnings are appropriate and should be disabled only under a separate option. cases where you are able to deduce the ranges = warning is not appropriate But I may have misinterpreted it. Cheers, Manuel. Thanks Paolo.
Re: RFA: Fix detecting of in-tree MPFR 3.1.0 sources
Il 09/05/2012 15:31, Richard Guenther ha scritto: Btw, it would probably be better to make the drop-in compiles doing a staged install during build instead of using the build tree for use. Same for GCC while building target library, but not really easy to do... GMP/MPFR are compiled without shared libraries which makes it simpler, but it would also be a bit ad hoc. Paolo
Re: PR 40752: -Wconversion generates false warnings for operands not larger than target type
On Wed, 9 May 2012, Manuel López-Ibáñez wrote: That is, cases where you not are able to make a deduction about ranges = the warnings are appropriate and should be disabled only under a separate option. cases where you are able to deduce the ranges = warning is not appropriate Yes, that's correct. -- Joseph S. Myers jos...@codesourcery.com
Re: RFA: Fix detecting of in-tree MPFR 3.1.0 sources
On Wed, May 9, 2012 at 3:40 PM, Paolo Bonzini bonz...@gnu.org wrote: Il 09/05/2012 15:31, Richard Guenther ha scritto: Btw, it would probably be better to make the drop-in compiles doing a staged install during build instead of using the build tree for use. Same for GCC while building target library, but not really easy to do... GMP/MPFR are compiled without shared libraries which makes it simpler, but it would also be a bit ad hoc. Sure - but support for dropping in support libraries into the bootstrap tree is ad-hoc, too ... Paolo
Re: PR 40752: -Wconversion generates false warnings for operands not larger than target type
On 05/09/2012 03:54 PM, Joseph S. Myers wrote: On Wed, 9 May 2012, Manuel López-Ibáñez wrote: That is, cases where you not are able to make a deduction about ranges = the warnings are appropriate and should be disabled only under a separate option. cases where you are able to deduce the ranges = warning is not appropriate Yes, that's correct. Ah, great, thanks. Then I should have a patchlet almost ready, simply tweaking the conditional expression case with get_unwidened (op1, 0), likewise op2, and recursion. Paolo.
Re: RFA: Fix detecting of in-tree MPFR 3.1.0 sources
Il 09/05/2012 15:57, Richard Guenther ha scritto: Btw, it would probably be better to make the drop-in compiles doing a staged install during build instead of using the build tree for use. Same for GCC while building target library, but not really easy to do... GMP/MPFR are compiled without shared libraries which makes it simpler, but it would also be a bit ad hoc. Sure - but support for dropping in support libraries into the bootstrap tree is ad-hoc, too ... Yes, but pretty much confined in configure.ac. What I'm wary of is adding ad-hoc stuff to the Makefile, which is already enough of a mess... Paolo
Symbol table 20/many: cleanup of cgraph_remove_unreachable_nodes
Hi, this patch reworks cgraph_remove_unreachable_nodes that has grown from somewhat hackish but simple marksweep routine into quite unmaintainable mess. It is because it needs to handle several special cases - extern inline functions, virtual functions, clones and inline clone tree reshaping and these features was not added in particularly nice way and their handling changed several times. Things bacome even more messy when varpool walking was added with ipa-ref infrastructure in 4.6 timeframe. This patch rewrites the function to be similar to what lto-cgraph does. I.e. compute reachable symbols and the boundary. I also made some fixes and made the function to release function bodies that are no longer neccesary more aggressively. This, for about 10th time, made me to run into problems with OMP lowering. OMP create declarations of the artificial functions early, but fills them with bodies only much later. It keeps the functions incomplette for a while, with STRUCT_FUNCTION initialized but left empty. This patch simply makes it to initialize those structures when they are needed. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza * cgraph.h (cgraph_remove_unreachable_nodes): Rename to ... (symtab_remove_unreachable_nodes): ... this one. * ipa-cp.c (ipcp_driver): Do not remove unreachable nodes. * omp-low.c (create_omp_child_function): Do not initialize DECL_INITIAL and STRUCT_FUNCTION. (remove_exit_barrier): Do not try to walk not created yet function bodies. (expand_omp_taskreg, lower_omp_taskreg): Initialize the STRUCT_FUNCTION and DECL_INITIAL here. * cgraphunit.c (ipa_passes): Update. * cgraphclones.c (cgraph_materialize_all_clones): Update. * ipa-inline.c (ipa_inline): Update. * ipa.c: Include ipa-inline.h (enqueue_cgraph_node, enqueue_varpool_node): Remove. (enqueue_node): New function. (process_references): Update. (symtab_remove_unreachable_nodes): Cleanup. * passes.c (execute_todo, execute_one_pass): Update. Index: cgraph.h === --- cgraph.h(revision 187335) +++ cgraph.h(working copy) @@ -637,7 +637,7 @@ int compute_call_stmt_bb_frequency (tree void record_references_in_initializer (tree, bool); /* In ipa.c */ -bool cgraph_remove_unreachable_nodes (bool, FILE *); +bool symtab_remove_unreachable_nodes (bool, FILE *); cgraph_node_set cgraph_node_set_new (void); cgraph_node_set_iterator cgraph_node_set_find (cgraph_node_set, struct cgraph_node *); Index: ipa-cp.c === --- ipa-cp.c(revision 187335) +++ ipa-cp.c(working copy) @@ -2445,7 +2445,6 @@ ipcp_driver (void) struct cgraph_2edge_hook_list *edge_duplication_hook_holder; struct topo_info topo; - cgraph_remove_unreachable_nodes (true,dump_file); ipa_check_create_node_params (); ipa_check_create_edge_args (); grow_next_edge_clone_vector (); Index: omp-low.c === --- omp-low.c (revision 187335) +++ omp-low.c (working copy) @@ -1572,7 +1572,6 @@ create_omp_child_function (omp_context * DECL_UNINLINABLE (decl) = 1; DECL_EXTERNAL (decl) = 0; DECL_CONTEXT (decl) = NULL_TREE; - DECL_INITIAL (decl) = make_node (BLOCK); t = build_decl (DECL_SOURCE_LOCATION (decl), RESULT_DECL, NULL_TREE, void_type_node); @@ -1605,13 +1604,6 @@ create_omp_child_function (omp_context * DECL_CHAIN (t) = DECL_ARGUMENTS (decl); DECL_ARGUMENTS (decl) = t; } - - /* Allocate memory for the function structure. The call to - allocate_struct_function clobbers CFUN, so we need to restore - it afterward. */ - push_struct_function (decl); - cfun-function_end_locus = gimple_location (ctx-stmt); - pop_cfun (); } @@ -3241,28 +3233,31 @@ remove_exit_barrier (struct omp_region * unsigned ix; any_addressable_vars = 0; - FOR_EACH_LOCAL_DECL (DECL_STRUCT_FUNCTION (child_fun), ix, decl) - if (TREE_ADDRESSABLE (decl)) - { - any_addressable_vars = 1; - break; - } - for (block = gimple_block (stmt); - !any_addressable_vars - block - TREE_CODE (block) == BLOCK; - block = BLOCK_SUPERCONTEXT (block)) + if (DECL_STRUCT_FUNCTION (child_fun)) { - for (local_decls = BLOCK_VARS (block); - local_decls; - local_decls = DECL_CHAIN (local_decls)) - if (TREE_ADDRESSABLE (local_decls)) + FOR_EACH_LOCAL_DECL (DECL_STRUCT_FUNCTION (child_fun), ix, decl) + if
[google] Hide all uses of __float128 from Clang (issue6195066)
Hide all uses of __float128 from Clang. Brackets _GLIBCXX_USE_FLOAT128 with #ifndef __clang__. Clang does not currently support the __float128 builtin, and so will fail to process libstdc++ headers that use it. Tested for full bootstrap and dejagnu testsuite. Okay for google/integration and google/gcc-4_7-integration branches? Thanks. 2012-05-09 Simon Baldwin sim...@google.com * libstdc++-v3/acinclude.m4: Bracket _GLIBCXX_USE_FLOAT128 definition with ifndef __clang__. * libstdc++-v3/config.h.in: Rebuild. Index: libstdc++-v3/config.h.in === --- libstdc++-v3/config.h.in(revision 187148) +++ libstdc++-v3/config.h.in(working copy) @@ -799,8 +799,11 @@ this host. */ #undef _GLIBCXX_USE_DECIMAL_FLOAT -/* Define if __float128 is supported on this host. */ +/* Define if __float128 is supported on this host. + Hide all uses of __float128 from Clang. Google ref b/6422845 */ +#ifndef __clang__ #undef _GLIBCXX_USE_FLOAT128 +#endif /* Defined if gettimeofday is available. */ #undef _GLIBCXX_USE_GETTIMEOFDAY Index: libstdc++-v3/acinclude.m4 === --- libstdc++-v3/acinclude.m4 (revision 187148) +++ libstdc++-v3/acinclude.m4 (working copy) @@ -2529,10 +2529,16 @@ int main() } EOF +AH_VERBATIM([_GLIBCXX_USE_FLOAT128,], +[/* Define if __float128 is supported on this host. + Hide all uses of __float128 from Clang. Google ref b/6422845 */ +#ifndef __clang__ +#undef _GLIBCXX_USE_FLOAT128 +#endif]) + AC_MSG_CHECKING([for __float128]) if AC_TRY_EVAL(ac_compile); then - AC_DEFINE(_GLIBCXX_USE_FLOAT128, 1, - [Define if __float128 is supported on this host.]) + AC_DEFINE(_GLIBCXX_USE_FLOAT128, 1) enable_float128=yes else enable_float128=no -- This patch is available for review at http://codereview.appspot.com/6195066
Re: [google] Hide all uses of __float128 from Clang (issue6195066)
On 12-05-09 08:19 , Simon Baldwin wrote: 2012-05-09 Simon Baldwinsim...@google.com * libstdc++-v3/acinclude.m4: Bracket _GLIBCXX_USE_FLOAT128 definition with ifndef __clang__. * libstdc++-v3/config.h.in: Rebuild. OK. Diego.
Re: [google] Hide all uses of __float128 from Clang (issue6195066)
On Wed, May 9, 2012 at 8:19 AM, Simon Baldwin sim...@google.com wrote: Hide all uses of __float128 from Clang. Brackets _GLIBCXX_USE_FLOAT128 with #ifndef __clang__. Clang does not currently support the __float128 builtin, and so will fail to process libstdc++ headers that use it. Tested for full bootstrap and dejagnu testsuite. Okay for google/integration and google/gcc-4_7-integration branches? Please check this into google/gcc-4_7 as well. Ollie
[PATCH] gcc_update explicit use of svn
contrib/gcc_update currently uses svn explicitly to determine the Revision and Branch instead of the $GCC_SVN shell variable used in other instances in the script. This patch uses the shell variable in all instances. Thanks, David * gcc_update: Use $GCC_SVN to retrieve branch and revision. Index: gcc_update === --- gcc_update (revision 187329) +++ gcc_update (working copy) @@ -372,8 +372,8 @@ exit 1 fi - revision=`svn info | awk '/Revision:/ { print $2 }'` - branch=`svn info | sed -ne /URL:/ { + revision=`$GCC_SVN info | awk '/Revision:/ { print $2 }'` + branch=`$GCC_SVN info | sed -ne /URL:/ { s,.*/trunk,trunk, s,.*/branches/,, s,.*/tags/,,
Re: [patch] support for multiarch systems
On 09.05.2012 15:37, Paolo Bonzini wrote: Il 09/05/2012 02:38, Matthias Klose ha scritto: Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187271) +++ gcc/doc/invoke.texi (working copy) @@ -6110,6 +6110,11 @@ @file{../lib32}, or if OS libraries are present in @file{lib/@var{subdir}} subdirectories it prints e.g.@: @file{amd64}, @file{sparcv9} or @file{ev6}. +@item -print-multiarch +@opindex print-multiarch +Print the path to OS libraries for the selected multiarch, +relative to some @file{lib} subdirectory. + @item -print-prog-name=@var{program} @opindex print-prog-name Like @option{-print-file-name}, but searches for a program such as @samp{cpp}. So -print-multiarch is like --print-multi-os-directory? the former prints the part before the `:' in the MULTILIB_OSDIRNAMES, the latter the part after the `':', e.g. ../lib32 and i386-linux-gnu. What is the difference, and where is it documented? Not sure how it should be further documented. Should it fail if multiarch support is not compiled in? All the -print options always succeed. I would prefer if it just prints the empty string if it is not configured (as it does now). Matthias
Re: [Patch,AVR,trunk,4.7]: Implement PR53256
2012/5/7 Georg-Johann Lay a...@gjlay.de: AVR-LibC switched from using either signal /or/ interrupt function attribute to using both at the same time. This was never documented or implemented but worked accidentally for some time, but results in wrong code for 4.7+ This patch adds better documentation of these attributes and makes 'interrupt' silently override 'signal'. Besides that, some more sanity checking is done for function attributes. ASM_DECLARE_FUNCTION_NAME just served to check isr names. All the checking is done in the new hook TARGET_SET_CURRENT_FUNCTION now so that ASM_DECLARE_FUNCTION_NAME from defaults.h can be used, thus some clean-up in elf.h Ok for trunk and 4.7? Johann PR target/53256 * config/avr/elf.h (ASM_DECLARE_FUNCTION_NAME): Remove. * config/avr/avr-protos.h (avr_asm_declare_function_name): Remove. * config/avr/avr.h (struct machine_function): Add attributes_checked_p. * config/avr/avr.c (avr_asm_declare_function_name): Remove. (expand_prologue): Move initialization of cfun-machine-is_naked, is_interrupt, is_signal, is_OS_task, is_OS_main from here to... (avr_regs_to_save): Ditto. (avr_set_current_function): ...this new static function. (TARGET_SET_CURRENT_FUNCTION): New define. (avr_function_ok_for_sibcall): Use cfun-machine-is_* instead of checking attributes of current_function_decl. (signal_function_p): Rename to avr_signal_function_p. (interrupt_function_p): Rename to avr_interrupt_function_p. * doc/extend.texi (Function Attributes): Better explanation of 'interrupt' and 'signal for AVR. Move 'ifunc' down for alphabetical order. Approved. Denis.
Re: Optimize calls to functions that return one of their arguments
On 04/28/2012 11:31 AM, Bernd Schmidt wrote: This patch allows us to recognize that even if the argument to memcpy lives across the call, we can allocate it to a call-used register by reusing the return value of the function. First, the patch sets the existing fn spec attribute for memcpy/memmove. This is translated to a new form of CALL_INSN_FUNCTION_USAGE, a (set (returnreg) (argreg)). This is recognized by IRA to adjust costs, and for communicating to caller-save that the register can be restored cheaply. The optimization only triggers if the argument is passed in a register, which should be the case in the majority of sane ABIs. The effect on the new testcase: pushq%rbx |subq$8, %rsp movslq%edx, %rdxmovslq%edx, %rdx movq%rdi, %rbx callmemcpycallmemcpy movq%rbx, %rax |addq$8, %rsp popq%rbx retret Bernd, sorry for some delay. I needed to think about the patch. It is pretty interesting and original idea. My only major objection to the patch is about treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM. I think it should be accumulated as ALLOCNO_CALLS_CROSSED_NUM. Otherwise, I am afraid you will have a degradation in many cases instead of improvement. IRA is a regional allocator. The first it tries to do coloring in whole function (seeing a whole picture), then it tries to improve allocation in subregions. When you treat ALLOCNO_CHEAP_CALLS_CROSSED_NUM not accumulated (it means not taking subregions into account) you mislead allocation in the region containing subregions. For example, if the single call is in a subregion, ALLOCNO_CHEAP_CALLS_CROSSED_NUM for the subregion allocno will be 1 but in whole program allocno will be 0. RA in whole function will tend to allocate callee-saved hard register and RA in the subregion will tend to allocate caller-saved hard register. That will increase a possibility to create additional shuffle insns on the subregion borders and as consequence will degrade the code. I don't expect that this micro-optimization improves SPEC2000, but it will improve some tests. So it is good to have it. It would be really interesting to see the optimization impact on SPEC2000. I think I'll do it for myself in a week. So IRA part of the patch is ok for me if you modify treatment of ALLOCNO_CHEAP_CALLS_CROSSED_NUM as it is done for ALLOCNO_CALLS_CROSSED_NUM (when upper region allocnos accumulate the values from the corresponding allocnos from its sub-regions).
Re: [google] Hide all uses of __float128 from Clang (issue6195066)
Hi Simon, Hide all uses of __float128 from Clang. Brackets _GLIBCXX_USE_FLOAT128 with #ifndef __clang__. Clang does not currently support the __float128 builtin, and so will fail to process libstdc++ headers that use it. if one day clang gets support for this type, won't this still turn everything off? Is it possible to test the compiler on some small program using __float128, and turn off use of __float128 if the compiler barfs? Ciao, Duncan.
Re: [google] Hide all uses of __float128 from Clang (issue6195066)
On Wed, May 9, 2012 at 6:08 PM, Duncan Sands baldr...@free.fr wrote: Hi Simon, Hide all uses of __float128 from Clang. Brackets _GLIBCXX_USE_FLOAT128 with #ifndef __clang__. Clang does not currently support the __float128 builtin, and so will fail to process libstdc++ headers that use it. if one day clang gets support for this type, won't this still turn everything off? Is it possible to test the compiler on some small program using __float128, and turn off use of __float128 if the compiler barfs? Does it matter? This is for Google-internal branches only. Ciao! Steven
Re: RFA: Fix detecting of in-tree MPFR 3.1.0 sources
Hi Guys, Ok. I think it's nicer for the users if we enable both builds to work. Thanks. I have applied the patch, along with a reference to PR 50461 and a credit to Paul Smith in the ChangeLog entry since he came up with basically the same patch as mine. Cheers Nick
Re: [patch] support for multiarch systems
Il 09/05/2012 17:34, Matthias Klose ha scritto: So -print-multiarch is like --print-multi-os-directory? the former prints the part before the `:' in the MULTILIB_OSDIRNAMES, the latter the part after the `':', e.g. ../lib32 and i386-linux-gnu. Yes, of course. What is the difference, and where is it documented? Not sure how it should be further documented. No idea, it is a new concept and people need to understand how it relates to multilibbing for example, what shortcomings are addressed etc. I went through the Debian pages (only cursorily, I admit) and I found nothing of this. Another question I have is related to usage of the option. Are you supposed to look for libraries in the multilib directories too if the compiler is multiarch-enabled? Or only in /lib/i386-linux-gnu? Which one takes priority, multiarch or multiosdir? From the patch I can guess the intended search path is /lib/MULTIARCH:/lib/MULTIOSDIR, but I'm not entirely sure about that and it needs documentation. Should it fail if multiarch support is not compiled in? All the -print options always succeed. I would prefer if it just prints the empty string if it is not configured (as it does now). Will the empty string be a valid output for a multiarch-enabled compiler? For example gcc --print-multi-os-directory and gcc --print-multi-directory on a bi-arch x86-64 compiler will never print the empty string. Again, I guess the answer is no but I'm not sure. If the answer is no, returning the empty string is fine. If the answer is yes, and assuming the search path is /lib/MULTIARCH:/lib/MULTIOSDIR, then programs need to know whether they need to omit the /lib/MULTIARCH component of the search path. A failure status code is then required. Paolo
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 9, 2012 at 1:12 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? The side effect is the user registered malloc hooks. The pattern 'free(malloc(4)', or loops like for (..) { malloc(..); // result not used } itself is not interesting. They only appear in the test code and the problem can be worked around by using -fno-builtin-malloc. The option is to avoid disable this and all similar malloc optimizations (in the future) for malloc implementation with hooks. Thanks, David Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: + if (!flag_eliminate_malloc) + mark_stmt_necessary (stmt, true); return; default:;
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 9, 2012 at 9:32 AM, Xinliang David Li davi...@google.com wrote: On Wed, May 9, 2012 at 1:12 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? The side effect is the user registered malloc hooks. IIRC future versions of glibc will have those malloc hooks removed. Because of this problem. Thanks, Andrew Pinski The pattern 'free(malloc(4)', or loops like for (..) { malloc(..); // result not used } itself is not interesting. They only appear in the test code and the problem can be worked around by using -fno-builtin-malloc. The option is to avoid disable this and all similar malloc optimizations (in the future) for malloc implementation with hooks. Thanks, David Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: + if (!flag_eliminate_malloc) + mark_stmt_necessary (stmt, true); return; default:;
Re: Fix gcc.dg/lower-subreg-1.c failure
From: Richard Sandiford rdsandif...@googlemail.com Date: Wed, 9 May 2012 11:14:49 +0200 Hans-Peter Nilsson hans-peter.nils...@axis.com writes: From: Richard Sandiford rdsandif...@googlemail.com Date: Tue, 1 May 2012 16:46:38 +0200 To repeat: as things stand, very few targets define proper rtx costs for SET. IMHO it's wrong to start blaming targets when rtx_cost doesn't take the mode in account in the first place, for the default cost. (Well, except for the modes-tieable subreg special-case.) The targets where an operation in N * word_mode costs no more than one in word_mode, if there even is one, is a minority, let's adjust the defaults to that. I'll pass on approving or disapproving this patch, but for the record: a factor of word_mode seems a bit too simplistic. I'd say it's the right level: simplistic enough for the default, not to mention now linear, without being plainly ignorant as now. It's OK for moves and logic ops, but addition of multiword modes is a bit more complicated. How about (same factor) factor*COSTS_N_INSNS (1)*3/2 to account for carry? Or is 2*factor a better default? Further improvements are welcome, but I see the patch as a strict improvement and I hope it will not be shot down by requests for perfection - at least not without detailing said perfection. Multiplication and division by multiword modes is even more so, of course. Suggestions are welcome, but in the absence of that, I'd say any factor larger than one is is a good start, like in the patch. I think there should be a gating check whether the target implements that kind of shift in that mode at all, before checking the cost. Not sure whether it's generally best to put that test here, or to make the rtx_cost function return the cost of a libcall for that mode when that happens. Similar for the other insns. This has come up a few times in past discussions about rtx_cost (as I'm sure you remember :-)). On the one hand, I agree it might be nice to shield the backend from invalid insns. That would effectively mean calling expand on each insn though, which would be rather expensive. No, nothing that complicated. I'm thinking of just basically checking that there's an operation in that mode, like: if (direct_optab_handler (code_to_optab [GET_CODE (x)], mode) == CODE_FOR_nothing) { ... return tabled default cost; for libcall or open-code ... } Restricting the validity-gating to checking that the mode is valid for the operation wouldn't interfere with fancy pipeline speculative use. So I think this patch is using rtx_cost according to its current interface. The interface use previously ignored the mode for most uses (QED), so that's not completely correct. ;) If someone wants to change or restrict that interface, than that's a separate change IMO. And it should be done consistently rather than in this one place. In this case it doesn't matter anyway. If we never see a shift in mode M by amount X, we'll never need to make a decision about whether to split it. If it's never used, then I don't mind it being wrong if that simplifies the computation. :) Isn't the below better than doing virtually the same in each target's rtx_costs? FWIW, MIPS, SH and x86 all used something slightly different (and more complicated). I imagine PowerPC and SPARC will too. So each seems a bit strong. That's not an objection to the patch though. I realise some ports do have very regular architectures where every register is the same width and has the same move cost. Hence the default should follow a very regular model... brgds, H-P
Re: RFA: PR target/53120, constraint modifier + on operand tied by matching-constraint, 0.
OK to apply ? Ok. Thanks!
Re: [patch] support for multiarch systems
On 09.05.2012 18:29, Paolo Bonzini wrote: Il 09/05/2012 17:34, Matthias Klose ha scritto: So -print-multiarch is like --print-multi-os-directory? the former prints the part before the `:' in the MULTILIB_OSDIRNAMES, the latter the part after the `':', e.g. ../lib32 and i386-linux-gnu. Yes, of course. What is the difference, and where is it documented? Not sure how it should be further documented. No idea, it is a new concept and people need to understand how it relates to multilibbing for example, what shortcomings are addressed etc. I went through the Debian pages (only cursorily, I admit) and I found nothing of this. these are referenced from the http://wiki.debian.org/Multiarch/Tuples https://wiki.ubuntu.com/MultiarchSpec#Filesystem_layout http://err.no/debian/amd64-multiarch-3 http://wiki.debian.org/Multiarch/TheCaseForMultiarch describes use cases for multiarch, and why Debian thinks that the existing approaches are not sufficient (having name collisions for different architectures or ad hoc names for new architectures like libx32). That may be contentious within the Linux community, but I would like to avoid this kind of discussion here. Another question I have is related to usage of the option. Are you supposed to look for libraries in the multilib directories too if the compiler is multiarch-enabled? Debian/Ubuntu always use /lib for the default MULTIOSDIR, and this needs to be looked up in the future too. The use of locations like ../lib32 will be faded out over time, but I don't see any harm not to have looked up as well. Or only in /lib/i386-linux-gnu? Which one takes priority, multiarch or multiosdir? From the patch I can guess the intended search path is /lib/MULTIARCH:/lib/MULTIOSDIR, but I'm not entirely sure about that and it needs documentation. yes, this is the intended search path. I assume such kind of documentation shouldn't go into invoke.texi. Should it fail if multiarch support is not compiled in? All the -print options always succeed. I would prefer if it just prints the empty string if it is not configured (as it does now). Will the empty string be a valid output for a multiarch-enabled compiler? For example gcc --print-multi-os-directory and gcc --print-multi-directory on a bi-arch x86-64 compiler will never print the empty string. Again, I guess the answer is no but I'm not sure. If the answer is no, returning the empty string is fine. The answer is no. If multiarch is enabled, then the string is always non-empty and should return a multiarch tuple as defined in http://wiki.debian.org/Multiarch/Tuples. Anything else should be considered an error. If the answer is yes, and assuming the search path is /lib/MULTIARCH:/lib/MULTIOSDIR, then programs need to know whether they need to omit the /lib/MULTIARCH component of the search path. Unrelated, but why would programs hard code this path and not use something like this? gcc -v -E - /dev/null 21 | grep ^LIBRARY_PATH Matthias
[doc] Fix xref in extend.texi
It is strange that a paragraph talking about GCC pragmas refers to the section in CPP talking about #indent and other such directives (no #pragma there). Wouldn't it better if it pointed to the section about #pragmas? OK? 2012-05-09 Manuel López-Ibáñez m...@gcc.gnu.org * doc/extend.texi (Function Attributes): Point xref to section about Pragmas. Index: doc/extend.texi === --- doc/extend.texi (revision 187299) +++ doc/extend.texi (working copy) @@ -4102,8 +4102,7 @@ found convenient to use @code{__attribute__} to achieve a natural attachment of attributes to their corresponding declarations, whereas @code{#pragma GCC} is of use for constructs that do not naturally form -part of the grammar. @xref{Other Directives,,Miscellaneous -Preprocessing Directives, cpp, The GNU C Preprocessor}. +part of the grammar. @xref{Pragmas,,Pragmas Accepted by GCC}. @node Attribute Syntax @section Attribute Syntax
[PATCH, i386]: Fix PR 44141, Redundant loads and stores generated for AMD bdver1 target
Hello! Attached patch avoids a deficiency in reload, where reload gives up on handling subregs of pseudos (please see the PR [1] for explanation by Ulrich). The patch simply avoids generating V4SF moves with V4SF subregs of V2DF values unless really necessary (i.e. moving SSE2 modes without SSE2 enabled, which shouldn't happen anyway). With patched gcc, expand pass emits (unaligned) moves in their original mode, and this mode is kept until asm is generated. The asm instruction is chosen according to the mode of insn pattern, and the mode is calculated using various influencing conditions. 2012-05-09 Uros Bizjak ubiz...@gmail.com PR target/44141 * config/i386/i386.c (ix86_expand_vector_move_misalign): Do not handle 128 bit vectors specially for TARGET_AVX. Emit sse2_movupd and sse_movupd RTXes for TARGET_AVX, TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL or when optimizing for size. * config/i386/sse.md (*movmode_internal): Remove TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling from asm output code. Calculate mode attribute according to optimize_function_for_size_p and TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL flag. (*sse_movussemodesuffixavxsizesuffix): Choose asm template depending on the mode of the instruction. Calculate mode attribute according to optimize_function_for_size_p, TARGET_SSE_TYPELESS_STORES and TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL flags. (*sse2_movdquavxsizesuffix): Ditto. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. The patch also fixes the testcase from the PR. Patch will be committed to mainline SVN. [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44141#c16 Uros. Index: config/i386/sse.md === --- config/i386/sse.md (revision 187286) +++ config/i386/sse.md (working copy) @@ -449,8 +449,6 @@ (misaligned_operand (operands[0], MODEmode) || misaligned_operand (operands[1], MODEmode))) return vmovupd\t{%1, %0|%0, %1}; - else if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return %vmovaps\t{%1, %0|%0, %1}; else return %vmovapd\t{%1, %0|%0, %1}; @@ -460,8 +458,6 @@ (misaligned_operand (operands[0], MODEmode) || misaligned_operand (operands[1], MODEmode))) return vmovdqu\t{%1, %0|%0, %1}; - else if (TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return %vmovaps\t{%1, %0|%0, %1}; else return %vmovdqa\t{%1, %0|%0, %1}; @@ -475,19 +471,21 @@ [(set_attr type sselog1,ssemov,ssemov) (set_attr prefix maybe_vex) (set (attr mode) - (cond [(match_test TARGET_AVX) + (cond [(and (eq_attr alternative 1,2) + (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL)) +(if_then_else + (match_test GET_MODE_SIZE (MODEmode) 16) + (const_string V8SF) + (const_string V4SF)) + (match_test TARGET_AVX) (const_string sseinsnmode) - (ior (ior (match_test optimize_function_for_size_p (cfun)) -(not (match_test TARGET_SSE2))) + (ior (and (eq_attr alternative 1,2) +(match_test optimize_function_for_size_p (cfun))) (and (eq_attr alternative 2) (match_test TARGET_SSE_TYPELESS_STORES))) (const_string V4SF) - (eq (const_string MODEmode) (const_string V4SFmode)) -(const_string V4SF) - (eq (const_string MODEmode) (const_string V2DFmode)) -(const_string V2DF) ] - (const_string TI)))]) + (const_string sseinsnmode)))]) (define_insn sse2_movq128 [(set (match_operand:V2DI 0 register_operand =x) @@ -597,11 +595,33 @@ [(match_operand:VF 1 nonimmediate_operand xm,x)] UNSPEC_MOVU))] TARGET_SSE !(MEM_P (operands[0]) MEM_P (operands[1])) - %vmovussemodesuffix\t{%1, %0|%0, %1} +{ + switch (get_attr_mode (insn)) +{ +case MODE_V8SF: +case MODE_V4SF: + return %vmovups\t{%1, %0|%0, %1}; +default: + return %vmovussemodesuffix\t{%1, %0|%0, %1}; +} +} [(set_attr type ssemov) (set_attr movu 1) (set_attr prefix maybe_vex) - (set_attr mode MODE)]) + (set (attr mode) + (cond [(match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) +(if_then_else + (match_test GET_MODE_SIZE (MODEmode) 16) + (const_string V8SF) + (const_string V4SF)) + (match_test TARGET_AVX) +(const_string MODE) + (ior (match_test optimize_function_for_size_p (cfun)) + (and (eq_attr alternative 1) +(match_test
Re: [doc] Fix xref in extend.texi
On Wed, May 9, 2012 at 12:50 PM, Manuel López-Ibáñez lopeziba...@gmail.com wrote: It is strange that a paragraph talking about GCC pragmas refers to the section in CPP talking about #indent and other such directives (no #pragma there). Wouldn't it better if it pointed to the section about #pragmas? yes. patch ok. OK? 2012-05-09 Manuel López-Ibáñez m...@gcc.gnu.org * doc/extend.texi (Function Attributes): Point xref to section about Pragmas. Index: doc/extend.texi === --- doc/extend.texi (revision 187299) +++ doc/extend.texi (working copy) @@ -4102,8 +4102,7 @@ found convenient to use @code{__attribute__} to achieve a natural attachment of attributes to their corresponding declarations, whereas @code{#pragma GCC} is of use for constructs that do not naturally form -part of the grammar. @xref{Other Directives,,Miscellaneous -Preprocessing Directives, cpp, The GNU C Preprocessor}. +part of the grammar. @xref{Pragmas,,Pragmas Accepted by GCC}. @node Attribute Syntax @section Attribute Syntax
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 9, 2012 at 6:32 PM, Xinliang David Li davi...@google.com wrote: On Wed, May 9, 2012 at 1:12 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? The side effect is the user registered malloc hooks. The pattern 'free(malloc(4)', or loops like for (..) { malloc(..); // result not used } itself is not interesting. They only appear in the test code and the problem can be worked around by using -fno-builtin-malloc. The option is to avoid disable this and all similar malloc optimizations (in the future) for malloc implementation with hooks. What is then interesting? The above are all the same as if optimization figured out the storage is not used, no? Richard. Thanks, David Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: case BUILT_IN_ALLOCA_WITH_ALIGN: + if (!flag_eliminate_malloc) + mark_stmt_necessary (stmt, true); return; default:;
[PATCH, i386]: Some further TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL cleanups
Hello! Practically no functional change. 2012-05-09 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (*movdf_internal_rex64): Remove TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling from asm output code. Calculate mode attribute according to TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL flag. (*movdf_internal): Ditto. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 187347) +++ i386.md (working copy) @@ -2953,8 +2953,7 @@ switch (get_attr_mode (insn)) { case MODE_V2DF: - if (!TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return %vmovapd\t{%1, %0|%0, %1}; + return %vmovapd\t{%1, %0|%0, %1}; case MODE_V4SF: return %vmovaps\t{%1, %0|%0, %1}; @@ -3032,7 +3031,8 @@ movaps encodes one byte shorter. */ (eq_attr alternative 8) (cond - [(match_test optimize_function_for_size_p (cfun)) + [(ior (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) +(match_test optimize_function_for_size_p (cfun))) (const_string V4SF) (match_test TARGET_SSE_PARTIAL_REG_DEPENDENCY) (const_string V2DF) @@ -3094,8 +3094,7 @@ switch (get_attr_mode (insn)) { case MODE_V2DF: - if (!TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) - return %vmovapd\t{%1, %0|%0, %1}; + return %vmovapd\t{%1, %0|%0, %1}; case MODE_V4SF: return %vmovaps\t{%1, %0|%0, %1}; @@ -3167,7 +3166,8 @@ movaps encodes one byte shorter. */ (eq_attr alternative 6,10) (cond - [(match_test optimize_function_for_size_p (cfun)) + [(ior (match_test TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL) +(match_test optimize_function_for_size_p (cfun))) (const_string V4SF) (match_test TARGET_SSE_PARTIAL_REG_DEPENDENCY) (const_string V2DF)
[C++ Patch] PR 53158 (EXPR_LOC_OR_HERE version)
Hi again, thus the below is what I booted and tested. As you can see I had to tweak a bit an existing testcase, for which the location is now a bit smaller, 3 chars, in fact is still not perfect, but arguably a tad better. Otherwise no issues. Ah, I'm touching also a c-common.c function, in fact all the other warning* functions in the same file already use the EXPR_LOC_OR_HERE on the expr argument idea. Tested x86_64-linux. Thanks, Paolo. // /cp 2012-05-09 Paolo Carlini paolo.carl...@oracle.com PR c++/53158 * cvt.c (ocp_convert): Error out early for void - bool conversions. * typeck.c (decay_conversion): Use error_at. * call.c (build_integral_nontype_arg_conv, convert_like_real, convert_arg_to_ellipsis, perform_implicit_conversion_flags, initialize_reference): Likewise. * cvt.c (warn_ref_binding): Add location_t parameter. (cp_convert_to_pointer, convert_to_reference, ocp_convert, convert_to_void, ): Use error_at and warning_at. /c-family 2012-05-09 Paolo Carlini paolo.carl...@oracle.com PR c++/53158 * c-common.c (warnings_for_convert_and_check): Use warning_at. /testsuite 2012-05-09 Paolo Carlini paolo.carl...@oracle.com PR c++/53158 * g++.dg/cpp0x/lambda/lambda-err2.C: New. * g++.dg/parse/error26.C: Tweak dg-error column number. Index: c-family/c-common.c === --- c-family/c-common.c (revision 187335) +++ c-family/c-common.c (working copy) @@ -2329,6 +2329,8 @@ conversion_warning (tree type, tree expr) void warnings_for_convert_and_check (tree type, tree expr, tree result) { + location_t loc = EXPR_LOC_OR_HERE (expr); + if (TREE_CODE (expr) == INTEGER_CST (TREE_CODE (type) == INTEGER_TYPE || TREE_CODE (type) == ENUMERAL_TYPE) @@ -2344,8 +2346,8 @@ warnings_for_convert_and_check (tree type, tree ex /* This detects cases like converting -129 or 256 to unsigned char. */ if (!int_fits_type_p (expr, c_common_signed_type (type))) -warning (OPT_Woverflow, - large integer implicitly truncated to unsigned type); +warning_at (loc, OPT_Woverflow, + large integer implicitly truncated to unsigned type); else conversion_warning (type, expr); } @@ -2357,16 +2359,16 @@ warnings_for_convert_and_check (tree type, tree ex (TREE_CODE (TREE_TYPE (expr)) != INTEGER_TYPE || TYPE_PRECISION (TREE_TYPE (expr)) != TYPE_PRECISION (type))) - warning (OPT_Woverflow, -overflow in implicit constant conversion); + warning_at (loc, OPT_Woverflow, + overflow in implicit constant conversion); else conversion_warning (type, expr); } else if ((TREE_CODE (result) == INTEGER_CST || TREE_CODE (result) == FIXED_CST) TREE_OVERFLOW (result)) -warning (OPT_Woverflow, - overflow in implicit constant conversion); +warning_at (loc, OPT_Woverflow, + overflow in implicit constant conversion); else conversion_warning (type, expr); } Index: testsuite/g++.dg/parse/error26.C === --- testsuite/g++.dg/parse/error26.C(revision 187335) +++ testsuite/g++.dg/parse/error26.C(working copy) @@ -4,7 +4,7 @@ void foo() { if (({int c[2];})) ; // { dg-error 7:ISO C.. forbids 7 } - // { dg-error 20:could not convert 20 { target *-*-* } 6 } + // { dg-error 17:could not convert 17 { target *-*-* } 6 } } void bar() Index: testsuite/g++.dg/cpp0x/lambda/lambda-err2.C === --- testsuite/g++.dg/cpp0x/lambda/lambda-err2.C (revision 0) +++ testsuite/g++.dg/cpp0x/lambda/lambda-err2.C (revision 0) @@ -0,0 +1,12 @@ +// PR c++/53158 +// { dg-do compile { target c++11 } } + +int main() +{ + auto a = []() { return true; }; + auto b = []() { return a(); }; // { dg-error 'a' is not captured } + int c, d; + while (b() c d) // { dg-error could not convert } +{ +} +} Index: cp/typeck.c === --- cp/typeck.c (revision 187335) +++ cp/typeck.c (working copy) @@ -1822,6 +1822,7 @@ decay_conversion (tree exp, tsubst_flags_t complai { tree type; enum tree_code code; + location_t loc = EXPR_LOC_OR_HERE (exp); type = TREE_TYPE (exp); if (type == error_mark_node) @@ -1853,7 +1854,7 @@ decay_conversion (tree exp, tsubst_flags_t complai if (code == VOID_TYPE) { if (complain tf_error) - error (void value not ignored as it ought to be); + error_at (loc, void value not ignored as it ought to be); return error_mark_node; } if (invalid_nonstatic_memfn_p (exp, complain)) @@ -1882,7
Re: Symbol table 19/many: cleanup varpool/front-end/varasm interactions
Jan, This patch is causing a bootstrap failure on AIX. It generates linker errors that TOC symbols are not emitted: ld: 0711-317 ERROR: Undefined symbol: LC..0 ld: 0711-317 ERROR: Undefined symbol: LC..1 ld: 0711-317 ERROR: Undefined symbol: LC..2 ld: 0711-317 ERROR: Undefined symbol: LC..3 ld: 0711-317 ERROR: Undefined symbol: LC..4 Although pLinux uses the same code, GCC is able to bootstrap on Linux. The function in question is gcc/config/rs6000/rs6000.c:output_toc() Do you have any suggestions on what is missing in the label allocation that causes your cgraph change to believe the symbols are unreferenced? I have opened PR bootstrap/53300. Thanks, David
Re: [PATCH] Optimize byte_from_pos, pos_from_bit
This optimizes byte_from_pos and pos_from_bit by noting that we operate on sizes whose computations have no intermediate (or final) overflow. This is the single patch necessary to get Ada to bootstrap and test with TYPE_IS_SIZETYPE removed. Rather than amending size_binop (my original plan) I chose to optimize the above two commonly used accessors. Conveniently normalize_offset can be re-written to use pos_from_bit instead of inlinig it. I also took the liberty to document the functions (sic). Nice, thanks. Could you add a blurb, in the head comment of the first function in which you operate under the no-overflow assumption, stating this fact and why this is necessary (an explicit mention of Ada isn't forbidden ;-), as well as a cross-reference to it in the head comment of the other function(s). Any comments? Would you like different factoring of the optimization (I considered adding a byte_from_bitpos)? Any idea why byte_from_pos is using TRUNC_DIV_EXPR (only positive offsets?) and pos_from_bit FLOOR_DIV_EXPR (also negative offsets?) - that seems inconsistent, and we fold FLOOR_DIV_EXPR of unsigned types (sizetype) to TRUNC_DIV_EXPR anyways. I think that a bit position is treated as non-negative, so bitpos can use TRUNC_DIV_EXPR in byte_from_pos and you need to use FLOOR_MOD_EXPR to get a non-negative *pbitpos in pos_from_bit. Very likely obsolete now indeed. -- Eric Botcazou
Re: [PATCH] Remove TYPE_IS_SIZETYPE
This removes the TYPE_IS_SIZETYPE macro and all its uses (by assuming it returns zero and applying trivial folding). Sizes and bitsizes can still be treat specially by means of knowing what the values represent and by means of using helper functions that assume you are dealing with sizes (in particular size_binop and friends and bit_from_pos, byte_from_pos or pos_from_bit). Fine with me, if you add the blurb I talked about in the other reply. Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages including Ada with the patch optimizing bute_from_pos and pos_from_bit Results on our internal testsuite are clean on x86-64 and almost clean on x86, an exception being: package t is type x (m : natural) is record s : string (1 .. m); r : natural; b : boolean; end record; for x'alignment use 4; pragma Pack (x); end t; Without the patches, compiling the package with -gnatR3 yields: Representation information for unit t (spec) for x'Object_Size use 17179869248; for x'Value_Size use ((#1 + 8) * 8) ; for x'Alignment use 4; for x use record m at 0 range 0 .. 30; s at 4 range 0 .. ((#1 * 8)) - 1; r at bit offset (((#1 + 4) * 8)) size in bits = 31 b at bit offset #1 + 7) * 8) + 7)) size in bits = 1 end record; With the patches, this yields: Representation information for unit t (spec) for x'Object_Size use 17179869248; for x'Value_Size use (((#1 + 7) + 1) * 8) ; for x'Alignment use 4; for x use record m at 0 range 0 .. 30; s at 4 range 0 .. ((#1 * 8)) - 1; r at bit offset (((#1 + 4) * 8)) size in bits = 31 b at bit offset #1 + 7) * 8) + 7)) size in bits = 1 end record; so we have lost a simple folding for x'Value_Size (TYPE_ADA_SIZE field). 2012-05-08 Richard Guenther rguent...@suse.de ada/ * gcc-interface/cuintp.c (UI_From_gnu): Remove TYPE_IS_SIZETYPE use. OK, modulo the formatting: Index: trunk/gcc/ada/gcc-interface/cuintp.c === *** trunk.orig/gcc/ada/gcc-interface/cuintp.c 2011-04-11 17:01:30.0 +0200 --- trunk/gcc/ada/gcc-interface/cuintp.c2012-05-07 16:43:43.497218058 +0200 *** UI_From_gnu (tree Input) *** 178,186 if (host_integerp (Input, 0)) return UI_From_Int (TREE_INT_CST_LOW (Input)); else if (TREE_INT_CST_HIGH (Input) 0 ! TYPE_UNSIGNED (gnu_type) ! !(TREE_CODE (gnu_type) == INTEGER_TYPE ! TYPE_IS_SIZETYPE (gnu_type))) return No_Uint; #endif --- 178,184 if (host_integerp (Input, 0)) return UI_From_Int (TREE_INT_CST_LOW (Input)); else if (TREE_INT_CST_HIGH (Input) 0 ! TYPE_UNSIGNED (gnu_type)) return No_Uint; #endif TYPE_UNSIGNED (gnu_type)) on the same line. -- Eric Botcazou
[PATCH, i386]: Fix PR 52908 - xop-mul-1:f9 miscompiled on bulldozer (-mxop)
Hello! Attached patch fixes PR 52908. There is no need to generate fake multiply instructions after reload, we can expand directly to MAC instructions. This approach even produced better assembly for a couple of testcases in gcc.target/i386 testsuite. 2012-05-09 Uros Bizjak ubiz...@gmail.com PR target/52908 * config/i386/sse.md (vec_widen_smult_hi_v4si): Expand using xop_pmacsdqh insn pattern instead of xop_mulv2div2di3_high. (vec_widen_smult_lo_v4si): Expand using xop_pmacsdql insn pattern instead of xop_mulv2div2di3_low. (xop_pmacsdql): Fix vec_select selector. (xop_pmacsdqh): Ditto. (xop_mulv2div2di3_low): Remove insn_and_split pattern. (xop_mulv2div2di3_high): Ditto. testsuite/ChangeLog: PR target/52908 * gcc.target/i386/xop-imul32widen-vector.c: Update scan-assembler directive to Scan for vpmuldq, not vpmacsdql. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, and tested on XOP target by Venkataramanan. Patch was committed to mainline SVN. The version, attached to the PR should be backported to other release branches, but another volunteer should do the backport, since I don't have access to XOP target. Also, please note that XOP horizontal add/subtract instructions (and possibly others) have vec_select parallel RTX in wrong endiannes. Uros. Index: config/i386/sse.md === --- config/i386/sse.md (revision 187347) +++ config/i386/sse.md (working copy) @@ -5748,11 +5748,15 @@ if (TARGET_XOP) { + rtx t3 = gen_reg_rtx (V2DImode); + emit_insn (gen_sse2_pshufd_1 (t1, op1, GEN_INT (0), GEN_INT (2), GEN_INT (1), GEN_INT (3))); emit_insn (gen_sse2_pshufd_1 (t2, op2, GEN_INT (0), GEN_INT (2), GEN_INT (1), GEN_INT (3))); - emit_insn (gen_xop_mulv2div2di3_high (operands[0], t1, t2)); + emit_move_insn (t3, CONST0_RTX (V2DImode)); + + emit_insn (gen_xop_pmacsdqh (operands[0], t1, t2, t3)); DONE; } @@ -5777,11 +5781,15 @@ if (TARGET_XOP) { + rtx t3 = gen_reg_rtx (V2DImode); + emit_insn (gen_sse2_pshufd_1 (t1, op1, GEN_INT (0), GEN_INT (2), GEN_INT (1), GEN_INT (3))); emit_insn (gen_sse2_pshufd_1 (t2, op2, GEN_INT (0), GEN_INT (2), GEN_INT (1), GEN_INT (3))); - emit_insn (gen_xop_mulv2div2di3_low (operands[0], t1, t2)); + emit_move_insn (t3, CONST0_RTX (V2DImode)); + + emit_insn (gen_xop_pmacsdql (operands[0], t1, t2, t3)); DONE; } @@ -9792,11 +9800,11 @@ (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 nonimmediate_operand %x) - (parallel [(const_int 1) (const_int 3)]))) + (parallel [(const_int 0) (const_int 2)]))) (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 2 nonimmediate_operand xm) - (parallel [(const_int 1) (const_int 3)] + (parallel [(const_int 0) (const_int 2)] (match_operand:V2DI 3 nonimmediate_operand x)))] TARGET_XOP vpmacsdql\t{%3, %2, %1, %0|%0, %1, %2, %3} @@ -9810,93 +9818,17 @@ (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 nonimmediate_operand %x) - (parallel [(const_int 0) (const_int 2)]))) + (parallel [(const_int 1) (const_int 3)]))) (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 2 nonimmediate_operand xm) - (parallel [(const_int 0) (const_int 2)] + (parallel [(const_int 1) (const_int 3)] (match_operand:V2DI 3 nonimmediate_operand x)))] TARGET_XOP vpmacsdqh\t{%3, %2, %1, %0|%0, %1, %2, %3} [(set_attr type ssemuladd) (set_attr mode TI)]) -;; We don't have a straight 32-bit parallel multiply and extend on XOP, so -;; fake it with a multiply/add. In general, we expect the define_split to -;; occur before register allocation, so we have to handle the corner case where -;; the target is the same as operands 1/2 -(define_insn_and_split xop_mulv2div2di3_low - [(set (match_operand:V2DI 0 register_operand =x) - (mult:V2DI - (sign_extend:V2DI - (vec_select:V2SI - (match_operand:V4SI 1 register_operand %x) - (parallel [(const_int 1) (const_int 3)]))) - (sign_extend:V2DI - (vec_select:V2SI - (match_operand:V4SI 2 nonimmediate_operand xm) - (parallel [(const_int 1) (const_int 3)])] - TARGET_XOP - # - reload_completed - [(set (match_dup 0) - (match_dup 3)) - (set (match_dup 0) - (plus:V2DI -(mult:V2DI - (sign_extend:V2DI - (vec_select:V2SI - (match_dup 1) - (parallel [(const_int 1) (const_int 3)]))) -
[patch] Fix LTO regression in Ada
Hi, this is a regression present on mainline and 4.7 branch. On the attached testcase, the compiler aborts in LTO mode with: eric@atlantis:~/build/gcc/native32 gcc/xgcc -Bgcc -S lto11.adb -O -flto +===GNAT BUG DETECTED==+ | 4.8.0 20120506 (experimental) [trunk revision 187216] (i586-suse-linux) | tree code 'call_expr' is not supported in LTO streams The problem is that the Ada compiler started to use DECL_ORIGINAL_TYPE in 4.7.x and the type in this field can have arbitrary expressions as TYPE_SIZE, for example expressions with CALL_EXPRs. Now the type is both not gimplified and streamed in LTO mode, so the CALL_EXPRs are sent to the streamer as-is. The immediate solution would be not to stream DECL_ORIGINAL_TYPE (and clear it in free_lang_data_in_decl), but this yields a regression in C++ with -flto -g (ICE in splice_child_die). Therefore, the patch implements the alternate solution of gimplifying DECL_ORIGINAL_TYPE. Bootstrapped/regtested on x86_64-suse-linux, OK for mainline and 4.7 branch? 2012-05-09 Eric Botcazou ebotca...@adacore.com * gimplify.c (gimplify_decl_expr): For a TYPE_DECL, gimplify the DECL_ORIGINAL_TYPE if it is present. 2012-05-09 Eric Botcazou ebotca...@adacore.com * gnat.dg/lto11.ad[sb]: New test. -- Eric Botcazou -- { dg-do compile } -- { dg-options -flto { target lto } } with Ada.Streams; use Ada.Streams; package body Lto11 is procedure Write (S : not null access Root_Stream_Type'Class; V : Vector) is subtype M_SEA is Stream_Element_Array (1 .. V'Size / Stream_Element'Size); Bytes : M_SEA; for Bytes'Address use V'Address; pragma Import (Ada, Bytes); begin Ada.Streams.Write (S.all, Bytes); end; end Lto11; with Ada.Streams; use Ada.Streams; package Lto11 is type Vector is array (Positive range ) of Float; procedure Write (S : not null access Root_Stream_Type'Class; V : Vector); end Lto11; diff --git a/gcc/gimplify.c b/gcc/gimplify.c index c4792e6..f69e773 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -1448,6 +1448,11 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p) !TYPE_SIZES_GIMPLIFIED (TREE_TYPE (decl))) gimplify_type_sizes (TREE_TYPE (decl), seq_p); + if (TREE_CODE (decl) == TYPE_DECL + DECL_ORIGINAL_TYPE (decl) + !TYPE_SIZES_GIMPLIFIED (DECL_ORIGINAL_TYPE (decl))) +gimplify_type_sizes (DECL_ORIGINAL_TYPE (decl), seq_p); + if (TREE_CODE (decl) == VAR_DECL !DECL_EXTERNAL (decl)) { tree init = DECL_INITIAL (decl);
Re: [RFC] PR 53063 encode group options in .opt files
On 9 May 2012 00:38, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 9 May 2012, Manuel López-Ibáñez wrote: which looks correct to me. However, the build fails because now options.h requires input.h which requires line-map.h, which is not included when building for example libgcc. options.h is included by tm.h, so it basically appears everywhere. Any suggestions how to fix this? options.h already has some #if !defined(IN_LIBGCC2) !defined(IN_TARGET_LIBS) !defined(IN_RTS) conditionals, so you could arrange for some more such conditionals to be generated. This works great for libgcc, however, ada's runtime library is compiled with IN_GCC and includes options.h through tm.h, so it breaks: ../../xgcc -B../../ -c -DIN_GCC -g -O2 -W -Wall \ -iquote /home/manuel/test2/src/gcc \ -iquote . -iquote .. -iquote ../.. -iquote /home/manuel/test2/src/gcc/ada -iquote /home/manuel/test2/src/gcc -I/home/manuel/test2/src/gcc/../include \ ../rts/targext.c -o targext.o In file included from ../../options.h:3716:0, from ../../tm.h:19, from ../rts/targext.c:50: /home/manuel/test2/src/gcc/input.h:25:22: fatal error: line-map.h: No such file or directory #include line-map.h ^ That tm.h includes the whole options.h is bad, but I don't want to mess with it. So I am instead only including input.h if options.h was not included from tm.h. This requires surprisingly few changes. The other major issue is that if -Wunused is enabled by -Wall, it has to be enabled by handle_generated_option, so I need to modify every front-end that does that . In the future, we should also generate per-FE _auto function, but I didn't want to make this patch even larger. Bootstrapped and regression tested. OK? 2012-05-09 Manuel López-Ibáñez m...@gcc.gnu.org PR 53063 gcc/ * doc/options.texi (EnabledBy): Document * opts.c: Include opts.h and options.h before tm.h. (finish_options): Do not handle some sub-options here... (common_handle_option): ... instead call common_handle_option_auto here. * optc-gen.awk: Handle EnabledBy. * opth-gen.awk: Declare common_handle_option_auto. * common.opt (Wuninitialized): Use EnabledBy. Delete Init. (Wmaybe-uninitialized): Likewise. (Wunused-but-set-variable): Likewise. (Wunused-function): Likewise. (Wunused-label): Likewise. (Wunused-value): Likewise. (Wunused-variable): Likewise. * opt-read.awk: Create opt_numbers array. ada/ * gcc-interface/misc.c (gnat_parse_file): Move before ... (gnat_handle_option): ... this. Use handle_generated_option. c-family/ * c-opts.c (c_common_handle_option): Use handle_generated_option to enable sub-options. fortran/ * options.c: Include diagnostics.h instead of diagnostics-core.h. (set_Wall): Do not see warn_unused here. (gfc_handle_option): Set it here using handle_generated_option. group-options-2.diff Description: Binary data
Re: Bug 53289 - unnecessary repetition of caret diagnostics
On Wed, May 9, 2012 at 4:02 PM, Manuel López-Ibáñez lopeziba...@gmail.com wrote: Simple enough. Bootstrapped and regression tested. The output for the example in the PR is now: /home/manuel/caret-overload.C:6:6: error: no matching function for call to ‘g(int)’ g(1); ^ /home/manuel/caret-overload.C:6:6: note: candidate is: /home/manuel/caret-overload.C:2:18: note: templateclass T typename T::type g(T) typename T::type g(T); ^ /home/manuel/caret-overload.C:2:18: note: template argument deduction/substitution failed: /home/manuel/caret-overload.C: In substitution of ‘templateclass T typename T::type g(T) [with T = int]’: /home/manuel/caret-overload.C:6:6: required from here /home/manuel/caret-overload.C:2:18: error: ‘int’ is not a class, struct, or union type OK? OK. Thanks, -- Gaby
Re: [PATCH] Add -feliminate-malloc to enable/disable elimination of redundant malloc/free pairs
On Wed, May 9, 2012 at 11:21 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, May 9, 2012 at 6:32 PM, Xinliang David Li davi...@google.com wrote: On Wed, May 9, 2012 at 1:12 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 8, 2012 at 6:18 PM, Xinliang David Li davi...@google.com wrote: To be clear, this flag is for malloc implementation (such as tcmalloc) with side effect unknown to the compiler. Using -fno-builtin-xxx is too conservative for that purpose. I don't think that flys. Btw, the patch also guards alloca - alloca is purely GCC internal. What's the unknown side-effects that are also important to preserve for free(malloc(4))? The side effect is the user registered malloc hooks. The pattern 'free(malloc(4)', or loops like for (..) { malloc(..); // result not used } itself is not interesting. They only appear in the test code and the problem can be worked around by using -fno-builtin-malloc. The option is to avoid disable this and all similar malloc optimizations (in the future) for malloc implementation with hooks. What is then interesting? The above are all the same as if optimization figured out the storage is not used, no? Compiler may choose to convert malloc call into alloca or coalesce multiple malloc calls -- but this may not always be the best choice given that more advanced locality based heap allocation is possible. Nothing like this exist yet, but the point is that it is convenient to have a way to keep malloc call while keeping its aliasing property. David Richard. Thanks, David Richard. David On Tue, May 8, 2012 at 7:43 AM, Dehao Chen de...@google.com wrote: Hello, This patch adds a flag to guard the optimization that optimize the following code away: free (malloc (4)); In some cases, we'd like this type of malloc/free pairs to remain in the optimized code. Tested with bootstrap, and no regression in the gcc testsuite. Is it ok for mainline? Thanks, Dehao gcc/ChangeLog 2012-05-08 Dehao Chen de...@google.com * common.opt (feliminate-malloc): New. * doc/invoke.texi: Document it. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Honor it. gcc/testsuite/ChangeLog 2012-05-08 Dehao Chen de...@google.com * gcc.dg/free-malloc.c: Check if -fno-eliminate-malloc is working as expected. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 187277) +++ gcc/doc/invoke.texi (working copy) @@ -360,7 +360,8 @@ -fcx-limited-range @gol -fdata-sections -fdce -fdelayed-branch @gol -fdelete-null-pointer-checks -fdevirtualize -fdse @gol --fearly-inlining -fipa-sra -fexpensive-optimizations -ffat-lto-objects @gol +-fearly-inlining -feliminate-malloc -fipa-sra -fexpensive-optimizations @gol +-ffat-lto-objects @gol -ffast-math -ffinite-math-only -ffloat-store -fexcess-precision=@var{style} @gol -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol @@ -6238,6 +6239,7 @@ -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-feliminate-malloc @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol @@ -6762,6 +6764,11 @@ Perform dead store elimination (DSE) on RTL@. Enabled by default at @option{-O} and higher. +@item -feliminate-malloc +@opindex feliminate-malloc +Eliminate unnecessary malloc/free pairs. +Enabled by default at @option{-O} and higher. + @item -fif-conversion @opindex fif-conversion Attempt to transform conditional jumps into branch-less equivalents. This Index: gcc/testsuite/gcc.dg/free-malloc.c === --- gcc/testsuite/gcc.dg/free-malloc.c (revision 0) +++ gcc/testsuite/gcc.dg/free-malloc.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-eliminate-malloc } */ +/* { dg-final { scan-assembler-times malloc 2} } */ +/* { dg-final { scan-assembler-times free 2} } */ + +extern void * malloc (unsigned long); +extern void free (void *); + +void test () +{ + free (malloc (10)); +} Index: gcc/common.opt === --- gcc/common.opt (revision 187277) +++ gcc/common.opt (working copy) @@ -1474,6 +1474,10 @@ Common Var(flag_dce) Init(1) Optimization Use the RTL dead code elimination pass +feliminate-malloc +Common Var(flag_eliminate_malloc) Init(1) Optimization +Eliminate unnecessary malloc/free pairs + fdse Common Var(flag_dse) Init(1) Optimization Use the RTL dead store elimination pass Index: gcc/tree-ssa-dce.c === --- gcc/tree-ssa-dce.c (revision 187277) +++ gcc/tree-ssa-dce.c (working copy) @@ -309,6 +309,8 @@ case
[C++ Patch] PR 53301
Hi, shame on me. I think the patch almost qualifies as obvious. Tested x86_64-linux. Thanks, Paolo. // /cp 2012-05-10 Paolo Carlini paolo.carl...@oracle.com PR c++/53301 * decl.c (check_default_argument): Fix typo (POINTER_TYPE_P instead of TYPE_PTR_P) in zero-as-null-pointer-constant warning. /testsuite 2012-05-10 Paolo Carlini paolo.carl...@oracle.com PR c++/53301 * g++.dg/warn/Wzero-as-null-pointer-constant-6.C: New. Index: testsuite/g++.dg/warn/Wzero-as-null-pointer-constant-6.C === --- testsuite/g++.dg/warn/Wzero-as-null-pointer-constant-6.C(revision 0) +++ testsuite/g++.dg/warn/Wzero-as-null-pointer-constant-6.C(revision 0) @@ -0,0 +1,6 @@ +// PR c++/53301 +// { dg-options -Wzero-as-null-pointer-constant } + +class x { public: x(int v) {} }; + +void foo(const x = 0); Index: cp/decl.c === --- cp/decl.c (revision 187335) +++ cp/decl.c (working copy) @@ -10619,7 +10619,7 @@ check_default_argument (tree decl, tree arg) if (warn_zero_as_null_pointer_constant c_inhibit_evaluation_warnings == 0 - (POINTER_TYPE_P (decl_type) || TYPE_PTR_TO_MEMBER_P (decl_type)) + (TYPE_PTR_P (decl_type) || TYPE_PTR_TO_MEMBER_P (decl_type)) null_ptr_cst_p (arg) !NULLPTR_TYPE_P (TREE_TYPE (arg))) {
Go patch committed: Add -fgo-pkgpath option
This patch to the Go frontend adds a new command line option: -fgo-pkgpath. This permits setting the string that is returned by the PkgPath() method of reflect.Type for types defined in the package being compiled. It also sets the prefix used for symbol names. This largely replaces the existing -fgo-prefix option, although I retained -fgo-prefix for backward compatibility. This will permit libgo to have PkgPath values that are the same as the ones returned by the other Go compiler. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.7 branch. Ian 2012-05-09 Ian Lance Taylor i...@google.com * lang.opt: Add -fgo-pkgpath. * go-lang.c (go_pkgpath): New static variable. (go_prefix): New static variable. (go_langhook_init): Pass go_pkgpath and go_prefix to go_create_gogo. (go_langhook_handle_option): Handle -fgo-pkgpath. Change -fgo-prefix handling to just set go_prefix. * go-c.h (go_set_prefix): Don't declare. (go_create_gogo): Add pkgpath and prefix to declaration. * go-gcc.cc (Gcc_backend::global_variable): Change unique_prefix to pkgpath. Don't include the package name in the asm name. * gccgo.texi (Invoking gccgo): Document -fgo-pkgpath. Update the docs for -fgo-prefix. Index: gcc/go/lang.opt === --- gcc/go/lang.opt (revision 187020) +++ gcc/go/lang.opt (working copy) @@ -53,6 +53,10 @@ fgo-optimize- Go Joined RejectNegative -fgo-optimize-type Turn on optimization passes in the frontend +fgo-pkgpath= +Go Joined RejectNegative +-fgo-pkgpath=string Set Go package path + fgo-prefix= Go Joined RejectNegative -fgo-prefix=string Set package-specific prefix for exported Go names Index: gcc/go/go-gcc.cc === --- gcc/go/go-gcc.cc (revision 187020) +++ gcc/go/go-gcc.cc (working copy) @@ -271,7 +271,7 @@ class Gcc_backend : public Backend Bvariable* global_variable(const std::string package_name, - const std::string unique_prefix, + const std::string pkgpath, const std::string name, Btype* btype, bool is_external, @@ -1281,7 +1281,7 @@ Gcc_backend::non_zero_size_type(tree typ Bvariable* Gcc_backend::global_variable(const std::string package_name, - const std::string unique_prefix, + const std::string pkgpath, const std::string name, Btype* btype, bool is_external, @@ -1310,9 +1310,9 @@ Gcc_backend::global_variable(const std:: { TREE_PUBLIC(decl) = 1; - std::string asm_name(unique_prefix); + std::string asm_name(pkgpath); asm_name.push_back('.'); - asm_name.append(var_name); + asm_name.append(name); SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(asm_name)); } TREE_USED(decl) = 1; Index: gcc/go/gccgo.texi === --- gcc/go/gccgo.texi (revision 187020) +++ gcc/go/gccgo.texi (working copy) @@ -157,14 +157,32 @@ compile time. When linking, specify a library search directory, as with @command{gcc}. +@item -fgo-pkgpath=@var{string} +@cindex @option{-fgo-pkgpath} +Set the package path to use. This sets the value returned by the +PkgPath method of reflect.Type objects. It is also used for the names +of globally visible symbols. The argument to this option should +normally be the string that will be used to import this package after +it has been installed; in other words, a pathname within the +directories specified by the @option{-I} option. + @item -fgo-prefix=@var{string} @cindex @option{-fgo-prefix} +An alternative to @option{-fgo-pkgpath}. The argument will be +combined with the package name from the source file to produce the +package path. If @option{-fgo-pkgpath} is used, @option{-fgo-prefix} +will be ignored. + Go permits a single program to include more than one package with the -same name. This option is required to make this work with -@command{gccgo}. The argument to this option may be any string. Each -package with the same name must use a distinct @option{-fgo-prefix} -option. The argument is typically the full path under which the -package will be installed, as that must obviously be unique. +same name in the @code{package} clause in the source file, though +obviously the two packages must be imported using different pathnames. +In order for this to work with @command{gccgo}, either +@option{-fgo-pkgpath} or @option{-fgo-prefix} must be specified when +compiling a package. + +Using either @option{-fgo-pkgpath} or @option{-fgo-prefix} disables +the special treatment of the @code{main} package and permits that +package to be imported like any other. @item -frequire-return-statement @itemx -fno-require-return-statement Index: gcc/go/gofrontend/gogo.cc
Re: [PATCH] Add option for dumping to stderr (issue6190057)
Thanks for your suggestions/comments. I have updated the patch and documentation. It supports the following usage: gcc -fdump-tree-all=tree.dump -fdump-tree-pre=stdout -fdump-rtl-ira=ira.dump Here all tree dumps except the PRE are output into tree.dump, PRE dump goes to stdout and the IRA dump goes to ira.dump. Thanks, Sharad 2012-05-09 Sharad Singhai sing...@google.com * doc/invoke.texi: Add documentation for the new option. * tree-dump.c (dump_get_standard_stream): New function. (dump_files): Update for new field. (dump_switch_p_1): Handle dump filenames. (dump_begin): Likewise. (get_dump_file_name): Likewise. (dump_end): Remove attribute. (dump_enable_all): Add new parameter FILENAME. All callers updated. (enable_rtl_dump_file): * tree-pass.h (enum tree_dump_index): Add new constant. (struct dump_file_info): Add new field FILENAME. * testsuite/g++.dg/other/dump-filename-1.C: New test. Index: doc/invoke.texi === --- doc/invoke.texi (revision 187265) +++ doc/invoke.texi (working copy) @@ -5322,20 +5322,23 @@ Here are some examples showing uses of these optio @item -d@var{letters} @itemx -fdump-rtl-@var{pass} +@itemx -fdump-rtl-@var{pass}=@var{filename} @opindex d Says to make debugging dumps during compilation at times specified by @var{letters}. This is used for debugging the RTL-based passes of the compiler. The file names for most of the dumps are made by appending a pass number and a word to the @var{dumpname}, and the files are -created in the directory of the output file. Note that the pass -number is computed statically as passes get registered into the pass -manager. Thus the numbering is not related to the dynamic order of -execution of passes. In particular, a pass installed by a plugin -could have a number over 200 even if it executed quite early. -@var{dumpname} is generated from the name of the output file, if -explicitly specified and it is not an executable, otherwise it is the -basename of the source file. These switches may have different effects -when @option{-E} is used for preprocessing. +created in the directory of the output file. If the +@option{=@var{filename}} is appended to the longer form of the dump +option then the dump is done on that file instead of numbered +files. Note that the pass number is computed statically as passes get +registered into the pass manager. Thus the numbering is not related +to the dynamic order of execution of passes. In particular, a pass +installed by a plugin could have a number over 200 even if it executed +quite early. @var{dumpname} is generated from the name of the output +file, if explicitly specified and it is not an executable, otherwise +it is the basename of the source file. These switches may have +different effects when @option{-E} is used for preprocessing. Debug dumps can be enabled with a @option{-fdump-rtl} switch or some @option{-d} option @var{letters}. Here are the possible @@ -5719,15 +5722,18 @@ counters for each function compiled. @item -fdump-tree-@var{switch} @itemx -fdump-tree-@var{switch}-@var{options} +@itemx -fdump-tree-@var{switch}-@var{options}=@var{filename} @opindex fdump-tree Control the dumping at various stages of processing the intermediate language tree to a file. The file name is generated by appending a switch specific suffix to the source file name, and the file is -created in the same directory as the output file. If the -@samp{-@var{options}} form is used, @var{options} is a list of -@samp{-} separated options which control the details of the dump. Not -all options are applicable to all dumps; those that are not -meaningful are ignored. The following options are available +created in the same directory as the output file. In case of +@option{=@var{filename}} option, the dump is output on the given file +name instead. If the @samp{-@var{options}} form is used, +@var{options} is a list of @samp{-} separated options which control +the details or location of the dump. Not all options are applicable +to all dumps; those that are not meaningful are ignored. The +following options are available @table @samp @item address @@ -5765,9 +5771,46 @@ Enable showing the tree dump for each statement. Enable showing the EH region number holding each statement. @item scev Enable showing scalar evolution analysis details. +@item slim +Inhibit dumping of members of a scope or body of a function merely +because that scope has been reached. Only dump such items when they +are directly reachable by some other path. When dumping pretty-printed +trees, this option inhibits dumping the bodies of control structures. +@item =@var{filename} +Instead of using an auto generated dump file name, use the given file +name. The file names @file{stdout} and @file{stderr} are treated +specially and are considered
[h8300] increase dwarf address size
H8/300 cpus have a larger-than-64k address space, despite 16-bit pointers. OK to apply? Ok for 4.7 branch? See also http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48231 * config/h8300/h8300.h (DWARF2_ADDR_SIZE): Define as 4 bytes. Index: h8300.h === --- h8300.h (revision 187362) +++ h8300.h (working copy) @@ -126,12 +126,13 @@ extern const char * const *h8_reg_names; /* The return address is pushed on the stack. */ #define INCOMING_RETURN_ADDR_RTX gen_rtx_MEM (Pmode, gen_rtx_REG (Pmode, STACK_POINTER_REGNUM)) #define INCOMING_FRAME_SP_OFFSET (POINTER_SIZE / 8) #define DWARF_CIE_DATA_ALIGNMENT 2 +#define DWARF2_ADDR_SIZE 4 /* Define this if addresses of constant functions shouldn't be put through pseudo regs where they can be cse'd. Desirable on machines where ordinary constants are expensive but a CALL with constant address is cheap.
Re: [PATCH] Add option for dumping to stderr (issue6190057)
Bummer. I was thinking to reserve '=' for selective dumping: -fdump-tree-pre=func_list_regexp I guess this can be achieved via @ -fdump-tree-pre@func_list -fdump-tree-pre=file_name@func_list Another issue -- I don't think the current precedence rule is correct. Consider that -fopt-info=2 will be mapped to -fdump-tree-all-transform-verbose2=stderr -fdump-rtl-all-transform-verbose2=stderr then the current precedence rule will cause surprise when the following is used -fopt-info -fdump-tree-pre The PRE dump will be emitted to stderr which is not what user wants. In short, special streams should be treated as 'weak' the same way as your previous implementation. thanks, David On Wed, May 9, 2012 at 4:56 PM, Sharad Singhai sing...@google.com wrote: Thanks for your suggestions/comments. I have updated the patch and documentation. It supports the following usage: gcc -fdump-tree-all=tree.dump -fdump-tree-pre=stdout -fdump-rtl-ira=ira.dump Here all tree dumps except the PRE are output into tree.dump, PRE dump goes to stdout and the IRA dump goes to ira.dump. Thanks, Sharad 2012-05-09 Sharad Singhai sing...@google.com * doc/invoke.texi: Add documentation for the new option. * tree-dump.c (dump_get_standard_stream): New function. (dump_files): Update for new field. (dump_switch_p_1): Handle dump filenames. (dump_begin): Likewise. (get_dump_file_name): Likewise. (dump_end): Remove attribute. (dump_enable_all): Add new parameter FILENAME. All callers updated. (enable_rtl_dump_file): * tree-pass.h (enum tree_dump_index): Add new constant. (struct dump_file_info): Add new field FILENAME. * testsuite/g++.dg/other/dump-filename-1.C: New test. Index: doc/invoke.texi === --- doc/invoke.texi (revision 187265) +++ doc/invoke.texi (working copy) @@ -5322,20 +5322,23 @@ Here are some examples showing uses of these optio @item -d@var{letters} @itemx -fdump-rtl-@var{pass} +@itemx -fdump-rtl-@var{pass}=@var{filename} @opindex d Says to make debugging dumps during compilation at times specified by @var{letters}. This is used for debugging the RTL-based passes of the compiler. The file names for most of the dumps are made by appending a pass number and a word to the @var{dumpname}, and the files are -created in the directory of the output file. Note that the pass -number is computed statically as passes get registered into the pass -manager. Thus the numbering is not related to the dynamic order of -execution of passes. In particular, a pass installed by a plugin -could have a number over 200 even if it executed quite early. -@var{dumpname} is generated from the name of the output file, if -explicitly specified and it is not an executable, otherwise it is the -basename of the source file. These switches may have different effects -when @option{-E} is used for preprocessing. +created in the directory of the output file. If the +@option{=@var{filename}} is appended to the longer form of the dump +option then the dump is done on that file instead of numbered +files. Note that the pass number is computed statically as passes get +registered into the pass manager. Thus the numbering is not related +to the dynamic order of execution of passes. In particular, a pass +installed by a plugin could have a number over 200 even if it executed +quite early. @var{dumpname} is generated from the name of the output +file, if explicitly specified and it is not an executable, otherwise +it is the basename of the source file. These switches may have +different effects when @option{-E} is used for preprocessing. Debug dumps can be enabled with a @option{-fdump-rtl} switch or some @option{-d} option @var{letters}. Here are the possible @@ -5719,15 +5722,18 @@ counters for each function compiled. @item -fdump-tree-@var{switch} @itemx -fdump-tree-@var{switch}-@var{options} +@itemx -fdump-tree-@var{switch}-@var{options}=@var{filename} @opindex fdump-tree Control the dumping at various stages of processing the intermediate language tree to a file. The file name is generated by appending a switch specific suffix to the source file name, and the file is -created in the same directory as the output file. If the -@samp{-@var{options}} form is used, @var{options} is a list of -@samp{-} separated options which control the details of the dump. Not -all options are applicable to all dumps; those that are not -meaningful are ignored. The following options are available +created in the same directory as the output file. In case of +@option{=@var{filename}} option, the dump is output on the given file +name instead. If the @samp{-@var{options}} form is used, +@var{options} is a list of @samp{-} separated options which control +the details or location of the dump. Not all
[Patch / RFC] Improving more locations for binary expressions
Hi, I'm looking into making more progress on those locations, for another class of cases. Consider: struct T { }; T foo(); void bar(int a, int b) { if (foo() a b) ; } thus, in this case, we have a class type T, instead of void. The error message is: a.cc: In function ‘void bar(int, int)’: a.cc:7:20: error: no match for ‘operator’ (operand types are ‘T’ and ‘bool’) if (foo() a b) ^ a.cc:7:20: note: candidate is: a.cc:7:20: note: operator(bool, bool) built-in if (foo() a b) ^ a.cc:7:20: note: no known conversion for argument 1 from ‘T’ to ‘bool’ in case my message ends up garbled, the carets do not point to (column 13), two times point to b (column 20), which is obviously wrong. In other terms, all the columns are 20, all wrong. Now, a first step toward fixing this is improving on the work I did some days ago on cp_parser_binary_expression: make sure that the location we are eventually passing to build_x_binary_op is always, provably, the location of the operator. It seems to me that In order to achieve that, the right thing to do is always handling tree_type and loc (the location of the operator) together, per the attached patch_parser (the initialization of loc to input_location is just to prevent uninitialized warnings) Then, with patch_parser applied (it passes testing), we end up with a correct first caret for the testcase above, that for no match, but the second one remains wrong. What happens is that the location is correct up to the print_z_candidates call. It gets a correct location as argument, and prints the candidate is message with the right column, 13. Then it calls print_z_candidate without passing a location, which, in turn: - Prints the built-in message with input_location as location, which is 20, wrong. - Then calls print_conversion_rejection passing location_of (candidate-fn) which is again 20, again wrong. Now, if I change print_z_candidate to get the location_t from print_z_candidates and use it for both the last two outputs, I finally get: a.cc: In function ‘void bar(int, int)’: a.cc:7:13: error: no match for ‘operator’ (operand types are ‘T’ and ‘bool’) if (foo() a b) ^ a.cc:7:13: note: candidate is: a.cc:7:13: note: operator(bool, bool) built-in a.cc:7:13: note: no known conversion for argument 1 from ‘T’ to ‘bool’ again, in case the text is garbled, all the locations seems finally right to me, and, interestingly there is no caret at the end, which seems Ok. Now, what I suspect we are doing wrong here, is that we are not handling separately, in terms of locations, the *builtin* overloads, which should simply use always the location passed from upstream (as I did in my quick experiment) from all the other overloads, for which the current code seems fine. Seems right? The patch for this second half of the issue seems now also rather simple to me. Thanks, Paolo. /// Index: parser.c === --- parser.c(revision 187362) +++ parser.c(working copy) @@ -1621,6 +1621,8 @@ typedef struct cp_parser_expression_stack_entry enum tree_code tree_type; /* Precedence of the binary operation we are parsing. */ enum cp_parser_prec prec; + /* Location of the binary operation we are parsing. */ + location_t loc; } cp_parser_expression_stack_entry; /* The stack for storing partial expressions. We only need NUM_PREC_VALUES @@ -7277,7 +7279,7 @@ cp_parser_binary_expression (cp_parser* parser, bo cp_parser_expression_stack_entry *sp = stack[0]; tree lhs, rhs; cp_token *token; - location_t loc; + location_t loc = input_location; enum tree_code tree_type, lhs_type, rhs_type; enum cp_parser_prec new_prec, lookahead_prec; tree overload; @@ -7290,7 +7292,6 @@ cp_parser_binary_expression (cp_parser* parser, bo { /* Get an operator token. */ token = cp_lexer_peek_token (parser-lexer); - loc = token-location; if (warn_cxx0x_compat token-type == CPP_RSHIFT @@ -7320,6 +7321,7 @@ cp_parser_binary_expression (cp_parser* parser, bo get_rhs: tree_type = binops_by_token[token-type].tree_type; + loc = token-location; /* We used the operator token. */ cp_lexer_consume_token (parser-lexer); @@ -7349,6 +7351,7 @@ cp_parser_binary_expression (cp_parser* parser, bo stack overflows. */ sp-prec = prec; sp-tree_type = tree_type; + sp-loc = loc; sp-lhs = lhs; sp-lhs_type = lhs_type; sp++; @@ -7370,6 +7373,7 @@ cp_parser_binary_expression (cp_parser* parser, bo --sp; prec = sp-prec; tree_type = sp-tree_type; + loc = sp-loc; rhs = lhs; rhs_type = lhs_type; lhs = sp-lhs;
Re: [Patch / RFC] Improving more locations for binary expressions
Paolo Carlini paolo.carl...@oracle.com writes: in case my message ends up garbled, the carets do not point to (column 13), two times point to b (column 20), which is obviously wrong. In other terms, all the columns are 20, all wrong. The new caret support does seem to have revealed a bunch of places where the column info in error messages was pretty screwy... I guess nobody paid much attention to it before... :] Should these get reported as bugzilla bugs...? -miles -- The trouble with most people is that they think with their hopes or fears or wishes rather than with their minds. -- Will Durant