date:20161103

Re: [patch,testsuite] Support dg-require-effective-target label_offsets.

2016-11-03 Thread Senthil Kumar Selvaraj


Georg-Johann Lay writes:

> On 28.10.2016 17:58, Mike Stump wrote:
>> On Oct 27, 2016, at 3:16 AM, Georg-Johann Lay  wrote:
>>>
>>> Now imagine some arithmetic like & - &  This might result in
>>> one or two stub addresses, and difference between such addresses is a
>>> complete different thing than the difference between the original labels:
>>> The result might differ in absolute value and in sign, i.e. you can't even
>>> determine whether LAB1 or LAB2 comes first in the generated code as the
>>> order of stubs might differ from the order of respective labels.
>>
>> So, I think this all doesn't matter any.  Every address gs(LAB) fits in
>> 16-bits by definition, and every gs(LAB1) - gs(LAB2) fits into 16 bits and
>
> Yes, you are right.  Label differences could be implemented.  AFAIK there is 
> currently no activity by the Binutils folks to add respective support, so it 
> is 
> somewhat pointless to add that support to avr-gcc...
>
> Bottom line is that label offsets are not supported by avr BE, and the 
> intention of the patch is to reduce FAIL noise in testsuite results because 
> of 
> the missing support.
>
> If a dg-require is not appropriate, should I rather add dg-skip-if to every 
> test case?  I'd still prefer the dg-require solution because if the Binutils 
> will ever support label differences, then the testsuite will automatically 
> catch up with that.
>
>> thus is valid for all 16-bit one contexts.  The fact the order between the
>> stub and the actual code is different is irrelevant, it is a private
>
> Only if the code is not executed; there are several test cases that compute 
> relative placements of labels like:
>
> x(){if(&&<0)x();b:goto*&e:;}
>
> If a test ever relies on the fact that & - & tells anything about the 
> ordering of the labels, the code is about to crash.
>
>> implementation detail of the port, the point is the semantics are fixed and
>> constant and useful.  In deed that there is even a stub is a private
>> implementation detail of the port.  I think the `extra' helpful warning from
>> avr_print_operand_address is wrong and should be removed.  Think of the
>
> The following code simple makes absolutely no sense in the presence of stubs:
> int
> test (int foo)
> {
>static void *dummy[] = { &, & };
>goto *((char *) & - 2 * (foo < 0));
> a:
> b:
>return 0;
> }
>
> It's only a compile test (the original PR66123 is about ICE), but the 
> compiler 
> cannot know that it's just a crazy test case that won't be executed...
>
> When you are offsetting from a stub you might end up *anywhere*, even in some 
> data range.
>
>> label as gs(LAB), not LAB, burn LAB from your mind.  Once you do that, you
>> see you can't talk about the order LAB1 > LAB2, the concept doesn't make any
>> sense.  The _only_ think you can talk about is gs(LAB1) > gs(LAB2).  And
>> because of that, it is always consistent and works just fine.
>>
>> Once that misguided complains from gcc and bisutils are fixed, are their any
>> failing cases?
>
> State of matters is that Binutils support is missing, and if I understand you 
> correctly, dg-require is not appropriate to factor out availability of such 
> features?

I'll take a stab at adding the missing binutils support for avr label diffs.

Regards
Senthil

Re: [PATCH], Fix PR 77993, PowerPC bootstrap on 32-bit systems

2016-11-03 Thread Michael Meissner

Unfortunately, I committed an earlier version of the patch that was buggy.  I
have updated the trunk to include the proper version of the file rs6000.h that
does bootstrap.  Sorry about htat.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [patch,testsuite] Support dg-require-effective-target label_offsets.

2016-11-03 Thread Mike Stump

On Nov 3, 2016, at 8:25 AM, Georg-Johann Lay  wrote:
> 
> On 28.10.2016 17:58, Mike Stump wrote:
>> On Oct 27, 2016, at 3:16 AM, Georg-Johann Lay  wrote:
>>> 
>>> Now imagine some arithmetic like & - &  This might result in
>>> one or two stub addresses, and difference between such addresses is a
>>> complete different thing than the difference between the original labels:
>>> The result might differ in absolute value and in sign, i.e. you can't even
>>> determine whether LAB1 or LAB2 comes first in the generated code as the
>>> order of stubs might differ from the order of respective labels.
>> 
>> So, I think this all doesn't matter any.  Every address gs(LAB) fits in
>> 16-bits by definition, and every gs(LAB1) - gs(LAB2) fits into 16 bits and
> 
> Yes, you are right.  Label differences could be implemented.  AFAIK there is 
> currently no activity by the Binutils folks to add respective support, so it 
> is somewhat pointless to add that support to avr-gcc...

[ pause, built an avr compiler ] So, I checked code-gen for 
gcc.c-torture/compile/labels-2.c and it looked fine:

ldi r24,lo8(gs(.L2))
ldi r25,hi8(gs(.L2))
std Y+2,r25
std Y+1,r24
ldi r18,lo8(gs(.L3))
ldi r19,hi8(gs(.L3))
ldi r24,lo8(gs(.L4))
ldi r25,hi8(gs(.L4))
mov r20,r18
mov r21,r19
sub r20,r24

So, apparently quite a lot of the code-gen is already wired up.  Since this 
generated code, I wasn't sure what error you're trying to hide?  I tried all 
the test cases your mentioned, none failed to produce code or failed to 
assemble that code with -mmcu=atmega2560 nor any other cpu I could find, so, 
trivially, I must not quite understand what doesn't work yet.

I did notice that gcc.c-torture/execute/pr70460.c produced:

.word   .L3-(.L2)

which seems wrong, given all the other code-gen.  I think this has to be 
gs(.L3)-(gs(.L2)), no?  I tried to get binutils to do that for me directly with 
a .word and it seemed resistant; which is unfortunate.

If you consider the above to be a code-gen bug, then you can do something like:

Index: config/avr/avr.c
===
--- config/avr/avr.c(revision 241837)
+++ config/avr/avr.c(working copy)
@@ -9422,6 +9422,21 @@
   x = plus_constant (Pmode, x, AVR_TINY_PM_OFFSET);
 }

+  /* We generate stubs using gs(), but binutils doesn't support that
+ here.  */
+  if (AVR_3_BYTE_PC
+  && GET_CODE (x) == MINUS
+  && GET_CODE (XEXP (x, 0)) == LABEL_REF
+  && GET_CODE (XEXP (x, 1)) == LABEL_REF)
+return false;
+  if (AVR_3_BYTE_PC
+  && GET_CODE (x) == MINUS
+  && GET_CODE (XEXP (x, 0)) == SUBREG
+  && GET_CODE (XEXP (XEXP (x, 0), 0)) == LABEL_REF
+  && GET_CODE (XEXP (x, 1)) == SUBREG
+  && GET_CODE (XEXP (XEXP (x, 1), 0)) == LABEL_REF)
+return false;
+
   return default_assemble_integer (x, size, aligned_p);
 }

to get the compiler to error out to prevent bad code-gen that binutils can't 
handle.  If you want, you can also declare by fiat that subtraction might be 
performed on the stub or the actual function, and so any code that does this 
isn't supported (runtime unspecified behavior), and then you just mark the 
execute testcase as bad (not supportable on avr, since avr blew the binutils 
support for gs).  The compile only test cases work fine now, so nothing needs 
to be done for them.

If you go for the above and make it a hard error, then a patch in the style of 
the original patch would be fine, but please just mark the ones that fail and 
please key of avr*, as this is an avr-only misfeature.  If avr did it right, 
they would complete gs() support so that gs(LAB)-gs(LAB) works.  gcc has a 
fetish for + and - working with labels and constants.

The completely different solution would be to never use gs(), and generate code 
that is 3 byte aware, but, I suspect you're unlikely to want to do that.

> Bottom line is that label offsets are not supported by avr BE, and the 
> intention of the patch is to reduce FAIL noise in testsuite results because 
> of the missing support.

So, what doesn't work?  I tried all the tests cases on a top of tree compiler, 
and nothing I did ever failed.

>> thus is valid for all 16-bit one contexts.  The fact the order between the
>> stub and the actual code is different is irrelevant, it is a private
> 
> Only if the code is not executed; there are several test cases that compute 
> relative placements of labels like:
> 
> x(){if(&&<0)x();b:goto*&e:;}
> 
> If a test ever relies on the fact that & - & tells anything about the 
> ordering of the labels, the code is about to crash.

No, I checked the code, it's perfectly fine, with the one exception that I 
think .L3 is used in places where gs(.L3) should be used.  The single bug I 
found is a port bug, and it wants to sometimes use gs() and sometimes not use 
gs().  You can't do that.  For the

Re: [Patch, fortran] PR54679 Erroneous "Expected P edit descriptor" in conjunction with L descriptor

2016-11-03 Thread Steve Kargl

On Sat, Oct 29, 2016 at 08:30:46PM -0700, Jerry DeLisle wrote:
> 
> OK for trunk?
> 
> 2016-10-29  Jerry DeLisle  
> 
>   PR fortran/54679
>   * io.c (check_format): Adjust checks for FMT_L to treat a zero
>   width as an extension, giving warnings or error as appropriate.
>   Improve messages.
> 
> 2016-10-24  Jerry DeLisle  
> 
>   PR fortran/54679
>   * io/format.c (parse_format_list): Adjust checks for FMT_L to
>   treat a zero width as an extension, giving warnings or error
>   as appropriate. Improve messages.

OK.

-- 
steve

Re: [PATCH, Fortran, v1] Restructure initialization of allocatable components

2016-11-03 Thread Steve Kargl

On Thu, Nov 03, 2016 at 02:16:48PM +0100, Andre Vehreschild wrote:
> 
> Bootstraps and regtests fine on x86_64-linux/F23. Ok for trunk?
> 
> @Dominique: Would you give it a go on your open patch collection? Maybe it
> fixes one PR, but I am not very hopeful, because the patch is merely removing
> complexity instead of doing new things.
> 

Andre,

I did not see anything that looked dubious.  I think
it is OK to commit, but you may want to see if Paul
has any comment.

-- 
Steve

Re: [PATCH] [ARC] Various small miscellaneous fixes.

2016-11-03 Thread Andrew Burgess

* Claudiu Zissulescu  [2016-11-01 16:28:34 
+0100]:

> This is an updated version of the patch that can be applied as is.
> 
> Ok to apply?
> Claudiu
> 
> gcc/
> 2016-05-09  Claudiu Zissulescu  
> 
>   * config/arc/arc.c (arc_process_double_reg_moves): Change.
>   * config/arc/arc.md (movsi_insn): Disable unsupported move
>   instructions for ARCv2 cores.
>   (movdi): Use prepare_move_operands.
>   (movsf, movdf): Use move_dest_operand predicate.
>   (arc_process_double_reg_moves): Change.

arc_process_double_reg_moves line is duplicated, and "Change" seems a
little vague, even by GCC/ChangeLog standards.

>   * config/arc/constraints.md (Chs): Enable when barrel shifter is
>   present.
>   * config/arc/fpu.md (divsf3): Change to divsf3_fpu.
>   * config/arc/fpx.md (dexcl_3op_peep2_insn): Dx data register is
>   also a destination.
>   (dexcl_3op_peep2_insn_nores): Likewise.
>   * config/arc/arc.h (SHIFT_COUNT_TRUNCATED): Define to one.
>   (LINK_COMMAND_SPEC): Remove.

All the rest looks good.

Thanks,
Andrew






> ---
>  gcc/config/arc/arc.c  |  5 +
>  gcc/config/arc/arc.h  | 27 +++
>  gcc/config/arc/arc.md | 35 +++
>  gcc/config/arc/constraints.md |  3 ++-
>  gcc/config/arc/fpu.md |  6 --
>  gcc/config/arc/fpx.md | 26 --
>  6 files changed, 41 insertions(+), 61 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 0e7b63d..c927d5b 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -9021,10 +9021,7 @@ arc_process_double_reg_moves (rtx *operands)
>rtx srcLow  = simplify_gen_subreg (SImode, src, DFmode,
>   TARGET_BIG_ENDIAN ? 4 : 0);
>  
> -  emit_insn (gen_rtx_UNSPEC_VOLATILE (Pmode,
> -   gen_rtvec (3, dest, srcHigh, srcLow),
> -   VUNSPEC_ARC_DEXCL_NORES));
> -
> +  emit_insn (gen_dexcl_2op (dest, srcHigh, srcLow));
>  }
>else
>  gcc_unreachable ();
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index b146f3a..17285a7 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -128,24 +128,6 @@ along with GCC; see the file COPYING3.  If not see
>  %{!marclinux*: %{pg|p|profile:-marclinux_prof;: -marclinux}} 
> \
>  %{!z:-z max-page-size=0x2000 -z common-page-size=0x2000} \
>  %{shared:-shared}"
> -/* Like the standard LINK_COMMAND_SPEC, but add %G when building
> -   a shared library with -nostdlib, so that the hidden functions of libgcc
> -   will be incorporated.
> -   N.B., we don't want a plain -lgcc, as this would lead to re-exporting
> -   non-hidden functions, so we have to consider libgcc_s.so.* first, which in
> -   turn should be wrapped with --as-needed.  */
> -#define LINK_COMMAND_SPEC "\
> -%{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:\
> -%(linker) %l " LINK_PIE_SPEC "%X %{o*} %{A} %{d} %{e*} %{m} %{N} %{n} 
> %{r}\
> -%{s} %{t} %{u*} %{x} %{z} %{Z} %{!A:%{!nostdlib:%{!nostartfiles:%S}}}\
> -%{static:} %{L*} %(mfwrap) %(link_libgcc) %o\
> -%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1):\
> - %:include(libgomp.spec)%(link_gomp)}\
> -%(mflib)\
> -%{fprofile-arcs|fprofile-generate|coverage:-lgcov}\
> -%{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\
> -%{!A:%{!nostdlib:%{!nostartfiles:%E}}} %{T*} }}"
> -
>  #else
>  #define LINK_SPEC "%{mbig-endian:-EB} %{EB} %{EL}\
>%{pg|p:-marcelf_prof;mA7|mARC700|mcpu=arc700|mcpu=ARC700: -marcelf}"
> @@ -1570,13 +1552,10 @@ extern int arc_return_address_regs[4];
>  /* Undo the effects of the movmem pattern presence on STORE_BY_PIECES_P .  */
>  #define MOVE_RATIO(SPEED) ((SPEED) ? 15 : 3)
>  
> -/* Define this to be nonzero if shift instructions ignore all but the 
> low-order
> -   few bits. Changed from 1 to 0 for rotate pattern testcases
> -   (e.g. 20020226-1.c). This change truncates the upper 27 bits of a word
> -   while rotating a word. Came to notice through a combine phase
> -   optimization viz. a << (32-b) is equivalent to a << (-b).
> +/* Define this to be nonzero if shift instructions ignore all but the
> +   low-order few bits.
>  */
> -#define SHIFT_COUNT_TRUNCATED 0
> +#define SHIFT_COUNT_TRUNCATED 1
>  
>  /* Value is 1 if truncating an integer of INPREC bits to OUTPREC bits
> is done just by pretending it is already truncated.  */
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index e127d5b..7147fbd 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -704,9 +704,9 @@
>  ; the iscompact attribute allows the epilogue expander to know for which
>  ; insns it should lengthen the return insn.
>  ; N.B. operand 1 of alternative 7 expands into

Re: [PATCH], PR 78192, Fix PowerPC ISA 3.0 xxextractaw/vextractu{b,h} on little endian

2016-11-03 Thread Segher Boessenkool

On Thu, Nov 03, 2016 at 06:22:11PM -0400, Michael Meissner wrote:
> Aaron has been running tests on the simulator, and some of the tests fails on
> little endian systems.  The failing tests do int extracts from a V4SImode
> vector.  In looking at the code, the vector index was adjusted when the low
> level extract instruction was created, and then adjusted again within the
> insn.  This patch removes the second adjustment.
> 
> I have done bootstraps and make check on both big endian and little endian
> power8 systems with no regressions.  I have verrified that the tests now pass
> in the simulator for both little and big endian targets.  Can I install this
> patch on the trunk?

Certainly!  Thanks for the fix.  One nit, see below.


Segher


> 2016-11-03  Michael Meissner  
> 
>   PR target/78192
>   * config/rs6000/vsx.md (vsx_extract__di): The element number
>   has already been adjusted for endianess, so don't adjust it any
>   further.

Three "n"s in endianness (another instance in the patch itself).

Re: [PATCH] [ARC] New option handling, refurbish multilib support.

2016-11-03 Thread Andrew Burgess

* Claudiu Zissulescu  [2016-10-31 16:46:17 
+0100]:

> Please find the updated patch.
> 
> What is new:
> - The .def files are having a comment block on how to add new lines.
> - The arc_seen_option is not used.
> - The arc_cpu* variables are not used.
> 
> Please let me know if I miss something,

Claudiu,

Thanks for the refresh on this patch, I think that it's looking really
great.  I just had a couple of issues that I think would be worth
addressing before we merge this.

In this hunk:

> diff --git a/gcc/config/arc/arc.opt b/gcc/config/arc/arc.opt
> index 4caf366..20a526b 100644
> --- a/gcc/config/arc/arc.opt
> +++ b/gcc/config/arc/arc.opt
> @@ -53,9 +53,12 @@ mARC700
>  Target Report
>  Same as -mA7.
>  
> +TargetVariable
> +int arc_mpy_option = DEFAULT_arc_mpy_option
> +
>  mmpy-option=
> -Target RejectNegative Joined UInteger Var(arc_mpy_option) Init(2)
> --mmpy-option={0,1,2,3,4,5,6,7,8,9} Compile ARCv2 code with a multiplier 
> design option.  Option 2 is default on.
> +Target RejectNegative Joined
> +-mmpy-option=MPY Compile ARCv2 code with a multiplier design option.
>  
>  mdiv-rem
>  Target Report Mask(DIVREM)
> @@ -100,7 +103,7 @@ Target Report Mask(MUL64_SET)

you create the TargetVariable arc_mpy_option, however, I think it
would be neater to fold this variable into the mmpy-option as it was
before, but, changing the type to Enum.  This would allow the big
option checking switch to be removed from arc-common.c, which I think
is a win overall.

I wasn't 100% certain that the above would actually be possible, so I
put together a prototype, I've included the patch below, that applies
on top of your patch.  Hopefully you'll agree that it's a nice clean
up.

My second question was with this hunk:

> diff --git a/gcc/config/arc/arc-opts.h b/gcc/config/arc/arc-opts.h
> index cbd7898..81446b4 100644
> --- a/gcc/config/arc/arc-opts.h
> +++ b/gcc/config/arc/arc-opts.h
> @@ -48,3 +49,35 @@ enum processor_type
>  /* Double precision floating point assist operations.  */
>  #define FPX_DP0x0100
>  
> +/* fpus option combi.  */
> +#define FPU_FPUS  (FPU_SP | FPU_SC)
> +/* fpud option combi.  */
> +#define FPU_FPUD  (FPU_SP | FPU_SC | FPU_DP | FPU_DC)
> +/* fpuda option combi.  */
> +#define FPU_FPUDA (FPU_SP | FPU_SC | FPX_DP)
> +/* fpuda_div option combi.  */
> +#define FPU_FPUDA_DIV (FPU_SP | FPU_SC | FPU_SD | FPX_DP)
> +/* fpuda_fma option combi.  */
> +#define FPU_FPUDA_FMA (FPU_SP | FPU_SC | FPU_SF | FPX_DP)
> +/* fpuda_all option combi.  */
> +#define FPU_FPUDA_ALL (FPU_SP | FPU_SC | FPU_SF | FPU_SD | FPX_DP)
> +/* fpus_div option combi.  */
> +#define FPU_FPUS_DIV  (FPU_SP | FPU_SC | FPU_SD)
> +/* fpus_fma option combi.  */
> +#define FPU_FPUS_FMA  (FPU_SP | FPU_SC | FPU_SF)
> +/* fpus_all option combi.  */
> +#define FPU_FPUS_ALL  (FPU_SP | FPU_SC | FPU_SF | FPU_SD)
> +/* fpud_div option combi.  */
> +#define FPU_FPUD_DIV  (FPU_FPUS_DIV | FPU_DP | FPU_DC | FPU_DD)
> +/* fpud_fma option combi.  */
> +#define FPU_FPUD_FMA  (FPU_FPUS_FMA | FPU_DP | FPU_DC | FPU_DF)
> +/* fpud_all option combi.  */
> +#define FPU_FPUD_ALL  (FPU_FPUS_ALL | FPU_DP | FPU_DC | FPU_DF | FPU_DD)
> +
> +/* Default FPU option value.  */
> +#define DEFAULT_arc_fpu_build 0x1000
> +
> +/* Default MPY option value.  */
> +#define DEFAULT_arc_mpy_option -1
> +
> +#endif /* ARC_OPTS_H */

I wonder where the vale 0x1000 comes from, what's the significance
of it.  I could ask the same question about the magic -1 constant, but
it's rather more obvious that -1 is just a no-value-selected magic
number.  I guess my questions for 0x1000 are why this specific
value?  What does it mean?

My final question concerns:

> diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
> index 5e8d6b4..8810e91 100644
> --- a/gcc/config/arc/arc.c
> +++ b/gcc/config/arc/arc.c
> @@ -853,7 +782,86 @@ static void
>  arc_override_options (void)
>  {
>if (arc_cpu == PROCESSOR_NONE)
> -arc_cpu = PROCESSOR_ARC700;
> +arc_cpu = TARGET_CPU_DEFAULT;
> +
> +  /* Set the default cpu options.  */
> +  arc_selected_cpu = _cpu_types[(int) arc_cpu];
> +  arc_selected_arch = _arch_types[(int) arc_selected_cpu->arch];
> +  arc_base_cpu = arc_selected_arch->arch;
> +
> +  /* Set the architectures.  */
> +  switch (arc_selected_arch->arch)
> +{
> +case BASE_ARCH_em:
> +  arc_cpu_string = "EM";
> +  break;
> +case BASE_ARCH_hs:
> +  arc_cpu_string = "HS";
> +  break;
> +case BASE_ARCH_700:
> +  arc_cpu_string = "ARC700";
> +  break;
> +case BASE_ARCH_6xx:
> +  arc_cpu_string = "ARC600";
> +  break;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  /* Set cpu flags accordingly to architecture/selected cpu.  The cpu
> + specific flags are set in arc-common.c.  The architecture forces
> + the default hardware configurations in, regardless what command
> + line options are saying.  The CPU optional hw options can be
> + turned on or off.

[PATCH], PR 78192, Fix PowerPC ISA 3.0 xxextractaw/vextractu{b,h} on little endian

2016-11-03 Thread Michael Meissner

Aaron has been running tests on the simulator, and some of the tests fails on
little endian systems.  The failing tests do int extracts from a V4SImode
vector.  In looking at the code, the vector index was adjusted when the low
level extract instruction was created, and then adjusted again within the
insn.  This patch removes the second adjustment.

I have done bootstraps and make check on both big endian and little endian
power8 systems with no regressions.  I have verrified that the tests now pass
in the simulator for both little and big endian targets.  Can I install this
patch on the trunk?

2016-11-03  Michael Meissner  

PR target/78192
* config/rs6000/vsx.md (vsx_extract__di): The element number
has already been adjusted for endianess, so don't adjust it any
further.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 241828)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -2586,11 +2586,10 @@ (define_insn  "vsx_extract__di"
  (parallel [(match_operand:QI 2 "" "n")]]
   "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
 {
-  int element = INTVAL (operands[2]);
+  /* Note, the element number has already been adjusted for endianess, so we
+ don't have to adjust it here.  */
   int unit_size = GET_MODE_UNIT_SIZE (mode);
-  int offset = ((VECTOR_ELT_ORDER_BIG)
-   ? unit_size * element
-   : unit_size * (GET_MODE_NUNITS (mode) - 1 - element));
+  HOST_WIDE_INT offset = unit_size * INTVAL (operands[2]);
 
   operands[2] = GEN_INT (offset);
   if (unit_size == 4)

[PATCH] Fix wrong declarations of builtin functions

2016-11-03 Thread Bernd Edlinger

Hi,

I thought in preparation of the new C++ warning about wrong
declarations of builtin functions it would be good to fix some of the
more obvious bugs in the testsuite first.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
2016-11-03  Bernd Edlinger  

	PR c++/71973
	* g++.dg/cpp1y/lambda-generic-udt.C: Fix builtin function declaration.
	* g++.dg/init/new15.C: Likewise.
	* g++.dg/ipa/inline-1.C: Likewise.
	* g++.dg/ipa/inline-2.C: Likewise.
	* g++.dg/lto/20080908-1_0.C: Likewise.
	* g++.dg/tc1/dr20.C: Likewise.
	* g++.dg/tree-ssa/inline-1.C: Likewise.
	* g++.dg/tree-ssa/inline-2.C: Likewise.
	* g++.old-deja/g++.law/except1.C: Likewise.
	* g++.old-deja/g++.other/vbase5.C: Likewise.
	* obj-c++.dg/lto/trivial-1_0.mm: Likewise.

Index: gcc/testsuite/g++.dg/cpp1y/lambda-generic-udt.C
===
--- gcc/testsuite/g++.dg/cpp1y/lambda-generic-udt.C	(revision 241831)
+++ gcc/testsuite/g++.dg/cpp1y/lambda-generic-udt.C	(working copy)
@@ -14,7 +14,7 @@
   bool shadow = false;
 };
 
-extern "C" void printf(...);
+extern "C" int printf(const char*, ...);
 #define assert(e) if (e); else \
 		 printf ("%s:%d: !(%s)\n", __FILE__, __LINE__, #e), __builtin_abort ();
 
Index: gcc/testsuite/g++.dg/init/new15.C
===
--- gcc/testsuite/g++.dg/init/new15.C	(revision 241831)
+++ gcc/testsuite/g++.dg/init/new15.C	(working copy)
@@ -1,6 +1,6 @@
 // PR c++/9782
 
-extern "C" void printf(const char*, ...);
+extern "C" int printf(const char*, ...);
 
 template 
 struct A {
Index: gcc/testsuite/g++.dg/ipa/inline-1.C
===
--- gcc/testsuite/g++.dg/ipa/inline-1.C	(revision 241831)
+++ gcc/testsuite/g++.dg/ipa/inline-1.C	(working copy)
@@ -3,7 +3,7 @@
 /* { dg-add-options bind_pic_locally } */
 
 namespace std {
-  extern "C" void puts(const char *s);
+  extern "C" int puts(const char *s);
 }
 
 template  void
Index: gcc/testsuite/g++.dg/ipa/inline-2.C
===
--- gcc/testsuite/g++.dg/ipa/inline-2.C	(revision 241831)
+++ gcc/testsuite/g++.dg/ipa/inline-2.C	(working copy)
@@ -3,7 +3,7 @@
 /* { dg-add-options bind_pic_locally } */
 
 namespace std {
-  extern "C" void puts(const char *s);
+  extern "C" int puts(const char *s);
 }
 
 template  void
Index: gcc/testsuite/g++.dg/lto/20080908-1_0.C
===
--- gcc/testsuite/g++.dg/lto/20080908-1_0.C	(revision 241831)
+++ gcc/testsuite/g++.dg/lto/20080908-1_0.C	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-lto-do run }  */
-extern "C" { extern void *memcpy (void *, const void *, unsigned); }
+extern "C" { extern void *memcpy (void *, const void *, __SIZE_TYPE__); }
 
 inline int
 bci (const float )
Index: gcc/testsuite/g++.dg/tc1/dr20.C
===
--- gcc/testsuite/g++.dg/tc1/dr20.C	(revision 241831)
+++ gcc/testsuite/g++.dg/tc1/dr20.C	(working copy)
@@ -2,7 +2,7 @@
 // Origin: Giovanni Bajo 
 // DR20: Some clarifications needed for 12.8 para 15 
 
-extern "C" void printf(const char*, ...);
+extern "C" int printf(const char*, ...);
 extern "C" void abort(void);
 
 int count = 0;
Index: gcc/testsuite/g++.dg/tree-ssa/inline-1.C
===
--- gcc/testsuite/g++.dg/tree-ssa/inline-1.C	(revision 241831)
+++ gcc/testsuite/g++.dg/tree-ssa/inline-1.C	(working copy)
@@ -3,7 +3,7 @@
 /* { dg-add-options bind_pic_locally } */
 
 namespace std {
-  extern "C" void puts(const char *s);
+  extern "C" int puts(const char *s);
 }
 
 template  void
Index: gcc/testsuite/g++.dg/tree-ssa/inline-2.C
===
--- gcc/testsuite/g++.dg/tree-ssa/inline-2.C	(revision 241831)
+++ gcc/testsuite/g++.dg/tree-ssa/inline-2.C	(working copy)
@@ -3,7 +3,7 @@
 /* { dg-add-options bind_pic_locally } */
 
 namespace std {
-  extern "C" void puts(const char *s);
+  extern "C" int puts(const char *s);
 }
 
 template  void
Index: gcc/testsuite/g++.old-deja/g++.law/except1.C
===
--- gcc/testsuite/g++.old-deja/g++.law/except1.C	(revision 241831)
+++ gcc/testsuite/g++.old-deja/g++.law/except1.C	(working copy)
@@ -7,7 +7,7 @@
 // Subject: Bugs
 // Date: Wed, 22 Jul 92 08:29:30 EDT
 
-extern "C" void puts(const char *);
+extern "C" int puts(const char *);
 
 class foo {
 public:
Index: gcc/testsuite/g++.old-deja/g++.other/vbase5.C
===
--- gcc/testsuite/g++.old-deja/g++.other/vbase5.C	(revision 241831)
+++ gcc/testsuite/g++.old-deja/g++.other/vbase5.C	(working copy)
@@ -6,7 +6,7 @@
 // vbases. Normally that's just a pessimization, unfortunately during
 // constructoring it leads to

Re: [PATCH] Generate reproducible output independently of the build-path

2016-11-03 Thread Mike Stump

On Nov 3, 2016, at 1:01 PM, Ximin Luo  wrote:
> Log snippets attached.

Ick.  You missed the utility of contrib/compare_tests.  :-)

If you say:

  contrib/compare_tests oldbuilddir newbuilddir

You can then sit back and see everything as you'd expect and want.  The output 
is priority sorted and usually around 0 lines line or so.

Re: [PATCH], Fix PR 77993, PowerPC bootstrap on 32-bit systems

2016-11-03 Thread Segher Boessenkool

On Thu, Nov 03, 2016 at 04:05:59PM -0400, Michael Meissner wrote:
> --- gcc/config/rs6000/rs6000.md   (revision 241715)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -376,7 +376,7 @@
>(TF "TARGET_HARD_FLOAT
> && (TARGET_FPRS || TARGET_E500_DOUBLE)
> && TARGET_LONG_DOUBLE_128")
> -  (IF "TARGET_LONG_DOUBLE_128")
> +  (IF "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_LONG_DOUBLE_128")
>(KF "TARGET_FLOAT128_TYPE")
>(DD "TARGET_DFP")
>(TD "TARGET_DFP")])

You might want to use FLOAT128_IBM_P here too, like you do below:

> @@ -398,7 +398,7 @@
>  (define_mode_iterator FMOVE64 [DF DD])
>  (define_mode_iterator FMOVE64X [DI DF DD])
>  (define_mode_iterator FMOVE128 [(TF "TARGET_LONG_DOUBLE_128")
> - (IF "TARGET_LONG_DOUBLE_128")
> + (IF "FLOAT128_IBM_P (IFmode)")
>   (TD "TARGET_HARD_FLOAT && TARGET_FPRS")])
>  
>  (define_mode_iterator FMOVE128_FPR [(TF "FLOAT128_2REG_P (TFmode)")

Okay for trunk with or without that.

Thanks!


Segher

[patch] Clean up WORD_REGISTER_OPERATIONS & LOAD_EXTEND_OP tests

2016-11-03 Thread Eric Botcazou

Hi,

WORD_REGISTER_OPERATIONS and LOAD_EXTEND_OP are partially used directly as 
preprocessor macros and partially tested in the code.  This patch brings a bit 
of consistency into this by converting the remaining macro cases.

Tested on SPARC/Solaris and x86-64/Linux, OK for the mainline?


2016-11-03  Eric Botcazou  

* defaults.h (LOAD_EXTEND_OP): Define if not already defined.
* combine.c (LOAD_EXTEND_OP): Delete.
(simplify_comparison): Fix comment on LOAD_EXTEND_OP.
* cse.c (LOAD_EXTEND_OP): Delete.
* fold-const.c (LOAD_EXTEND_OP): Likewise.
* fwprop.c (free_load_extend): Remove #ifdef LOAD_EXTEND_OP/#endif.
* postreload.c (LOAD_EXTEND_OP): Delete.
* reload.c (push_reload): Remove #ifdef LOAD_EXTEND_OP/#endif.
Convert conditional compilation based on WORD_REGISTER_OPERATIONS.
(find_reloads): Likewise.
* reload1.c (eliminate_regs_1): Likewise.
* rtlanal.c (nonzero_bits1): Remove #ifdef LOAD_EXTEND_OP/#endif.
(num_sign_bit_copies1): Likewise.

-- 
Eric BotcazouIndex: combine.c
===
--- combine.c	(revision 241808)
+++ combine.c	(working copy)
@@ -104,10 +104,6 @@ along with GCC; see the file COPYING3.
 #include "rtl-iter.h"
 #include "print-rtl.h"
 
-#ifndef LOAD_EXTEND_OP
-#define LOAD_EXTEND_OP(M) UNKNOWN
-#endif
-
 /* Number of attempts to combine instructions in this function.  */
 
 static int combine_attempts;
@@ -12462,14 +12458,14 @@ simplify_comparison (enum rtx_code code,
  care bits and we can assume they have any convenient value.  So
  making the transformation is safe.
 
- 2. SUBREG_REG (op0) is a memory and LOAD_EXTEND_OP is not defined.
+ 2. SUBREG_REG (op0) is a memory and LOAD_EXTEND_OP is UNKNOWN.
  In this case the upper bits of op0 are undefined.  We should not make
  the simplification in that case as we do not know the contents of
  those bits.
 
- 3. SUBREG_REG (op0) is a memory and LOAD_EXTEND_OP is defined and not
- UNKNOWN.  In that case we know those bits are zeros or ones.  We must
- also be sure that they are the same as the upper bits of op1.
+ 3. SUBREG_REG (op0) is a memory and LOAD_EXTEND_OP is not UNKNOWN.
+ In that case we know those bits are zeros or ones.  We must also be
+ sure that they are the same as the upper bits of op1.
 
  We can never remove a SUBREG for a non-equality comparison because
  the sign bit is in a different place in the underlying object.  */
Index: cse.c
===
--- cse.c	(revision 241808)
+++ cse.c	(working copy)
@@ -43,10 +43,6 @@ along with GCC; see the file COPYING3.
 #include "dbgcnt.h"
 #include "rtl-iter.h"
 
-#ifndef LOAD_EXTEND_OP
-#define LOAD_EXTEND_OP(M) UNKNOWN
-#endif
-
 /* The basic idea of common subexpression elimination is to go
through the code, keeping a record of expressions that would
have the same value at the current scan point, and replacing
Index: defaults.h
===
--- defaults.h	(revision 241808)
+++ defaults.h	(working copy)
@@ -1259,6 +1259,10 @@ see the files COPYING3 and COPYING.RUNTI
 #define WORD_REGISTER_OPERATIONS 0
 #endif
 
+#ifndef LOAD_EXTEND_OP
+#define LOAD_EXTEND_OP(M) UNKNOWN
+#endif
+
 #ifndef CONSTANT_ALIGNMENT
 #define CONSTANT_ALIGNMENT(EXP, ALIGN) ALIGN
 #endif
Index: fold-const.c
===
--- fold-const.c	(revision 241808)
+++ fold-const.c	(working copy)
@@ -80,10 +80,6 @@ along with GCC; see the file COPYING3.
 #include "tree-ssanames.h"
 #include "selftest.h"
 
-#ifndef LOAD_EXTEND_OP
-#define LOAD_EXTEND_OP(M) UNKNOWN
-#endif
-
 /* Nonzero if we are folding constants inside an initializer; zero
otherwise.  */
 int folding_initializer = 0;
Index: fwprop.c
===
--- fwprop.c	(revision 241808)
+++ fwprop.c	(working copy)
@@ -1051,9 +1051,7 @@ free_load_extend (rtx src, rtx_insn *ins
   df_ref def, use;
 
   reg = XEXP (src, 0);
-#ifdef LOAD_EXTEND_OP
   if (LOAD_EXTEND_OP (GET_MODE (reg)) != GET_CODE (src))
-#endif
 return false;
 
   FOR_EACH_INSN_USE (use, insn)
Index: postreload.c
===
--- postreload.c	(revision 241808)
+++ postreload.c	(working copy)
@@ -41,10 +41,6 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "dbgcnt.h"
 
-#ifndef LOAD_EXTEND_OP
-#define LOAD_EXTEND_OP(M) UNKNOWN
-#endif
-
 static int reload_cse_noop_set_p (rtx);
 static bool reload_cse_simplify (rtx_insn *, rtx);
 static void reload_cse_regs_1 (void);
Index: reload.c
===
--- reload.c	(revision 241808)
+++ reload.c	(working copy)
@@ -1064,7 +1064,6 @@ push_reload

Add missing symbols for versioned namespace

2016-11-03 Thread François Dumont


Hi

I might not be the right one to propose this patch as I am not sure 
that I fully understand gnu-versioned-namespace.ver organization. But 
with it following test failures when using versioned namespace vanish:


FAIL: 20_util/allocator/overaligned.cc (test for excess errors)
FAIL: ext/bitmap_allocator/overaligned.cc (test for excess errors)
FAIL: ext/mt_allocator/overaligned.cc (test for excess errors)
FAIL: ext/new_allocator/overaligned.cc (test for excess errors)
FAIL: ext/pool_allocator/overaligned.cc (test for excess errors)

Ok to commit ?

* config/abi/pre/gnu-versioned-namespace.ver: Export C++17 new of
over-aligned types symbols.

François

diff --git a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
index 34d58ae..bffb35c 100644
--- a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
+++ b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
@@ -348,6 +348,18 @@ CXXABI_2.0 {
 
 # __gnu_cxx::__freeres()
 _ZN9__gnu_cxx9__freeresEv;
+
+# C++17 aligned new/delete
+_Znw[jmy]St11align_val_t;
+_Znw[jmy]St11align_val_tRKSt9nothrow_t;
+_Zna[jmy]St11align_val_t;
+_Zna[jmy]St11align_val_tRKSt9nothrow_t;
+_ZdlPvSt11align_val_t;
+_ZdlPvSt11align_val_tRKSt9nothrow_t;
+_ZdlPv[jmy]St11align_val_t;
+_ZdaPvSt11align_val_t;
+_ZdaPvSt11align_val_tRKSt9nothrow_t;
+_ZdaPv[jmy]St11align_val_t;
 };
 
 # Symbols in the support library (libsupc++) supporting trans-mem.

[PATCH], Fix PR 77993, PowerPC bootstrap on 32-bit systems

2016-11-03 Thread Michael Meissner

This patch fixes PR target/77993, which is a bug preventing bootstrapping on
32-bit PowerPC Linux targets.

The issue is that some of the insns that support IFmode (IBM extended double
mode when long double is IEEE 128-bit floating point in the future) did not
check for -msoft-float, and disable IFmode, while others did get disabled.  On
64-bit Linux target systems, the soft-float multilib is no longer build, but on
32-bit Linux target systems, it is built.

The fix is to add checks for hardware floating point in the 3 macros used by
IFmode instructions.

I built this on a little endian Power8 system (64-bit only), big endian Power8
system (64-bit only), and a big endian Power7 system (32-bit and 64-bit).
There were no regressions in the test.  In addition, Eric Botcazou, who
reported the bug, says that he can now bootstrap his compiler.  Is this patch
ok to install?

2016-11-03  Michael Meissner  

PR target/77993
* config/rs6000/rs6000.h (FLOAT128_IBM_P): Do not allow IFmode or
ICmode unless we have standard PowerPC floating point.
* config/rs6000/rs6000.md (FP iterator): Likewise.
(FMOVE128 iterator): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 241715)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -442,14 +442,15 @@
point.  KFmode was added as a way to represent IEEE 128-bit floating point,
even if the default for long double is the IBM long double format.
Similarly IFmode is the IBM long double format even if the default is IEEE
-   128-bit.  */
+   128-bit.  Don't allow IFmode if -msoft-float.  */
 #define FLOAT128_IEEE_P(MODE)  \
   ((TARGET_IEEEQUAD && ((MODE) == TFmode || (MODE) == TCmode)) \
|| ((MODE) == KFmode) || ((MODE) == KCmode))
 
 #define FLOAT128_IBM_P(MODE)   \
   ((!TARGET_IEEEQUAD && ((MODE) == TFmode || (MODE) == TCmode))
\
-   || ((MODE) == IFmode) || ((MODE) == ICmode))
+   || (TARGET_HARD_FLOAT && TARGET_FPRS
\
+   && ((MODE) == IFmode || (MODE) == ICmode)))
 
 /* Helper macros to say whether a 128-bit floating point type can go in a
single vector register, or whether it needs paired scalar values.  */
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 241715)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -376,7 +376,7 @@
   (TF "TARGET_HARD_FLOAT
&& (TARGET_FPRS || TARGET_E500_DOUBLE)
&& TARGET_LONG_DOUBLE_128")
-  (IF "TARGET_LONG_DOUBLE_128")
+  (IF "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_LONG_DOUBLE_128")
   (KF "TARGET_FLOAT128_TYPE")
   (DD "TARGET_DFP")
   (TD "TARGET_DFP")])
@@ -398,7 +398,7 @@
 (define_mode_iterator FMOVE64 [DF DD])
 (define_mode_iterator FMOVE64X [DI DF DD])
 (define_mode_iterator FMOVE128 [(TF "TARGET_LONG_DOUBLE_128")
-   (IF "TARGET_LONG_DOUBLE_128")
+   (IF "FLOAT128_IBM_P (IFmode)")
(TD "TARGET_HARD_FLOAT && TARGET_FPRS")])
 
 (define_mode_iterator FMOVE128_FPR [(TF "FLOAT128_2REG_P (TFmode)")

Re: [PATCH] Generate reproducible output independently of the build-path

2016-11-03 Thread Ximin Luo

Ximin Luo:
> Testing
> ===
> 
> I've tested these patches on a Debian testing/unstable x86_64-linux-gnu 
> system.
> So far I've only run the new tests that this patch adds, on a 
> disable-bootstrap
> build. I will do a full bootstrap and run the full testsuite over the next few
> days, both with and without this patch, and report back.
> 

Tested with a full bootstrap on a Debian unstable x86_64-linux-gnu system.

Log snippets attached. These were produced by running the below shell command 
in each of the respective build directories (0 = without, 1 = with the patches):

find . -name '*.log' -a -not -name 'config.log' | sort | while read x; do echo 
"== $x =="; grep -a '^FAIL: \|^# of' "$x"; done > ../gcc-build-X.snippets

Due to make -j22, they didn't run the tests in the same order. However in 
between doing this:

$ diff -ru <(sort gcc-build-0.snippets) <(sort gcc-build-1.snippets)
--- /dev/fd/63  2016-11-03 20:58:02.892334031 +0100
+++ /dev/fd/62  2016-11-03 20:58:02.892334031 +0100
@@ -323,7 +323,7 @@
 # of expected failures 76
 # of expected passes   110511
 # of expected passes   11259
-# of expected passes   127561
+# of expected passes   127573
 # of expected passes   2797
 # of expected passes   42
 # of expected passes   43915
1

and a manual review of the unsorted diffs, hopefully you can see that 
everything is good.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
== ./gcc/testsuite/g++/g++.log ==
FAIL: g++.dg/guality/pr55665.C   -O2  line 23 p == 40
FAIL: g++.dg/guality/pr55665.C   -O3 -g  line 23 p == 40
# of expected passes110511
# of unexpected failures2
# of unexpected successes   2
# of expected failures  333
# of unsupported tests  3989
== ./gcc/testsuite/gcc/gcc.log ==
FAIL: gcc.dg/guality/pr54200.c   -Os  line 20 z == 3
FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 20 y == 25
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 20 z == 6
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 23 y == 117
FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 23 z == 8
FAIL: gcc.dg/guality/pr54519-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-4.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-5.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 17 y == 25
FAIL: gcc.dg/guality/pr54519-5.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 17 y == 25
FAIL: gcc.dg/guality/pr43051-1.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  line 34 c == [0]
FAIL: gcc.dg/guality/pr43051-1.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  line 36 e == [1]
FAIL: gcc.dg/guality/pr45882.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 16 d == 112
FAIL: gcc.dg/guality/pr45882.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  line 16 e == 142
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg1 == 1
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg2 == 2
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg3 == 3
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg4 == 4
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg5 == 5
FAIL: gcc.dg/guality/pr68860-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 14 arg6 == 6
FAIL:

Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

2016-11-03 Thread Jason Merrill

On Wed, Nov 2, 2016 at 5:15 PM, Bernd Edlinger
 wrote:
> On 11/02/16 18:51, Jason Merrill wrote:
>> On 11/02/2016 02:11 AM, Bernd Edlinger wrote:
>>> On 11/01/16 19:15, Bernd Edlinger wrote:
 On 11/01/16 18:11, Jason Merrill wrote:
> On Tue, Nov 1, 2016 at 11:45 AM, Bernd Edlinger
>  wrote:
>> On 11/01/16 16:20, Jason Merrill wrote:
>>> On 10/17/2016 03:18 PM, Bernd Edlinger wrote:
>>> I'm not even sure we need a new warning.  Can we combine this warning
>>> with the block that currently follows?
>>
>> After 20 years of not having a warning on that,
>> an implicitly enabled warning would at least break lots of bogus
>> test cases.
>
> Would it, though?  Which test cases still break with the current patch?

 Less than before, but there are still at least a few of them.

 I can make a list and send it tomorrow.
>>>
>>> FAIL: g++.dg/cpp1y/lambda-generic-udt.C  -std=gnu++14 (test for excess
>>> errors)
>>> FAIL: g++.dg/cpp1y/lambda-generic-xudt.C  -std=gnu++14 (test for excess
>>> errors)
>>> FAIL: g++.dg/init/new15.C  -std=c++11 (test for excess errors)
>>> FAIL: g++.dg/init/new15.C  -std=c++14 (test for excess errors)
>>> FAIL: g++.dg/init/new15.C  -std=c++98 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.dg/tc1/dr20.C  -std=c++11 (test for excess errors)
>>> FAIL: g++.dg/tc1/dr20.C  -std=c++14 (test for excess errors)
>>> FAIL: g++.dg/tc1/dr20.C  -std=c++98 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
>>> -flto-partition=1to1 -fno-use-linker-plugin
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
>>> -flto-partition=none -fuse-linker-plugin
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
>>> -fuse-linker-plugin -fno-fat-lto-objects
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
>>> -flto-partition=1to1 -fno-use-linker-plugin
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
>>> -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects
>>> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
>>> -fuse-linker-plugin
>>> FAIL: g++.dg/lto/pr68811 cp_lto_pr68811_0.o assemble, -O2
>>> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++98 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++11 (test for excess errors)
>>> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++14 (test for excess errors)
>>> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++98 (test for excess errors)
>>> FAIL: g++.old-deja/g++.other/builtins10.C  -std=c++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/builtins10.C  -std=c++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++98 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++11 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++14 (test for excess
>>> errors)
>>> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++98 (test for excess
>>> errors)
>>>
>>>
>>> The lto test case does emit the warning when assembling, but
>>> it still produces an executable and even executes it.
>>>
>>> Also g++.dg/cpp1y/lambda-generic-udt.C, g++.dg/tc1/dr20.C
>>> and g++.old-deja/g++.other/vbase5.C are execution tests.
>>>
>>> So I was wrong to assume these were all compile-only tests.
>>>
>>> I think that list should be fixable, if we decide to enable
>>> the warning by default.
>>
>> Yes, either by fixing the prototypes or disabling the warning.
>>
>
> Yes, I am inclined to enable the warning by default now.
>
> Most of the test cases are fixable in a fairly obvious way,
> see attachment.
>
> But I am

Re: [match.pd] Fix for PR35691

2016-11-03 Thread Marc Glisse


On Thu, 3 Nov 2016, Richard Biener wrote:


The transform would also work for vectors (element_precision for
the test but also a value-matching zero which should ensure the
same number of elements).

Um sorry, I didn't get how to check vectors to be of equal length by a
matching zero.
Could you please elaborate on that ?


He may have meant something like:

  (op (cmp @0 integer_zerop@2) (cmp @1 @2))


I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the 
point of the pattern.


Oups, that's what I had written first, and then I somehow managed to 
confuse myself enough to remove it so as to remove the call to types_match 
:-(


So the last operand is checked with operand_equal_p instead of 
integer_zerop. But the fact that we could compute bit_ior on the 
comparison results should already imply that the number of elements is 
the same.


Though for equality compares we also allow scalar results IIRC.


Oh, right, I keep forgetting that :-( And I have no idea how to generate 
one for a testcase, at least until the GIMPLE FE lands...


On platforms that have IOR on floats (at least x86 with SSE, maybe some 
vector mode on s390?), it would be cool to do the same for floats (most 
likely at the RTL level).


On GIMPLE view-converts could come to the rescue here as well.  Or we 
cab just allow bit-and/or on floats as much as we allow them on 
pointers.


Would that generate sensible code on targets that do not have logic insns 
for floats? Actually, even on x86_64 that generates inefficient code, so 
there would be some work (for instance grep finds no gen_iordf3, only 
gen_iorv2df3).


I am also a bit wary of doing those obfuscating optimizations too early... 
a==0 is something that other optimizations might use. long 
c=(long&)a|(long&)b; (double&)c==0; less so...


(and I am assuming that signaling NaNs don't make the whole transformation 
impossible, which might be wrong)


--
Marc Glisse

Re: [PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those (take 2)

2016-11-03 Thread Jason Merrill

On Thu, Nov 3, 2016 at 12:38 PM, Jakub Jelinek  wrote:
> On Wed, Nov 02, 2016 at 01:19:09PM -0400, Jason Merrill wrote:
>> On Wed, Nov 2, 2016 at 12:33 PM, Jakub Jelinek  wrote:
>> > On Wed, Nov 02, 2016 at 04:44:05PM +0100, Jakub Jelinek wrote:
>> >> which means if gen_type_die or gen_type_die_with_usage doesn't
>> >> use the langhook, then we'd emit a completely useless { __pfn; __delta }
>> >> struct into debug info first, and then in modified_type_die used
>> >> the langhook, get OFFSET_TYPE and probably create the
>> >> DW_TAG_ptr_to_member_type.  So I think we really need that.
>> >>
>> >> > > 2) it is used for something Ada-ish I really don't know how to test 
>> >> > > etc.
>> >> > >to be able to find out if it is safe to call it in
>> >> > >gen_type_die_with_usage too
>> >> >
>> >> > You could find an Ada test that uses the code and verify that the
>> >> > output stays the same?
>> >>
>> >> I can try to find the patch that introduced it and if it contained any
>> >> testcases.
>> >
>> > I couldn't find any unfortunately.  Pierre, could you please test if the
>> > following patch doesn't regress anything in the Ada debug info area?
>> >
>> > Here is updated patch I'm going to bootstrap/regtest; it generates the
>> > same debuginfo on ref-3.C testcase.
>>
>> Looks good.
>
> Keith's testing revealed a bug in the patch, that we don't emit
> DW_TAG_typedef for PMF typedefs.  Fixed by adding && !typedef_variant_p (type)
> check to the cp_get_debug_type hook, so that we emit the proper typedef
> with the right name and only its DECL_ORIGINAL_TYPE is replaced with
> OFFSET_TYPE handling.  I believe that was the only issue, so I think it
> should be ok to enable it always, not just for -gdwarf-5.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I checked in the hunks introducing check_lang_type separately; the rest is OK.

Jason

Re: [fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+

2016-11-03 Thread Bruce Korb


On 11/03/16 07:11, Rainer Orth wrote:


Ok for mainline now, and for backports to the gcc-6 and gcc-5 branches
after some soak time?


Yes, please.  Thanks.

Re: [PATCH] Print repeated rtl vector elements in a nicer way

2016-11-03 Thread David Malcolm

On Thu, 2016-11-03 at 17:43 +0100, Bernd Schmidt wrote:
> On 11/03/2016 05:35 PM, Martin Jambor wrote:
> > 
> > * print-rtl.c (print_rtx_operand_codes_E_and_V): Print how many
> > times
> > an element is repeated istead of printing each repeated
> > element.
> 
> "instead"
> 
> > ---
> >  gcc/print-rtl.c | 15 ++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
> > index 341ecdf..9752c85 100644
> > --- a/gcc/print-rtl.c
> > +++ b/gcc/print-rtl.c
> > @@ -252,7 +252,20 @@ rtx_writer::print_rtx_operand_codes_E_and_V
> > (const_rtx in_rtx, int idx)
> > m_sawclose = 1;
> > 
> >for (int j = 0; j < XVECLEN (in_rtx, idx); j++)
> > -   print_rtx (XVECEXP (in_rtx, idx, j));
> > +   {
> > + int j1;
> > +
> > + print_rtx (XVECEXP (in_rtx, idx, j));
> > + for (j1 = j + 1; j1 < XVECLEN (in_rtx, idx); j1++)
> > +   if (XVECEXP (in_rtx, idx, j) != XVECEXP (in_rtx, idx,
> > j1))
> > + break;
> > +
> > + if (j1 != j + 1)
> > +   {
> > + fprintf (m_outfile, " repeated %ix", j1 - j);
> > + j = j1;
> > +   }
> > +   }
> Good idea, but can you give an example of how this looks in practice?

It should be possible to create a simple selftest of this (for pre
-existing examples, look for ASSERT_RTL_DUMP_EQ in gcc/rtl-tests.c).

> Also, it would be nice (and necessary for David's rtl-testing) to
> also 
>  
> teach the rtl reader to parse this format.

Yes please, ideally with a selftest, though that may be tricky until
the rtl-testing work goes in.

(I suppose it could be disabled in compact mode, but that seems to be
an abuse of the term "compact").

Re: [match.pd] Fix for PR35691

2016-11-03 Thread Richard Biener

On November 3, 2016 6:11:07 PM GMT+01:00, Marc Glisse  
wrote:
>On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 3 November 2016 at 16:13, Richard Biener 
>wrote:
>>> On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:
>>>
 Hi Richard,
 The attached patch tries to fix PR35691, by adding the following
>two
 transforms to match.pd:
 (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0.
 (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0.

 For GENERIC, the "and" operator is truth_andif_expr, and it seems
>for GIMPLE,
 it gets transformed to bit_and_expr
 so to match for both GENERIC and GIMPLE, I had to guard the
>for-stmt:

 #if GENERIC
 (for op (truth_andif truth_orif)
 #elif GIMPLE
 (for op (bit_and bit_ior)
 #endif

 Is that OK ?
>>>
>>> As you are not removing the fold-const.c variant I'd say you should
>>> simply not look for truth_* and only handle GIMPLE.  Note that we
>>> have tree-ssa-ifcombine.c which should handle the variant with
>>> control-flow (but I guess it does not and your patch wouldn't help
>>> it either).
>>>
>>> The transform would also work for vectors (element_precision for
>>> the test but also a value-matching zero which should ensure the
>>> same number of elements).
>> Um sorry, I didn't get how to check vectors to be of equal length by
>a
>> matching zero.
>> Could you please elaborate on that ?
>
>He may have meant something like:
>
>   (op (cmp @0 integer_zerop@2) (cmp @1 @2))

I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the 
point of the pattern.

>So the last operand is checked with operand_equal_p instead of 
>integer_zerop. But the fact that we could compute bit_ior on the 
>comparison results should already imply that the number of elements is
>the 
>same.

Though for equality compares we also allow scalar results IIRC.

 This would also prevent the case where one vector is signed and
>the 
>other unsigned, which requires a view_convert (I don't remember if
>convert 
>automatically becomes view_convert here as in fold_convert or not).

No, but the other way around (for sign changes).

>For (some_int == 0) & (some_long == 0), doing ((long)some_int |
>some_long) 
>== 0 should also work (and it doesn't matter if we pick a sign- or 
>zero-extension), but that's more complicated, not necessary for a first
>
>version.

Yeah.

>On platforms that have IOR on floats (at least x86 with SSE, maybe some
>
>vector mode on s390?), it would be cool to do the same for floats (most
>
>likely at the RTL level).

On GIMPLE view-converts could come to the rescue here as well.  Or we cab just 
allow bit-and/or on floats as much as we allow them on pointers.

Richard.

Re: [match.pd] Fix for PR35691

2016-11-03 Thread Marc Glisse


On Thu, 3 Nov 2016, Richard Biener wrote:


On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:


Hi Richard,
The attached patch tries to fix PR35691, by adding the following two
transforms to match.pd:
(x == 0 && y == 0) -> (x | typeof(x)(y)) == 0.
(x != 0 || y != 0) -> (x | typeof(x)(y)) != 0.

For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE,
it gets transformed to bit_and_expr
so to match for both GENERIC and GIMPLE, I had to guard the for-stmt:

#if GENERIC
(for op (truth_andif truth_orif)
#elif GIMPLE
(for op (bit_and bit_ior)
#endif

Is that OK ?


As you are not removing the fold-const.c variant I'd say you should
simply not look for truth_* and only handle GIMPLE.  Note that we
have tree-ssa-ifcombine.c which should handle the variant with
control-flow (but I guess it does not and your patch wouldn't help
it either).

The transform would also work for vectors (element_precision for
the test but also a value-matching zero which should ensure the
same number of elements).


On the other hand, now that we are using VEC_COND_EXPR all over the place, 
it may be hard to write a testcase that makes this kind of pattern 
appear...


(related to PR 68714)

--
Marc Glisse

Re: [PATCH] Add mcpu flag for Qualcomm falkor core

2016-11-03 Thread Siddhesh Poyarekar

On 3 November 2016 at 23:08, Andrew Pinski  wrote:
> This patch no longer applies after the recent changes (starting around
> a month ago) to aarch64-cores.def.  Please updat the patch for the
> recent changes

Sorry about that, I'll post an updated patch shortly.

Siddhesh

[PATCH] [ARM] Eliminate SUBTARGET_CPU_DEFAULT

2016-11-03 Thread Richard Earnshaw (lists)

Over the years, GCC has had various ways of setting the default CPU.  As
things have changed, the ARM back-end hasn't necessarily kept up with
some of the changes and this has resulted in some convoluted logic in
places.  This patch cleans up some of this and eliminates entirely the
need for SUBTARGET_CPU_DEFAULT and for the generic ARM code to provide a
definition of TARGET_CPU_DEFAULT.

Instead, all the default selection logic for the ARM cpus is pushed into
config.gcc, making it much easier to understand which processor is used
for each target.

* config.gcc (arm-wrs-vxworks): Set target_cpu_cname.
(arm*-freebsd*): Likewise.
(arm*-*-netbsdelf*): Likewise.
(arm*-*-linux*): Likewise.
(arm*-*-uclinux*eabi*): Likewise.
(arm*-*-phoenix*): Likewise.
(arm*-*-eabi*, arm*-*-symbianelf*, arm*-*-rtems*): Likewise.
(arm*-*-*): Don't clobber target_cpu_cname when --with-cpu is not
specified.
Default to arm6 if target_cpu_cname is not set.
* arm/arm.c (arm_option_override): Simplify logic.  Assert that the
default cpu has been correctly configured.
* arm/arm.h (TARGET_CPU_DEFAULT): Delete.
(target_cpus): Delete TARGET_CPU_generic, add TARGET_CPU_num_cores.
* arm/freebsd.h (SUBTARGET_CPU_DEFAULT): Delete.
* arm/linux-eabi.h (SUBTARGET_CPU_DEFAULT): Delete.
* arm/linux-elf.h (SUBTARGET_CPU_DEFAULT): Delete.
* arm/symbian.h (SUBTARGET_CPU_DEFAULT): Delete.
* arm/unknown-elf.h (SUBTARGET_CPU_DEFAULT): Delete.

Tested by running builds for all supported configurations to check that
correct default CPUs are selected and by running full bootstrap on
arm-linux-gnueabihf.


Applied to trunk.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f9148dd..6d98d96 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1049,6 +1049,7 @@ arm-wrs-vxworks)
tm_file="elfos.h arm/elf.h arm/aout.h ${tm_file} vx-common.h vxworks.h 
arm/vxworks.h"
extra_options="${extra_options} arm/vxworks.opt"
tmake_file="${tmake_file} arm/t-arm arm/t-vxworks"
+   target_cpu_cname="arm6"
;;
 arm*-*-freebsd*)# ARM FreeBSD EABI
tm_file="dbxelf.h elfos.h ${fbsd_tm_file} arm/elf.h"
@@ -1061,11 +1062,15 @@ arm*-*-freebsd*)# ARM FreeBSD EABI
tm_file="${tm_file} arm/bpabi.h arm/freebsd.h arm/aout.h arm/arm.h"
case $target in
armv6*-*-freebsd*)
+   target_cpu_cname="arm1176jzfs"
tm_defines="${tm_defines} TARGET_FREEBSD_ARMv6=1"
 if test $fbsd_major -ge 11; then
tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1"
 fi
;;
+   *)
+   target_cpu_cname="arm9"
+   ;;
esac
with_tls=${with_tls:-gnu}
;;
@@ -1073,6 +1078,7 @@ arm*-*-netbsdelf*)
tm_file="dbxelf.h elfos.h netbsd.h netbsd-elf.h arm/elf.h arm/aout.h 
${tm_file} arm/netbsd-elf.h"
extra_options="${extra_options} netbsd.opt netbsd-elf.opt"
tmake_file="${tmake_file} arm/t-arm"
+   target_cpu_cname="arm6"
;;
 arm*-*-linux-*)# ARM GNU/Linux with ELF
tm_file="dbxelf.h elfos.h gnu-user.h linux.h linux-android.h 
glibc-stdint.h arm/elf.h arm/linux-gas.h arm/linux-elf.h"
@@ -1084,6 +1090,7 @@ arm*-*-linux-*)   # ARM GNU/Linux with ELF
esac
tmake_file="${tmake_file} arm/t-arm arm/t-arm-elf arm/t-bpabi 
arm/t-linux-eabi"
tm_file="$tm_file arm/bpabi.h arm/linux-eabi.h arm/aout.h 
vxworks-dummy.h arm/arm.h"
+   target_cpu_cname="arm10tdmi"
# Define multilib configuration for arm-linux-androideabi.
case ${target} in
*-androideabi)
@@ -1098,6 +1105,7 @@ arm*-*-uclinux*eabi*) # ARM ucLinux
tm_file="dbxelf.h elfos.h arm/unknown-elf.h arm/elf.h arm/linux-gas.h 
arm/uclinux-elf.h glibc-stdint.h"
tmake_file="${tmake_file} arm/t-arm arm/t-arm-elf arm/t-bpabi"
tm_file="$tm_file arm/bpabi.h arm/uclinux-eabi.h arm/aout.h 
vxworks-dummy.h arm/arm.h"
+   target_cpu_cname="arm7tdmi"
# The EABI requires the use of __cxa_atexit.
default_use_cxa_atexit=yes
;;
@@ -1106,6 +1114,7 @@ arm*-*-phoenix*)
tm_file="${tm_file} newlib-stdint.h phoenix.h"
tm_file="${tm_file} arm/aout.h arm/arm.h"
tmake_file="${tmake_file} arm/t-arm arm/t-bpabi arm/t-phoenix"
+   target_cpu_cname="arm7tdmi"
;;
 arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*)
case ${target} in
@@ -1115,6 +1124,7 @@ arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*)
default_use_cxa_atexit=yes
tm_file="dbxelf.h elfos.h arm/unknown-elf.h arm/elf.h arm/bpabi.h"
tmake_file="${tmake_file} arm/t-arm arm/t-arm-elf"
+   target_cpu_cname="arm7tdmi"
case ${target} in
arm*-*-eabi*)
  tm_file="$tm_file newlib-stdint.h"
@@ -1130,6 +1140,7 @@

Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading" (was: [PATCH] Add fopt-info-oacc)

2016-11-03 Thread Cesar Philippidis

This patch has proven to be effective at warning users when the compiler
falls back to host execution due to insufficient parallelism (at least
from parloop's perspective) inside kernels regions. At the moment, acc
kernels are restricted to gang-level parallelism. Consequently, if
parloops fails to detects any parallelism, the kernels region will only
execute on a single GPU thread, which is several orders of magnitude
slower than a single CPU thread.

Before I rebase this patch, are these changes conceptually OK for GCC7?
Looking back at the old email thread, there were some disagreements on
these types of warnings. One significant example where the parloops
fails to detect any parallelism is in SPEC_ACCEL. A test that should
take an hour or so, takes longer than a week with the host fallback. At
least this warning provides the user with some feedback that his/her
program might run slow.

Cesar

On 01/21/2016 01:54 PM, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 18 Jan 2016 18:26:49 +0100, Tom de Vries  
> wrote:
>> This patch introduces an option fopt-info-oacc.
>>
>> When using the option like this with a kernels region in kernels-loop.c 
>> that parloops does not manage to parallelize:
>> ...
>> $ gcc kernels-loop.c -S -O2 -fopenacc -fopt-info-oacc-all
>> ...
>>
>> we get a message:
>> ...
>> kernels-loop.c:23:9: note: kernels region executed sequentially. 
>> Consider mapping it to host execution, to avoid data copy penalty.
>> ...
> 
> Yay for helping the user understand what the compiler is doing!
> 
>> Any comments?
> 
> Telling from real-world code that we've been having a look at, when the
> above situation happens, we're -- in the vast majority of all cases -- in
> a situation where we generally want to avoid offloading (unless
> explicitly requested), "to avoid data copy penalty" as well as typically
> much slower single-threaded execution on the GPU.  Obviously, that will
> have to be revisited as parloops (or any other mechanism in GCC) is able
> to better understand/use the parallelism in OpenACC kernels constructs.
> 
> So, building upon Tom's patch, I have implemented an "avoid offloading"
> flag given the presence of one un-parallelized OpenACC kernels construct.
> This is currently only enabled for OpenACC kernels constructs, in
> combination with nvptx offloading, but I think the general scheme will be
> useful also for other constructs as well as other (non-shared memory)
> offloading targets.
> 
> Also, "avoid offloading" is just a default: if a user explicitly
> requested the use of, for example, a Nvidia GPU (with an
> acc_init(acc_device_nvidia) call, or by setting the
> ACC_DEVICE_TYPE=idia environemnt variable, for example), then we cannot
> apply host-fallback execution, because in this case the user can
> rightfully assume Nvidia GPU semantics (async clause works, and so on).
> 
> 
> The new warning (very similar to the one that Tom proposed) also
> uncovered a bunch of OpenACC kernels test cases in libgomp that did not
> have OpenACC kernels processing enabled (-ftree-parallelize-loops), but
> which parloops can handle fine once that is enabled -- and also a bunch
> of OpenACC kernels test cases that parloops doesn't handle but it looked
> as they were meant to be.  (Maybe I'm wrong about that, though.)  Anyway,
> Tom, would you please make a note to audit all use of -foffload-force in
> the libgomp testsuite?  (It is appropriate for all test cases that
> parloops truely is not meant to handle, but for all others, that flag
> should probably be removed and instead an XFAILed dg-bogus directive
> added, so that we will notice (XPASS) once it does handle them.)
> 
> 
> I've also added a new command-line option, -foffload-force, that restores
> the current behavior, inhibits the "avoid offloading" handling.  This is
> primarily meant for GCC (libgomp) testsuite usage, but could occasionally
> also be useful for users.  Considering alternatives (that can be applied
> in a more fine-grained way, case by case per OpenACC kernels construct):
> 
> 1) a new GCC-specific pragma, for example:
> 
> #pragma GCC force offloading
> #pragma acc kernels
>   [un-parallelizable stuff]
> 
> 2) a new GCC-specific clause, for example in the implementation
> namespace, starting with "_":
> 
> #pragma acc kernels _force_offloading
>   [un-parallelizable stuff]
> 
> ..., the -foffload-force flag was the simplest solution.  (Because, if
> you're going to alter the sources anyway, you might as well just remove
> the one offending OpenACC kernels construct...)
> 
> 
> Committed to gomp-4_0-branch in r232709:
> 
> commit 41a76d233e714fd7b79dc1f40823f607c38306ba
> Author: tschwinge 
> Date:   Thu Jan 21 21:52:50 2016 +
> 
> Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid 
> offloading"
> 
>   gcc/
>   * common.opt: Add -foffload-force.
>   * lto-wrapper.c

Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Jiong Wang


On 03/11/16 13:01, Bernd Schmidt wrote:


Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  (revision 241233)
+++ gcc/emit-rtl.c  (working copy)
@@ -6169,17 +6169,18 @@ emit_copy_of_insn_after (rtx_insn *insn,
   which may be duplicated by the basic block reordering code.  */
RTX_FRAME_RELATED_P (new_rtx) = RTX_FRAME_RELATED_P (insn);
  
+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */

+  rtx *ptail = _NOTES (new_rtx);
+  gcc_assert (*ptail == NULL_RTX);
+


Looks like new_rtx may contain it's own REG_NOTES when reached here thus
triggered ICE, I guess mark_jump_label may generate REG_LABEL_OPERAND as the
comments says.

After replace the gcc_assert with the following loop, this patch passed
bootstrap on both AArch64 and X86-64, and regression OK on gcc and g++.

+  while (*ptail != NULL_RTX)
+ptail =  (*ptail, 1);

Regards,
Jiong

[PATCH] libiberty: Add Rust symbol demangling.

2016-11-03 Thread Mark Wielaard

Adds Rust symbol demangler. Rust mangles symbols using GNU_V3 style,
adding a hash and various special character subtitutions. This adds
a new rust style to cplus_demangle and adds 3 helper functions
rust_demangle, rust_demangle_sym and rust_is_mangled.

rust-demangle.c was written by David. Mark did the code formatting to
GNU style and integration into the gcc/libiberty build system and
testsuite.

The original code was written for the perf tool. David agreed with
submitting it for gcc/libiberty so binutils, gdb, valgrind, etc.
can also easily reuse it. He has sent request-assign to ass...@gnu.org.
I already have write after approval for gcc.

include/ChangeLog:

2016-11-03  David Tolnay 
   Mark Wielaard  

   * demangle.h (DMGL_RUST): New macro.
   (DMGL_STYLE_MASK): Add DMGL_RUST.
   (demangling_styles): Add dlang_rust.
   (RUST_DEMANGLING_STYLE_STRING): New macro.
   (RUST_DEMANGLING): New macro.
   (rust_demangle): New prototype.
   (rust_is_mangled): Likewise.
   (rust_demangle_sym): Likewise.

libiberty/ChangeLog:

2016-11-03  David Tolnay 
   Mark Wielaard  

   * Makefile.in (CFILES): Add rust-demangle.c.
   (REQUIRED_OFILES): Add rust-demangle.o.
   * cplus-dem.c (libiberty_demanglers): Add rust_demangling case.
   (cplus_demangle): Handle RUST_DEMANGLING.
   (rust_demangle): New function.
   * rust-demangle.c: New file.
   * testsuite/Makefile.in (really-check): Add check-rust-demangle.
   (check-rust-demangle): New rule.
   * testsuite/rust-demangle-expected: New file.
---

diff --git a/include/demangle.h b/include/demangle.h
index 1d7cadf..de9b13f 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -63,9 +63,10 @@ extern "C" {
 #define DMGL_GNU_V3 (1 << 14)
 #define DMGL_GNAT   (1 << 15)
 #define DMGL_DLANG  (1 << 16)
+#define DMGL_RUST   (1 << 17)  /* Rust wraps GNU_V3 style mangling.  */
 
 /* If none of these are set, use 'current_demangling_style' as the default. */
-#define DMGL_STYLE_MASK 
(DMGL_AUTO|DMGL_GNU|DMGL_LUCID|DMGL_ARM|DMGL_HP|DMGL_EDG|DMGL_GNU_V3|DMGL_JAVA|DMGL_GNAT|DMGL_DLANG)
+#define DMGL_STYLE_MASK 
(DMGL_AUTO|DMGL_GNU|DMGL_LUCID|DMGL_ARM|DMGL_HP|DMGL_EDG|DMGL_GNU_V3|DMGL_JAVA|DMGL_GNAT|DMGL_DLANG|DMGL_RUST)
 
 /* Enumeration of possible demangling styles.
 
@@ -88,7 +89,8 @@ extern enum demangling_styles
   gnu_v3_demangling = DMGL_GNU_V3,
   java_demangling = DMGL_JAVA,
   gnat_demangling = DMGL_GNAT,
-  dlang_demangling = DMGL_DLANG
+  dlang_demangling = DMGL_DLANG,
+  rust_demangling = DMGL_RUST
 } current_demangling_style;
 
 /* Define string names for the various demangling styles. */
@@ -104,6 +106,7 @@ extern enum demangling_styles
 #define JAVA_DEMANGLING_STYLE_STRING  "java"
 #define GNAT_DEMANGLING_STYLE_STRING  "gnat"
 #define DLANG_DEMANGLING_STYLE_STRING "dlang"
+#define RUST_DEMANGLING_STYLE_STRING  "rust"
 
 /* Some macros to test what demangling style is active. */
 
@@ -118,6 +121,7 @@ extern enum demangling_styles
 #define JAVA_DEMANGLING (((int) CURRENT_DEMANGLING_STYLE) & DMGL_JAVA)
 #define GNAT_DEMANGLING (((int) CURRENT_DEMANGLING_STYLE) & DMGL_GNAT)
 #define DLANG_DEMANGLING (((int) CURRENT_DEMANGLING_STYLE) & DMGL_DLANG)
+#define RUST_DEMANGLING (((int) CURRENT_DEMANGLING_STYLE) & DMGL_RUST)
 
 /* Provide information about the available demangle styles. This code is
pulled from gdb into libiberty because it is useful to binutils also.  */
@@ -175,6 +179,27 @@ ada_demangle (const char *mangled, int options);
 extern char *
 dlang_demangle (const char *mangled, int options);
 
+/* Returns non-zero iff MANGLED is a rust mangled symbol.  MANGLED must
+   already have been demangled through cplus_demangle_v3.  If this function
+   returns non-zero then MANGLED can be demangled (in-place) using
+   RUST_DEMANGLE_SYM.  */
+extern int
+rust_is_mangled (const char *mangled);
+
+/* Demangles SYM (in-place) if RUST_IS_MANGLED returned non-zero for SYM.
+   If RUST_IS_MANGLED returned zero for SYM then RUST_DEMANGLE_SYM might
+   replace characters that cannot be demangled with '?' and might truncate
+   SYM.  After calling RUST_DEMANGLE_SYM SYM might be shorter, but never
+   larger.  */
+extern void
+rust_demangle_sym (char *sym);
+
+/* Demangles MANGLED if it was GNU_V3 and then RUST mangled, otherwise
+   returns NULL. Uses CPLUS_DEMANGLE_V3, RUST_IS_MANGLED and
+   RUST_DEMANGLE_SYM.  Returns a new string that is owned by the caller.  */
+extern char *
+rust_demangle (const char *mangled, int options);
+
 enum gnu_v3_ctor_kinds {
   gnu_v3_complete_object_ctor = 1,
   gnu_v3_base_object_ctor,
diff --git a/libiberty/Makefile.in b/libiberty/Makefile.in
index c7a4568..0ff9e45 100644
--- a/libiberty/Makefile.in
+++ b/libiberty/Makefile.in
@@ -146,6 +146,7 @@ CFILES = alloca.c argv.c asprintf.c atexit.c
\

Re: [RFC] Handle unary pass-through jump functions for ipa-vrp

2016-11-03 Thread Martin Jambor

Hi,

On Fri, Oct 28, 2016 at 02:03:47PM +1100, kugan wrote:
>
> ...snip...
> 
> I have also separated the constant parameter conversion out and posted as
> https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02309.html. This is now
> handling just unary pass-through jump functions.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu with no new
> regressions.
> 
> Is this OK for trunk?
> 
> Thanks,
> Kugan
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-10-28  Kugan Vivekanandarajah  
> 
>   * gcc.dg/ipa/vrp7.c: New test.
> 
> 
> gcc/ChangeLog:
> 
> 2016-10-28  Kugan Vivekanandarajah  
> 
>   * ipa-cp.c (ipa_get_jf_pass_through_result): Handle unary expressions.
>   (propagate_vr_accross_jump_function): Likewise.
>   * ipa-prop.c (ipa_set_jf_unary_pass_through): New.
>   (load_from_param_1): New.
>   (load_from_unmodified_param): Factor common part into load_from_param_1.
>   (load_from_param): New.
>   (compute_complex_assign_jump_func): Handle unary expressions.
>   (ipa_write_jump_function): Likewise.
>   (ipa_read_jump_function): Likewise.
> 
> 
> > Patch is OK with changes Martin suggested.
> > 
> > Honza
> > 

> From b7d9b413951ba20d156a7801640cc7d7bc57c062 Mon Sep 17 00:00:00 2001
> From: Kugan Vivekanandarajah 
> Date: Fri, 28 Oct 2016 10:16:38 +1100
> Subject: [PATCH 2/2] add unary jump function
> 
> ---
>  gcc/ipa-cp.c| 39 +++---
>  gcc/ipa-prop.c  | 89 
> +++--
>  gcc/testsuite/gcc.dg/ipa/vrp7.c | 32 +++
>  3 files changed, 142 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/vrp7.c
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 9f28557..8fc95dd 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1225,13 +1225,21 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
> *jfunc, tree input)
>  return NULL_TREE;
>  
>if (TREE_CODE_CLASS (ipa_get_jf_pass_through_operation (jfunc))
> -  == tcc_comparison)
> -restype = boolean_type_node;
> +  == tcc_unary)
> +{
> +  res = fold_unary (ipa_get_jf_pass_through_operation (jfunc),
> + TREE_TYPE (input), input);
> +}

Please do not put curly braces around a single statement.  Apart from
that, no objection from me.

Thanks,

Martin

Re: [PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those

2016-11-03 Thread Pierre-Marie de Rodat


Hello all,

On 11/02/2016 05:33 PM, Jakub Jelinek wrote:

On Wed, Nov 02, 2016 at 04:44:05PM +0100, Jakub Jelinek wrote:

You could find an Ada test that uses the code and verify that the
output stays the same?


I can try to find the patch that introduced it and if it contained any
testcases.


I couldn't find any unfortunately.  Pierre, could you please test if the
following patch doesn't regress anything in the Ada debug info area?


Yes, DWARF output is hard to test… or at least I haven’t find a way to 
do so in TCL/DejaGNU. ;-)


I have my own Python-based testsuite, which I use everytime I submit a 
DIE-related patch. As I said on IRC and as richi and I discussed at the 
last cauldron, the plan would be to manage to port the testcase to 
GCC’s, but schedule-wise I’m not able to do that these days… For the 
record I still plan to do it, though! So, just in case someone would 
need/want to run this testsuite, here it is in its current state: 
.


Anyway, that being said: your patch does not introduce regressions in my 
testsuite, so it’s fine as far as I’m concerned. :-) Thank you for asking!


--
Pierre-Marie de Rodat

[PING!] Re: [PATCH, Fortran] DEC Compatibility: Default missing exponents to 0 with -fdec

2016-11-03 Thread Fritz Reese

https://gcc.gnu.org/ml/fortran/2016-10/msg00222.html
On Wed, Oct 26, 2016 at 10:14 AM, Fritz Reese  wrote:
> All,
>
> Attached is a patch to the GNU Fortran front-end and runtime library
> (libgfortran) which accepts real numbers with missing exponents as if
> '0' was given as the exponent when the compile flag -fdec is given,
> for further compatibility with legacy compilers.
...
>
> Bootstraps and regtests on x86_64-redhat-linux. OK for trunk?
>

Just waiting on review for - I can hardly believe it myself - my LAST
planned DEC extension for gfortran!

Note the patch here has been re-based on trunk since the original-
required due to my other recent patches updating the same part of
gfortran.texi.

---
Fritz Reese

From: Fritz Reese 
Date: Wed, 5 Oct 2016 18:27:56 -0400
Subject: [PATCH] Default missing exponents to 0 with -fdec.

gcc/fortran/
* gfortran.texi: Document.
* gfortran.h (gfc_dt): New field default_exp.
* primary.c (match_real_constant): Default exponent with -fdec.
* io.c (match_io): Set dt.default_exp with -fdec.
* ioparm.def (IOPARM_dt_default_exp): New.
* trans-io.c (build_dt): Set IOPARM_dt_default_exp with -fdec.

libgfortran/io/
* io.h (IOPARM_DT_DEFAULT_EXP): New flag bit.
* list_read.c (parse_real, read_real): Allow omission of exponent with
IOPARM_DT_DEFAULT_EXP.
* read.c (read_f): Ditto.

gcc/testsuite/gfortran.dg/
* dec_exp_1.f90, dec_exp_2.f90, dec_exp_3.f90: New testcases.
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index ea4437c..a0dcf6d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2336,6 +2336,7 @@ typedef struct
   gfc_expr *io_unit, *format_expr, *rec, *advance, *iostat, *size, *iomsg,
   *id, *pos, *asynchronous, *blank, *decimal, *delim, *pad, *round,
   *sign, *extra_comma, *dt_io_kind, *udtio;
+  char default_exp;
 
   gfc_symbol *namelist;
   /* A format_label of `format_asterisk' indicates the "*" format */
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index cd2c5a5..6de6c9b 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -1472,6 +1472,7 @@ compatibility extensions along with those enabled by 
@option{-std=legacy}.
 * Bitwise logical operators::
 * Extended I/O specifiers::
 * Legacy PARAMETER statements::
+* Default exponents::
 @end menu
 
 @node Old-style kind specifications
@@ -2713,6 +2714,14 @@ real c
 parameter c = 3.0e8
 @end smallexample
 
+@node Default exponents
+@subsection Default exponents
+@cindex exponent
+
+For compatibility, GNU Fortran supports a default exponent of zero in real
+constants with @option{-fdec}.  For example, @code{9e} would be
+interpreted as @code{9e0}, rather than an error.
+
 
 @node Extensions not implemented in GNU Fortran
 @section Extensions not implemented in GNU Fortran
diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c
index dce0f7c..5f50969 100644
--- a/gcc/fortran/io.c
+++ b/gcc/fortran/io.c
@@ -4163,6 +4163,10 @@ get_io_list:
goto syntax;
 }
 
+  /* See if we want to use defaults for missing exponents in real transfers.  
*/
+  if (flag_dec)
+dt->default_exp = 1;
+
   /* A full IO statement has been matched.  Check the constraints.  spec_end is
  supplied for cases where no locus is supplied.  */
   m = check_io_constraints (k, dt, io_code, _end);
diff --git a/gcc/fortran/ioparm.def b/gcc/fortran/ioparm.def
index f1bf733..4669187 100644
--- a/gcc/fortran/ioparm.def
+++ b/gcc/fortran/ioparm.def
@@ -118,4 +118,5 @@ IOPARM (dt,  round, 1 << 23, char2)
 IOPARM (dt,  sign, 1 << 24, char1)
 #define IOPARM_dt_f2003  (1 << 25)
 #define IOPARM_dt_dtio   (1 << 26)
+#define IOPARM_dt_default_exp(1 << 27)
 IOPARM (dt,  u,0,   pad)
diff --git a/gcc/fortran/primary.c b/gcc/fortran/primary.c
index 2101644..f26740d 100644
--- a/gcc/fortran/primary.c
+++ b/gcc/fortran/primary.c
@@ -483,7 +483,7 @@ backup:
 static match
 match_real_constant (gfc_expr **result, int signflag)
 {
-  int kind, count, seen_dp, seen_digits, is_iso_c;
+  int kind, count, seen_dp, seen_digits, is_iso_c, default_exponent;
   locus old_loc, temp_loc;
   char *p, *buffer, c, exp_char;
   gfc_expr *e;
@@ -494,6 +494,7 @@ match_real_constant (gfc_expr **result, int signflag)
 
   e = NULL;
 
+  default_exponent = 0;
   count = 0;
   seen_dp = 0;
   seen_digits = 0;
@@ -575,8 +576,14 @@ match_real_constant (gfc_expr **result, int signflag)
 
   if (!ISDIGIT (c))
 {
-  gfc_error ("Missing exponent in real number at %C");
-  return MATCH_ERROR;
+  /* With -fdec, default exponent to 0 instead of complaining.  */
+  if (flag_dec)
+   default_exponent = 1;
+  else
+   {
+ gfc_error ("Missing exponent in real number at %C");
+ return MATCH_ERROR;
+   }
 }

Re: [match.pd] Fix for PR35691

2016-11-03 Thread Marc Glisse


On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:


On 3 November 2016 at 16:13, Richard Biener  wrote:

On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:


Hi Richard,
The attached patch tries to fix PR35691, by adding the following two
transforms to match.pd:
(x == 0 && y == 0) -> (x | typeof(x)(y)) == 0.
(x != 0 || y != 0) -> (x | typeof(x)(y)) != 0.

For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE,
it gets transformed to bit_and_expr
so to match for both GENERIC and GIMPLE, I had to guard the for-stmt:

#if GENERIC
(for op (truth_andif truth_orif)
#elif GIMPLE
(for op (bit_and bit_ior)
#endif

Is that OK ?


As you are not removing the fold-const.c variant I'd say you should
simply not look for truth_* and only handle GIMPLE.  Note that we
have tree-ssa-ifcombine.c which should handle the variant with
control-flow (but I guess it does not and your patch wouldn't help
it either).

The transform would also work for vectors (element_precision for
the test but also a value-matching zero which should ensure the
same number of elements).

Um sorry, I didn't get how to check vectors to be of equal length by a
matching zero.
Could you please elaborate on that ?


He may have meant something like:

  (op (cmp @0 integer_zerop@2) (cmp @1 @2))

So the last operand is checked with operand_equal_p instead of 
integer_zerop. But the fact that we could compute bit_ior on the 
comparison results should already imply that the number of elements is the 
same. This would also prevent the case where one vector is signed and the 
other unsigned, which requires a view_convert (I don't remember if convert 
automatically becomes view_convert here as in fold_convert or not).


For (some_int == 0) & (some_long == 0), doing ((long)some_int | some_long) 
== 0 should also work (and it doesn't matter if we pick a sign- or 
zero-extension), but that's more complicated, not necessary for a first 
version.


On platforms that have IOR on floats (at least x86 with SSE, maybe some 
vector mode on s390?), it would be cool to do the same for floats (most 
likely at the RTL level).


--
Marc Glisse

Re: [Patch, testsuite] Add new effective-target_store_merge

2016-11-03 Thread Mike Stump

On Nov 3, 2016, at 2:29 AM, Senthil Kumar Selvaraj 
 wrote:
> 
>  The below patch adds a new effective target keyword (store_merge) for
>  use in the store_merging_xxx.c tests.
> 
>  The tests currently require non_strict_align, but they still fail for the 
> avr.
>  Eyeballing the dump, I found that the pass doesn't attempt merging as it is
>  unprofitable for a target like the avr which has only single byte
>  stores.
> 
>  I figured store merging is unlikely to be profitable for targets with
>  smallish word sizes, and added a check_effective_target_store_merge
>  that combines non_strict_align and int32plus.
> 
>  Is this ok for trunk?

Ok.

If anyone knows of a reason why this would turn off these tests for a target 
that currently tests them and works ok, let us know.

[PATCH, GCC/ARM] Fix PR77904: callee-saved register trashed when clobbering sp

2016-11-03 Thread Thomas Preudhomme


Hi,

When using a callee-saved register to save the frame pointer the Thumb-1 
prologue fails to save the callee-saved register before that. For ARM and 
Thumb-2 targets the frame pointer is handled as a special case but nothing is 
done for Thumb-1 targets. This patch adds the same logic for Thumb-1 targets.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-02  Thomas Preud'homme  

PR target/77904
* config/arm/arm.c (thumb1_compute_save_reg_mask): mark frame pointer
in save register mask if it is needed.


*** gcc/testsuite/ChangeLog ***

2016-11-02  Thomas Preud'homme  

PR target/77904
* gcc.target/arm/pr77904.c: New test.


Testing: Testsuite shows no regression when run with arm-none-eabi GCC 
cross-compiler for Cortex-M0 target.


Is this ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd8d5e5db8ca50daab648e58df290969aa794862..c7bf3320a3db5dfc4f33ae145ff2e5f239d6c0f9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19495,6 +19495,10 @@ thumb1_compute_save_reg_mask (void)
 if (df_regs_ever_live_p (reg) && callee_saved_reg_p (reg))
   mask |= 1 << reg;
 
+  /* Handle the frame pointer as a special case.  */
+  if (frame_pointer_needed)
+mask |= 1 << HARD_FRAME_POINTER_REGNUM;
+
   if (flag_pic
   && !TARGET_SINGLE_PIC_BASE
   && arm_pic_register != INVALID_REGNUM
diff --git a/gcc/testsuite/gcc.target/arm/pr77904.c b/gcc/testsuite/gcc.target/arm/pr77904.c
new file mode 100644
index ..76728c07e73350ce44160cabff3dd2fa7a6ef021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr77904.c
@@ -0,0 +1,45 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__ ((noinline, noclone)) void
+clobber_sp (void)
+{
+  __asm volatile ("" : : : "sp");
+}
+
+int
+main (void)
+{
+  int ret;
+
+  __asm volatile ("mov\tr4, #0xf4\n\t"
+		  "mov\tr5, #0xf5\n\t"
+		  "mov\tr6, #0xf6\n\t"
+		  "mov\tr7, #0xf7\n\t"
+		  "mov\tr0, #0xf8\n\t"
+		  "mov\tr8, r0\n\t"
+		  "mov\tr0, #0xfa\n\t"
+		  "mov\tr10, r0"
+		  : : : "r0", "r4", "r5", "r6", "r7", "r8", "r10");
+  clobber_sp ();
+
+  __asm volatile ("cmp\tr4, #0xf4\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr5, #0xf5\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr6, #0xf6\n\t"
+		  "bne\tfail\n\t"
+		  "cmp\tr7, #0xf7\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r8\n\t"
+		  "cmp\tr0, #0xf8\n\t"
+		  "bne\tfail\n\t"
+		  "mov\tr0, r10\n\t"
+		  "cmp\tr0, #0xfa\n\t"
+		  "bne\tfail\n\t"
+		  "mov\t%0, #1\n"
+		  "fail:\n\t"
+		  "sub\tr0, #1"
+		  : "=r" (ret) : :);
+  return ret;
+}

[PATCH] Fix dwarf2out related bootstrap failure on Solaris (PR debug/78191)

2016-11-03 Thread Jakub Jelinek

Hi!

My recent optimize_abbrev_table optimization apparently broke Solaris
bootstrap.
The bug is specific I think just to -gdwarf-{2,3} -gno-strict-dwarf, where
DW_FORM_exprloc can't be used for location expressions and some location
expression contains DW_OP_GNU_{{const,regval,deref}_type,convert,reinterpret}
and there are at least 128 die abbreviations used by more than one DIE.
Because of the lack of DW_FORM_exprloc we need to decide on
DW_FORM_block{1,2,4} and size the location expression, for that we need to know
the DIE offsets of base offset DIEs that are referenced (earlier code
ensures such types appear very early in the CU, right after the
DW_TAG_compile_unit DIE) and for that their size and their abbreviation values
need to be constant, so the optimize_abbrev_table opts that reorder abbreviation
numbers by decreasing usage count or for -gdwarf-5 attempt to optimize implicit
constants can't be done for the CU and base types.
For -gdwarf-4 and above, we can emit DW_FORM_exprloc, so value_format doesn't
need to compute any sizes and thus calc_base_type_die_sizes shouldn't be
called.

Bootstrapped/regtested on x86_64-linux and i686-linux and Rainer has kindly
tested it on Solaris, ok for trunk?

2016-11-03  Jakub Jelinek  

* dwarf2out.c (size_of_discr_list): Fix typo in function comment.

PR debug/78191
* dwarf2out.c (abbrev_opt_base_type_end): New variable.
(die_abbrev_cmp): Sort dies with die_abbrev smaller than
abbrev_opt_base_type_end only by increasing die_abbrev, before
any other dies.
(optimize_abbrev_table): Don't change abbrev numbers of
base types and CU or optimize implicit consts in them if
calc_base_type_die_sizes has been called during build_abbrev_table.
(calc_base_type_die_sizes): If abbrev_opt_start, set
abbrev_opt_base_type_end to one plus largest base type's
die_abbrev.

--- gcc/dwarf2out.c.jj  2016-11-03 08:47:59.0 +0100
+++ gcc/dwarf2out.c 2016-11-03 12:26:03.192459170 +0100
@@ -1909,7 +1909,7 @@ size_of_discr_value (dw_discr_value *dis
 return size_of_sleb128 (discr_value->v.sval);
 }
 
-/* Return the size of the value in a DW_discr_list attribute.  */
+/* Return the size of the value in a DW_AT_discr_list attribute.  */
 
 static int
 size_of_discr_list (dw_discr_list_ref discr_list)
@@ -8548,6 +8548,11 @@ optimize_external_refs (dw_die_ref die)
 /* First abbrev_id that can be optimized based on usage.  */
 static unsigned int abbrev_opt_start;
 
+/* Maximum abbrev_id of a base type plus one (we can't optimize DIEs with
+   abbrev_id smaller than this, because they must be already sized
+   during build_abbrev_table).  */
+static unsigned int abbrev_opt_base_type_end;
+
 /* Vector of usage counts during build_abbrev_table.  Indexed by
abbrev_id - abbrev_opt_start.  */
 static vec abbrev_usage_count;
@@ -8646,12 +8651,16 @@ die_abbrev_cmp (const void *p1, const vo
   gcc_checking_assert (die1->die_abbrev >= abbrev_opt_start);
   gcc_checking_assert (die2->die_abbrev >= abbrev_opt_start);
 
-  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
-  > abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
-return -1;
-  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
-  < abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
-return 1;
+  if (die1->die_abbrev >= abbrev_opt_base_type_end
+  && die2->die_abbrev >= abbrev_opt_base_type_end)
+{
+  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
+ > abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
+   return -1;
+  if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start]
+ < abbrev_usage_count[die2->die_abbrev - abbrev_opt_start])
+   return 1;
+}
 
   /* Stabilize the sort.  */
   if (die1->die_abbrev < die2->die_abbrev)
@@ -8729,10 +8738,12 @@ optimize_abbrev_table (void)
   sorted_abbrev_dies.qsort (die_abbrev_cmp);
 
   unsigned int abbrev_id = abbrev_opt_start - 1;
-  unsigned int first_id = 0;
+  unsigned int first_id = ~0U;
   unsigned int last_abbrev_id = 0;
   unsigned int i;
   dw_die_ref die;
+  if (abbrev_opt_base_type_end > abbrev_opt_start)
+   abbrev_id = abbrev_opt_base_type_end - 1;
   /* Reassign abbreviation ids from abbrev_opt_start above, so that
 most commonly used abbreviations come first.  */
   FOR_EACH_VEC_ELT (sorted_abbrev_dies, i, die)
@@ -8740,10 +8751,15 @@ optimize_abbrev_table (void)
  dw_attr_node *a;
  unsigned ix;
 
+ /* If calc_base_type_die_sizes has been called, the CU and
+base types after it can't be optimized, because we've already
+calculated their DIE offsets.  We've sorted them first.  */
+ if (die->die_abbrev < abbrev_opt_base_type_end)
+   continue;
  if (die->die_abbrev != last_abbrev_id)
{

Re: [PATCH] Print repeated rtl vector elements in a nicer way

2016-11-03 Thread Bernd Schmidt


On 11/03/2016 05:35 PM, Martin Jambor wrote:


* print-rtl.c (print_rtx_operand_codes_E_and_V): Print how many times
an element is repeated istead of printing each repeated element.


"instead"


---
 gcc/print-rtl.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 341ecdf..9752c85 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -252,7 +252,20 @@ rtx_writer::print_rtx_operand_codes_E_and_V (const_rtx 
in_rtx, int idx)
m_sawclose = 1;

   for (int j = 0; j < XVECLEN (in_rtx, idx); j++)
-   print_rtx (XVECEXP (in_rtx, idx, j));
+   {
+ int j1;
+
+ print_rtx (XVECEXP (in_rtx, idx, j));
+ for (j1 = j + 1; j1 < XVECLEN (in_rtx, idx); j1++)
+   if (XVECEXP (in_rtx, idx, j) != XVECEXP (in_rtx, idx, j1))
+ break;
+
+ if (j1 != j + 1)
+   {
+ fprintf (m_outfile, " repeated %ix", j1 - j);
+ j = j1;
+   }
+   }


Good idea, but can you give an example of how this looks in practice? 
Also, it would be nice (and necessary for David's rtl-testing) to also 
teach the rtl reader to parse this format.



Bernd

[PATCH] DW_TAG_ptr_to_member_type for PMF and DW_AT_{,rvalue_}reference for those (take 2)

2016-11-03 Thread Jakub Jelinek

On Wed, Nov 02, 2016 at 01:19:09PM -0400, Jason Merrill wrote:
> On Wed, Nov 2, 2016 at 12:33 PM, Jakub Jelinek  wrote:
> > On Wed, Nov 02, 2016 at 04:44:05PM +0100, Jakub Jelinek wrote:
> >> which means if gen_type_die or gen_type_die_with_usage doesn't
> >> use the langhook, then we'd emit a completely useless { __pfn; __delta }
> >> struct into debug info first, and then in modified_type_die used
> >> the langhook, get OFFSET_TYPE and probably create the
> >> DW_TAG_ptr_to_member_type.  So I think we really need that.
> >>
> >> > > 2) it is used for something Ada-ish I really don't know how to test 
> >> > > etc.
> >> > >to be able to find out if it is safe to call it in
> >> > >gen_type_die_with_usage too
> >> >
> >> > You could find an Ada test that uses the code and verify that the
> >> > output stays the same?
> >>
> >> I can try to find the patch that introduced it and if it contained any
> >> testcases.
> >
> > I couldn't find any unfortunately.  Pierre, could you please test if the
> > following patch doesn't regress anything in the Ada debug info area?
> >
> > Here is updated patch I'm going to bootstrap/regtest; it generates the
> > same debuginfo on ref-3.C testcase.
> 
> Looks good.

Keith's testing revealed a bug in the patch, that we don't emit
DW_TAG_typedef for PMF typedefs.  Fixed by adding && !typedef_variant_p (type)
check to the cp_get_debug_type hook, so that we emit the proper typedef
with the right name and only its DECL_ORIGINAL_TYPE is replaced with
OFFSET_TYPE handling.  I believe that was the only issue, so I think it
should be ok to enable it always, not just for -gdwarf-5.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-03  Jakub Jelinek  
Alexandre Oliva  
Jason Merrill  

PR debug/28767
PR debug/56974
* langhooks.h (struct lang_hooks_for_types): Document type_hash_eq
being also called on METHOD_TYPEs.  Add type_dwarf_attribute langhook.
* langhooks.c (lhd_type_dwarf_attribute): New function.
* langhooks-def.h (lhd_type_dwarf_attribute): Declare.
(LANG_HOOKS_TYPE_DWARF_ATTRIBUTE): Define.
(LANG_HOOKS_FOR_TYPES_INITIALIZER): Add
LANG_HOOKS_TYPE_DWARF_ATTRIBUTE.
* tree.h (check_lang_type): Declare.
* tree.c (check_lang_type): New function.
(check_qualified_type, check_aligned_type): Call it.
* dwarf2out.c (modified_type_die): Don't use type_main_variant
for FUNCTION_TYPE or METHOD_TYPE, instead walk over variants with
check_base_type and check_lang_type.
(gen_ptr_to_mbr_type_die): If lookup_type_die is already non-NULL,
return early.  For pointer-to-data-member add DW_AT_use_location
attribute.
(gen_subroutine_type_die): Add DW_AT_{,rvalue_}reference attribute
if needed.
(gen_type_die_with_usage): Don't use type_main_variant
for FUNCTION_TYPE or METHOD_TYPE, instead walk over variants with
check_base_type and check_lang_type.  Formatting fixes. Call
get_debug_type langhook.
cp/
* tree.c (cp_check_qualified_type): Use check_base_type and
TYPE_QUALS comparison instead of check_qualified_type.
(cxx_type_hash_eq): Return false if type_memfn_rqual don't match.
* cp-objcp-common.c (cp_get_debug_type): New function.
(cp_decl_dwarf_attribute): Don't handle types here.
(cp_type_dwarf_attribute): New function.
* cp-objcp-common.h (cp_get_debug_type, cp_type_dwarf_attribute):
Declare.
(LANG_HOOKS_GET_DEBUG_TYPE, LANG_HOOKS_TYPE_DWARF_ATTRIBUTE):
Define.
testsuite/
* g++.dg/debug/dwarf2/ptrdmem-1.C: New test.
* g++.dg/debug/dwarf2/ref-3.C: New test.
* g++.dg/debug/dwarf2/ref-4.C: New test.
* g++.dg/debug/dwarf2/refqual-1.C: New test.
* g++.dg/debug/dwarf2/refqual-2.C: New test.

--- gcc/langhooks.h.jj  2016-11-03 08:47:59.373747992 +0100
+++ gcc/langhooks.h 2016-11-03 13:08:27.003538880 +0100
@@ -120,7 +120,7 @@ struct lang_hooks_for_types
   /* Return TRUE if TYPE1 and TYPE2 are identical for type hashing purposes.
  Called only after doing all language independent checks.
  At present, this function is only called when both TYPE1 and TYPE2 are
- FUNCTION_TYPEs.  */
+ FUNCTION_TYPE or METHOD_TYPE.  */
   bool (*type_hash_eq) (const_tree, const_tree);

   /* Return TRUE if TYPE uses a hidden descriptor and fills in information
@@ -162,6 +162,10 @@ struct lang_hooks_for_types
  for the debugger about scale factor, etc.  */
   bool (*get_fixed_point_type_info) (const_tree,
 struct fixed_point_type_info *);
+
+  /* Returns -1 if dwarf ATTR shouldn't be added for TYPE, or the attribute
+ value otherwise.  */
+  int (*type_dwarf_attribute) (const_tree, int);
 };

 /* Language hooks

[PATCH] Print repeated rtl vector elements in a nicer way

2016-11-03 Thread Martin Jambor

Hi,

this patch comes from our GCN BE branch.  Because vectors of that
architectures have many elements, RTL dumps can get quite unwieldy when
all elements are printed out all the time.  However, pretty much all the
time it is the same value repeated over and over again, which lead us to
the following patch which just prints how many times a given value
occurs.

Honza has asked me yesterday to submit this hunk upstream so here it
is.  It has passed bootstrap and testing on x86_64-linux and
aarch64-linux, OK for trunk?

Thanks,

Martin


2016-11-02  Jan Hubicka  
Martin Jambor  

* print-rtl.c (print_rtx_operand_codes_E_and_V): Print how many times
an element is repeated istead of printing each repeated element.
---
 gcc/print-rtl.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 341ecdf..9752c85 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -252,7 +252,20 @@ rtx_writer::print_rtx_operand_codes_E_and_V (const_rtx 
in_rtx, int idx)
m_sawclose = 1;
 
   for (int j = 0; j < XVECLEN (in_rtx, idx); j++)
-   print_rtx (XVECEXP (in_rtx, idx, j));
+   {
+ int j1;
+
+ print_rtx (XVECEXP (in_rtx, idx, j));
+ for (j1 = j + 1; j1 < XVECLEN (in_rtx, idx); j1++)
+   if (XVECEXP (in_rtx, idx, j) != XVECEXP (in_rtx, idx, j1))
+ break;
+
+ if (j1 != j + 1)
+   {
+ fprintf (m_outfile, " repeated %ix", j1 - j);
+ j = j1;
+   }
+   }
 
   m_indent -= 2;
 }
-- 
2.10.1

Re: [ipa-vrp] ice in set_value_range

2016-11-03 Thread Martin Jambor

Hi,

On Fri, Oct 28, 2016 at 01:58:13PM +1100, kugan wrote:
> > Do I understand it correctly that extract_range_from_unary_expr deals
> > with any potential type conversions better (compared to what you did
> > before here)?
> 
> Yes, this can be wrong at times too as reported in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121. I have separated this
> part of the patch with a testcase.
> 
> Please note that I am using fold_convert in the attached patch.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu with no new
> regressions. Is this OK for trunk?
> 

I have no objections, but we need to wait for Honza.

Thanks,

Martin

> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2016-10-28  Kugan Vivekanandarajah  
> 
>   PR ipa/78121
>   * ipa-cp.c (propagate_vr_accross_jump_function): Pass param type.
>   Also fold constant passed as argument while computing value range.
>   (propagate_constants_accross_call): Pass param type.
>   * ipa-prop.c: export ipa_get_callee_param_type.
>   * ipa-prop.h: export ipa_get_callee_param_type.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-10-28  Kugan Vivekanandarajah  
> 
>   PR ipa/78121
>   * gcc.dg/ipa/pr78121.c: New test.

[openacc] add warnings for unused parallelism

2016-11-03 Thread Cesar Philippidis

OpenACC permits the user to request more gang, worker and vector level
parallelism than what the compiler can utilize. For instance, if the
user writes worker routine without including a worker-partitioned loop,
the compiler will not generate worker-partitioned code for that function.

The intent behind this patch is to warn the user of any potentially
unutilized parallelism as a debugging aid. Users often find it
disconcerting when their code doesn't speed up despite explicitly
setting num_gangs, num_workers and vector_length. This patch at least
warns them not to expect parallelism across a specific axis.

Is this patch OK for trunk? This patch was originally posted by Nathan
here . Most of
the changes in that patch are already in trunk.

Cesar
2016-11-03  Cesar Philippidis  
	Nathan Sidwell 

	gcc/
	* omp-low.c (oacc_validate_dims): Emit warnings about strange
	partitioning choices.

	gcc/testsuite/
	* c-c++-common/goacc/pr70688.c (parallel_reduction): Adjust expected
	warnings.
	* c-c++-common/goacc/routine-1.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* gcc.dg/goacc/loop-processing-1.c (void vector_1): Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/routine-4.f90: Likewise.
	* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
	* gfortran.dg/goacc/vector_length.f90: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/crash-1.c: Adjust to account
	for insufficient parallelism warnings.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
	* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e5b9e4c..cbf4f3e 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -18882,6 +18882,36 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used)
   pos = TREE_CHAIN (pos);
 }
 
+  bool check = true;
+#ifdef ACCEL_COMPILER
+  /* When device_type is implemented, we should also check on the
+ target, if device_type has been used to affect the partitioning
+ and/or dimensions.  */
+  check = false;
+#endif
+  if (!is_kernel && check)
+{
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+	{"gang", "worker", "vector" };
+  for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+	if (dims[ix] < 0)
+	  ; /* Defaulting axis.  */
+	else if ((used & GOMP_DIM_MASK (ix)) && dims[ix] == 1)
+	  /* There is partitioned execution, but the user requested a
+	 dimension size of 1.  They're probably confused.  */
+	  warning_at (DECL_SOURCE_LOCATION (fn), 0,
+		  "region contains %s partitoned code but"
+		  " is not %s partitioned", axes[ix], axes[ix]);
+	else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+	  /* The dimension is explicitly partitioned to non-unity, but
+	 no use is made within the region.  */
+	  warning_at (DECL_SOURCE_LOCATION (fn), 0,
+		  "region is %s partitioned but"
+		  " does not contain %s partitioned code",
+		  axes[ix], axes[ix]);
+}
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level);
 
   /* Default anything left to 1 or a partitioned default.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/pr70688.c b/gcc/testsuite/c-c++-common/goacc/pr70688.c
index 5a23665..37c3885 100644
--- a/gcc/testsuite/c-c++-common/goacc/pr70688.c
+++ b/gcc/testsuite/c-c++-common/goacc/pr70688.c
@@ -1,3 +1,5 @@
+/* { dg-compile } */
+
 const int n = 100;
 
 int
@@ -21,7 +23,7 @@ parallel_reduction ()
 
 #pragma acc data copy (dummy)
   {
-#pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum)
+#pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum) /* { dg-warning "region is gang partitioned" } */
 {
   int v = 5;
   sum += 10 + v;
@@ -36,11 +38,11 @@ main ()
 {
   int i, s = 0;
 
-#pragma acc parallel num_gangs (10) copy (s) reduction (+:s)
+#pragma acc parallel num_gangs (10) copy (s) reduction (+:s) /* {

Re: [PATCH] Add mcpu flag for Qualcomm falkor core

2016-11-03 Thread Andrew Pinski

On Thu, Nov 3, 2016 at 3:03 AM, Siddhesh Poyarekar
 wrote:
> This adds an mcpu option for the upcoming Qualcomm Falkor core.  This
> is identical to the qdf24xx part that was added earlier and hence
> retains the same tuning structure and continues to have the a57
> pipeline for now.  The part number has also been changed and this
> patch fixes this for both qdf24xx and falkor options.
>
> Tested aarch64-linux-gnu and arm-linux-gnuabihf and did not find any
> regressions resulting from this patch.


This patch no longer applies after the recent changes (starting around
a month ago) to aarch64-cores.def.  Please updat the patch for the
recent changes

Thanks,
Andrew Pinski

>
> Siddhesh
>
> * gcc/config/aarch64/aarch64-cores.def (qdf24xx): Update part
> number.
> (falkor): New core.
> * gcc/config/aarch64/aarch64-tune.md: Regenerated.
> * gcc/config/arm/arm-cores.def (falkor): New core.
> * gcc/config/arm/arm-tables.opt: Regenerated.
> * gcc/config/arm/arm-tune.md: Regenerated.
> * gcc/config/arm/bpabi.h (BE8_LINK_SPEC): Add falkor support.
> * gcc/config/arm/t-aprofile (MULTILIB_MATCHES): Add falkor
> support.
> * gcc/doc/invoke.texi (AArch64 Options/-mtune): Add falkor.
> (ARM Options/-mtune): Add falkor.
>
> ---
>  gcc/config/aarch64/aarch64-cores.def | 3 ++-
>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>  gcc/config/arm/arm-cores.def | 1 +
>  gcc/config/arm/arm-tables.opt| 3 +++
>  gcc/config/arm/arm-tune.md   | 2 +-
>  gcc/config/arm/bpabi.h   | 2 ++
>  gcc/config/arm/t-aprofile| 1 +
>  gcc/doc/invoke.texi  | 9 +
>  8 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index d9da257..0aad9a9 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -46,7 +46,8 @@ AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  
> AARCH64_FL_FOR_ARCH8 | AA
>  AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
>  AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09")
>  AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
> -AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0x800")
> +AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0xC00")
> +AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0xC00")
>  AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
>  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
> xgene1, "0x50", "0x000")
>
> diff --git a/gcc/config/aarch64/aarch64-tune.md 
> b/gcc/config/aarch64/aarch64-tune.md
> index 022b131..29afcdf 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> -   
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> +   
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,falkor,qdf24xx,thunderx,xgene1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
> index 2072e1e..1a382e1 100644
> --- a/gcc/config/arm/arm-cores.def
> +++ b/gcc/config/arm/arm-cores.def
> @@ -174,6 +174,7 @@ ARM_CORE("cortex-a72",  cortexa72, cortexa57,   8A,   
>   ARM_FSET_MAKE_CPU1 (FL_LDSCHED
>  ARM_CORE("cortex-a73", cortexa73, cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a73)
>  ARM_CORE("exynos-m1",  exynosm1,  exynosm1,8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), exynosm1)
>  ARM_CORE("qdf24xx",qdf24xx,   cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
> +ARM_CORE("falkor", falkor,cortexa57,   8A, ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), qdf24xx)
>  ARM_CORE("xgene1",  xgene1,xgene1,  8A,ARM_FSET_MAKE_CPU1 
> (FL_LDSCHED | FL_FOR_ARCH8A),xgene1)
>
>  /* V8 big.LITTLE implementations

[SPARC] Remove couple of obsolete patterns

2016-11-03 Thread Eric Botcazou

The vec_interleave_{low, high} standard patterns were removed 5 years ago.

Tested on SPARC/Solaris, applied on the mainline.


2016-11-03  Eric Botcazou  

* config/sparc/sparc.md (vec_interleave_lowv8qi): Delete.
(vec_interleave_highv8qi): Likewise.

-- 
Eric BotcazouIndex: config/sparc/sparc.md
===
--- config/sparc/sparc.md	(revision 241808)
+++ config/sparc/sparc.md	(working copy)
@@ -8844,34 +8844,6 @@ (define_insn "fpmerge_vis"
  [(set_attr "type" "fga")
   (set_attr "fptype" "double")])
 
-(define_insn "vec_interleave_lowv8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=e")
-(vec_select:V8QI
-  (vec_concat:V16QI (match_operand:V8QI 1 "register_operand" "f")
-(match_operand:V8QI 2 "register_operand" "f"))
-  (parallel [(const_int 0) (const_int 8)
- (const_int 1) (const_int 9)
- (const_int 2) (const_int 10)
-		 (const_int 3) (const_int 11)])))]
- "TARGET_VIS"
- "fpmerge\t%L1, %L2, %0"
- [(set_attr "type" "fga")
-  (set_attr "fptype" "double")])
-
-(define_insn "vec_interleave_highv8qi"
-  [(set (match_operand:V8QI 0 "register_operand" "=e")
-(vec_select:V8QI
-  (vec_concat:V16QI (match_operand:V8QI 1 "register_operand" "f")
-(match_operand:V8QI 2 "register_operand" "f"))
-  (parallel [(const_int 4) (const_int 12)
- (const_int 5) (const_int 13)
- (const_int 6) (const_int 14)
-		 (const_int 7) (const_int 15)])))]
- "TARGET_VIS"
- "fpmerge\t%H1, %H2, %0"
- [(set_attr "type" "fga")
-  (set_attr "fptype" "double")])
-
 ;; Partitioned multiply instructions
 (define_insn "fmul8x16_vis"
   [(set (match_operand:V4HI 0 "register_operand" "=e")

[PATCH] combine lhs zero_extract fix (PR78186)

2016-11-03 Thread Segher Boessenkool

I managed to forget to mask the thing inserted.  Tested on powerpc64-linux
{-m32,-m64}, and Bin tested on arm.  Applying to trunk.


Segher


2016-11-03  Segher Boessenkool  

PR rtl-optimization/78186
* combine.c (change_zero_ext): Mask the RHS of a zero_extract as
well, when converting to IOR.

---
 gcc/combine.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index 7c21fe4..7ed0a62 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11224,6 +11224,9 @@ change_zero_ext (rtx pat)
   rtx x = gen_rtx_AND (mode, reg, immed_wide_int_const (mask, mode));
   rtx y = simplify_gen_binary (ASHIFT, mode, SET_SRC (pat),
   GEN_INT (offset));
+  wide_int mask2 = wi::shifted_mask (offset, width, false, reg_width);
+  y = simplify_gen_binary (AND, mode, y,
+  immed_wide_int_const (mask2, mode));
   rtx z = simplify_gen_binary (IOR, mode, x, y);
   SUBST (SET_DEST (pat), reg);
   SUBST (SET_SRC (pat), z);
-- 
1.9.3

Re: [PATCH, RFC] Improve ivopts group costs

2016-11-03 Thread Bin.Cheng

On Thu, Nov 3, 2016 at 1:35 PM, Evgeny Kudryashov  wrote:
> Hello,
>
> I'm facing the following problem related to ivopts. The problem is that GCC
> generates a lot of induction variables and as a result there is an
> unnecessary increase of stack usage and register pressure.
>
> For instance, for the attached testcase (tc_ivopts.c) GCC generates 26
> induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for
> rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm target
> using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For reference, I'm
> attaching assembly I get on current trunk.
>
> The reason might be in use groups costs, in particular, in the way of
> estimation. Currently, the cost of a tuple (group, candidate) is a sum of
> per-use costs in the group. So, the cost of a group grows proportional to
> the number of uses in the group. This approach has a negative effect on the
> algorithm for finding the best set of induction variables: the part of a
> total cost related to adding a new candidate is almost washed out by the
> cost of the group. In addition, when there is a lot of groups with many uses
> inside and a target is out of free registers, the cost of spill is washing
> out too. As a result, GCC prefers to use native candidates (candidate
> created for a particular group) for each group of uses instead of
> considering the real cost of introducing a new variable into a set.
>
> The summing approach was added as a part of this patch
> (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the
> motivation for taking the sum does not seem to be
> specifically discussed.
>
> I propose the following patch that changes a group cost from cost summing to
> selecting the largest one inside a group. For the given test case I have:
> necessary size of stack is decreased by almost 3 times and ldr\str amount
> are decreased by less than 2 times. Also I'm attaching assembly after
> applying the patch.
>
> The essential change in the patch is just:
>
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index f9211ad..a149418 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct ivopts_data
> *data,
>  offset and where the iv_use is.  */
> cost = get_computation_cost (data, next, cand, true,
>  NULL, _autoinc, NULL);
> -  sum_cost += cost;
> +  if (sum_cost < cost)
> +sum_cost = cost;
>  }
>set_group_iv_cost (data, group, cand, sum_cost, depends_on,
>  NULL_TREE, ERROR_MARK, inv_expr);
>
> Any suggestions?
Hi Evgeny,

I tend to be practical here.  Given cost is not model that well in
IVOPT, any heuristic tuning in cost is quite likely resulting in
improvement in some cases, while regressions in others.  At least we
need to run good number of benchmarks for such changes.  Specific to
this one, the summary of cost in a group is a natural result of the
original code, in which uses' cost is added up before candidate
selection.  Take your code as an example, choosing loop's original
candidate for group results in lots of memory accesses with [base +
index << scale] addressing mode, which could have higher cost than
[base + offset] mode wrto u-arch, IMHO, it makes sense to add up this
cost.  OTOH, I totally agree that IVOPT tends to choose too many
candidates at the moment, especially for large loops.  Is it possible
to achieve the same effect by penalizing register pressure cost?
Meanwhile, I can collect benchmark data for your patch on AArch64 and
get back to you later.

Thanks,
bin
>
>
> Thanks,
> Evgeny.

Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-03 Thread Dominik Vogt

On Thu, Nov 03, 2016 at 04:29:05PM +0100, Eric Botcazou wrote:
> > Is VRP the right pass to do this optimisation or should a later
> > pass rather attempt to eliminate the new use of b_5 instead?  Uli
> > has brought up the idea a mini "sign extend elimination" pass that
> > checks if the result of a sign extend could be replaced by the
> > original quantity in all places, and if so, eliminate the ssa
> > name.
> 
> Did you try to enable -free on s390x?  It's a RTL pass.

Just tried it, but it does not seem to make a difference:

  Trying to eliminate extension:
  (insn 8 6 10 2 (set (reg:SI 1 %r1 [orig:63 b+-3 ] [63])
  (zero_extend:SI (reg/v:QI 2 %r2 [orig:60 b ] [60]))) y.c:9
  1258 {*zero_extendqisi2_extimm}
   (nil))
  Elimination opportunities = 1 realized = 0

I guess when in Rtl format, the code is already too convoluted to
be cleaned up.  Before combine - which should make use of some
advanced s390x instruction (risbg) but doesn't - Rtl is this:

(insn 2 4 3 2 (set (reg/v:DI 62 [ a+-7 ])
(reg:DI 2 %r2 [ a+-7 ])) y.c:3 1074 {*movdi_64}
# (SImode)(a & 63)
(insn 6 3 8 2 (parallel [
(set (subreg:SI (reg/v:QI 60 [ b ]) 0)
(and:SI (subreg/s/v:SI (reg/v:DI 62 [ a+-7 ]) 4)
(const_int 63 [0x3f])))
(clobber (reg:CC 33 %cc))
]) y.c:7 1526 {*andsi3_zarch}
(insn 8 6 9 2 (set (reg:SI 63 [ b+-3 ])
(zero_extend:SI (reg/v:QI 60 [ b ]))) y.c:9 1258
{*zero_extendqisi2_extimm}
# Compare quantity in SImode
(insn 9 8 10 2 (set (reg:CCU 33 %cc)
(compare:CCU (reg:SI 63 [ b+-3 ])
(const_int 9 [0x9]))) y.c:9 1047 {*cmpsi_ccu}
(jump_insn 10 9 11 2 (set (pc)
(if_then_else (leu (reg:CCU 33 %cc)
(const_int 0 [0]))
(label_ref:DI 18)
(pc))) y.c:9 1706 {*cjump_64}
# Extend to DImode
(insn 12 11 13 3 (set (reg:DI 64 [ l ])
(zero_extend:DI (reg/v:QI 60 [ b ]))) y.c:8 1257
{*zero_extendqidi2_extimm}
# Load DImode function argument for call to bar()
(insn 13 12 14 3 (set (reg:DI 2 %r2)
(reg:DI 64 [ l ])) y.c:10 1074 {*movdi_64}
...

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

2016-11-03 Thread Segher Boessenkool

On Thu, Nov 03, 2016 at 09:29:07AM +0100, Richard Biener wrote:
> On Wed, Nov 2, 2016 at 4:27 PM, Segher Boessenkool
>  wrote:
> > On Wed, Nov 02, 2016 at 02:51:41PM +0100, Richard Biener wrote:
> >> >> I don't believe it needs a cleanup on every iteration. One cleanup at
> >> >> the end should work fine.
> >> >
> >> > But as the comment there says:
> >> >
> >> >   /* Merge the duplicated blocks into predecessors, when possible.  
> >> > */
> >> >   cleanup_cfg (0);
> >> >
> >> > (this is not a new comment), and without merging blocks this whole
> >> > patch does zilch.
> >> >
> >> > There is no need to create new basic blocks at all (can replace the
> >> > final branch in a block with a copy of the whole block it jumps to,
> >> > instead); and then it is painfully obvious that switching to layout
> >> > mode here is pointless, too.
> >> >
> >> > Do you want me to do that?
> >> >
> >> > Btw, this isn't quadratic anyway; it is a constant number (the maximal
> >> > length allowed of the block to copy) linear.  Unless there are 
> >> > unboundedly
> >> > many forwarder blocks, which there shouldn't be, but I cannot prove that.
> >>
> >> Well, you iterate calling functions (find candidates, merge, cleanup-cfg) 
> >> that
> >> walk the whole function.  So unless the number of iterations is bound
> >> with a constant I call this quadratic (in function size).
> >
> > It is bounded (with that caveat above): new candidates will be bigger than
> > the block merged into it, so there won't be more than
> > PARAM_MAX_GOTO_DUPLICATION_INSNS passes.
> >
> > But I can make it all work simpler, in non-layout mode, if you prefer.
> 
> Iteration is fine but we should restrict where we look for new
> candidates.  Likewise
> block merging opportunities should be easier to exploit than by
> running cfg-cleanup...

Yes, but that was what the original code does already, so I didn't look
deeper.  "It was just a simple bugfix".

> I'm just thinking of those gigantic machine-generated state machines where we
> ATM do quite well (at least at -O1 ...) with respect to compile-time.

Like I said, I tested it on a 2000 node VM, very artificial so almost
no code except the threading itself, and the compgotos pass takes less
than 1% time both before and after the patch.  I ony tested at -O2 though.

I'll get back to it, but first I have some things that need handling during
stage 1.


Segher

Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-03 Thread Eric Botcazou

> Is VRP the right pass to do this optimisation or should a later
> pass rather attempt to eliminate the new use of b_5 instead?  Uli
> has brought up the idea a mini "sign extend elimination" pass that
> checks if the result of a sign extend could be replaced by the
> original quantity in all places, and if so, eliminate the ssa
> name.

Did you try to enable -free on s390x?  It's a RTL pass.

-- 
Eric Botcazou

Re: [PATCH] PR77359: Properly align local variables in functions calling alloca.

2016-11-03 Thread David Edelsohn

[Please cc me and Segher on PowerPC / AIX patches.]

I bootstrapped successfully with the patch and ran the testsuite.

Without patch:
https://gcc.gnu.org/ml/gcc-testresults/2016-11/msg00186.html

With patch:
https://gcc.gnu.org/ml/gcc-testresults/2016-11/msg00233.html

Not exactly the same revision, but close enough and no differences.

The patch doesn't break anything, but we still need to understand if
it is correct.

Thanks, David

Re: [patch,testsuite] Support dg-require-effective-target label_offsets.

2016-11-03 Thread Georg-Johann Lay


On 28.10.2016 17:58, Mike Stump wrote:

On Oct 27, 2016, at 3:16 AM, Georg-Johann Lay  wrote:


Now imagine some arithmetic like & - &  This might result in
one or two stub addresses, and difference between such addresses is a
complete different thing than the difference between the original labels:
The result might differ in absolute value and in sign, i.e. you can't even
determine whether LAB1 or LAB2 comes first in the generated code as the
order of stubs might differ from the order of respective labels.


So, I think this all doesn't matter any.  Every address gs(LAB) fits in
16-bits by definition, and every gs(LAB1) - gs(LAB2) fits into 16 bits and


Yes, you are right.  Label differences could be implemented.  AFAIK there is 
currently no activity by the Binutils folks to add respective support, so it is 
somewhat pointless to add that support to avr-gcc...


Bottom line is that label offsets are not supported by avr BE, and the 
intention of the patch is to reduce FAIL noise in testsuite results because of 
the missing support.


If a dg-require is not appropriate, should I rather add dg-skip-if to every 
test case?  I'd still prefer the dg-require solution because if the Binutils 
will ever support label differences, then the testsuite will automatically 
catch up with that.



thus is valid for all 16-bit one contexts.  The fact the order between the
stub and the actual code is different is irrelevant, it is a private


Only if the code is not executed; there are several test cases that compute 
relative placements of labels like:


x(){if(&&<0)x();b:goto*&e:;}

If a test ever relies on the fact that & - & tells anything about the 
ordering of the labels, the code is about to crash.



implementation detail of the port, the point is the semantics are fixed and
constant and useful.  In deed that there is even a stub is a private
implementation detail of the port.  I think the `extra' helpful warning from
avr_print_operand_address is wrong and should be removed.  Think of the


The following code simple makes absolutely no sense in the presence of stubs:
int
test (int foo)
{
  static void *dummy[] = { &, & };
  goto *((char *) & - 2 * (foo < 0));
a:
b:
  return 0;
}

It's only a compile test (the original PR66123 is about ICE), but the compiler 
cannot know that it's just a crazy test case that won't be executed...


When you are offsetting from a stub you might end up *anywhere*, even in some 
data range.



label as gs(LAB), not LAB, burn LAB from your mind.  Once you do that, you
see you can't talk about the order LAB1 > LAB2, the concept doesn't make any
sense.  The _only_ think you can talk about is gs(LAB1) > gs(LAB2).  And
because of that, it is always consistent and works just fine.

Once that misguided complains from gcc and bisutils are fixed, are their any
failing cases?


State of matters is that Binutils support is missing, and if I understand you 
correctly, dg-require is not appropriate to factor out availability of such 
features?


Johann

Re: [PATCH] Make direct emission of time profiler counter

2016-11-03 Thread Jan Hubicka

> 
> 2016-10-31  Martin Liska  
> 
>   * libgcov-profiler.c (__gcov_time_profiler): Remove.
>   (__gcov_time_profiler_atomic): Likewise.
> 
> gcc/ChangeLog:
> 
> 2016-10-31  Martin Liska  
> 
>   * profile.c (instrument_values): Fix coding style.
>   (branch_prob): Use renamed function.
>   * tree-profile.c (init_ic_make_global_vars): Likewise.
>   (gimple_init_edge_profiler): Rename to
>   gimple_init_gcov_profiler.
>   tree_time_profiler_counter variable declaration.
>   (gimple_gen_time_profiler): Rewrite to do a direct gimple code
>   emission.
>   * value-prof.h: Remove an argument.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-11-03  Martin Liska  
> 
>   * gcc.dg/no_profile_instrument_function-attr-1.c: Update scanned
>   output.
>   * gcc.dg/tree-prof/time-profiler-3.c: New test.

OK,
Thanks!
Honza

Re: [RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-03 Thread Dominik Vogt

On Thu, Nov 03, 2016 at 04:03:22PM +0100, Dominik Vogt wrote:
> I've been trying to fix some bad tree-ssa related optimisation for
> s390x and come up with the attached experimental patch.  The patch
> is not really good - it breaks some situations in which the
> optimisation was useful.  With this code:
...

Missing patch attached.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
>From c1ee9d85c47ce66aa0b5c97739717fefcb07b4ce Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 2 Nov 2016 14:01:46 +0100
Subject: [PATCH] Check number of uses in simplify_cond_using_ranges().

---
 gcc/tree-vrp.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 3c75a0d..e74d935 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9599,7 +9599,12 @@ simplify_cond_using_ranges (gcond *stmt)
 with strict overflow semantics.  */
  && ((!is_negative_overflow_infinity (vr->min)
   && !is_positive_overflow_infinity (vr->max))
- || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (innerop
+ || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (innerop)))
+ /* If the only use of INNEROP is the cast to OP0, and OP0 is also
+used in other places, folding would introduce new uses of the
+otherwise dead INNEROP without eliminating OP0, so do not
+fold.  */
+ && (!has_single_use (innerop) || has_single_use (op0)))
{
  /* If the range overflowed and the user has asked for warnings
 when strict overflow semantics were used to optimize code,
-- 
2.3.0

[RFC] Check number of uses in simplify_cond_using_ranges().

2016-11-03 Thread Dominik Vogt

I've been trying to fix some bad tree-ssa related optimisation for
s390x and come up with the attached experimental patch.  The patch
is not really good - it breaks some situations in which the
optimisation was useful.  With this code:

  void bar(long);
  void foo (char a)
  {
long l;
char b;

b = a & 63;
l = b;
if (l > 9)
  bar(l);
  }

We get this representation before value range propagation:

  ...
  a_4 = *p_3(D);
  b_5 = a_4 & 63;
  l_6 = (long int) b_5;
  if (l_6 > 9)
  ...

Now, there's some code in tree-vrp.c:simplify_cond_using_ranges()
that folds b_5 into the if condition, because l_6 is just a sign
extension of b_5, and the value range of l_6 can also be
represented by the type of b (char).

  if (b_5 > 9)

(On s390x we end up with "a & 63" stored in two separate
registers, extended to 32 bits in one and to 64 bits in the other,
adding up to two unnecessary instructions.)

A naive idea to prevent folding in this situation was to suppress
it if it would introduce a second use of b_5 (i.e. b_5 was only
used in the cast before) while not eliminating all uses of l_6.
However, calling has_single_use() for both purposes proves to be
not good enough, and VRP does not do this kind of optimisation
yet.  It does not catch cases like

  if (l_6 > 9)
...
  else if (l_6 > 7)
...

where all occurences of l_6 could be replaced, and simply looking
at the use counts is too coarse.

--

Is VRP the right pass to do this optimisation or should a later
pass rather attempt to eliminate the new use of b_5 instead?  Uli
has brought up the idea a mini "sign extend elimination" pass that
checks if the result of a sign extend could be replaced by the
original quantity in all places, and if so, eliminate the ssa
name.  (I guess that won't help with the above code because l is
used also as a function argument.)  How could a sensible approach
to deal with the situation look like?

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[PATCH] Make direct emission of time profiler counter

2016-11-03 Thread Martin Liška

Hello.

As Honza noticed we spent quite some time in __gcov_time_profiler:

perf report for Postgres make check command:
   4.10%  postgres  postgres[.]__gcov_time_profiler

Thus I rewrote the profiling code directly to GIMPLE statements:

  _4 = __gcov7.main[0];
  if (_4 == 0)
goto ;
  else
goto ;

  :
  time_profile_5 = __gcov_time_profiler_counter;
  time_profile_6 = time_profile_5 + 1;
  __gcov7.main[0] = time_profile_6;
  __gcov_time_profiler_counter = time_profile_6;

thread-safe variant of the patch:

  _3 = __gcov7.foo[0];
  if (_3 == 0)
goto ;
  else
goto ;

  :
  time_profiler_counter_ptr_4 = (long int) &__gcov_time_profiler_counter;
  time_profile_5 = __atomic_add_fetch_8 (time_profiler_counter_ptr_4, 1, 0);
  time_profile_6 = (long int) time_profile_5;
  __gcov7.foo[0] = time_profile_6;


Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 1a277ec6ef25cddbf0518b679fa8f1928cd3d891 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 27 Oct 2016 09:37:34 +0200
Subject: [PATCH] Make direct emission of time profiler counter

libgcc/ChangeLog:

2016-10-31  Martin Liska  

	* libgcov-profiler.c (__gcov_time_profiler): Remove.
	(__gcov_time_profiler_atomic): Likewise.

gcc/ChangeLog:

2016-10-31  Martin Liska  

	* profile.c (instrument_values): Fix coding style.
	(branch_prob): Use renamed function.
	* tree-profile.c (init_ic_make_global_vars): Likewise.
	(gimple_init_edge_profiler): Rename to
	gimple_init_gcov_profiler.
	tree_time_profiler_counter variable declaration.
	(gimple_gen_time_profiler): Rewrite to do a direct gimple code
	emission.
	* value-prof.h: Remove an argument.

gcc/testsuite/ChangeLog:

2016-11-03  Martin Liska  

	* gcc.dg/no_profile_instrument_function-attr-1.c: Update scanned
	output.
	* gcc.dg/tree-prof/time-profiler-3.c: New test.
---
 gcc/profile.c  |  14 +--
 .../gcc.dg/no_profile_instrument_function-attr-1.c |   2 +-
 gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c   |  22 +
 gcc/tree-profile.c | 106 -
 gcc/value-prof.h   |   5 +-
 libgcc/libgcov-profiler.c  |  23 +
 6 files changed, 111 insertions(+), 61 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c

diff --git a/gcc/profile.c b/gcc/profile.c
index 2564f07..ef38f98 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -192,15 +192,9 @@ instrument_values (histogram_values values)
 	  gimple_gen_ior_profiler (hist, t, 0);
 	  break;
 
-  case HIST_TYPE_TIME_PROFILE:
-{
-  basic_block bb =
- split_edge (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
-  gimple_stmt_iterator gsi = gsi_start_bb (bb);
-
-  gimple_gen_time_profiler (t, 0, gsi);
-  break;
-}
+	case HIST_TYPE_TIME_PROFILE:
+	  gimple_gen_time_profiler (t, 0);
+	  break;
 
 	default:
 	  gcc_unreachable ();
@@ -1305,7 +1299,7 @@ branch_prob (void)
 {
   unsigned n_instrumented;
 
-  gimple_init_edge_profiler ();
+  gimple_init_gcov_profiler ();
 
   n_instrumented = instrument_edges (el);
 
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index c93d171..e0c2600 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -19,5 +19,5 @@ int main ()
 
 /* { dg-final { scan-tree-dump-times "__gcov0\\.main.* = PROF_edge_counter" 1 "optimized"} } */
 /* { dg-final { scan-tree-dump-times "__gcov_indirect_call_profiler_v2" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "__gcov_time_profiler" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__gcov_time_profiler_counter = " 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "__gcov_init" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c
new file mode 100644
index 000..69ce026
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/time-profiler-3.c
@@ -0,0 +1,22 @@
+/* { dg-options "-O2 -fdump-ipa-profile -fprofile-update=atomic" } */
+/* { dg-require-effective-target profile_update_atomic } */
+
+__attribute__ ((noinline))
+int foo()
+{
+  return 0;
+}
+
+__attribute__ ((noinline))
+int bar()
+{
+  return 1;
+}
+
+int main ()
+{
+  return foo ();
+}
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 0" 1 "profile"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 1" 1 "profile"} } */
+/* { dg-final-use-not-autofdo { scan-ipa-dump-times "Read tp_first_run: 2" 1 "profile"} } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index abeee92..09a702f 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -56,9 +56,9 @@

Re: [match.pd] Fix for PR35691

2016-11-03 Thread Prathamesh Kulkarni

On 3 November 2016 at 16:13, Richard Biener  wrote:
> On Thu, 3 Nov 2016, Prathamesh Kulkarni wrote:
>
>> Hi Richard,
>> The attached patch tries to fix PR35691, by adding the following two
>> transforms to match.pd:
>> (x == 0 && y == 0) -> (x | typeof(x)(y)) == 0.
>> (x != 0 || y != 0) -> (x | typeof(x)(y)) != 0.
>>
>> For GENERIC, the "and" operator is truth_andif_expr, and it seems for GIMPLE,
>> it gets transformed to bit_and_expr
>> so to match for both GENERIC and GIMPLE, I had to guard the for-stmt:
>>
>> #if GENERIC
>> (for op (truth_andif truth_orif)
>> #elif GIMPLE
>> (for op (bit_and bit_ior)
>> #endif
>>
>> Is that OK ?
>
> As you are not removing the fold-const.c variant I'd say you should
> simply not look for truth_* and only handle GIMPLE.  Note that we
> have tree-ssa-ifcombine.c which should handle the variant with
> control-flow (but I guess it does not and your patch wouldn't help
> it either).
>
> The transform would also work for vectors (element_precision for
> the test but also a value-matching zero which should ensure the
> same number of elements).
Um sorry, I didn't get how to check vectors to be of equal length by a
matching zero.
Could you please elaborate on that ?
Thanks!
>
> Richard.

Re: [patch, libgomp, OpenACC] Additional enter/exit data map handling

2016-11-03 Thread Chung-Lin Tang


Ping this patch again.

On 2016/9/21 12:43 AM, Cesar Philippidis wrote:
>> +/* Returns the number of mappings associated with the pointer or pset. PSET
>> > +   have three mappings, whereas pointer have two.  */
>> > +
>> >  static int
>> > -find_pset (int pos, size_t mapnum, unsigned short *kinds)
>> > +find_pointer (int pos, size_t mapnum, unsigned short *kinds)
>> >  {
>> >if (pos + 1 >= mapnum)
>> >  return 0;
>> >  
>> >unsigned char kind = kinds[pos+1] & 0xff;
>> >  
>> > -  return kind == GOMP_MAP_TO_PSET;
>> > +  if (kind == GOMP_MAP_TO_PSET)
>> > +return 3;
>> > +  else if (kind == GOMP_MAP_POINTER)
>> > +return 2;
>> > +
>> > +  return 0;
>> >  }
> Is this still necessary with the firstprivatization of subarrays
> pointers? Well, it might be for fortran. Conceptually, the gimplifier
> should prune out those unnecessary firstprivate pointer clauses for
> executable constructs such as enter/exit data and update.

It appears that GOMP_MAP_POINTER/GOMP_MAP_TO_PSET maps are currently
created only from the Fortran FE, so I think your description is accurate.

> Actually, this is one area in the spec where the intent of enter/exit
> data conflicts with what it describes. If you look at the runtime
> documentation for, say, acc_create, it states that
> 
>   acc_create (pvar, n*sizeof(var))
> 
> is equivalent to
> 
>   acc enter data create (pvar[n])
> 
> And to free acc_create, you use acc_delete. So in theory, you should be
> able to
> 
>   #pragma acc enter data create (pvar[n])
>   acc_free (pvar)
> 
> but this may result in a memory leak if the pointer mapping isn't freed.

Upon re-reading the OpenACC spec, it appears that acc_malloc/acc_free are 
supposed
to be "dumb" allocation/deallocation interfaces, i.e. the implementation is 
likely
to be something that directly wires to the alloc_func/free_func plugin hooks.
I don't think it's supposed to be something that works with the enter/exit data 
directives,
or anything that works on the maps managed by libgomp.

Chung-Lin

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška

On 11/03/2016 03:03 PM, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 03:02:21PM +0100, Martin Liška wrote:
>>> But how would you be able to find out if there isn't any return *ptr; after
>>> the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
>>> into SSA form and you can easily verify (uses of ASAN_POISON are a problem
>>> if they are encountered at runtime).  What would you do for the
>>> must_live_in_memory vars?  Add some pass that detects it, handle it somehow
>>> in addressable pass, handle it in SRA, ... ?
>>
>> If there's return of *ptr, there must be a _char, and it looks
>>   _4 = MEM[(char *)_char];
>>
>> properly identifies that my_char has address taken.
> 
> It doesn't.  MEM_REF's ADDR_EXPR isn't considered to be address taking.
> 
>   Jakub
> 

You are of course right, my mistake in the patch draft.

Thanks for clarification,
Martin

[fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+

2016-11-03 Thread Rainer Orth

As I've noticed some time ago, recent versions of Solaris 
include this little gem:

#if __cplusplus >= 201103L
#undef  _GLIBCXX_USE_C99_MATH
#undef  _GLIBCXX_USE_C99_MATH_TR1
#endif

This renders a couple of perfectly good libstdc++ tests as UNSUPPORTED
and is completely unsustainable.  Agreement has now been reached with
the Solaris libm maintainers that this wasn't a good idea and the plan
is to do away with it at least in future Solaris 11.3 SRUs and Solaris 12.

Some digging discovered that we have 3 levels of support in  and
 overloads on Solaris 12
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02075.html

   and the 4.9 backport

[v3, 4.9] Handle C++11  overloads on Solaris 11, 12
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01477.html

   They were introduced in Solaris 10 Patch 11996[67]-02, Solaris 11.3
   SRU 3.6, and Solaris 12 Build 86.

3) Full C++11 math support, including the templates present in libstdc++
   , and the #undef's above.

   Introduced in the same Solaris 10 Patch, Solaris 11.3 SRU 5.6, and
   Solaris 12 Build 90.

To check for potential fallout, I've added a fixincludes fix to remove
the #undefs.  As expected, there was quite a bit, e.g.

+FAIL: 25_algorithms/pop_heap/complexity.cc (test for excess errors)
+WARNING: 25_algorithms/pop_heap/complexity.cc compilation failed to produce 
executable

/vol/gcc/src/hg/trunk/local/libstdc++-v3/testsuite/25_algorithms/pop_heap/complexity.cc:48:
 error: call of overloaded 'log2(const size_t&)' is ambiguous

In file included from 
/var/gcc/regression/trunk/12-gcc/build/gcc/include-fixed/math.h:25,
 from 
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:45,
 from 
/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/precompiled/stdc++.h:41:
/usr/include/iso/math_c99.h:474: note: candidate: double std::log2(double)
In file included from 
/var/gcc/regression/trunk/12-gcc/build/gcc/include-fixed/math.h:25,
 from 
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:45,
 from 
/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/precompiled/stdc++.h:41:
/usr/include/iso/math_c99.h:763: note: candidate: float std::log2(float)
/usr/include/iso/math_c99.h:805: note: candidate: long double std::log2(long 
double)
/usr/include/iso/math_c99.h:946: note: candidate: typename 
std::__math_impl::__enable_if::__value, double>::__type std::log2(_Tp) [with _Tp = 
unsigned int; typename 
std::__math_impl::__enable_if::__value, double>::__type = double]
In file included from 
/vol/gcc/src/hg/trunk/local/libstdc++-v3/include/precompiled/stdc++.h:41:
/var/gcc/regression/trunk/12-gcc/build/i386-pc-solaris2.12/libstdc++-v3/include/cmath:1530:
 note: candidate: constexpr typename 
__gnu_cxx::__enable_if::__value, double>::__type 
std::log2(_Tp) [with _Tp = unsigned int; typename 
__gnu_cxx::__enable_if::__value, double>::__type = 
double]

due to duplicate log2 templates in  and c_global/cmath,
and many more of the same kind.

Fortunately, this is all easily fixed by wrapping the affected templates
in a new macro.  That's what this patch does.  The new libstdc++
acinclude.m4 test may well need wording changes in comments etc. and can
perhaps be shortened a bit, bit it worked for me.

The fixincludes part passes make check.

The patch has been regtested (C++ only) on

* Solaris 11.2 SRU 13.6 (i.e. without both overloads and templates in
  ),

* Solaris 11.3 SRU 3.6 (i.e. with overloads only), and

* Solaris 12 Build 110 (i.e. with both overloads and templates, and the
  _GLIBCXX_USE_C99_MATH #undef's)

* Linux/x86_64

I've checked that __CORRECT_ISO_CPP11_MATH_H_PROTO[12] were defined in
config.h as expected, and there were no testsuite regressions.  On the
contrary, a couple of tests that before showed up as UNSUPPORTED due to
the _GLIBCXX_USE_C99_MATH* #undef's now PASS as they should.

Ok for mainline now, and for backports to the gcc-6 and gcc-5 branches
after some soak time?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2016-10-27  Rainer Orth  

libstdc++-v3:
* acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): Update comments.
(__CORRECT_ISO_CPP11_MATH_H_PROTO): Rename to ...
(__CORRECT_ISO_CPP11_MATH_H_PROTO1): ... this.
Add test for C++11  templates.
* configure: Regenerate.
* config.h.in: Regenerate.

* include/c_global/cmath [__cplusplus >= 201103L]: Reflect
__CORRECT_ISO_CPP11_MATH_H_PROTO to
__CORRECT_ISO_CPP11_MATH_H_PROTO1 rename.
* include/c_global/cmath [_GLIBCXX_USE_C99_MATH &&
!_GLIBCXX_USE_C99_FP_MACROS_DYNAMIC && __cplusplus >= 201103L]

[PATCH GCC]Clean pedantic calls and useless lvalue code in fold_cond_expr_with_comparison

2016-11-03 Thread Bin Cheng

Hi,
According to analysis given by 
https://gcc.gnu.org/ml/gcc/2016-10/msg00228.html, calls to 
pedantic_non_lvalue_loc and code handling lvalue in 
fold_cond_expr_with_comparison are useless now.  Given this is complicated 
legacy code, it may be better to change code step by step, rather than doing 
this cleanup together with moving simplification from 
fold_cond_expr_with_comparison to match.pd.  
BTW, after last cleanup of pedantic_lvalues, function pedantic_non_lvalue_loc 
now has nothing to do with lvalue.  It could be further cleaned up, or at least 
renamed into something else.  This patch doesn't do that because that depends 
on the answer to the question of the aforementioned message.

Bootstrap and test on x86_64 and AArch64.  Any comments?

Thanks,
bin

2016-10-27  Bin Cheng  

* fold-const.c (fold_cond_expr_with_comparison): Remove call
to pedantic_non_lvalue_loc.  Remove useless code for lvalue
where cond_expr can't be a lvalue.diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 89ed89d..2bad470 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -5082,12 +5082,10 @@ fold_cond_expr_with_comparison (location_t loc, tree 
type,
   case EQ_EXPR:
   case UNEQ_EXPR:
tem = fold_convert_loc (loc, arg1_type, arg1);
-   return pedantic_non_lvalue_loc (loc,
-   fold_convert_loc (loc, type,
- negate_expr (tem)));
+   return fold_convert_loc (loc, type, negate_expr (tem));
   case NE_EXPR:
   case LTGT_EXPR:
-   return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, 
arg1));
+   return fold_convert_loc (loc, type, arg1);
   case UNGE_EXPR:
   case UNGT_EXPR:
if (flag_trapping_math)
@@ -5098,7 +5096,7 @@ fold_cond_expr_with_comparison (location_t loc, tree type,
if (TYPE_UNSIGNED (TREE_TYPE (arg1)))
  break;
tem = fold_build1_loc (loc, ABS_EXPR, TREE_TYPE (arg1), arg1);
-   return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, tem));
+   return fold_convert_loc (loc, type, tem);
   case UNLE_EXPR:
   case UNLT_EXPR:
if (flag_trapping_math)
@@ -5124,7 +5122,7 @@ fold_cond_expr_with_comparison (location_t loc, tree type,
   && integer_zerop (arg01) && integer_zerop (arg2))
 {
   if (comp_code == NE_EXPR)
-   return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, 
arg1));
+   return fold_convert_loc (loc, type, arg1);
   else if (comp_code == EQ_EXPR)
return build_zero_cst (type);
 }
@@ -5170,20 +5168,12 @@ fold_cond_expr_with_comparison (location_t loc, tree 
type,
   tree comp_op1 = arg01;
   tree comp_type = TREE_TYPE (comp_op0);
 
-  /* Avoid adding NOP_EXPRs in case this is an lvalue.  */
-  if (TYPE_MAIN_VARIANT (comp_type) == TYPE_MAIN_VARIANT (type))
-   {
- comp_type = type;
- comp_op0 = arg1;
- comp_op1 = arg2;
-   }
-
   switch (comp_code)
{
case EQ_EXPR:
- return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, 
arg2));
+ return fold_convert_loc (loc, type, arg2);
case NE_EXPR:
- return pedantic_non_lvalue_loc (loc, fold_convert_loc (loc, type, 
arg1));
+ return fold_convert_loc (loc, type, arg1);
case LE_EXPR:
case LT_EXPR:
case UNLE_EXPR:
@@ -5200,8 +5190,7 @@ fold_cond_expr_with_comparison (location_t loc, tree type,
? fold_build2_loc (loc, MIN_EXPR, comp_type, comp_op0, 
comp_op1)
: fold_build2_loc (loc, MIN_EXPR, comp_type,
   comp_op1, comp_op0);
- return pedantic_non_lvalue_loc (loc,
- fold_convert_loc (loc, type, tem));
+ return fold_convert_loc (loc, type, tem);
}
  break;
case GE_EXPR:
@@ -5216,19 +5205,16 @@ fold_cond_expr_with_comparison (location_t loc, tree 
type,
? fold_build2_loc (loc, MAX_EXPR, comp_type, comp_op0, 
comp_op1)
: fold_build2_loc (loc, MAX_EXPR, comp_type,
   comp_op1, comp_op0);
- return pedantic_non_lvalue_loc (loc,
- fold_convert_loc (loc, type, tem));
+ return fold_convert_loc (loc, type, tem);
}
  break;
case UNEQ_EXPR:
  if (!HONOR_NANS (arg1))
-   return pedantic_non_lvalue_loc (loc,
-   fold_convert_loc (loc, type, arg2));
+   return fold_convert_loc (loc, type, arg2);
  break;
case LTGT_EXPR:
  if (!HONOR_NANS (arg1))
-   return pedantic_non_lvalue_loc (loc,
-   fold_convert_loc (loc, type, arg1));
+   return fold_convert_loc (loc, type, arg1);
  break;

Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Prathamesh Kulkarni

On 3 November 2016 at 18:36, Uros Bizjak  wrote:
> On Thu, Nov 3, 2016 at 1:58 PM, Eric Botcazou  wrote:
>>> libfunc, as in "__{,u}divmod{di,ti}4 library function" is already
>>> implemented in libgcc. But the enablement of this function inside the
>>> compiler has to be performed by each target.
>>
>> So can we do it generically instead of duplicating it ~50 times?
>
> I guess it can be done. Currently the expander goes:
>
> --cut here--
>   /* Check if optab_handler exists for divmod_optab for given mode.  */
>   if (optab_handler (tab, mode) != CODE_FOR_nothing)
> {
>   quotient = gen_reg_rtx (mode);
>   remainder = gen_reg_rtx (mode);
>   expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
> }
>
>   /* Generate call to divmod libfunc if it exists.  */
>   else if ((libfunc = optab_libfunc (tab, mode)) != NULL_RTX)
> targetm.expand_divmod_libfunc (libfunc, mode, op0, op1,
>, );
>
>   else
> gcc_unreachable ();
> --cut here--
>
> so, by declaring divmod libfunc, the target also has to provide target hook.
>
> Let's ask authors of the original divmod patch for the details.
Yes, in the initial patch, the default version of the hook targeted
generic divmod functions in libgcc,
however since full set of divmod libfuncs wasn't available
(__divmoddi4 for instance),
we had to drop that.

I have couple of concerns:
a) I am not sure if __udivmoddi4() is generically available for all targets.
For aarch64, I can confirm that __udivmoddi4() isn't available (it has
hardware div, so the divmod transform
should never generate call to __udivmoddi4() on aarch64).
Please see: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01239.html
Can we safely assume that a target would have generic divmod libfunc
like __udivmoddi4
available if it doesn't support hardware div insn and doesn't define
target-specific divmod libfuncs ?

b) Targets like AVR, have their own divmod patterns in the backend for
doing the divmod transform
(and haven't registered target-specific divmod libfuncs via
set_optab_libfunc() ).
We would end up generating call to generic divmod libfuncs for these
targets even though
they have target-specific divmod libfuncs available. I wonder if these
targets should now use the
generic divmod transform instead ?

Thanks,
Prathamesh
>
> Uros.

Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Eric Botcazou

> FWIW here's a more complete version of my patch which I'm currently
> testing. Let me know if you think it's at least a good enough
> intermediate step to be installed.

It is, thanks.

> I think, the sel-sched example notwithstanding, this is something that
> normally should not be needed outside of emit_copy_of_insn_after, so having
> that do the right thing ought to be good enough.

reload does direct note copying too (in forward order).

-- 
Eric Botcazou

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Markus Trippelsdorf

On 2016.11.03 at 14:57 +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 02:55:03PM +0100, Markus Trippelsdorf wrote:
> > On 2016.11.03 at 14:47 +0100, Jakub Jelinek wrote:
> > > On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> > > > I don't have gathered detailed statistics. But for example a simple
> > > > /* drop through */ in a package header file will of course cause many
> > > > bogus warnings during the build on level 2.
> > > > For the Linux kernel false positives decrease ~20% when switching from
> > > > level 3 to 1.
> > > 
> > > One would have to count only warnings with unique locus (i.e. sort -u them
> > > after grepping them from logs).
> > > But even with 20%, if one spends the energy to analyze the 80%, where
> > > one actually has to analyze the code, just mechanically changing a couple 
> > > of
> > > common comment kinds into more standardized one isn't going to be
> > > significant.
> > 
> > I should have written: For the Linux kernel the number of warnings
> > dropped by 20% (going from level 3 to 1) and all of them turned out to
> > be false positives. And yes, I have used "sort -u".
> > I'm not sure if I would call 20% insignificant.
> 
> But we are talking about 2 vs. 1 now, so that is likely smaller than 20%.
> Plus what those comments in that 2 vs. 1 set are where the warnings differ,
> if they are related to fall through or not.

It is still a 12% reduction (2 vs. 1). I've posted a list of them in the
older thread.

-- 
Markus

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Jakub Jelinek

On Thu, Nov 03, 2016 at 03:02:21PM +0100, Martin Liška wrote:
> > But how would you be able to find out if there isn't any return *ptr; after
> > the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
> > into SSA form and you can easily verify (uses of ASAN_POISON are a problem
> > if they are encountered at runtime).  What would you do for the
> > must_live_in_memory vars?  Add some pass that detects it, handle it somehow
> > in addressable pass, handle it in SRA, ... ?
> 
> If there's return of *ptr, there must be a _char, and it looks
>   _4 = MEM[(char *)_char];
> 
> properly identifies that my_char has address taken.

It doesn't.  MEM_REF's ADDR_EXPR isn't considered to be address taking.

Jakub

Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Eric Botcazou

> I guess it can be done. Currently the expander goes:
> 
> --cut here--
>   /* Check if optab_handler exists for divmod_optab for given mode.  */
>   if (optab_handler (tab, mode) != CODE_FOR_nothing)
> {
>   quotient = gen_reg_rtx (mode);
>   remainder = gen_reg_rtx (mode);
>   expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
> }
> 
>   /* Generate call to divmod libfunc if it exists.  */
>   else if ((libfunc = optab_libfunc (tab, mode)) != NULL_RTX)
> targetm.expand_divmod_libfunc (libfunc, mode, op0, op1,
>, );
> 
>   else
> gcc_unreachable ();
> --cut here--
> 
> so, by declaring divmod libfunc, the target also has to provide target hook.

Right, that's why I also suggested a default associated target hook.

-- 
Eric Botcazou

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška

On 11/03/2016 02:44 PM, Jakub Jelinek wrote:
> Hi!
> 
> FYI, I think it is much more important to get the initial patch in, so
> resolve the switch + declarations inside of its body stuff, add testcases
> for that and post for re-review and check in.
> These optimizations can be done even in stage3.

I know that it's much urgent to have it done first. I'm currently testing patch
for the switch + declaration. Hopefully I'll send it today.

> 
> On Thu, Nov 03, 2016 at 02:34:47PM +0100, Martin Liška wrote:
>> I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
>> Well, to be honest,
>> I still have a feeling that doing the magic with the parallel variable is 
>> bit overkill. Maybe
>> a new runtime call would make it easier for us.
>>
>> However, I still don't fully understand why we want to support just 
>> is_gimple_reg variables.
>> Let's consider following test-case:
>>
>> void foo()
>> {
>> char *ptr;
>>   {
>> char my_char[9];
>> ptr = _char[0];
>>   }
>> }
>>
>> Where I would expect to optimize out:
>>   :
>>   _5 = (unsigned long) 9;
>>   _4 = (unsigned long) _char;
>>   __builtin___asan_unpoison_stack_memory (_4, _5);
>>   _7 = (unsigned long) 9;
>>   _6 = (unsigned long) _char;
>>   __builtin___asan_poison_stack_memory (_6, _7);
>>   return;
>>
>> where address of my_char is taken in the original source code, while not 
>> during tree-ssa
>> optimization, where the address is used only by ASAN_MARK calls.
> 
> But how would you be able to find out if there isn't any return *ptr; after
> the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
> into SSA form and you can easily verify (uses of ASAN_POISON are a problem
> if they are encountered at runtime).  What would you do for the
> must_live_in_memory vars?  Add some pass that detects it, handle it somehow
> in addressable pass, handle it in SRA, ... ?

If there's return of *ptr, there must be a _char, and it looks
  _4 = MEM[(char *)_char];

properly identifies that my_char has address taken.

M.

> 
>>
>> Doing such transformation can rapidly decrease number of 
>> __builtin___asan_{un}poison_stack_memory
>> in tramp3d: from ~36K to ~22K.
>>
>> Thanks for clarification.
>> Martin
> 
>   Jakub
>

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek

On Thu, Nov 03, 2016 at 02:55:03PM +0100, Markus Trippelsdorf wrote:
> On 2016.11.03 at 14:47 +0100, Jakub Jelinek wrote:
> > On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> > > I don't have gathered detailed statistics. But for example a simple
> > > /* drop through */ in a package header file will of course cause many
> > > bogus warnings during the build on level 2.
> > > For the Linux kernel false positives decrease ~20% when switching from
> > > level 3 to 1.
> > 
> > One would have to count only warnings with unique locus (i.e. sort -u them
> > after grepping them from logs).
> > But even with 20%, if one spends the energy to analyze the 80%, where
> > one actually has to analyze the code, just mechanically changing a couple of
> > common comment kinds into more standardized one isn't going to be
> > significant.
> 
> I should have written: For the Linux kernel the number of warnings
> dropped by 20% (going from level 3 to 1) and all of them turned out to
> be false positives. And yes, I have used "sort -u".
> I'm not sure if I would call 20% insignificant.

But we are talking about 2 vs. 1 now, so that is likely smaller than 20%.
Plus what those comments in that 2 vs. 1 set are where the warnings differ,
if they are related to fall through or not.

Jakub

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Markus Trippelsdorf

On 2016.11.03 at 14:47 +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> > I don't have gathered detailed statistics. But for example a simple
> > /* drop through */ in a package header file will of course cause many
> > bogus warnings during the build on level 2.
> > For the Linux kernel false positives decrease ~20% when switching from
> > level 3 to 1.
> 
> One would have to count only warnings with unique locus (i.e. sort -u them
> after grepping them from logs).
> But even with 20%, if one spends the energy to analyze the 80%, where
> one actually has to analyze the code, just mechanically changing a couple of
> common comment kinds into more standardized one isn't going to be
> significant.

I should have written: For the Linux kernel the number of warnings
dropped by 20% (going from level 3 to 1) and all of them turned out to
be false positives. And yes, I have used "sort -u".
I'm not sure if I would call 20% insignificant.

-- 
Markus

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek

On Thu, Nov 03, 2016 at 02:48:33PM +0100, Bernd Schmidt wrote:
> 
> On 11/03/2016 02:47 PM, Jakub Jelinek wrote:
> >On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> >>I don't have gathered detailed statistics. But for example a simple
> >>/* drop through */ in a package header file will of course cause many
> >>bogus warnings during the build on level 2.
> >>For the Linux kernel false positives decrease ~20% when switching from
> >>level 3 to 1.
> >
> >One would have to count only warnings with unique locus (i.e. sort -u them
> >after grepping them from logs).
> >But even with 20%, if one spends the energy to analyze the 80%, where
> >one actually has to analyze the code, just mechanically changing a couple of
> >common comment kinds into more standardized one isn't going to be
> >significant.
> 
> But it's a completely pointless exercise which we shouldn't impose on users
> unasked.

It isn't pointless.  Users can have completely unrelated comments in between
case labels, to describe what the case handles etc. and then miss a warning
when the fallthrough is not intentional.

Jakub

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Bernd Schmidt



On 11/03/2016 02:47 PM, Jakub Jelinek wrote:

On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:

I don't have gathered detailed statistics. But for example a simple
/* drop through */ in a package header file will of course cause many
bogus warnings during the build on level 2.
For the Linux kernel false positives decrease ~20% when switching from
level 3 to 1.


One would have to count only warnings with unique locus (i.e. sort -u them
after grepping them from logs).
But even with 20%, if one spends the energy to analyze the 80%, where
one actually has to analyze the code, just mechanically changing a couple of
common comment kinds into more standardized one isn't going to be
significant.


But it's a completely pointless exercise which we shouldn't impose on 
users unasked.



Bernd

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek

On Thu, Nov 03, 2016 at 02:35:57PM +0100, Markus Trippelsdorf wrote:
> I don't have gathered detailed statistics. But for example a simple
> /* drop through */ in a package header file will of course cause many
> bogus warnings during the build on level 2.
> For the Linux kernel false positives decrease ~20% when switching from
> level 3 to 1.

One would have to count only warnings with unique locus (i.e. sort -u them
after grepping them from logs).
But even with 20%, if one spends the energy to analyze the 80%, where
one actually has to analyze the code, just mechanically changing a couple of
common comment kinds into more standardized one isn't going to be
significant.

Jakub

Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr

2016-11-03 Thread Wilco Dijkstra

Hi,

The patch looks correct, however I would suggest to rewrite this bit of the code
urgently in separate patch as it is way too complex to assert it is now bug 
free -
there are too many possible failure scenarios to list... Also it generates quite
inefficient code - pushable_regs should include R0-R3 minus the arguments plus
LR (if LR was already saved).

Also it is important to keep track of which registers still need to be saved, 
and
never use pushable_regs for two separate purposes. Finally it seems a good
idea to check the 2 masks passed to thumb1_emit_multi_reg_push have the
same number of set bits - that would have prevented this bug from ever
happening.

Wilco

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Jakub Jelinek

Hi!

FYI, I think it is much more important to get the initial patch in, so
resolve the switch + declarations inside of its body stuff, add testcases
for that and post for re-review and check in.
These optimizations can be done even in stage3.

On Thu, Nov 03, 2016 at 02:34:47PM +0100, Martin Liška wrote:
> I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
> Well, to be honest,
> I still have a feeling that doing the magic with the parallel variable is bit 
> overkill. Maybe
> a new runtime call would make it easier for us.
> 
> However, I still don't fully understand why we want to support just 
> is_gimple_reg variables.
> Let's consider following test-case:
> 
> void foo()
> {
> char *ptr;
>   {
> char my_char[9];
> ptr = _char[0];
>   }
> }
> 
> Where I would expect to optimize out:
>   :
>   _5 = (unsigned long) 9;
>   _4 = (unsigned long) _char;
>   __builtin___asan_unpoison_stack_memory (_4, _5);
>   _7 = (unsigned long) 9;
>   _6 = (unsigned long) _char;
>   __builtin___asan_poison_stack_memory (_6, _7);
>   return;
> 
> where address of my_char is taken in the original source code, while not 
> during tree-ssa
> optimization, where the address is used only by ASAN_MARK calls.

But how would you be able to find out if there isn't any return *ptr; after
the scope or similar (as MEM_REF)?  With is_gimple_reg, they will be turned
into SSA form and you can easily verify (uses of ASAN_POISON are a problem
if they are encountered at runtime).  What would you do for the
must_live_in_memory vars?  Add some pass that detects it, handle it somehow
in addressable pass, handle it in SRA, ... ?

> 
> Doing such transformation can rapidly decrease number of 
> __builtin___asan_{un}poison_stack_memory
> in tramp3d: from ~36K to ~22K.
> 
> Thanks for clarification.
> Martin

Jakub

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Markus Trippelsdorf

On 2016.11.03 at 14:24 +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 01:55:33PM +0100, Markus Trippelsdorf wrote:
> > On 2016.11.03 at 13:32 +0100, Jakub Jelinek wrote:
> > > On Thu, Nov 03, 2016 at 01:22:11PM +0100, Bernd Schmidt wrote:
> > > > On 11/03/2016 12:58 PM, Jakub Jelinek wrote:
> > > > >On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> > > > >>I'm concerned about the number of false positives for this warning, 
> > > > >>and
> > > > >>judging by previous discussions, I'm not alone in this. This patch 
> > > > >>limits it
> > > > >>to level 1 (any comment before the case label disables the warning) 
> > > > >>for
> > > > >>cases where the user specified no explicit level. It'll still generate
> > > > >>enough noise that people will be aware of it and can choose whether 
> > > > >>to use a
> > > > >>higher level or not.
> > > > >>
> > > > >>Bootstrapped and tested on x86_64-linux. Ok?
> > > > >
> > > > >I disagree, I'm ok with changing it to 2, but 1 is too much.
> > > > 
> > > > Well, we have data from our own sources where we had to "fix" lots of
> > > > perfectly good code, and also from the Linux kernel. From an earlier
> > > > discussion:
> > > 
> > > That data wasn't really convincing on this.  All it proved is that most of
> > > the cases are (likely) deliberate fall-throughs without any comment
> > > whatsoever, the rest is in the noise.  As one needs to deal with those
> > > where comments are missing altogether, dealing with the noise is 
> > > acceptable.
> > 
> > Without Bernd's patch to set the default to 1 you will drown in false
> > positives once you start using gcc-7 to build a whole distro. On my
> > Gentoo test box anything but level 1 is simply unacceptable, because you
> > will miss important other warnings in the -Wimplicit-fallthrough noise
> > otherwise.
> 
> That is really strange.  First of all, most of packages aren't compiled with
> -Wextra/-W.  And in those that are, are you sure that that no comment at all
> for the implicit fallthroughs isn't significantly more common than the
> set of comments that are accepted by -Wimplicit-fallthrough=1 and not
> accepted by -Wimplicit-fallthrough=2?

I don't have gathered detailed statistics. But for example a simple
/* drop through */ in a package header file will of course cause many
bogus warnings during the build on level 2.
For the Linux kernel false positives decrease ~20% when switching from
level 3 to 1.

-- 
Markus

Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2016-11-03 Thread Jakub Jelinek

On Thu, Nov 03, 2016 at 09:27:55AM -0400, Jason Merrill wrote:
> On Thu, Nov 3, 2016 at 7:24 AM, Marek Polacek  wrote:
> > On Tue, Nov 01, 2016 at 02:53:58PM +0100, Jakub Jelinek wrote:
> >> On Tue, Nov 01, 2016 at 09:41:20AM -0400, Jason Merrill wrote:
> >> > On Tue, Oct 25, 2016 at 9:59 AM, Marek Polacek  
> >> > wrote:
> >> > > On Mon, Oct 24, 2016 at 04:10:21PM +0200, Marek Polacek wrote:
> >> > >> On Thu, Oct 20, 2016 at 12:28:36PM +0200, Marek Polacek wrote:
> >> > >> > I found a problem with this patch--we can't call 
> >> > >> > do_warn_duplicated_branches in
> >> > >> > build_conditional_expr, because that way the C++-specific codes 
> >> > >> > might leak into
> >> > >> > the hasher.  Instead, I should use operand_equal_p, I think.  Let 
> >> > >> > me rework
> >> > >> > that part of the patch.
> >> >
> >> > Hmm, is there a reason not to use operand_equal_p for
> >> > do_warn_duplicated_branches as well?  I'm concerned about hash
> >> > collisions leading to false positives.
> >>
> >> If the hashing function is iterative_hash_expr / inchash::add_expr, then
> >> that is supposed to pair together with operand_equal_p, we even have
> >> checking verification of that.
> 
> Yes, but there could still be hash collisions; we can't guarantee that
> everything that compares unequal also hashes unequal.

Right, after h0 == h1 is missing && operand_equal_p (thenb, elseb, 0)
or so (the exact last operand needs to be figured out).
OEP_ONLY_CONST is certainly wrong, we want the same VAR_DECLs to mean the
same thing.  0 is a tiny bit better, but still it will give up on e.g. pure
and other calls.  OEP_PURE_SAME is tiny bit better than that, but still
calls with the same arguments to the same function will not be considered
equal, plus likely operand_equal_p doesn't handle STATEMENT_LIST etc.
So maybe we need another OEP_* mode for this.

Jakub

[PATCH, RFC] Improve ivopts group costs

2016-11-03 Thread Evgeny Kudryashov


Hello,

I'm facing the following problem related to ivopts. The problem is that 
GCC generates a lot of induction variables and as a result there is an 
unnecessary increase of stack usage and register pressure.


For instance, for the attached testcase (tc_ivopts.c) GCC generates 26 
induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for 
rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm 
target using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For 
reference, I'm attaching assembly I get on current trunk.


The reason might be in use groups costs, in particular, in the way of 
estimation. Currently, the cost of a tuple (group, candidate) is a sum 
of per-use costs in the group. So, the cost of a group grows 
proportional to the number of uses in the group. This approach has a 
negative effect on the algorithm for finding the best set of induction 
variables: the part of a total cost related to adding a new candidate is 
almost washed out by the cost of the group. In addition, when there is a 
lot of groups with many uses inside and a target is out of free 
registers, the cost of spill is washing out too. As a result, GCC 
prefers to use native candidates (candidate created for a particular 
group) for each group of uses instead of considering the real cost of 
introducing a new variable into a set.


The summing approach was added as a part of this patch 
(https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the 
motivation for taking the sum does not seem to be

specifically discussed.

I propose the following patch that changes a group cost from cost 
summing to selecting the largest one inside a group. For the given test 
case I have: necessary size of stack is decreased by almost 3 times and 
ldr\str amount are decreased by less than 2 times. Also I'm attaching 
assembly after applying the patch.


The essential change in the patch is just:

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index f9211ad..a149418 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct 
ivopts_data *data,

 offset and where the iv_use is.  */
cost = get_computation_cost (data, next, cand, true,
 NULL, _autoinc, NULL);
-  sum_cost += cost;
+  if (sum_cost < cost)
+sum_cost = cost;
 }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
 NULL_TREE, ERROR_MARK, inv_expr);

Any suggestions?


Thanks,
Evgeny.#define SIZE 20
int gp22 = SIZE;
double rhs [SIZE][SIZE][5];

void x_solve()
{
  int j, k;
  double lhsX[5][5][gp22][gp22];
  for (j = 1; j < gp22; j++) {
for (k = 1; k < gp22; k++) {
  lhsX[0][0][j][k] -= lhsX[1][0][j][k];
  lhsX[0][1][j][k] -= lhsX[1][1][j][k];
  lhsX[0][2][j][k] -= lhsX[1][2][j][k];
  lhsX[0][3][j][k] -= lhsX[1][3][j][k];
  lhsX[0][4][j][k] -= lhsX[1][4][j][k];

  lhsX[2][0][j][k] -= lhsX[1][0][j][k];
  lhsX[2][1][j][k] -= lhsX[1][1][j][k];
  lhsX[2][2][j][k] -= lhsX[1][2][j][k];
  lhsX[2][3][j][k] -= lhsX[1][3][j][k];
  lhsX[2][4][j][k] -= lhsX[1][4][j][k];

  lhsX[3][0][j][k] -= lhsX[1][0][j][k];
  lhsX[3][1][j][k] -= lhsX[1][1][j][k];
  lhsX[3][2][j][k] -= lhsX[1][2][j][k];
  lhsX[3][3][j][k] -= lhsX[1][3][j][k];
  lhsX[3][4][j][k] -= lhsX[1][4][j][k];

  lhsX[4][0][j][k] -= lhsX[1][0][j][k];
  lhsX[4][1][j][k] -= lhsX[1][1][j][k];
  lhsX[4][2][j][k] -= lhsX[1][2][j][k];
  lhsX[4][3][j][k] -= lhsX[1][3][j][k];
  lhsX[4][4][j][k] -= lhsX[1][4][j][k];

  lhsX[0][0][j][k] -= lhsX[3][0][j][k];
  lhsX[0][1][j][k] -= lhsX[3][1][j][k];
  lhsX[0][2][j][k] -= lhsX[3][2][j][k];
  lhsX[0][3][j][k] -= lhsX[3][3][j][k];
  lhsX[0][4][j][k] -= lhsX[3][4][j][k];

  lhsX[2][0][j][k] -= lhsX[3][0][j][k];
  lhsX[2][1][j][k] -= lhsX[3][1][j][k];
  lhsX[2][2][j][k] -= lhsX[3][2][j][k];
  lhsX[2][3][j][k] -= lhsX[3][3][j][k];
  lhsX[2][4][j][k] -= lhsX[3][4][j][k];

  lhsX[4][0][j][k] -= lhsX[3][0][j][k];
  lhsX[4][1][j][k] -= lhsX[3][1][j][k];
  lhsX[4][2][j][k] -= lhsX[3][2][j][k];
  lhsX[4][3][j][k] -= lhsX[3][3][j][k];
  lhsX[4][4][j][k] -= lhsX[3][4][j][k];

  rhs[k][j][0] -= rhs[k][j][1];
  rhs[k][j][1] -= rhs[k][j][2];
}/*end j*/
  }/*end k*/
}
	.cpu cortex-a9
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.file	"bt.c"
	.global	__aeabi_dsub
	.text
	.align	2
	.global	x_solve
	.syntax unified
	.arm
	.fpu softvfp
	.type	x_solve, %function
x_solve:
	@ args = 0, pretend = 0, frame = 216
	@ frame_needed = 1, uses_anonymous_args = 0
	movw	r3, #:lower16:.LANCHOR0
	push	{r4, r5, r6, r7, r8, r9, r10, fp, lr}
	movt	r3, #:upper16:.LANCHOR0
	add	fp, sp, #32

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-03 Thread Martin Liška

On 11/02/2016 03:51 PM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 03:38:25PM +0100, Martin Liška wrote:
>> it converts:
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>   int _8;
>>   int _9;
>>
>>   :
>>   ASAN_MARK (2, , 1);
>>   a = 0;
>>   p_6 = 
>>   ASAN_MARK (1, , 1);
>>   _1 = *p_6;
> 
> You shouldn't convert if a is addressable (when ignoring  in ASAN_MARK
> calls).  Only if there is  just in ASAN_MARK and MEM_REF, you can convert.
> 
>> to:
>>
>> foo ()
>> {
>>   char a;
>>   char * p;
>>   char _1;
>>   int _2;
>>
>>   :
>>   a_10 = 0;
>>   a_12 = ASAN_POISON ();
>>   _1 = a_12;
>>   if (_1 != 0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # _2 = PHI <1(2), 0(3)>
>>   return _2;
>>
>> }
>>
>> and probably the last goal is to convert the newly added internal fn to a 
>> runtime call.
>> Hope sanopt pass is the right place where to it?
> 
> If ASAN_POISON is ECF_CONST and has any uses during sanopt, perhaps best
> would be to add an artificial variable you give the same name as the
> underlying var of the SSA_NAME (and alignment, locus etc.) and poison it
> right away (keep unpoisoning only to the function epilogue) and then
> ASAN_CHECK replace all uses of that SSA_NAME with ASAN_CHECK + use of
> (D) SSA_NAME.
> 
>   Jakub
> 

Hi.

I'm having a semi-working patch that comes up with the ASAN_POISON built-in. 
Well, to be honest,
I still have a feeling that doing the magic with the parallel variable is bit 
overkill. Maybe
a new runtime call would make it easier for us.

However, I still don't fully understand why we want to support just 
is_gimple_reg variables.
Let's consider following test-case:

void foo()
{
char *ptr;
  {
char my_char[9];
ptr = _char[0];
  }
}

Where I would expect to optimize out:
  :
  _5 = (unsigned long) 9;
  _4 = (unsigned long) _char;
  __builtin___asan_unpoison_stack_memory (_4, _5);
  _7 = (unsigned long) 9;
  _6 = (unsigned long) _char;
  __builtin___asan_poison_stack_memory (_6, _7);
  return;

where address of my_char is taken in the original source code, while not during 
tree-ssa
optimization, where the address is used only by ASAN_MARK calls.

Doing such transformation can rapidly decrease number of 
__builtin___asan_{un}poison_stack_memory
in tramp3d: from ~36K to ~22K.

Thanks for clarification.
Martin

Re: Implement -Wduplicated-branches (PR c/64279) (v3)

2016-11-03 Thread Jason Merrill

On Thu, Nov 3, 2016 at 7:24 AM, Marek Polacek  wrote:
> On Tue, Nov 01, 2016 at 02:53:58PM +0100, Jakub Jelinek wrote:
>> On Tue, Nov 01, 2016 at 09:41:20AM -0400, Jason Merrill wrote:
>> > On Tue, Oct 25, 2016 at 9:59 AM, Marek Polacek  wrote:
>> > > On Mon, Oct 24, 2016 at 04:10:21PM +0200, Marek Polacek wrote:
>> > >> On Thu, Oct 20, 2016 at 12:28:36PM +0200, Marek Polacek wrote:
>> > >> > I found a problem with this patch--we can't call 
>> > >> > do_warn_duplicated_branches in
>> > >> > build_conditional_expr, because that way the C++-specific codes might 
>> > >> > leak into
>> > >> > the hasher.  Instead, I should use operand_equal_p, I think.  Let me 
>> > >> > rework
>> > >> > that part of the patch.
>> >
>> > Hmm, is there a reason not to use operand_equal_p for
>> > do_warn_duplicated_branches as well?  I'm concerned about hash
>> > collisions leading to false positives.
>>
>> If the hashing function is iterative_hash_expr / inchash::add_expr, then
>> that is supposed to pair together with operand_equal_p, we even have
>> checking verification of that.

Yes, but there could still be hash collisions; we can't guarantee that
everything that compares unequal also hashes unequal.

Jason

Re: [C++ Patch/RFC] PR 67980 ("left shift count is negative [-Wshift-count-negative] generated for unreachable code")

2016-11-03 Thread Jason Merrill

On Tue, Oct 18, 2016 at 4:26 PM, Paolo Carlini  wrote:
> Thus, I'm back to one of my first tries earlier today: a much more
> conservative change which uses fold_non_dependent_expr only for the purpose
> of suppressing the unwanted warnings, see the below.

This looks right; in general we use early folding only to control warnings.  OK.

Jason

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek

On Thu, Nov 03, 2016 at 01:55:33PM +0100, Markus Trippelsdorf wrote:
> On 2016.11.03 at 13:32 +0100, Jakub Jelinek wrote:
> > On Thu, Nov 03, 2016 at 01:22:11PM +0100, Bernd Schmidt wrote:
> > > On 11/03/2016 12:58 PM, Jakub Jelinek wrote:
> > > >On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> > > >>I'm concerned about the number of false positives for this warning, and
> > > >>judging by previous discussions, I'm not alone in this. This patch 
> > > >>limits it
> > > >>to level 1 (any comment before the case label disables the warning) for
> > > >>cases where the user specified no explicit level. It'll still generate
> > > >>enough noise that people will be aware of it and can choose whether to 
> > > >>use a
> > > >>higher level or not.
> > > >>
> > > >>Bootstrapped and tested on x86_64-linux. Ok?
> > > >
> > > >I disagree, I'm ok with changing it to 2, but 1 is too much.
> > > 
> > > Well, we have data from our own sources where we had to "fix" lots of
> > > perfectly good code, and also from the Linux kernel. From an earlier
> > > discussion:
> > 
> > That data wasn't really convincing on this.  All it proved is that most of
> > the cases are (likely) deliberate fall-throughs without any comment
> > whatsoever, the rest is in the noise.  As one needs to deal with those
> > where comments are missing altogether, dealing with the noise is acceptable.
> 
> Without Bernd's patch to set the default to 1 you will drown in false
> positives once you start using gcc-7 to build a whole distro. On my
> Gentoo test box anything but level 1 is simply unacceptable, because you
> will miss important other warnings in the -Wimplicit-fallthrough noise
> otherwise.

That is really strange.  First of all, most of packages aren't compiled with
-Wextra/-W.  And in those that are, are you sure that that no comment at all
for the implicit fallthroughs isn't significantly more common than the
set of comments that are accepted by -Wimplicit-fallthrough=1 and not
accepted by -Wimplicit-fallthrough=2?

Jakub

RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2016-11-03 Thread Toma Tabacu

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Andrew Bennett
> Sent: 03 November 2016 11:33
> To: Matthew Fortune; 'Moore, Catherine'; 'gcc-patches@gcc.gnu.org'
> Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard
> library support check the sysroot supports the required test options.
> 
> Ping.
> 
> 
> Regards,
> 
> 
> 
> Andrew
> 

Hi Andrew,

I believe the inline-memcpy-{1,2,3,4,5}.c tests also qualify for the
(REQUIRES_STDLIB) option.

Regards,
Toma Tabacu

> > -Original Message-
> > From: Andrew Bennett
> > Sent: 28 August 2015 16:50
> > To: Matthew Fortune; Moore, Catherine; gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires
> > standard library support check the sysroot supports the required test
> options.
> >
> > > I had some comments on this that I hadn't got round to posting. The
> > > fix in this patch is not general enough as the missing header
> > > problem comes in two (related) forms:
> > >
> > > 1) Using the new MTI and IMG sysroot layout we can end up with GCC
> looking
> > >for headers in a sysroot that simply does not exist. The current patch
> > >handles this.
> > > 2) Using any sysroot layout (i.e. a simple mips-linux-gnu) it is possible
> > >for the stdlib.h header to be found but the ABI dependent gnu-stubs
> > >header may not be installed depending on soft/hard nan1985/nan2008.
> > >
> > > The test for stdlib.h needs to therefore verify that preprocessing
> > > succeeds rather than just testing for an error relating to stdlib.h.
> > > This could be done by adding a further option to mips_preprocess to
> > > indicate the processor output should go to a file and that the
> > > caller wants the messages emitted by the compiler instead.
> > >
> > > A second issue is that you have added (REQUIRES_STDLIB) to too many
> tests.
> > > You only need to add it to tests that request a compiler option (via
> > > dg-options) that could potentially lead to forcing soft/hard
> > > nan1985/nan2008 directly or indirectly. So -mips32r6 implies nan2008
> > > so you need it -
> > mips32r5
> > > implies nan1985 so you need it. There are at least two tests which
> > > don't need the option but you need to check them all so we don't run
> > > the check needlessly.
> >
> > The updated patch and ChangeLog that addresses Matthew's comments is
> below.
> >
> > Ok to commit?
> >
> > Regards,
> >
> >
> > Andrew
> >
> >
> > testsuite/
> >
> > * gcc.target/mips/loongson-simd.c (dg-options): Add
> > (REQUIRES_STDLIB).
> > * gcc.target/mips/loongson-shift-count-truncated-1.c: Ditto
> > * gcc.target/mips/mips-3d-1.c: Ditto
> > * gcc.target/mips/mips-3d-2.c: Ditto
> > * gcc.target/mips/mips-3d-3.c: Ditto
> > * gcc.target/mips/mips-3d-4.c: Ditto
> > * gcc.target/mips/mips-3d-5.c: Ditto
> > * gcc.target/mips/mips-3d-6.c: Ditto
> > * gcc.target/mips/mips-3d-7.c: Ditto
> > * gcc.target/mips/mips-3d-8.c: Ditto
> > * gcc.target/mips/mips-3d-9.c: Ditto
> > * gcc.target/mips/mips-ps-1.c: Ditto
> > * gcc.target/mips/mips-ps-2.c: Ditto
> > * gcc.target/mips/mips-ps-3.c: Ditto
> > * gcc.target/mips/mips-ps-4.c: Ditto
> > * gcc.target/mips/mips-ps-6.c: Ditto
> > * gcc.target/mips/mips16-attributes.c: Ditto
> > * gcc.target/mips/mips32-dsp-run.c: Ditto
> > * gcc.target/mips/mips32-dsp.c: Ditto
> > * gcc.target/mips/save-restore-1.c: Ditto
> > * gcc.target/mips/mips.exp (mips_option_groups): Add stdlib.
> > (mips_preprocess): Add ignore_output argument that when set
> > will not return the pre-processed output.
> > (mips_arch_info): Update arguments for the call to
> > mips_preprocess.
> > (mips-dg-init): Ditto.
> > (mips-dg-options): Check if a test having test option
> > (REQUIRES_STDLIB) has the required sysroot support for
> > the current test options.
> >
> >
> >
> > diff --git
> > a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > index f57a18c..baed48c 100644
> > --- a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > +++ b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > @@ -4,7 +4,7 @@
> >  /* loongson.h does not handle or check for MIPS16ness.  There doesn't
> > seem any good reason for it to, given that the Loongson processors
> > do not support MIPS16.  */
> > -/* { dg-options "isa=loongson -mhard-float -mno-mips16" } */
> > +/* { dg-options "isa=loongson -mhard-float -mno-mips16
> > +(REQUIRES_STDLIB)" }
> > */
> >  /* See PR 52155.  */
> >  /* { dg-options "isa=loongson -mhard-float -mno-mips16 -mlong64" {
> > mips*-*-
> > elf* && ilp32 } } */
> >
> > diff --git a/gcc/testsuite/gcc.target/mips/loongson-simd.c
> > b/gcc/testsuite/gcc.target/mips/loongson-simd.c
> > index 6d2ceb6..f263b43 100644
>

Re: [PATCH] Create x.gcov file for binary w/o x.gcda file (PR, gcov-profile/65831)

2016-11-03 Thread Jan Hubicka

> On 08/04/2016 02:52 PM, Nathan Sidwell wrote:
> > On 08/04/16 08:27, Martin Liška wrote:
> >> Hi.
> >>
> >> Following patch is grabbed from the PR, where I just applied the patch
> >> and wrote a test-case which removes x.gcda file before running gcov tool.
> >>
> >> Ready to be installed?
> > 
> > 2016-08-04  Martin Liska  
> > 
> > * g++.dg/gcov/gcov-16.C: New test.
> > * lib/gcov.exp: Support new argument for run-gcov function.
> > 
> > gcc/ChangeLog:
> > 
> > 2016-08-04  Martin Liska  
> > Adam Fineman  
> > 
> > * gcov.c (process_file): Create .gcov file when .gcda
> > file is missing.
> > 
> > ok thanks
> 
> Hi.
> 
> As mentioned by the reporter, it's desired patch for backport to GCC 5 and 6 
> branches.
> 
> May I install the same patch after it survives regression tests?
OK,
thanks
Honza
> Thanks,
> Martin

[PATCH, Fortran, v1] Restructure initialization of allocatable components

2016-11-03 Thread Andre Vehreschild

Hi all,

the attached patch restructures gfortran's way of initializing components of
derived types in ALLOCATE. The old way was to generate a new gfc_code-node and
add it after the ALLOCATE node to initialize the the derived type on certain
conditions (like initializer or allocatable components exist). This patch
proposes to do the initialization as part of the ALLOCATE. This way it makes the
ALLOCATE-statement more atomic in that the ALLOCATE does everything it is
responsible for itself and does rely on other nodes adding to its
responsibilities. The patch furthermore enables to use the knowledge we have in
the allocate, i.e., a freshly allocated object can never have allocated
allocatable components, so no need to check before resetting them.

At the same time I remove some dead code from the resolve_alloc_expr and moved
a loop invariant piece out of the loop iterating over all objects to allocate.
This of course is only cosmetic.

Of course did I not do this out of fun. I have a patch upcoming for allocatable
components in coarrayed derived types. For this I needed to identify the
initialization of the structure and to parameterize it further. This was hard
when for the default initialization an additional code-node was created, but
now that everything necessary for ALLOCATE is done in ALLOCATE parameterizing
the initialization is way easier. The coarray patch is not yet perfect, but I
thought to publish this part already to get your opinions.

Bootstraps and regtests fine on x86_64-linux/F23. Ok for trunk?

@Dominique: Would you give it a go on your open patch collection? Maybe it
fixes one PR, but I am not very hopeful, because the patch is merely removing
complexity instead of doing new things.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


rework_derived_alloc.clog
Description: Binary data
diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index bb183d4..0e94ae8 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4131,6 +4131,26 @@ gfc_apply_init (gfc_typespec *ts, symbol_attribute *attr, gfc_expr *init)
 }
 
 
+/* Check whether an expression is a structure constructor and whether it has
+   other values than NULL.  */
+
+bool
+is_non_empty_structure_constructor (gfc_expr * e)
+{
+  if (e->expr_type != EXPR_STRUCTURE)
+return false;
+
+  gfc_constructor *cons = gfc_constructor_first (e->value.constructor);
+  while (cons)
+{
+  if (!cons->expr || cons->expr->expr_type != EXPR_NULL)
+	return true;
+  cons = gfc_constructor_next (cons);
+}
+  return false;
+}
+
+
 /* Check for default initializer; sym->value is not enough
as it is also set for EXPR_NULL of allocatables.  */
 
@@ -4145,7 +4165,9 @@ gfc_has_default_initializer (gfc_symbol *der)
   {
 if (!c->attr.pointer && !c->attr.proc_pointer
 	 && !(c->attr.allocatable && der == c->ts.u.derived)
-	 && gfc_has_default_initializer (c->ts.u.derived))
+	 && ((c->initializer
+		  && is_non_empty_structure_constructor (c->initializer))
+		 || gfc_has_default_initializer (c->ts.u.derived)))
 	  return true;
 	if (c->attr.pointer && c->initializer)
 	  return true;
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 14685d2..c341bbc 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -7046,35 +7046,6 @@ conformable_arrays (gfc_expr *e1, gfc_expr *e2)
   return true;
 }
 
-static void
-cond_init (gfc_code *code, gfc_expr *e, int pointer, gfc_expr *init_e)
-{
-  gfc_code *block;
-  gfc_expr *cond;
-  gfc_code *init_st;
-  gfc_expr *e_to_init = gfc_expr_to_initialize (e);
-
-  cond = pointer
-? gfc_build_intrinsic_call (gfc_current_ns, GFC_ISYM_ASSOCIATED,
-	"associated", code->loc, 2, gfc_copy_expr (e_to_init), NULL)
-: gfc_build_intrinsic_call (gfc_current_ns, GFC_ISYM_ALLOCATED,
-	"allocated", code->loc, 1, gfc_copy_expr (e_to_init));
-
-  init_st = gfc_get_code (EXEC_INIT_ASSIGN);
-  init_st->loc = code->loc;
-  init_st->expr1 = e_to_init;
-  init_st->expr2 = init_e;
-
-  block = gfc_get_code (EXEC_IF);
-  block->loc = code->loc;
-  block->block = gfc_get_code (EXEC_IF);
-  block->block->loc = code->loc;
-  block->block->expr1 = cond;
-  block->block->next = init_st;
-  block->next = code->next;
-
-  code->next = block;
-}
 
 /* Resolve the expression in an ALLOCATE statement, doing the additional
checks to see whether the expression is OK or not.  The expression must
@@ -7325,34 +7296,6 @@ resolve_allocate_expr (gfc_expr *e, gfc_code *code, bool *array_alloc_wo_spec)
   /* We have to zero initialize the integer variable.  */
   code->expr3 = gfc_get_int_expr (gfc_default_integer_kind, >where, 0);
 }
-  else if (!code->expr3)
-{
-  /* Set up default initializer if needed.  */
-  gfc_typespec ts;
-  gfc_expr *init_e;
-
-  if (gfc_bt_struct (code->ext.alloc.ts.type))
-	ts = code->ext.alloc.ts;
-  else
-	ts = e->ts;
-
-  if (ts.type == BT_CLASS)
-	ts = ts.u.derived->components->ts;
-
-

Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Uros Bizjak

On Thu, Nov 3, 2016 at 1:58 PM, Eric Botcazou  wrote:
>> libfunc, as in "__{,u}divmod{di,ti}4 library function" is already
>> implemented in libgcc. But the enablement of this function inside the
>> compiler has to be performed by each target.
>
> So can we do it generically instead of duplicating it ~50 times?

I guess it can be done. Currently the expander goes:

--cut here--
  /* Check if optab_handler exists for divmod_optab for given mode.  */
  if (optab_handler (tab, mode) != CODE_FOR_nothing)
{
  quotient = gen_reg_rtx (mode);
  remainder = gen_reg_rtx (mode);
  expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp);
}

  /* Generate call to divmod libfunc if it exists.  */
  else if ((libfunc = optab_libfunc (tab, mode)) != NULL_RTX)
targetm.expand_divmod_libfunc (libfunc, mode, op0, op1,
   , );

  else
gcc_unreachable ();
--cut here--

so, by declaring divmod libfunc, the target also has to provide target hook.

Let's ask authors of the original divmod patch for the details.

Uros.

RE: [PATCH,testsuite] MIPS: Downgrade R6 to R5 if tests need branch-likely instructions.

2016-11-03 Thread Matthew Fortune

Toma Tabacu  writes:
> The gcc.target/mips/wrap-delay.c test was failing on mips-img-*
> toolchains
> because it was using -mbranch-likely with an R6 target, and branch-
> likely
> instructions were removed in R6.
> 
> This patch makes the testsuite downgrade to R5 if the -mbranch-likely
> option
> is present and we're targeting R6.
> 
> Tested with mips-img-elf and mips-img-linux-gnu.

Hi Toma,

Welcome to GCC development, thanks for your first patch. As you can see
from Catherine's reply the change looks good. I'll just cover some
housekeeping issues...

> gcc/testsuite/
> * gcc.target/mips/mips.exp: Add check for -mbranch-likely in
> condition for R5 downgrade.

Changelogs are an art form which will take some getting used to. This
is almost there but needs to reference the function affected.

* gcc.target/mips/mips.exp (mips-dg-options): Downgrade to R5
for -mbranch-likely and infer -mno-branch-likely for R6.

I have extended the comment as well as there is an additional change
needed for this patch ideally.

> diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> b/gcc/testsuite/gcc.target/mips/mips.exp
> index 7c24140..382d69c 100644
> --- a/gcc/testsuite/gcc.target/mips/mips.exp
> +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> @@ -1176,7 +1176,8 @@ proc mips-dg-options { args } {
>|| [mips_have_test_option_p options "-mpaired-
> single"]
>|| [mips_have_test_option_p options "-
> mnan=legacy"]
>|| [mips_have_test_option_p options "-
> mabs=legacy"]
> -  || [mips_have_test_option_p options "!HAS_LSA"])
> } {
> +  || [mips_have_test_option_p options "!HAS_LSA"]
> +  || [mips_have_test_option_p options "-mbranch-
> likely"]) } {
> if { $gp_size == 32 } {
> mips_make_test_option options "-mips32r5"
> } else {

Please can you make sure to retain the original patch formatting
when posting. I suspect you have copied this out of a putty session
or similar and have therefore lost the tabs.

The extra change is that in the post-arch option processing we will
need to infer -mno-branch-likely for the $isa_rev > 5 case much like
we infer -mnan=2008 and -mabs=2008. This is so that when running the
testsuite using -mips32r5 or earlier, with -mbranch-likely as part of
the user-supplied test flags, then a test which is upgraded to
mips32r6 does not attempt to use -mbranch-likely.

Hope that wasn't too cryptic!

Thanks,
Matthew

Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Bernd Schmidt


On 11/03/2016 01:33 PM, Eric Botcazou wrote:

Thanks for the feedback,  I'll try to work through this.


Thanks, but since there doesn't seem to be a consensus on the goal, feel free
to disregard it and just implement what you need for your initial work.


FWIW here's a more complete version of my patch which I'm currently 
testing. Let me know if you think it's at least a good enough 
intermediate step to be installed. I think, the sel-sched example 
notwithstanding, this is something that normally should not be needed 
outside of emit_copy_of_insn_after, so having that do the right thing 
ought to be good enough.



Bernd

	* emit-rtl.c (emit_copy_of_insn_after): Duplicate notes in order.
	* sel-sched-ir.c (create_copy_of_insn_rtx): Likewise.
	* rtl.h (duplicate_reg_notes): Declare.
	* rtlanal.c (duplicate_reg_note): New function.

Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c	(revision 241233)
+++ gcc/emit-rtl.c	(working copy)
@@ -6169,17 +6169,18 @@ emit_copy_of_insn_after (rtx_insn *insn,
  which may be duplicated by the basic block reordering code.  */
   RTX_FRAME_RELATED_P (new_rtx) = RTX_FRAME_RELATED_P (insn);
 
+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = _NOTES (new_rtx);
+  gcc_assert (*ptail == NULL_RTX);
+
   /* Copy all REG_NOTES except REG_LABEL_OPERAND since mark_jump_label
  will make them.  REG_LABEL_TARGETs are created there too, but are
  supposed to be sticky, so we copy them.  */
   for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
 if (REG_NOTE_KIND (link) != REG_LABEL_OPERAND)
   {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (new_rtx, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
-	else
-	  add_shallow_copy_of_reg_note (new_rtx, link);
+	*ptail = duplicate_reg_note (link);
+	ptail =  (*ptail, 1);
   }
 
   INSN_CODE (new_rtx) = INSN_CODE (insn);
Index: gcc/rtl.h
===
--- gcc/rtl.h	(revision 241233)
+++ gcc/rtl.h	(working copy)
@@ -3008,6 +3008,7 @@ extern rtx alloc_reg_note (enum reg_note
 extern void add_reg_note (rtx, enum reg_note, rtx);
 extern void add_int_reg_note (rtx, enum reg_note, int);
 extern void add_shallow_copy_of_reg_note (rtx_insn *, rtx);
+extern rtx duplicate_reg_note (rtx);
 extern void remove_note (rtx, const_rtx);
 extern void remove_reg_equal_equiv_notes (rtx_insn *);
 extern void remove_reg_equal_equiv_notes_for_regno (unsigned int);
Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c	(revision 241233)
+++ gcc/rtlanal.c	(working copy)
@@ -2304,6 +2304,20 @@ add_shallow_copy_of_reg_note (rtx_insn *
 add_reg_note (insn, REG_NOTE_KIND (note), XEXP (note, 0));
 }
 
+/* Duplicate NOTE and return the copy.  */
+rtx
+duplicate_reg_note (rtx note)
+{
+  reg_note kind = REG_NOTE_KIND (note);
+
+  if (GET_CODE (note) == INT_LIST)
+return gen_rtx_INT_LIST ((machine_mode) kind, XINT (note, 0), NULL_RTX);
+  else if (GET_CODE (note) == EXPR_LIST)
+return alloc_reg_note (kind, copy_insn_1 (XEXP (note, 0)), NULL_RTX);
+  else
+return alloc_reg_note (kind, XEXP (note, 0), NULL_RTX);
+}
+
 /* Remove register note NOTE from the REG_NOTES of INSN.  */
 
 void
Index: gcc/sel-sched-ir.c
===
--- gcc/sel-sched-ir.c	(revision 241233)
+++ gcc/sel-sched-ir.c	(working copy)
@@ -5762,6 +5762,10 @@ create_copy_of_insn_rtx (rtx insn_rtx)
   res = create_insn_rtx_from_pattern (copy_rtx (PATTERN (insn_rtx)),
   NULL_RTX);
 
+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = _NOTES (res);
+  gcc_assert (*ptail == NULL_RTX);
+
   /* Copy all REG_NOTES except REG_EQUAL/REG_EQUIV and REG_LABEL_OPERAND
  since mark_jump_label will make them.  REG_LABEL_TARGETs are created
  there too, but are supposed to be sticky, so we copy them.  */
@@ -5770,11 +5774,8 @@ create_copy_of_insn_rtx (rtx insn_rtx)
 	&& REG_NOTE_KIND (link) != REG_EQUAL
 	&& REG_NOTE_KIND (link) != REG_EQUIV)
   {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (res, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
-	else
-	  add_reg_note (res, REG_NOTE_KIND (link), XEXP (link, 0));
+	*ptail = duplicate_reg_note (link);
+	ptail =  (*ptail, 1);
   }
 
   return res;

Re: [PATCH] Convert character arrays to string csts

2016-11-03 Thread Jan Hubicka

> On 11/03/2016 01:12 PM, Martin Liška wrote:
> >+  tree init = DECL_INITIAL (decl);
> >+  if (init
> >+  && TREE_READONLY (decl)
> >+  && can_convert_ctor_to_string_cst (init))
> >+DECL_INITIAL (decl) = build_string_cst_from_ctor (init);
> 
> I'd merge these two new functions since they're only ever called
> together. We'd then have something like
> 
> if (init && TREE_READONLY (decl))
>   init = convert_ctor_to_string_cst (init);
> if (init)
>   DECL_INITIAL (decl) = init;
> 
> I'll defer to Jan on whether finalize_decl seems like a good place
> to do this.

I think finalize_decl may be bit too early because frontends may expects the
ctors to be in a way they produced them.  We only want to convert those arrays
seen by middle-end so I would move the logic to varpool_node::analyze

Otherwise the patch seems fine to me.

Honza
> 
> >diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c 
> >b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
> >index 283bd1c..b2d1fd5 100644
> >--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
> >+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
> >@@ -4,12 +4,15 @@
> > char *buffer1;
> > char *buffer2;
> >
> >+const char global[] = {'a', 'b', 'c', 'd', '\0'};
> >+
> > #define SIZE 1000
> >
> > int
> > main (void)
> > {
> >   const char* const foo1 = "hello world";
> >+  const char local[] = "abcd";
> >
> >   buffer1 = __builtin_malloc (SIZE);
> >   __builtin_strcpy (buffer1, foo1);
> >@@ -45,6 +48,10 @@ main (void)
> > __builtin_abort ();
> >   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
> > __builtin_abort ();
> >+  if (__builtin_memchr (global, null, 5) == 0)
> >+__builtin_abort ();
> >+  if (__builtin_memchr (local, null, 5) == 0)
> >+__builtin_abort ();
> 
> How is that a meaningful test? This seems to work even with an
> unpatched gcc. I'd have expected something that shows a benefit for
> doing this conversion, and maybe also a test that shows it isn't
> done in cases where it's not allowed.
> 
> > tree
> >-build_string_literal (int len, const char *str)
> >+build_string_literal (int len, const char *str, bool build_addr_expr)
> 
> New arguments should be documented in the function comment.
> 
> >+/* Return TRUE when CTOR can be converted to a string constant.  */
> 
> "if", not "when".
> 
> >+  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);
> >+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
> >+{
> >+  if (key == NULL_TREE
> >+  || TREE_CODE (key) != INTEGER_CST
> >+  || !tree_fits_uhwi_p (value)
> >+  || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))
> >+return false;
> 
> Shouldn't all elements have the same type, or do you really have to
> call useless_type_conversion in a loop?
> 
> >+  /* Allow zero character just at the end of a string.  */
> >+  if (integer_zerop (value))
> >+return idx == elements - 1;
> 
> Don't you also have to explicitly check it's there?
> 
> 
> Bernd

Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Eric Botcazou

> libfunc, as in "__{,u}divmod{di,ti}4 library function" is already
> implemented in libgcc. But the enablement of this function inside the
> compiler has to be performed by each target.

So can we do it generically instead of duplicating it ~50 times?

-- 
Eric Botcazou

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Markus Trippelsdorf

On 2016.11.03 at 13:32 +0100, Jakub Jelinek wrote:
> On Thu, Nov 03, 2016 at 01:22:11PM +0100, Bernd Schmidt wrote:
> > On 11/03/2016 12:58 PM, Jakub Jelinek wrote:
> > >On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> > >>I'm concerned about the number of false positives for this warning, and
> > >>judging by previous discussions, I'm not alone in this. This patch limits 
> > >>it
> > >>to level 1 (any comment before the case label disables the warning) for
> > >>cases where the user specified no explicit level. It'll still generate
> > >>enough noise that people will be aware of it and can choose whether to 
> > >>use a
> > >>higher level or not.
> > >>
> > >>Bootstrapped and tested on x86_64-linux. Ok?
> > >
> > >I disagree, I'm ok with changing it to 2, but 1 is too much.
> > 
> > Well, we have data from our own sources where we had to "fix" lots of
> > perfectly good code, and also from the Linux kernel. From an earlier
> > discussion:
> 
> That data wasn't really convincing on this.  All it proved is that most of
> the cases are (likely) deliberate fall-throughs without any comment
> whatsoever, the rest is in the noise.  As one needs to deal with those
> where comments are missing altogether, dealing with the noise is acceptable.

Without Bernd's patch to set the default to 1 you will drown in false
positives once you start using gcc-7 to build a whole distro. On my
Gentoo test box anything but level 1 is simply unacceptable, because you
will miss important other warnings in the -Wimplicit-fallthrough noise
otherwise.

-- 
Markus

RE: [PATCH,testsuite] MIPS: Downgrade R6 to R5 if tests need branch-likely instructions.

2016-11-03 Thread Moore, Catherine



> -Original Message-
> From: Toma Tabacu [mailto:toma.tab...@imgtec.com]
> Sent: Thursday, November 3, 2016 6:58 AM
> Subject: [PATCH,testsuite] MIPS: Downgrade R6 to R5 if tests need
> branch-likely instructions.
> 
> 
> gcc/testsuite/
> * gcc.target/mips/mips.exp: Add check for -mbranch-likely in
> condition for R5 downgrade.
> 

This patch is OK.
Catherine

Re: [patch, avr] Make pmem-wrap-around option as default

2016-11-03 Thread Georg-Johann Lay


On 03.11.2016 08:58, Pitchumani Sivanupandi wrote:

Most of the AVR's 8k memorydevices have only rjmp instruction, not jmp. So, it
is important to wrap around jump destination to check if it can reach backwards.

Currently link specs passes --pmem-wrap-around=xxK when mrelax and
mpmem-wrap-around options are enabled. Attached patch changes the specs so that
option --pmem-wrap-around=8K is passed for 8k memory devices if
-mno-pmem-wrap-around is not enabled.

If OK, could someone commit please?

Note: Currently 8k devices are identified based on name prefix. We are working
on alternative method to incorporate flash memory size.


Currently, "at90usb8" this the only prefix that results in wrap_k = 8, so 
without adding knowledge about flash sizes to the compiler, the change makes 
not much sense... (but does not do harm either).


The right place for flash size info would be avr-mcus.def together with a new 
command option to specify the flash size and draw conclusions from it.  When I 
asked Atmel about a mapping of Device to flash size, the support told me to 
check the data sheets -- not really practical for ~300 devices in avr-mcus.def 
:-)  Atmel *must* have information about this...


The new option would render -mn-flash superfluous, but we should keep it for 
backward compatibility.


Johann



Regards,
Pitchumani

gcc/ChangeLog

2016-11-03  Pitchumani Sivanupandi  

* config/avr/gen-avr-mmcu-specs.c (print_mcu): Update link_pmem_wrap spec
string to add linker option --pmem-wrap-around=8k.
* config/avr/specs.h: Add link_pmem_wrap to linker specs (LINK_SPEC).




diff --git a/gcc/config/avr/specs.h b/gcc/config/avr/specs.h
index 52763cc..fbf0ce6 100644
--- a/gcc/config/avr/specs.h
+++ b/gcc/config/avr/specs.h
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
   "%(link_data_start) " \
   "%(link_text_start) " \
   "%(link_relax) "  \
+  "%(link_pmem_wrap) "  \
   "%{shared:%eshared is not supported} "

 #undef  LIB_SPEC


Shouldn't link_pmem_wrap then be removed from link_relax, i.e. from 
LINK_RELAX_SPEC?  And what happens if relaxation is off?


Johann

Re: [PATCH] Convert character arrays to string csts

2016-11-03 Thread Bernd Schmidt


On 11/03/2016 01:12 PM, Martin Liška wrote:

+  tree init = DECL_INITIAL (decl);
+  if (init
+  && TREE_READONLY (decl)
+  && can_convert_ctor_to_string_cst (init))
+DECL_INITIAL (decl) = build_string_cst_from_ctor (init);


I'd merge these two new functions since they're only ever called 
together. We'd then have something like


if (init && TREE_READONLY (decl))
  init = convert_ctor_to_string_cst (init);
if (init)
  DECL_INITIAL (decl) = init;

I'll defer to Jan on whether finalize_decl seems like a good place to do 
this.



diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
index 283bd1c..b2d1fd5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
@@ -4,12 +4,15 @@
 char *buffer1;
 char *buffer2;

+const char global[] = {'a', 'b', 'c', 'd', '\0'};
+
 #define SIZE 1000

 int
 main (void)
 {
   const char* const foo1 = "hello world";
+  const char local[] = "abcd";

   buffer1 = __builtin_malloc (SIZE);
   __builtin_strcpy (buffer1, foo1);
@@ -45,6 +48,10 @@ main (void)
 __builtin_abort ();
   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
 __builtin_abort ();
+  if (__builtin_memchr (global, null, 5) == 0)
+__builtin_abort ();
+  if (__builtin_memchr (local, null, 5) == 0)
+__builtin_abort ();


How is that a meaningful test? This seems to work even with an unpatched 
gcc. I'd have expected something that shows a benefit for doing this 
conversion, and maybe also a test that shows it isn't done in cases 
where it's not allowed.



 tree
-build_string_literal (int len, const char *str)
+build_string_literal (int len, const char *str, bool build_addr_expr)


New arguments should be documented in the function comment.


+/* Return TRUE when CTOR can be converted to a string constant.  */


"if", not "when".


+  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
+{
+  if (key == NULL_TREE
+ || TREE_CODE (key) != INTEGER_CST
+ || !tree_fits_uhwi_p (value)
+ || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))
+   return false;


Shouldn't all elements have the same type, or do you really have to call 
useless_type_conversion in a loop?



+  /* Allow zero character just at the end of a string.  */
+  if (integer_zerop (value))
+   return idx == elements - 1;


Don't you also have to explicitly check it's there?


Bernd

Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Uros Bizjak

On Thu, Nov 3, 2016 at 1:18 PM, Eric Botcazou  wrote:
>> libfunc is already implemented for all targets to use, there is also:
>>
>> OPTAB_NC(sdivmod_optab, "divmod$a4", UNKNOWN)
>> OPTAB_NC(udivmod_optab, "udivmod$a4", UNKNOWN)
>>
>> in optabs.def that can probably be changed  for generic expansion.
>
> So what's the purpose of ix86_init_libfuncs if the libfuncs are already there?

libfunc, as in "__{,u}divmod{di,ti}4 library function" is already
implemented in libgcc. But the enablement of this function inside the
compiler has to be performed by each target.

Uros.

Re: [PATCH] Convert character arrays to string csts

2016-11-03 Thread Richard Biener

On Thu, Nov 3, 2016 at 1:12 PM, Martin Liška  wrote:
> Hello.
>
> This is small follow-up of the patches I sent to string folding.
> The patch transforms character array defined in an initializer to string
> constant:
>
> +const char global[] = {'a', 'b', 'c', 'd', '\0'};
>
> Apart from that, it also enables string folding of local symbols like:
> +  const char local[] = "abcd";
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Hmm, does this handle

const char global[] = {'a', [4] = 'd', 'b', [3] = '\0'};

correctly?

I think you need to check that 'key' is increasing and there are no gaps
(ISTR constructor elements are sorted but gaps can still appear).

+ || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))

please instead check the element type of the array constructor.  I'd also
use a stricter check like TYPE_MAIN_VARIANT (type) == char_type_node
to avoid changing non-strings like unsigned / signed char.

Finally I'm a bit worried about doing this for all 'char'
constructors.  Maybe we
should restrict this to those with ISPRINT () entries?

Richard.

> Martin

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-03 Thread Yuri Rumyantsev

Hi Richard,

I did not understand your last remark:

> That is, here (and avoid the FOR_EACH_LOOP change):
>
> @@ -580,12 +586,21 @@ vectorize_loops (void)
>   && dump_enabled_p ())
>   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
>"loop vectorized\n");
> -   vect_transform_loop (loop_vinfo);
> +   new_loop = vect_transform_loop (loop_vinfo);
> num_vectorized_loops++;
>/* Now that the loop has been vectorized, allow it to be unrolled
>   etc.  */
>  loop->force_vectorize = false;
>
> +   /* Add new loop to a processing queue.  To make it easier
> +  to match loop and its epilogue vectorization in dumps
> +  put new loop as the next loop to process.  */
> +   if (new_loop)
> + {
> +   loops.safe_insert (i + 1, new_loop->num);
> +   vect_loops_num = number_of_loops (cfun);
> + }
>
> simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
f> unction which will set up stuff properly (and also perform
> the if-conversion of the epilogue there).
>
> That said, if we can get in non-masked epilogue vectorization
> separately that would be great.

Could you please clarify your proposal.

Thanks.
Yuri.

2016-11-02 15:27 GMT+03:00 Richard Biener :
> On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
>
>> Hi All,
>>
>> I re-send all patches sent by Ilya earlier for review which support
>> vectorization of loop epilogues and loops with low trip count. We
>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>> approved by Jeff.
>>
>> I did re-base of all patches and performed bootstrapping and
>> regression testing that did not show any new failures. Also all
>> changes related to new vect_do_peeling algorithm have been changed
>> accordingly.
>>
>> Is it OK for trunk?
>
> I would have prefered that the series up to -03-nomask-tails would
> _only_ contain epilogue loop vectorization changes but unfortunately
> the patchset is oddly separated.
>
> I have a comment on that part nevertheless:
>
> @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> loop_vinfo)
>/* Check if we can possibly peel the loop.  */
>if (!vect_can_advance_ivs_p (loop_vinfo)
>|| !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
> -  || loop->inner)
> +  || loop->inner
> +  /* Required peeling was performed in prologue and
> +is not required for epilogue.  */
> +  || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
>  do_peeling = false;
>
>if (do_peeling
> @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info
> loop_vinfo)
>
>do_versioning =
> optimize_loop_nest_for_speed_p (loop)
> -   && (!loop->inner); /* FORNOW */
> +   && (!loop->inner) /* FORNOW */
> +/* Required versioning was performed for the
> +  original loop and is not required for epilogue.  */
> +   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
>
>if (do_versioning)
>  {
>
> please do that check in the single caller of this function.
>
> Otherwise I still dislike the new ->aux use and I believe that simply
> passing down info from the processed parent would be _much_ cleaner.
> That is, here (and avoid the FOR_EACH_LOOP change):
>
> @@ -580,12 +586,21 @@ vectorize_loops (void)
> && dump_enabled_p ())
>dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
> "loop vectorized\n");
> -   vect_transform_loop (loop_vinfo);
> +   new_loop = vect_transform_loop (loop_vinfo);
> num_vectorized_loops++;
> /* Now that the loop has been vectorized, allow it to be unrolled
>etc.  */
> loop->force_vectorize = false;
>
> +   /* Add new loop to a processing queue.  To make it easier
> +  to match loop and its epilogue vectorization in dumps
> +  put new loop as the next loop to process.  */
> +   if (new_loop)
> + {
> +   loops.safe_insert (i + 1, new_loop->num);
> +   vect_loops_num = number_of_loops (cfun);
> + }
>
> simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
> function which will set up stuff properly (and also perform
> the if-conversion of the epilogue there).
>
> That said, if we can get in non-masked epilogue vectorization
> separately that would be great.
>
> I'm still torn about all the rest of the stuff and question its
> usability (esp. merging the epilogue with the main vector loop).
> But it has already been approved ... oh well.
>
> Thanks,
> Richard.

Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Eric Botcazou

> Thanks for the feedback,  I'll try to work through this.

Thanks, but since there doesn't seem to be a consensus on the goal, feel free 
to disregard it and just implement what you need for your initial work.

-- 
Eric Botcazou

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Jakub Jelinek

On Thu, Nov 03, 2016 at 01:22:11PM +0100, Bernd Schmidt wrote:
> On 11/03/2016 12:58 PM, Jakub Jelinek wrote:
> >On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:
> >>I'm concerned about the number of false positives for this warning, and
> >>judging by previous discussions, I'm not alone in this. This patch limits it
> >>to level 1 (any comment before the case label disables the warning) for
> >>cases where the user specified no explicit level. It'll still generate
> >>enough noise that people will be aware of it and can choose whether to use a
> >>higher level or not.
> >>
> >>Bootstrapped and tested on x86_64-linux. Ok?
> >
> >I disagree, I'm ok with changing it to 2, but 1 is too much.
> 
> Well, we have data from our own sources where we had to "fix" lots of
> perfectly good code, and also from the Linux kernel. From an earlier
> discussion:

That data wasn't really convincing on this.  All it proved is that most of
the cases are (likely) deliberate fall-throughs without any comment
whatsoever, the rest is in the noise.  As one needs to deal with those
where comments are missing altogether, dealing with the noise is acceptable.

> Markus Trippelsdorf wrote:
> >>>I randomly looked at the differences and almost all additional
> >>>-Wimplicit-fallthrough=2 warnings are bogus (~5% are genuine).
> >>>And _all_ additional -Wimplicit-fallthrough=3 warnings appear
> >>>to be bogus.
> [...]
> >Actually looking more closely it appears that all of the 136 additional
> >warnings for level 2 are bogus, too.
> 
> Also, levels above 1 enforce English as a language, which isn't something we
> should be doing, even if we could detect fallthrough comments reliably (and
> we can't).

If you use non-english comment, then you need to add lint-like comments for
this, or attributes.  IMHO it is really not very high cost (compared to
all those without any comment at all) to adjust code, what takes time is to
analyze if something is intentional or invalid fallthrough.  The clearer the
comments are on that, the better.

Jakub

[PATCH] AIX xcoff declare object name

2016-11-03 Thread David Edelsohn

During Richi's debugging of the section type conflict, he found a
problem with rs6000_xcoff_declare_object_name trying to access a decl
that did not exist.  Fixed thusly.

* config/rs6000/rs6000.c (rs6000_xcoff_declare_object_name): Use
symtab_node::get_create.

Index: rs6000.c
===
--- rs6000.c(revision 241814)
+++ rs6000.c(working copy)
@@ -35414,8 +35414,8 @@ rs6000_xcoff_declare_object_name (FILE *file, cons
   struct declare_alias_data data = {file, false};
   RS6000_OUTPUT_BASENAME (file, name);
   fputs (":\n", file);
-  symtab_node::get (decl)->call_for_symbol_and_aliases (rs6000_declare_alias,
-   , true);
+  symtab_node::get_create (decl)->call_for_symbol_and_aliases
(rs6000_declare_alias,
+  , true);
 }

 /* Overide the default 'SYMBOL-.' syntax with AIX compatible 'SYMBOL-$'. */

Re: Change default level for -Wimplicit-fallthrough

2016-11-03 Thread Bernd Schmidt


On 11/03/2016 12:58 PM, Jakub Jelinek wrote:

On Thu, Nov 03, 2016 at 12:51:15PM +0100, Bernd Schmidt wrote:

I'm concerned about the number of false positives for this warning, and
judging by previous discussions, I'm not alone in this. This patch limits it
to level 1 (any comment before the case label disables the warning) for
cases where the user specified no explicit level. It'll still generate
enough noise that people will be aware of it and can choose whether to use a
higher level or not.

Bootstrapped and tested on x86_64-linux. Ok?


I disagree, I'm ok with changing it to 2, but 1 is too much.


Well, we have data from our own sources where we had to "fix" lots of 
perfectly good code, and also from the Linux kernel. From an earlier 
discussion:


Markus Trippelsdorf wrote:

I randomly looked at the differences and almost all additional
-Wimplicit-fallthrough=2 warnings are bogus (~5% are genuine).
And _all_ additional -Wimplicit-fallthrough=3 warnings appear
to be bogus.

[...]

Actually looking more closely it appears that all of the 136 additional
warnings for level 2 are bogus, too.


Also, levels above 1 enforce English as a language, which isn't 
something we should be doing, even if we could detect fallthrough 
comments reliably (and we can't).



Bernd

Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-11-03 Thread Jiong Wang


On 03/11/16 12:06, Eric Botcazou wrote:

What's your decision on this?

I think that we ought to standardize on a single order for note copying in the
RTL middle-end and the best way to enforce it is to have a single primitive in
rtlanal.c, with an optional filtering.  Bernd's patch is a step in the right
direction, but doesn't enforce the single order.  Maybe something based on a
macro calling duplicate_reg_note, but not clear whether it's really better.


Thanks for the feedback,  I'll try to work through this.

Regards,
Jiong

[PATCH][ARM] Fix ldrd offsets

2016-11-03 Thread Wilco Dijkstra

Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091.  This reduces the number of addressing instructions
when using DI mode operations (such as in PR77308).

Bootstrap & regress OK.

ChangeLog:
2015-11-03  Wilco Dijkstra  

gcc/
* config/arm/arm.c (arm_legitimate_index_p): Add comment.
(thumb2_legitimate_index_p): Use correct range for DI/DF mode.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
3c4c7042d9c2101619722b5822b3d1ca37d637b9..5d12cf9c46c27d60a278d90584bde36ec86bb3fe
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7486,6 +7486,8 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
{
  HOST_WIDE_INT val = INTVAL (index);
 
+ /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+If vldr is selected it uses arm_coproc_mem_operand.  */
  if (TARGET_LDRD)
return val > -256 && val < 256;
  else
@@ -7613,11 +7615,13 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
index, int strict_p)
   if (code == CONST_INT)
{
  HOST_WIDE_INT val = INTVAL (index);
- /* ??? Can we assume ldrd for thumb2?  */
- /* Thumb-2 ldrd only has reg+const addressing modes.  */
- /* ldrd supports offsets of +-1020.
-However the ldr fallback does not.  */
- return val > -256 && val < 256 && (val & 3) == 0;
+ /* Thumb-2 ldrd only has reg+const addressing modes.
+Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+If vldr is selected it uses arm_coproc_mem_operand.  */
+ if (TARGET_LDRD)
+   return IN_RANGE (val, -1020, 1020) && (val & 3) == 0;
+ else
+   return IN_RANGE (val, -255, 4095 - 4);
}
   else
return 0;

Re: [PATCH 2/2, i386]: Implement TARGET_EXPAND_DIVMOD_LIBFUNC

2016-11-03 Thread Eric Botcazou

> libfunc is already implemented for all targets to use, there is also:
> 
> OPTAB_NC(sdivmod_optab, "divmod$a4", UNKNOWN)
> OPTAB_NC(udivmod_optab, "udivmod$a4", UNKNOWN)
> 
> in optabs.def that can probably be changed  for generic expansion.

So what's the purpose of ix86_init_libfuncs if the libfuncs are already there?

-- 
Eric Botcazou

1 2 >

1 - 100 of 130 matches

Mail list logo