date:20130410

Re: [patch][DF] fix df_find_def and df_find_use

2013-04-10 Thread Paolo Bonzini

Il 10/04/2013 21:33, Steven Bosscher ha scritto:
> Hello,
> 
> df_find_def and df_find_use do not work properly for hard registers
> because rtx_equal_p returns false for the case where
> REGNO(x)==REGNO(y) but the modes are different. This happened as
> follows in my case:
> 
> Breakpoint 9, df_reg_used (insn=0x3fffb6083c50, reg=0x3fffb5eb72e0) at
> ../../trunk/gcc/df-core.c:1856
> 1856  return df_find_use (insn, reg) != NULL;
> (gdb) p debug_rtx(reg)
> (reg:CCX 100 %icc)
> $37 = void
> (gdb) p debug_rtx(insn)
> (jump_insn 34 33 35 3 (set (pc)
> (if_then_else (le (reg:CCX 100 %icc)
> (const_int 0 [0]))
> (label_ref 44)
> (pc))) t.c:25 48 {*normal_branch}
>  (expr_list:REG_DEAD (reg:CCX 100 %icc)
> (expr_list:REG_BR_PROB (const_int 3900 [0xf3c])
> (nil)))
>  -> 44)
> $38 = void
> (gdb) step
> df_find_use (insn=0x3fffb6083c50, reg=0x3fffb5eb72e0) at
> ../../trunk/gcc/df-core.c:1829
> ...
> 1837  if (rtx_equal_p (DF_REF_REAL_REG (use), reg))
> (gdb) p debug_rtx(reg)
> (reg:CCX 100 %icc)
> $39 = void
> (gdb) p debug_rtx(DF_REF_REAL_REG(use))
> (reg:CC 100 %icc)
> $40 = void
> (gdb) p rtx_equal_p (DF_REF_REAL_REG (use), reg)
> $41 = 0
> 
> 
> I think we should just compare REGNO instead of going through rtx_equal_p.
> 
> Bootstrapped&tested on x86_64-unknown-linux-gnu.
> OK for trunk?

Ok.  Was this with out-of-tree patches?

Paolo

> Ciao!
> Steven
> 
> 
> 
> * df-core.c (df_find_def): Compare register numbers.
> (df_find_use): Compare register numbers.
> 
> Index: df-core.c
> ===
> --- df-core.c   (revision 197610)
> +++ df-core.c   (working copy)
> @@ -1800,7 +1800,7 @@ df_find_def (rtx insn, rtx reg)
>for (def_rec = DF_INSN_UID_DEFS (uid); *def_rec; def_rec++)
>  {
>df_ref def = *def_rec;
> -  if (rtx_equal_p (DF_REF_REAL_REG (def), reg))
> +  if (DF_REF_REGNO (def) == REGNO (reg))
> return def;
>  }
> 
> @@ -1834,14 +1834,14 @@ df_find_use (rtx insn, rtx reg)
>for (use_rec = DF_INSN_UID_USES (uid); *use_rec; use_rec++)
>  {
>df_ref use = *use_rec;
> -  if (rtx_equal_p (DF_REF_REAL_REG (use), reg))
> +  if (DF_REF_REGNO (use) == REGNO (reg))
> return use;
>  }
>if (df->changeable_flags & DF_EQ_NOTES)
>  for (use_rec = DF_INSN_UID_EQ_USES (uid); *use_rec; use_rec++)
>{
> df_ref use = *use_rec;
> -   if (rtx_equal_p (DF_REF_REAL_REG (use), reg))
> +   if (DF_REF_REGNO (use) == REGNO (reg))
>   return use;
>}
>return NULL;
>

Re: RFC: color diagnostics markers

2013-04-10 Thread Jakub Jelinek

On Wed, Apr 10, 2013 at 09:04:06PM -0500, Gabriel Dos Reis wrote:
> We might be saying the same thing using different languages.
> 
> I was the %r/%R markers are ways of implementing the IL language
> I suggested in that message.  So, as such I do not object to it.
> Having an explicit call makes the FE makes a "colorful" formatting
> decision way too early -- a FE shouldn't be concerned about color matters.
> That decision should be left to the device doing the formatting.  Separation
> of concerns here isn't just taste; it is good engineering practice.

But the decision is left to the device doing the formatting.
The %r/%R only says, this text in between is of this kind (locus, quote
(well, that is automatically done by the patch also for % and %qs etc.),
etc.), and we either color that using GCC_COLORS (or default) defined color
if requested through command line option and terminal supports it, or we
don't.

As discussed earlier, alternative to the current uses of %r/%R in the
sources would be %U (first and only letter from locus that is still available)
which would take location_t and would perform on it:
  expanded_location el = expand_location (va_arg (ap, location_t));
  pp_string (pp, colorize_start (pp_show_color (pp), "locus"));
  pp_string (pp, el.file);
  pp_colon (pp);
  pp_decimal_int (pp, el.line);
  if (context->show_column && el.column)
{
  pp_colon (pp);
  pp_decimal_int (pp, el.column);
}
  pp_string (pp, colorize_stop (pp_show_color (pp)));
or so.  But I wonder if we won't need %r/%R in the future, or add more and
more formatting codes, say if we wanted to highlight something that isn't
to be quoted around, or some substring inside quotes.  With %r/%R we have
the flexibility to do so easily, with just %U we don't.

Jakub

Re: libcpp: registering both a pragma and a pragma namespace with the same name

2013-04-10 Thread Jakub Jelinek

On Wed, Apr 10, 2013 at 05:16:17PM -0700, Andrew Pinski wrote:
> On Wed, Apr 10, 2013 at 3:24 PM, Aldy Hernandez  wrote:
> > Hi Tom.  Hi folks.
> >
> > We've asked Balaji to rewrite the <#pragma simd> handling for cilkplus as we
> > currently do for OMP, etc, in init_pragma().
> >
> > The cilkplus branch currently has something like:
> >
> >   cpp_register_deferred_pragma (parse_in, "simd", "",
> >   PRAGMA_SIMD_EMPTY, true, false);
> >   cpp_register_deferred_pragma (parse_in, "simd", "assert",
> > PRAGMA_SIMD_ASSERT, true, false);
> >   cpp_register_deferred_pragma (parse_in, "simd", "noassert",
> > PRAGMA_SIMD_NOASSERT, true, false);
> >   cpp_register_deferred_pragma (parse_in, "simd", "vectorlength",
> > PRAGMA_SIMD_VECTORLENGTH, true, false);
> 
> What about just registering simd as the pragma and then look for the
> right keyword after that?  Like diagnostic is handled?

Yeah, the above is definitely wrong.  Just
  if (flag_cilkplus)
cpp_register_deferred_pragma (parse_in, NULL, "simd", PRAGMA_SIMD, true, 
false);
and parse the clauses in c/c-parser.c and cp/parser.c, look at how OpenMP
pragmas are parsed (those have the "omp" space, you just use NULL, otherwise
it is not any different).

Also check the standard whether cpp expansion is allowed or not and on what
exactly.  Like:
#define S simd
#define V vectorlength

#pragma simd vectorlength(8)
for (i = 0; i < 16; i++)
  ;

vs.

#pragma simd V(8)
for (i = 0; i < 16; i++)
  ;

vs.

#pragma S V(8)
for (i = 0; i < 16; i++)
  ;

(that's the two last arguments to cpp_register_deferred_pragma).

Jakub

RE: [PATCH GCC/pr56124] Don't prefer memory if the source of load operation has side effect

2013-04-10 Thread Bin Cheng



> -Original Message-
> From: Vladimir Makarov [mailto:vmaka...@redhat.com]
> Sent: Thursday, April 11, 2013 7:20 AM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH GCC/pr56124] Don't prefer memory if the source of load
> operation has side effect
> 
> On 13-04-06 11:16 PM, Bin Cheng wrote:
> >
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org
> >> [mailto:gcc-patches-ow...@gcc.gnu.org]
> > On
> >> Behalf Of Bin Cheng
> >> Sent: Monday, March 25, 2013 3:15 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Subject: FW: [PATCH GCC/pr56124] Don't prefer memory if the source of
> >> load operation has side effect
> >>
> >> Sorry for the wrong list.
> >>
> >> -Original Message-
> >> From: Bin Cheng [mailto:bin.ch...@arm.com]
> >> Sent: Monday, March 25, 2013 3:00 PM
> >> To: g...@gcc.gnu.org
> >> Subject: [PATCH GCC/pr56124] Don't prefer memory if the source of
> >> load operation has side effect
> >>
> >> Hi,
> >> As reported in PR56124, IRA causes redundant reload by preferring to
> >> put pseudo which is target of loading in memory. Generally this is
> >> good but
> > the
> >> case in which the src of loading has side effect.
> >> This patch fixes this issue by checking whether source of loading has
> >> side effect.
> >>
> >> I tested the patch on x86/thumb2. Is it OK? Thanks.
> >>
> >> 2013-03-25  Bin Cheng  
> >>
> >>PR target/56124
> >>* ira-costs.c (scan_one_insn): Check whether the source rtx of
> >>loading has side effect.
> > Ping.
> This patch is ok for trunk.  Thanks.  And sorry, for the delay with the
answer.
> 
Committed as r197691.

Thanks.

Re: [Patch] Add -gdwarf option to make gcc generate DWARF with the default version

2013-04-10 Thread Jason Merrill


Applied, thanks.

Jason

Re: [C++ Patch] PR 54216

2013-04-10 Thread Jason Merrill


OK.

Jason

Re: RFC: color diagnostics markers

2013-04-10 Thread Gabriel Dos Reis

On Wed, Apr 10, 2013 at 12:54 PM, Manuel López-Ibáñez
 wrote:
> On 8 April 2013 21:06, Jakub Jelinek  wrote:
>> On Mon, Apr 08, 2013 at 07:54:18PM +0200, Manuel López-Ibáñez wrote:
>>> > can be right now a single call, while you would need several.  Also, if 
>>> > you
>>> > eventually want to colorize something in say error_at, warning_at and
>>> > similar format strings.  For those you really don't have the printer at
>>>
>>> Do we really want to allow that much flexibility? Then the color_dict
>>> needs to be dynamic or the caller is restricted to re-using existing
>>> colornames.
>>
>> Yes, I think we want that flexibility, it certainly isn't that much
>> difficult to support it (a few lines of code, will try to code the %r/%R
>> variant tomorrow), and from time to time it can be useful.
>
> I am still not convinced by the %r/%R. My two concerns are that:
>
> 1) %r/%R rather than explicit function calls make the code harder to
> understand. But I guess this is a matter of taste.
>
> 2) It makes harder to decouple the diagnostics machinery from the
> actual formatting. The color should be something handled by the
> pretty-printer and transparent to the diagnostics machinery interface.
> (perhaps it should be pretty-printer-color.h instead of
> diagnostics-color.h). I generally agree with the ideas of Gabriel
> exposed here: http://gcc.gnu.org/ml/gcc/2012-04/msg00558.html. The
> difference (and perhaps I misunderstood Gabriel's position in that
> thread) is that I think that hiding the color stuff behind the
> diagnostics machinery interface does not move us farther away from
> those ideas, even thought it does not move us closer either. And we
> don't need an internal IL to do that.  However, letting the FEs add
> arbitrary colors to diagnostics does move us farther. Yes, it is a
> nice flexibility, but on the other hand, I don't really see the need
> and I am afraid it will be misused. As Gabriel says: "it would be
> really terrible idea if the intelligibility of a diagnostic -requires-
> colors.". So if the color is not required, the FE should be oblivious
> to whether there is a specific color there or not.
>
> Nonetheless, I am pragmatic. Since you already did the work (and
> improved significantly my original patch), I am fine with your patch
> (for what is worth). Thanks for working on it.
>
> Cheers,
>
> Manuel.

We might be saying the same thing using different languages.

I was the %r/%R markers are ways of implementing the IL language
I suggested in that message.  So, as such I do not object to it.
Having an explicit call makes the FE makes a "colorful" formatting
decision way too early -- a FE shouldn't be concerned about color matters.
That decision should be left to the device doing the formatting.  Separation
of concerns here isn't just taste; it is good engineering practice.

-- Gaby

Re: [PATCH] color diagnostics markers

2013-04-10 Thread Gabriel Dos Reis

On Wed, Apr 10, 2013 at 1:42 PM, Manuel López-Ibáñez
 wrote:
> On 9 April 2013 15:21, Jakub Jelinek  wrote:
>> white).  The default is still -fdiagnostics-color=never, can be changed
>> later on.
>
> Apart from my comments elsewhere
> (http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00614.html), the patch
> looks fine to me. But perhaps we should change the default to auto, at
> least during Stage 1, to find out whether some bug was introduced. If
> agreed, I could do this in a follow-up patch that also disables colors
> for the testsuite.
>
> Cheers,
>
> Manuel.

I am still of the opinion that the default should be discussed differently,
and I strongly suggest that it defaults to "never".  I do not believe we do
need to do otherwise now.

As I stated before, our pursuit of enabling everything new thing by default
may have made C++ diagnostics more terrifying.

-- Gaby

Re: [PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread Zhenqiang Chen

On 10 April 2013 18:48, Marcus Shawcroft  wrote:
> Zhenqiang, Does Jame's patch fix your test case?

Thank you all. The patch fixes my test case.

-Zhenqiang

> On 10 April 2013 11:43, Richard Earnshaw  wrote:
>> On 10/04/13 11:31, James Greenhalgh wrote:
>>>
>>>
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen
 Sent: 10 April 2013 09:02
 To: gcc-patches@gcc.gnu.org
 Cc: Marcus Shawcroft
 Subject: [PATCH, AARCH64] Fix unrecognizable insn issue

 Hi,

 During expand, function aarch64_vcond_internal inverses some CMP, e.g.

a LE b -> b GE a

 But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>>>
>>>
>>> Yes it will. We should not be swapping the comparison in these cases.
>>>

 Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
 for detail about the issue.

 The patch is to make "b" a register when inversing LE.
>>>
>>>
>>> This patch is too restrictive. There is an `fcmle v0.2d #0` form which we
>>> should be generating when we can. Also, you are only fixing one
>>> problematic
>>> case where there are a few.
>>>
>>> I don't have access to your reproducer, so I can't be certain this patch
>>> is correct - I have created my own reproducer and added it in with
>>> the other vect-fcm tests.
>>>
>>> Thorough regression tests are ongoing for this patch, but it
>>> passes aarch64.exp and vect.exp with no regressions.
>>>
>>> Thanks,
>>> James
>>>
>>> ---
>>> gcc/
>>>
>>> 2013-04-10  James Greenhalgh  
>>>
>>> * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Fix
>>> floating-point vector comparisons against 0.
>>>
>>> gcc/testsuite/
>>>
>>> 2013-04-10  James Greenhalgh  
>>>
>>> * gcc.target/aarch64/vect-fcm.x: Add check for zero forms of
>>> inverse operands.
>>> * gcc.target/aarch64/vect-fcm-eq-d.c: Check that new zero form
>>> loop is vectorized.
>>> * gcc.target/aarch64/vect-fcm-eq-f.c: Likewise.
>>> * gcc.target/aarch64/vect-fcm-ge-d.c: Check that new zero form
>>> loop is vectorized and that the correct instruction is generated.
>>> * gcc.target/aarch64/vect-fcm-ge-f.c: Likewise.
>>> * gcc.target/aarch64/vect-fcm-gt-d.c: Likewise.
>>> * gcc.target/aarch64/vect-fcm-gt-f.c: Likewise.
>>>
>>>
>>
>> OK.
>>
>> R.
>>
>>

Re: libcpp: registering both a pragma and a pragma namespace with the same name

2013-04-10 Thread Andrew Pinski

On Wed, Apr 10, 2013 at 3:24 PM, Aldy Hernandez  wrote:
> Hi Tom.  Hi folks.
>
> We've asked Balaji to rewrite the <#pragma simd> handling for cilkplus as we
> currently do for OMP, etc, in init_pragma().
>
> The cilkplus branch currently has something like:
>
>   cpp_register_deferred_pragma (parse_in, "simd", "",
>   PRAGMA_SIMD_EMPTY, true, false);
>   cpp_register_deferred_pragma (parse_in, "simd", "assert",
> PRAGMA_SIMD_ASSERT, true, false);
>   cpp_register_deferred_pragma (parse_in, "simd", "noassert",
> PRAGMA_SIMD_NOASSERT, true, false);
>   cpp_register_deferred_pragma (parse_in, "simd", "vectorlength",
> PRAGMA_SIMD_VECTORLENGTH, true, false);

What about just registering simd as the pragma and then look for the
right keyword after that?  Like diagnostic is handled?

Thanks,
Andrew Pinski

>
> Notice that #pragma simd can be both a pragma name space, and also a lone
> pragma with no arguments:
>
> #pragma simd assert
> -or-
> #pragma simd
>
> It seems like the code in libcpp's do_pragma(), specifically disallows this.
> If we're looking at a possible pragma name space, the next expected token is
> a CPP_NAME.
>
> Is there a way to handle this scenario with the current infrastructure?  If
> not, is something like the attached (untested) patch reasonable?
>
> Aldy

On Wed, Apr 10, 2013 at 3:24 PM, Aldy Hernandez  wrote:
> Hi Tom.  Hi folks.
>
> We've asked Balaji to rewrite the <#pragma simd> handling for cilkplus as we
> currently do for OMP, etc, in init_pragma().
>
> The cilkplus branch currently has something like:
>
>   cpp_register_deferred_pragma (parse_in, "simd", "",
>   PRAGMA_SIMD_EMPTY, true, false);
>   cpp_register_deferred_pragma (parse_in, "simd", "assert",
> PRAGMA_SIMD_ASSERT, true, false);
>   cpp_register_deferred_pragma (parse_in, "simd", "noassert",
> PRAGMA_SIMD_NOASSERT, true, false);
>   cpp_register_deferred_pragma (parse_in, "simd", "vectorlength",
> PRAGMA_SIMD_VECTORLENGTH, true, false);
>
> Notice that #pragma simd can be both a pragma name space, and also a lone
> pragma with no arguments:
>
> #pragma simd assert
> -or-
> #pragma simd
>
> It seems like the code in libcpp's do_pragma(), specifically disallows this.
> If we're looking at a possible pragma name space, the next expected token is
> a CPP_NAME.
>
> Is there a way to handle this scenario with the current infrastructure?  If
> not, is something like the attached (untested) patch reasonable?
>
> Aldy

[Google] Use line offset instead of absolute lineno to represent AutoFDO profile

2013-04-10 Thread Dehao Chen

Hi,

This patch
1. Uses relative line offset (lineno - start_lineno_of_function) to
represent AutoFDO profile. This ensures profile still work for
modified source code.
2. When matching the profile, add function name (bfd_name) to match
the inline stack.

Bootstrapped and passed regression tests.

Is it okay for google branches?

Thanks,
Dehao

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index d0ab1dc..07125e1 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -267,7 +267,9 @@ afdo_stack_hash (const void *stack)
   for (i = 0; i < s->size; i++) {
 const struct gcov_callsite_pos *p = s->stack + i;
 const char *file = afdo_get_filename (p->file);
+const char *func = afdo_get_bfd_name (p->func);
 h = iterative_hash (file, strlen (file), h);
+h = iterative_hash (func, strlen (func), h);
 h = iterative_hash (&p->line, sizeof (p->line), h);
 if (i == 0)
   h = iterative_hash (&p->discr, sizeof (p->discr), h);
@@ -311,6 +313,7 @@ afdo_stack_eq (const void *p, const void *q)
   const struct gcov_callsite_pos *p1 = s1->stack + i;
   const struct gcov_callsite_pos *p2 = s2->stack + i;
   if (strcmp (afdo_get_filename(p1->file), afdo_get_filename(p2->file))
+  || strcmp (afdo_get_bfd_name(p1->func), afdo_get_bfd_name (p2->func))
   || p1->line != p2->line || (i== 0 && p1->discr != p2->discr))
  return 0;
 }
@@ -538,10 +541,10 @@ get_inline_stack_size_by_edge (struct cgraph_edge *edge)
   return size;
 }

-/* Return the function name of a given lexical BLOCK.  */
+/* Return the function decl of a given lexical BLOCK.  */

-static const char *
-get_function_name_from_block (tree block)
+static tree
+get_function_decl_from_block (tree block)
 {
   tree decl;
   for (decl = BLOCK_ABSTRACT_ORIGIN (block);
@@ -549,7 +552,7 @@ get_function_name_from_block (tree block)
decl = BLOCK_ABSTRACT_ORIGIN (decl))
 if (TREE_CODE (decl) == FUNCTION_DECL)
   break;
-  return decl ? IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)) : NULL;
+  return decl;
 }

 /* Store the inline stack of STMT to POS_STACK, return the size of the
@@ -583,16 +586,22 @@ get_inline_stack_by_stmt (gimple stmt, tree decl,
block && (TREE_CODE (block) == BLOCK);
block = BLOCK_SUPERCONTEXT (block))
 {
+  tree decl = get_function_decl_from_block (block);
   if (LOCATION_LOCUS (BLOCK_SOURCE_LOCATION (block)) == UNKNOWN_LOCATION)
  continue;
   loc = BLOCK_SOURCE_LOCATION (block);
   pos_stack[idx].file = expand_location (loc).file;
   pos_stack[idx].line = expand_location (loc).line;
-  pos_stack[idx - 1].func = get_function_name_from_block (block);
+  pos_stack[idx - 1].func =
+  decl ? IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)) : NULL;
+  pos_stack[idx - 1].line -= decl ? DECL_SOURCE_LINE (decl) : 0;
   pos_stack[idx++].discr = 0;
 }
   if (decl)
-pos_stack[idx - 1].func = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+{
+  pos_stack[idx - 1].func = IDENTIFIER_POINTER
(DECL_ASSEMBLER_NAME (decl));
+  pos_stack[idx - 1].line -= DECL_SOURCE_LINE (decl);
+}
   return idx;
 }

@@ -1064,12 +1073,15 @@ read_profile (void)
  * sizeof (struct gcov_callsite_pos));
   for (k = 0; k < gcov_functions[i].stacks[j].size; k++)
 {
+  gcov_unsigned_t line, start_line;
   gcov_functions[i].stacks[j].stack[k].func =
  file_names[gcov_read_unsigned ()];
   gcov_functions[i].stacks[j].stack[k].file =
  file_names[gcov_read_unsigned ()];
+  line = gcov_read_unsigned ();
+  start_line = gcov_read_unsigned ();
   gcov_functions[i].stacks[j].stack[k].line =
- gcov_read_unsigned ();
+ line > start_line ? line - start_line : 0;
   gcov_functions[i].stacks[j].stack[k].discr =
  gcov_read_unsigned ();
 }

Re: [PATCH GCC/pr56124] Don't prefer memory if the source of load operation has side effect

2013-04-10 Thread Vladimir Makarov


On 13-04-06 11:16 PM, Bin Cheng wrote:



-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]

On

Behalf Of Bin Cheng
Sent: Monday, March 25, 2013 3:15 PM
To: gcc-patches@gcc.gnu.org
Subject: FW: [PATCH GCC/pr56124] Don't prefer memory if the source of load
operation has side effect

Sorry for the wrong list.

-Original Message-
From: Bin Cheng [mailto:bin.ch...@arm.com]
Sent: Monday, March 25, 2013 3:00 PM
To: g...@gcc.gnu.org
Subject: [PATCH GCC/pr56124] Don't prefer memory if the source of load
operation has side effect

Hi,
As reported in PR56124, IRA causes redundant reload by preferring to put
pseudo which is target of loading in memory. Generally this is good but

the

case in which the src of loading has side effect.
This patch fixes this issue by checking whether source of loading has side
effect.

I tested the patch on x86/thumb2. Is it OK? Thanks.

2013-03-25  Bin Cheng  

PR target/56124
* ira-costs.c (scan_one_insn): Check whether the source rtx of
loading has side effect.

Ping.
This patch is ok for trunk.  Thanks.  And sorry, for the delay with the 
answer.

Re: [Patch, fortran] PR 40958 Compress module files with zlib

2013-04-10 Thread Steve Kargl

On Wed, Apr 10, 2013 at 09:18:46AM -0700, Mike Stump wrote:
> On Apr 9, 2013, at 11:33 AM, Janne Blomqvist  
> wrote:
> > the attached patch reduces the size of module files on disk
> 
> Do those modules interoperate with C++ modules flawlessly?  :-)

Fortran 2008 became an ISO well before C++'s last standard.
You needs to ask the C++ guys if C++ modules were designed
to interoperate with Fortran.

-- 
Steve

libcpp: registering both a pragma and a pragma namespace with the same name

2013-04-10 Thread Aldy Hernandez


Hi Tom.  Hi folks.

We've asked Balaji to rewrite the <#pragma simd> handling for cilkplus 
as we currently do for OMP, etc, in init_pragma().


The cilkplus branch currently has something like:

  cpp_register_deferred_pragma (parse_in, "simd", "",
  PRAGMA_SIMD_EMPTY, true, false);
  cpp_register_deferred_pragma (parse_in, "simd", "assert",
PRAGMA_SIMD_ASSERT, true, false);
  cpp_register_deferred_pragma (parse_in, "simd", "noassert",
PRAGMA_SIMD_NOASSERT, true, false);
  cpp_register_deferred_pragma (parse_in, "simd", "vectorlength",
PRAGMA_SIMD_VECTORLENGTH, true, false);

Notice that #pragma simd can be both a pragma name space, and also a 
lone pragma with no arguments:


#pragma simd assert
-or-
#pragma simd

It seems like the code in libcpp's do_pragma(), specifically disallows 
this.  If we're looking at a possible pragma name space, the next 
expected token is a CPP_NAME.


Is there a way to handle this scenario with the current infrastructure? 
 If not, is something like the attached (untested) patch reasonable?


Aldy
diff --git a/libcpp/directives.c b/libcpp/directives.c
index 65b2034..d09d2a4 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -1373,7 +1373,26 @@ do_pragma (cpp_reader *pfile)
  if (token->type == CPP_NAME)
p = lookup_pragma_entry (p->u.space, token->val.node.node);
  else
-   p = NULL;
+   {
+ /* See if we can handle pragmas that are defined both as
+a pragma namespace, and as an argumentless pragma.
+For example:
+
+#pragma simd vectorlength
+#pragma simd   // empty argument
+ */
+ if (token->type == CPP_EOF)
+   {
+ const cpp_hashnode *node;
+ node = cpp_lookup (pfile, UC "", 0);
+ if (node)
+   p = lookup_pragma_entry (p->u.space, node);
+ else
+   p = NULL;
+   }
+ else
+   p = NULL;
+   }
  if (allow_name_expansion)
pfile->state.prevent_expansion++;
  count = 2;

[Fortran-dev] Merge trunk into the branch

2013-04-10 Thread Tobias Burnus

I took the opportunity to merge the trunk into the branch (Rev. 197683) 
after Janne's touch-all-Fortran-file bool patch.


Tobias

[gomp4] Little compiler OpenMP 4.0 progress

2013-04-10 Thread Jakub Jelinek

Hi!

This patch
1) starts using the new GOMP_parallel* (without _start) APIs for #pragma omp 
parallel
2) makes proc_bind clauses work
3) handles #pragma omp taskgroup (though, on the library side it is just a
   dummy right now)

Tested on x86_64-linux, committed to gomp-4_0-branch.

2013-04-10  Jakub Jelinek  

* builtin-types.def (DEF_FUNCTION_TYPE_8): Document.
(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG,
BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove.
(BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New.
* gimplify.c (gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses):
Handle OMP_CLAUSE_PROC_BIND.
* omp-builtins.def (BUILT_IN_GOMP_TASKGROUP_START,
BUILT_IN_GOMP_TASKGROUP_END, BUILT_IN_GOMP_PARALLEL_LOOP_STATIC,
BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC,
BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED,
BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME, BUILT_IN_GOMP_PARALLEL,
BUILT_IN_GOMP_PARALLEL_SECTIONS): New built-ins.
(BUILT_IN_GOMP_PARALLEL_LOOP_STATIC_START,
BUILT_IN_GOMP_PARALLEL_LOOP_DYNAMIC_START,
BUILT_IN_GOMP_PARALLEL_LOOP_GUIDED_START,
BUILT_IN_GOMP_PARALLEL_LOOP_RUNTIME_START,
BUILT_IN_GOMP_PARALLEL_START, BUILT_IN_GOMP_PARALLEL_END,
BUILT_IN_GOMP_PARALLEL_SECTIONS_START): Remove.
* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_PROC_BIND.
(expand_parallel_call): Expand #pragma omp parallel* as
calls to the new GOMP_parallel_* APIs without _start at the end,
instead of GOMP_parallel_*_start followed by fn.omp_fn.N call,
followed by GOMP_parallel_end.  Handle OMP_CLAUSE_PROC_BIND.
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1,
call_may_clobber_ref_p_1): Handle BUILT_IN_GOMP_TASKGROUP_END
instead of BUILT_IN_GOMP_PARALLEL_END.
c-family/
* c-common.c (DEF_FUNCTION_TYPE_8): Define.
* c-omp.c (c_split_parallel_clauses): Handle OMP_CLAUSE_PROC_BIND.
cp/
* cp-tree.h (finish_omp_taskgroup): New prototype.
* parser.c (cp_parser_omp_clause_proc_bind): Require ) instead of
colon at the end of the clause.
(cp_parser_omp_taskgroup): New function.
(cp_parser_omp_construct, cp_parser_pragma): Handle
PRAGMA_OMP_TASKGROUP.
* semantics.c (finish_omp_taskgroup): New function.
fortran/
* f95-lang.c (DEF_FUNCTION_TYPE_8): Define.
* types.def (DEF_FUNCTION_TYPE_8): Document.
(BT_FN_VOID_OMPFN_PTR_UINT, BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG,
BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG): Remove.
(BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_UINT,
BT_FN_VOID_OMPFN_PTR_UINT_LONG_LONG_LONG_LONG_UINT): New.
ada/
* gcc-interface/utils.c (DEF_FUNCTION_TYPE_8): Define.
lto/
* lto-lang.c (DEF_FUNCTION_TYPE_8): Define.
testsuite/
* gcc.dg/gomp/combined-1.c: Look for GOMP_parallel_loop_runtime
instead of GOMP_parallel_loop_runtime_start.

--- gcc/builtin-types.def.jj2013-03-20 10:07:24.0 +0100
+++ gcc/builtin-types.def   2013-04-10 16:55:26.154822356 +0200
@@ -34,6 +34,8 @@ along with GCC; see the file COPYING3.
DEF_FUNCTION_TYPE_5 (ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5)
DEF_FUNCTION_TYPE_6 (ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6)
DEF_FUNCTION_TYPE_7 (ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7)
+   DEF_FUNCTION_TYPE_8 (ENUM, RETURN, ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7,
+   ARG8)
 
  These macros describe function types.  ENUM is as above.  The
  RETURN type is one of the enumerals already defined.  ARG1, ARG2,
@@ -409,8 +411,6 @@ DEF_FUNCTION_TYPE_3 (BT_FN_I4_VPTR_I4_I4
 DEF_FUNCTION_TYPE_3 (BT_FN_I8_VPTR_I8_I8, BT_I8, BT_VOLATILE_PTR, BT_I8, BT_I8)
 DEF_FUNCTION_TYPE_3 (BT_FN_I16_VPTR_I16_I16, BT_I16, BT_VOLATILE_PTR,
 BT_I16, BT_I16)
-DEF_FUNCTION_TYPE_3 (BT_FN_VOID_OMPFN_PTR_UINT, BT_VOID, BT_PTR_FN_VOID_PTR,
-BT_PTR, BT_UINT)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_CONST_PTR_INT_SIZE, BT_PTR,
 BT_CONST_PTR, BT_INT, BT_SIZE)
 DEF_FUNCTION_TYPE_3 (BT_FN_I1_VPTR_I1_INT, BT_I1, BT_VOLATILE_PTR, BT_I1, 
BT_INT)
@@ -465,6 +465,9 @@ DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_VPTR_PTR
 BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I8, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_5 (BT_FN_BOOL_VPTR_PTR_I16_INT_INT,
 BT_BOOL, BT_VOLATILE_PTR, BT_PTR, BT_I16, BT_INT, BT_INT)
+DEF_FUNCTION_TYPE_5 (BT_FN_VOID_OMPFN_PTR_UINT_UINT_UINT,
+BT_VOID, BT_PTR_FN_VOID_PTR, BT_PTR, BT_UINT, BT_UINT,
+BT_UINT)
 
 DEF_FUNCTION_TYPE_6 (BT_FN_INT_STRING_SIZE_INT_SIZE_CONST_STRING_VALIST_ARG,
 BT_INT, BT_STRING, BT_SIZE, BT_IN

Re: [PATCH, AArch64] Negate and set flags in shift mode

2013-04-10 Thread Marcus Shawcroft

OK. Thanks.
/Marcus

On 10 April 2013 11:32, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> Please find attached the patch that implements negs instruction
> with shift for aarch64 target.
> Testcase have been added for negs instruction.
>
> Please review the same and let me know if there should be any
> modifications in the patch.
>
> Build and tested on aarch64-thunder-elf (using Cavium's internal
> simulator). No new regressions.
>
> Thanks,
> Naveen
>
> gcc/
>
> 2013-04-10   Naveen H.S  
>
> * config/aarch64/aarch64.c (aarch64_select_cc_mode): Allow NEG
> code in CC_NZ mode.
> * config/aarch64/aarch64.md (*neg_3_compare0): New
> pattern.
>
> gcc/testsuite/
>
> 2013-04-10   Naveen H.S  
>
> * gcc.target/aarch64/negs.c: New.

[Patch, Fortran] PR56907 - do not 'pack' arrays passed to C_LOC

2013-04-10 Thread Tobias Burnus

Fortran 2008 supports  C_LOC(array); if the argument is not simply 
contiguous, the current code adds a call to __gfortran_intrinsic_pack.


The pack call shouldn't be there. Fortran 2008 demands that the actual 
argument is contiguous and intrinsic_pack copy creates a copy if the 
run-time check shows that the argument is not contiguous. Thus, it is 
not a wrong-code issue. However, for performance reasons, it makes sense 
to avoid the call __gfortran_intrinsic_pack.


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias
2013-04-10  Tobias Burnus  

	PR fortran/56907
	* trans-intrinsic.c (conv_isocbinding_function): Don't pack array
	passed to C_LOC

2013-04-10  Tobias Burnus  

	PR fortran/56907
	* gfortran.dg/c_loc_test_22.f90: New.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 9b2cc19..005dd73 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -6317,8 +6317,13 @@ conv_isocbinding_function (gfc_se *se, gfc_expr *expr)
 {
   if (arg->expr->rank == 0)
 	gfc_conv_expr_reference (se, arg->expr);
-  else
+  else if (gfc_is_simply_contiguous (arg->expr, false))
 	gfc_conv_array_parameter (se, arg->expr, true, NULL, NULL, NULL);
+  else
+	{
+	  gfc_conv_expr_descriptor (se, arg->expr);
+	  se->expr = gfc_conv_descriptor_data_get (se->expr);
+	}
 
   /* TODO -- the following two lines shouldn't be necessary, but if
 	 they're removed, a bug is exposed later in the code path.
--- /dev/null	2013-04-10 09:49:18.320086712 +0200
+++ gcc/gcc/testsuite/gfortran.dg/c_loc_test_22.f90	2013-04-10 21:42:20.835284814 +0200
@@ -0,0 +1,24 @@
+! { dg-do compile }
+! { dg-options "-fdump-tree-original" }
+!
+! PR fortran/56907
+!
+subroutine sub(xxx, yyy)
+  use iso_c_binding
+  implicit none
+  integer, target, contiguous :: xxx(:)
+  integer, target :: yyy(:)
+  type(c_ptr) :: ptr1, ptr2, ptr3, ptr4
+  ptr1 = c_loc (xxx)
+  ptr2 = c_loc (xxx(5:))
+  ptr3 = c_loc (yyy)
+  ptr4 = c_loc (yyy(5:))
+end
+! { dg-final { scan-tree-dump-not " _gfortran_internal_pack" "original" } }
+! { dg-final { scan-tree-dump-times "parm.\[0-9\]+.data = \\(void .\\) &\\(.xxx.\[0-9\]+\\)\\\[0\\\];" 1 "original" } }
+! { dg-final { scan-tree-dump-times "parm.\[0-9\]+.data = \\(void .\\) &\\(.xxx.\[0-9\]+\\)\\\[D.\[0-9\]+ \\* 4\\\];" 1 "original" } }
+! { dg-final { scan-tree-dump-times "parm.\[0-9\]+.data = \\(void .\\) &\\(.yyy.\[0-9\]+\\)\\\[0\\\];" 1 "original" } }
+! { dg-final { scan-tree-dump-times "parm.\[0-9\]+.data = \\(void .\\) &\\(.yyy.\[0-9\]+\\)\\\[D.\[0-9\]+ \\* 4\\\];" 1 "original" } }
+
+! { dg-final { scan-tree-dump-times "D.\[0-9\]+ = parm.\[0-9\]+.data;\[^;]+ptr\[1-4\] = D.\[0-9\]+;" 4 "original" } }
+! { dg-final { cleanup-tree-dump "optimized" } }

[RFA] patch to fix PR56903

2013-04-10 Thread Vladimir Makarov


The following patch fixes
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56903

In this test case reload pass gets correct value HARD_REGNO_MODE_OK 
because it can not create pseudos (can_create_pseudo) and this was 
actually used that we are in reload (or after reload).  LRA can create 
pseudos therefore it got the wrong answer.  The patch fixes it.


OK for the trunk?

2013-04-10  Vladimir Makarov  

PR tree-optimization/56903
* config/i386/i386.c (ix86_hard_regno_mode_ok): Add
lra_in_progress for return.

2013-04-10  Vladimir Makarov  

PR tree-optimization/56903
* gcc.target/i386/pr56903.c: New test.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 197679)
+++ config/i386/i386.c  (working copy)
@@ -33976,7 +33976,9 @@ ix86_hard_regno_mode_ok (int regno, enum
return true;
   if (!TARGET_PARTIAL_REG_STALL)
return true;
-  return !can_create_pseudo_p ();
+  /* LRA can create pseudos but it does not mean that it does not
+need to know that the hard register in given mode is OK.  */
+  return lra_in_progress || !can_create_pseudo_p ();
 }
   /* We handle both integer and floats in the general purpose registers.  */
   else if (VALID_INT_MODE_P (mode))
Index: testsuite/gcc.target/i386/pr56903.c
===
--- testsuite/gcc.target/i386/pr56903.c (revision 0)
+++ testsuite/gcc.target/i386/pr56903.c (working copy)
@@ -0,0 +1,18 @@
+/* PR rtl-optimization/56903 */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-additional-options "-march=pentium3" { target ia32 } } */
+
+int a, *b, c;
+struct S { int s : 1; } *fn1 (void);
+extern int fn3 (void), fn4 (int *);
+
+void
+fn2 (void)
+{
+  int e = fn3 ();
+  char f = c + fn1 ()->s * 4;
+  if (*b && f == e)
+a = *b;
+  fn4 (b);
+}

Re: [PATCH] Improve cstore code generation on 64-bit sparc.

2013-04-10 Thread David Miller

From: David Miller 
Date: Mon, 08 Apr 2013 21:56:04 -0400 (EDT)

> 
> One major suboptimal area of the sparc back end is cstore generation
> on 64-bit.
> 
> Due to the way arguments and return values of functions must be
> promoted, the ideal mode for cstore's result would be DImode.
> 
> But this hasn't been done because of a fundamental limitation
> of the cstore patterns.  They require a fixed mode be used for
> the boolean result value.
> 
> I've decided to work around this by building a target hook which
> specifies the type to use for conditional store results, and then I
> use a special predicate for operans 0 in the cstore expanders so
> that they still match even when we use DImode.
> 
> The default version of the target hook just does what it does now,
> so no other target should be impacted by this at all.
> 
> Regstrapped on 32-bit sparc-linux-gnu and I've run the testsuite
> with "-m64" to validate the 64-bit side.
> 
> Any major objections?

Since no objections were expressed, I've committed this to trunk.

[patch][DF] fix df_find_def and df_find_use

2013-04-10 Thread Steven Bosscher

Hello,

df_find_def and df_find_use do not work properly for hard registers
because rtx_equal_p returns false for the case where
REGNO(x)==REGNO(y) but the modes are different. This happened as
follows in my case:

Breakpoint 9, df_reg_used (insn=0x3fffb6083c50, reg=0x3fffb5eb72e0) at
../../trunk/gcc/df-core.c:1856
1856  return df_find_use (insn, reg) != NULL;
(gdb) p debug_rtx(reg)
(reg:CCX 100 %icc)
$37 = void
(gdb) p debug_rtx(insn)
(jump_insn 34 33 35 3 (set (pc)
(if_then_else (le (reg:CCX 100 %icc)
(const_int 0 [0]))
(label_ref 44)
(pc))) t.c:25 48 {*normal_branch}
 (expr_list:REG_DEAD (reg:CCX 100 %icc)
(expr_list:REG_BR_PROB (const_int 3900 [0xf3c])
(nil)))
 -> 44)
$38 = void
(gdb) step
df_find_use (insn=0x3fffb6083c50, reg=0x3fffb5eb72e0) at
../../trunk/gcc/df-core.c:1829
...
1837  if (rtx_equal_p (DF_REF_REAL_REG (use), reg))
(gdb) p debug_rtx(reg)
(reg:CCX 100 %icc)
$39 = void
(gdb) p debug_rtx(DF_REF_REAL_REG(use))
(reg:CC 100 %icc)
$40 = void
(gdb) p rtx_equal_p (DF_REF_REAL_REG (use), reg)
$41 = 0


I think we should just compare REGNO instead of going through rtx_equal_p.

Bootstrapped&tested on x86_64-unknown-linux-gnu.
OK for trunk?

Ciao!
Steven



* df-core.c (df_find_def): Compare register numbers.
(df_find_use): Compare register numbers.

Index: df-core.c
===
--- df-core.c   (revision 197610)
+++ df-core.c   (working copy)
@@ -1800,7 +1800,7 @@ df_find_def (rtx insn, rtx reg)
   for (def_rec = DF_INSN_UID_DEFS (uid); *def_rec; def_rec++)
 {
   df_ref def = *def_rec;
-  if (rtx_equal_p (DF_REF_REAL_REG (def), reg))
+  if (DF_REF_REGNO (def) == REGNO (reg))
return def;
 }

@@ -1834,14 +1834,14 @@ df_find_use (rtx insn, rtx reg)
   for (use_rec = DF_INSN_UID_USES (uid); *use_rec; use_rec++)
 {
   df_ref use = *use_rec;
-  if (rtx_equal_p (DF_REF_REAL_REG (use), reg))
+  if (DF_REF_REGNO (use) == REGNO (reg))
return use;
 }
   if (df->changeable_flags & DF_EQ_NOTES)
 for (use_rec = DF_INSN_UID_EQ_USES (uid); *use_rec; use_rec++)
   {
df_ref use = *use_rec;
-   if (rtx_equal_p (DF_REF_REAL_REG (use), reg))
+   if (DF_REF_REGNO (use) == REGNO (reg))
  return use;
   }
   return NULL;

RFC: PR 28865: Fix ELF .size directive for structures with a flexible arrray member

2013-04-10 Thread H.J. Lu

This patch:

http://gcc.gnu.org/ml/gcc-patches/2009-04/msg01807.html

fixes ELF .size directive for structures with a flexible arrray member:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28865

Can someone take a look?

-- 
H.J.

[PATCH] Fix assembler options for -mcpu={supersparc,hypersparc}

2013-04-10 Thread David Miller


This is yet another bug just like PR target/52610, we need to pass
-Av8 to the assembler for every cpu type that will use MASK_V8 in
sparc.c:sparc_option_override().

I found this while testing GMP builds.

I'd like to check this into the various gcc-4_X-branch branches as
well, unless there are major objections.

gcc/

* config/sparc/sparc.h (ASM_CPU_SPEC): Pass -Av8 if -mcpu=supersparc
or -mcpu=hypersparc.

diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 6b02b45..c6122c1 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -327,6 +327,8 @@ extern enum cmodel sparc_cmodel;
 %{mcpu=sparclite86x:-Asparclite} \
 %{mcpu=f930:-Asparclite} %{mcpu=f934:-Asparclite} \
 %{mcpu=v8:-Av8} \
+%{mcpu=supersparc:-Av8} \
+%{mcpu=hypersparc:-Av8} \
 %{mcpu=leon:-Av8} \
 %{mv8plus:-Av8plus} \
 %{mcpu=v9:-Av9} \

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-04-10 Thread Ondřej Bílka

On Wed, Apr 10, 2013 at 09:53:09PM +0400, Michael Zolotukhin wrote:
> > Hi, I am writing memcpy for libc. It avoids computed jump and has is
> > much faster on small strings (variant for sandy bridge attached.
> 
> I'm not sure I get what you meant - could you please explain what is
> computed jumps?
computed goto. See Duff's device it works almost exactly same.
> 
> > You must also check performance with cold instruction cache.
> > Now memcpy(x,y,128) takes 126 bytes which is too much.
> 
> > Do not align for small sizes. Dependency caused by this erases any gains
> > that you migth get. Keep in mind that in 55% of cases data are already
> > aligned.
> 
> Other algorithms are still available and we can use them for small
> sizes. E.g. for sizes <128 we could emit loop with GPR-moves and don't
> use vector instructions in it.

128 is about upper bound you can expand with sse moves. 
Tuning did not take into account code size and measured only when code
is in tigth loop.
For GPR-moves limit is around 64.

What matters which code has best performance/size ratio.
> But that's tuning and I haven't worked on it yet - I'm going to
> measure performance of all algorithms on all sizes and thus defines on
> which sizes which algorithm is preferable.
> What I did in this patch is introducing some infrastructure to allow
> emitting of vector moves in movmem expanding - tuning is certainly
> possible and needed, but that's out of the scope of the patch.
> 
> On 10 April 2013 21:43, Ondřej Bílka  wrote:
> > On Wed, Apr 10, 2013 at 08:14:30PM +0400, Michael Zolotukhin wrote:
> >> Hi,
> >> This patch adds a new algorithm of expanding movmem in x86 and a bit
> >> refactor existing implementation. This is a reincarnation of the patch
> >> that was sent wasn't checked couple of years ago - now I reworked it
> >> from scratch and divide into several more manageable parts.
> >>
> > Hi, I am writing memcpy for libc. It avoids computed jump and has is
> > much faster on small strings (variant for sandy bridge attached.
> >
> >> For now this algorithm isn't used, because cost_models are tuned to
> >> use existing ones. I believe the new algorithm will give better
> >> performance, but I'll leave cost-models tuning for a separate patch.
> >>
> > You must also check performance with cold instruction cache.
> > Now memcpy(x,y,128) takes 126 bytes which is too much.
> >
> >> Also, I changed get_mem_align_offset to make it handle MEM_REFs as
> >> well. Probably, there is another way of getting info about alignment -
> >> if so, please let me know.
> >>
> > Do not align for small sizes. Dependency caused by this erases any gains
> > that you migth get. Keep in mind that in 55% of cases data are already
> > aligned.
> >
> > Also in my tests best way to handle prologue is first copy last 16
> > bytes and then loop.
> >
> >> Similar improvements could be done in expanding of memset, but that's
> >> in progress now and I'm going to proceed with it if this patch is ok.
> >>
> >> Bootstrap/make check/Specs2k are passing on i686 and x86_64.
> >>
> >> Is it ok for trunk?
> >>
> >> Changelog entry:
> >>
> >> 2013-04-10  Michael Zolotukhin  
> >>
> >> * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
> >> * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
> >> adjust_address instead of change_address to keep info about 
> >> alignment.
> >> (emit_strmov): Remove.
> >> (emit_memmov): New function.
> >> (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
> >> (expand_movmem_epilogue): Likewise and return updated rtx for
> >> destination.
> >> (expand_constant_movmem_prologue): Likewise and return updated rtx 
> >> for
> >> destination and source.
> >> (decide_alignment): Refactor, handle vector_loop.
> >> (ix86_expand_movmem): Likewise.
> >> (ix86_expand_setmem): Likewise.
> >> * config/i386/i386.opt (Enum): Add vector_loop to option 
> >> stringop_alg.
> >> * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.
> >>
> >>
> >> --
> >> ---
> >> Best regards,
> >> Michael V. Zolotukhin,
> >> Software Engineer
> >> Intel Corporation.
> >
> 
> 
> 
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.

-- 

Traffic jam on the Information Superhighway.

Re: [PATCH] color diagnostics markers

2013-04-10 Thread Manuel López-Ibáñez

On 9 April 2013 15:21, Jakub Jelinek  wrote:
> white).  The default is still -fdiagnostics-color=never, can be changed
> later on.

Apart from my comments elsewhere
(http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00614.html), the patch
looks fine to me. But perhaps we should change the default to auto, at
least during Stage 1, to find out whether some bug was introduced. If
agreed, I could do this in a follow-up patch that also disables colors
for the testsuite.

Cheers,

Manuel.

Re: [PATCH] Fix extract_muldiv (PR tree-optimization/56899)

2013-04-10 Thread Jeff Law


On 04/10/2013 09:31 AM, Jakub Jelinek wrote:

Hi!

As f1 in the testcase shows, applying distributive law in extract_muldiv_1
isn't safe if overflow behavior isn't defined, if we have
(op0 + c1) * c2
and the type is signed, we can't just try to fold that to
op0 * c2 + (c1 * c2)
even when we know that c1*c2 doesn't overflow, because op0 * c2
might overflow even when (op0 + c1) * c2 doesn't.

Fixed thusly, after all that hunk of code is often soon undone by
fold_build2 again or later during GIMPLE optimizations,
bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk/4.8?

2013-04-10  Jakub Jelinek  

PR tree-optimization/56899
* fold-const.c (extract_muldiv_1): Apply distributive law
only if TYPE_OVERFLOW_WRAPS (ctype).

* gcc.c-torture/execute/pr56899.c: New test.
OK, but I think you should update the comment before the test.  Right 
now it just references the overflow of the constant multiplication.


jeff

Re: RFC: color diagnostics markers

2013-04-10 Thread Manuel López-Ibáñez

On 8 April 2013 21:06, Jakub Jelinek  wrote:
> On Mon, Apr 08, 2013 at 07:54:18PM +0200, Manuel López-Ibáñez wrote:
>> > can be right now a single call, while you would need several.  Also, if you
>> > eventually want to colorize something in say error_at, warning_at and
>> > similar format strings.  For those you really don't have the printer at
>>
>> Do we really want to allow that much flexibility? Then the color_dict
>> needs to be dynamic or the caller is restricted to re-using existing
>> colornames.
>
> Yes, I think we want that flexibility, it certainly isn't that much
> difficult to support it (a few lines of code, will try to code the %r/%R
> variant tomorrow), and from time to time it can be useful.

I am still not convinced by the %r/%R. My two concerns are that:

1) %r/%R rather than explicit function calls make the code harder to
understand. But I guess this is a matter of taste.

2) It makes harder to decouple the diagnostics machinery from the
actual formatting. The color should be something handled by the
pretty-printer and transparent to the diagnostics machinery interface.
(perhaps it should be pretty-printer-color.h instead of
diagnostics-color.h). I generally agree with the ideas of Gabriel
exposed here: http://gcc.gnu.org/ml/gcc/2012-04/msg00558.html. The
difference (and perhaps I misunderstood Gabriel's position in that
thread) is that I think that hiding the color stuff behind the
diagnostics machinery interface does not move us farther away from
those ideas, even thought it does not move us closer either. And we
don't need an internal IL to do that.  However, letting the FEs add
arbitrary colors to diagnostics does move us farther. Yes, it is a
nice flexibility, but on the other hand, I don't really see the need
and I am afraid it will be misused. As Gabriel says: "it would be
really terrible idea if the intelligibility of a diagnostic -requires-
colors.". So if the color is not required, the FE should be oblivious
to whether there is a specific color there or not.

Nonetheless, I am pragmatic. Since you already did the work (and
improved significantly my original patch), I am fine with your patch
(for what is worth). Thanks for working on it.

Cheers,

Manuel.

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-04-10 Thread Michael Zolotukhin

> Hi, I am writing memcpy for libc. It avoids computed jump and has is
> much faster on small strings (variant for sandy bridge attached.

I'm not sure I get what you meant - could you please explain what is
computed jumps?

> You must also check performance with cold instruction cache.
> Now memcpy(x,y,128) takes 126 bytes which is too much.

> Do not align for small sizes. Dependency caused by this erases any gains
> that you migth get. Keep in mind that in 55% of cases data are already
> aligned.

Other algorithms are still available and we can use them for small
sizes. E.g. for sizes <128 we could emit loop with GPR-moves and don't
use vector instructions in it.
But that's tuning and I haven't worked on it yet - I'm going to
measure performance of all algorithms on all sizes and thus defines on
which sizes which algorithm is preferable.
What I did in this patch is introducing some infrastructure to allow
emitting of vector moves in movmem expanding - tuning is certainly
possible and needed, but that's out of the scope of the patch.

On 10 April 2013 21:43, Ondřej Bílka  wrote:
> On Wed, Apr 10, 2013 at 08:14:30PM +0400, Michael Zolotukhin wrote:
>> Hi,
>> This patch adds a new algorithm of expanding movmem in x86 and a bit
>> refactor existing implementation. This is a reincarnation of the patch
>> that was sent wasn't checked couple of years ago - now I reworked it
>> from scratch and divide into several more manageable parts.
>>
> Hi, I am writing memcpy for libc. It avoids computed jump and has is
> much faster on small strings (variant for sandy bridge attached.
>
>> For now this algorithm isn't used, because cost_models are tuned to
>> use existing ones. I believe the new algorithm will give better
>> performance, but I'll leave cost-models tuning for a separate patch.
>>
> You must also check performance with cold instruction cache.
> Now memcpy(x,y,128) takes 126 bytes which is too much.
>
>> Also, I changed get_mem_align_offset to make it handle MEM_REFs as
>> well. Probably, there is another way of getting info about alignment -
>> if so, please let me know.
>>
> Do not align for small sizes. Dependency caused by this erases any gains
> that you migth get. Keep in mind that in 55% of cases data are already
> aligned.
>
> Also in my tests best way to handle prologue is first copy last 16
> bytes and then loop.
>
>> Similar improvements could be done in expanding of memset, but that's
>> in progress now and I'm going to proceed with it if this patch is ok.
>>
>> Bootstrap/make check/Specs2k are passing on i686 and x86_64.
>>
>> Is it ok for trunk?
>>
>> Changelog entry:
>>
>> 2013-04-10  Michael Zolotukhin  
>>
>> * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
>> * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
>> adjust_address instead of change_address to keep info about 
>> alignment.
>> (emit_strmov): Remove.
>> (emit_memmov): New function.
>> (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
>> (expand_movmem_epilogue): Likewise and return updated rtx for
>> destination.
>> (expand_constant_movmem_prologue): Likewise and return updated rtx 
>> for
>> destination and source.
>> (decide_alignment): Refactor, handle vector_loop.
>> (ix86_expand_movmem): Likewise.
>> (ix86_expand_setmem): Likewise.
>> * config/i386/i386.opt (Enum): Add vector_loop to option 
>> stringop_alg.
>> * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.
>>
>>
>> --
>> ---
>> Best regards,
>> Michael V. Zolotukhin,
>> Software Engineer
>> Intel Corporation.
>



--
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-04-10 Thread Ondřej Bílka

On Wed, Apr 10, 2013 at 08:14:30PM +0400, Michael Zolotukhin wrote:
> Hi,
> This patch adds a new algorithm of expanding movmem in x86 and a bit
> refactor existing implementation. This is a reincarnation of the patch
> that was sent wasn't checked couple of years ago - now I reworked it
> from scratch and divide into several more manageable parts.
>
Hi, I am writing memcpy for libc. It avoids computed jump and has is
much faster on small strings (variant for sandy bridge attached.
 
> For now this algorithm isn't used, because cost_models are tuned to
> use existing ones. I believe the new algorithm will give better
> performance, but I'll leave cost-models tuning for a separate patch.
>
You must also check performance with cold instruction cache.
Now memcpy(x,y,128) takes 126 bytes which is too much.

> Also, I changed get_mem_align_offset to make it handle MEM_REFs as
> well. Probably, there is another way of getting info about alignment -
> if so, please let me know.
> 
Do not align for small sizes. Dependency caused by this erases any gains
that you migth get. Keep in mind that in 55% of cases data are already
aligned.

Also in my tests best way to handle prologue is first copy last 16
bytes and then loop.

> Similar improvements could be done in expanding of memset, but that's
> in progress now and I'm going to proceed with it if this patch is ok.
> 
> Bootstrap/make check/Specs2k are passing on i686 and x86_64.
> 
> Is it ok for trunk?
> 
> Changelog entry:
> 
> 2013-04-10  Michael Zolotukhin  
> 
> * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
> * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
> adjust_address instead of change_address to keep info about alignment.
> (emit_strmov): Remove.
> (emit_memmov): New function.
> (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
> (expand_movmem_epilogue): Likewise and return updated rtx for
> destination.
> (expand_constant_movmem_prologue): Likewise and return updated rtx for
> destination and source.
> (decide_alignment): Refactor, handle vector_loop.
> (ix86_expand_movmem): Likewise.
> (ix86_expand_setmem): Likewise.
> * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
> * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.
> 
> 
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.

Re: [PATCH, AArch64] Compare Negative instruction in shift and extend mode

2013-04-10 Thread Marcus Shawcroft


On 10/04/13 11:35, Hurugalawadi, Naveen wrote:


+(define_insn "*cmn_swp__reg"
+  [(set (reg:CC_SWP CC_REGNUM)
+   (compare:CC_SWP (ANY_EXTEND:GPI
+(match_operand:ALLX 0 "register_operand" "r"))
+   (neg:GPI (match_operand:GPI 1 "register_operand" 
"r"]



Umm, I'm not convinced, are you sure this placement of EXTEND and NEG in 
the RTL representation of CMN are correct?



--- gcc/testsuite/gcc.target/aarch64/cmn-1.c1970-01-01 05:30:00.0 
+0530
+++ gcc/testsuite/gcc.target/aarch64/cmn-1.c2013-04-10 12:27:17.845318216 
+0530
@@ -0,0 +1,134 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps" } */


At the point that these CMN optimizations kick in, GCC is clever enough 
to inline and fold main() to nothing, in effect CMN instruction 
generated never executes and the test cases all pass...


Try this instead:

> +/* { dg-options "-O2 -fno-line --save-temps" } */

this will result in the test case failing.

Cheers
/Marcus

[wwwdocs, patch, committed] Updated Fortran part of http://gcc.gnu.org/gcc-4.9/changes.html

2013-04-10 Thread Tobias Burnus


http://gcc.gnu.org/gcc-4.9/changes.html was so empty :-)

I have committed the attached patch - comments and suggestions are welcome.

Tobias
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.1
diff -u -r1.1 changes.html
--- changes.html	15 Mar 2013 16:39:45 -	1.1
+++ changes.html	10 Apr 2013 17:22:21 -
@@ -45,9 +45,21 @@
 C++
 -->
 
-
+  
+Compatibility notice:
+
+  Note that the argument passing ABI has changed for scalar dummy
+	arguments of type INTEGER, REAL,
+	COMPLEX and LOGICAL, which have
+	both the VALUE and the OPTIONAL
+	attribute.
+
+The deprecated command-line option -fno-whole-file
+  has been removed. (-fwhole-file is the default since
+  GCC 4.6.) -fwhole-file/-fno-whole-file
+  continue to be accepted but do not influence the code generation.
+

[PATCH, testsuite]: Avoid "error: inlining failed in call to always_inline" with -fpic

2013-04-10 Thread Uros Bizjak

Hello!

Attached testsuite patch fixes:

pr33992.c: In function ‘do_test’:
pr33992.c:11:1: error: inlining failed in call to always_inline ‘foo’:
function body can be overwritten at link time
pr33992.c:28:9: error: called from here
 foo (r);
 ^

errors through gcc and g++ testsuite when tested with -fpic.

2013-04-10  Uros Bizjak  

* g++.dg/ipa/devirt-c-7.C: Require nonpic effective target.
* gcc.c-torture/execute/pr33992.c (foo): Declare as static void.
* gcc.dg/uninit-pred-5_a.c (foo): Ditto.
* gcc.dg/uninit-pred-5_b.c (foo): Ditto.

OK for mainline and release branches?

Uros.
Index: g++.dg/ipa/devirt-c-7.C
===
--- g++.dg/ipa/devirt-c-7.C (revision 197646)
+++ g++.dg/ipa/devirt-c-7.C (working copy)
@@ -1,6 +1,7 @@
 /* Verify that ipa-cp will not get confused by placement new constructing an
object within another one when looking for dynamic type change .  */
 /* { dg-do run } */
+/* { dg-require-effective-target nonpic } */
 /* { dg-options "-O3 -Wno-attributes"  } */
 
 extern "C" void abort (void);
Index: gcc.c-torture/execute/pr33992.c
===
--- gcc.c-torture/execute/pr33992.c (revision 197646)
+++ gcc.c-torture/execute/pr33992.c (working copy)
@@ -7,7 +7,7 @@
 abort ();
 }
 
-void __attribute__((always_inline))
+static void __attribute__((always_inline))
 foo (unsigned long long *r)
 {
   int i;
Index: gcc.dg/uninit-pred-5_a.c
===
--- gcc.dg/uninit-pred-5_a.c(revision 197646)
+++ gcc.dg/uninit-pred-5_a.c(working copy)
@@ -6,8 +6,9 @@
 int blah(int);
 void t(int);
 
+static int
 __attribute__((always_inline)) 
-int foo (int n, int* v, int r)
+foo (int n, int* v, int r)
 {
   int flag = 0;
   if (r > n)
Index: gcc.dg/uninit-pred-5_b.c
===
--- gcc.dg/uninit-pred-5_b.c(revision 197646)
+++ gcc.dg/uninit-pred-5_b.c(working copy)
@@ -6,8 +6,9 @@
 int blah(int);
 void t(int);
 
+static int
 __attribute__((always_inline)) 
-int foo (int n, int* v, int r)
+foo (int n, int* v, int r)
 {
   int flag = 0;
   if (r > n)

Re: [PATCH] Fix PR48184

2013-04-10 Thread Jakub Jelinek

On Wed, Apr 10, 2013 at 06:42:58PM +0200, Marek Polacek wrote:
> On Wed, Apr 10, 2013 at 11:49:18AM +0200, Jakub Jelinek wrote:
> > Shouldn't this be again solved instead by bumping minimum for the param to 1
> > from 0?  Because, the smaller the param is, the bigger freq_threshold is,
> > so if for the smallest param we suddenly set freq_threshold to 0, it isn't
> > consistent.
> 
> Yeah, I'm all for it.  I think it's so obvious that I'll just commit it
> tomorrow to trunk/4.8.

Yes, thanks.

> 2013-04-10  Marek Polacek  
> 
>   PR tree-optimization/48184
>   * params.def (PARAM_ALIGN_THRESHOLD): Increase the minimum
>   value to 1.
> 
> --- gcc/params.def.mp 2013-04-10 18:35:24.983126017 +0200
> +++ gcc/params.def2013-04-10 18:35:36.619165432 +0200
> @@ -376,7 +376,7 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
>  DEFPARAM (PARAM_ALIGN_THRESHOLD,
> "align-threshold",
> "Select fraction of the maximal frequency of executions of basic 
> block in function given basic block get alignment",
> -   100, 0, 0)
> +   100, 1, 0)
>  
>  DEFPARAM (PARAM_ALIGN_LOOP_ITERATIONS,
> "align-loop-iterations",
> 

Jakub

[google gcc-4_7] offline profile tool (patchset 4) (issue8508048)

2013-04-10 Thread Rong Xu

This patch integrated Martin's comments.

-Rong

2013-04-10  Rong Xu  

* contrib/profile_tool: New
* gcc/Makefile.in (GCC_INSTALL_NAME): install profile_tool

Index: contrib/profile_tool
===
--- contrib/profile_tool(revision 0)
+++ contrib/profile_tool(revision 0)
@@ -0,0 +1,1315 @@
+#!/usr/bin/python2.7
+#
+#Copyright (C) 2013
+#Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+#
+
+
+"""Merge two or more gcda profile.
+"""
+
+__author__ = 'Seongbae Park, Rong Xu'
+__author_email__ = 'sp...@google.com, x...@google.com'
+
+import array
+from optparse import OptionParser
+import os
+import struct
+import zipfile
+
+new_histogram = None
+
+
+class Error(Exception):
+  """Exception class for profile module."""
+
+
+def ReadAllAndClose(path):
+  """Return the entire byte content of the specified file.
+
+  Args:
+path: The path to the file to be opened and read.
+
+  Returns:
+The byte sequence of the content of the file.
+  """
+  f = open(path, 'rb')
+  data = f.read()
+  f.close()
+  return data
+
+
+def ReturnMergedCounters(objs, index, multipliers):
+  """Accumulate the counter at "index" from all counters objs."""
+  val = 0
+  for j in xrange(len(objs)):
+val += multipliers[j] * objs[j].counters[index]
+  return val
+
+
+class DataObject(object):
+  """Base class for various datum in GCDA/GCNO file."""
+
+  def __init__(self, tag):
+self.tag = tag
+
+
+class Function(DataObject):
+  """Function and its counters.
+
+  Attributes:
+length: Length of the data on the disk
+ident: Ident field
+line_checksum: Checksum of the line number
+cfg_checksum: Checksum of the control flow graph
+counters: All counters associated with the function
+file: The name of the file the function is defined in. Optional.
+line: The line number the function is defined at. Optional.
+
+  Function object contains other counter objects and block/arc/line objects.
+  """
+
+  def __init__(self, reader, tag, n_words):
+"""Read function record information from a gcda/gcno file.
+
+Args:
+  reader: gcda/gcno file.
+  tag: funtion tag.
+  n_words: length of function record in unit of 4-byte.
+"""
+DataObject.__init__(self, tag)
+self.length = n_words
+self.counters = []
+
+if reader:
+  pos = reader.pos
+  self.ident = reader.ReadWord()
+  self.line_checksum = reader.ReadWord()
+  self.cfg_checksum = reader.ReadWord()
+
+  # Function name string is in gcno files, but not
+  # in gcda files. Here we make string reading optional.
+  if (reader.pos - pos) < n_words:
+reader.ReadStr()
+
+  if (reader.pos - pos) < n_words:
+self.file = reader.ReadStr()
+self.line_number = reader.ReadWord()
+  else:
+self.file = ''
+self.line_number = 0
+else:
+  self.ident = 0
+  self.line_checksum = 0
+  self.cfg_checksum = 0
+  self.file = None
+  self.line_number = 0
+
+  def Write(self, writer):
+"""Write out the function."""
+
+writer.WriteWord(self.tag)
+writer.WriteWord(self.length)
+writer.WriteWord(self.ident)
+writer.WriteWord(self.line_checksum)
+writer.WriteWord(self.cfg_checksum)
+for c in self.counters:
+  c.Write(writer)
+
+  def EntryCount(self):
+"""Return the number of times the function called."""
+return self.ArcCounters().counters[0]
+
+  def Merge(self, others, multipliers):
+"""Merge all functions in "others" into self.
+
+Args:
+  others: A sequence of Function objects
+  multipliers: A sequence of integers to be multiplied during merging.
+"""
+for o in others:
+  assert self.ident == o.ident
+  assert self.line_checksum == o.line_checksum
+  assert self.cfg_checksum == o.cfg_checksum
+
+for i in xrange(len(self.counters)):
+  self.counters[i].Merge([o.counters[i] for o in others], multipliers)
+
+  def Print(self):
+"""Print all the attributes in full detail."""
+print 'function: ident %d length %d line_chksum %x cfg_chksum %x' % (
+self.ident, self.length,
+self.line_checksum, self.cfg_checksum)
+if self.file:
+  print 'file: %s' % self.file
+  print 'line_number:   %d' % self.line_number
+

Re: [PATCH] Fix PR48184

2013-04-10 Thread Marek Polacek

On Wed, Apr 10, 2013 at 11:49:18AM +0200, Jakub Jelinek wrote:
> Shouldn't this be again solved instead by bumping minimum for the param to 1
> from 0?  Because, the smaller the param is, the bigger freq_threshold is,
> so if for the smallest param we suddenly set freq_threshold to 0, it isn't
> consistent.

Yeah, I'm all for it.  I think it's so obvious that I'll just commit it
tomorrow to trunk/4.8.

2013-04-10  Marek Polacek  

PR tree-optimization/48184
* params.def (PARAM_ALIGN_THRESHOLD): Increase the minimum
value to 1.

--- gcc/params.def.mp   2013-04-10 18:35:24.983126017 +0200
+++ gcc/params.def  2013-04-10 18:35:36.619165432 +0200
@@ -376,7 +376,7 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION,
 DEFPARAM (PARAM_ALIGN_THRESHOLD,
  "align-threshold",
  "Select fraction of the maximal frequency of executions of basic 
block in function given basic block get alignment",
- 100, 0, 0)
+ 100, 1, 0)
 
 DEFPARAM (PARAM_ALIGN_LOOP_ITERATIONS,
  "align-loop-iterations",


Marek

Re: [Patch, Fortran, OOP] PR 56261: seg fault call procedure pointer on polymorphic array

2013-04-10 Thread Tobias Burnus


Janus Weil wrote:

Ok, here is an updated patch, which does the discussed checking for
procedure pointer assignments. For this I have introduced a new
function 'gfc_explicit_interface_required', which checks all the items
in F08:12.4.2.2 and is loosely based on the present checks in
'resolve_global_procedure' (which are replaced by the new function).

I hope the general idea of the patch is ok and the error messages are
sufficiently comprehensible.


Thanks for working on it. It looks mostly okay.

+ snprintf (errmsg, err_len, "allocatable argument");
+ return true;


You should use "strncpy" instead of "snprintf". (Unless, you want to append ' "%s", 
arg->sym->name'; however, in this context, the argument name does not really matter.) Additionally, please use 
_("...") to mark it for translation.


(I was thinking whether it misses BT_CLASS function results, but the check 
res->attr.pointer should be always true for (both allocatable or pointer) 
polymorphic types.


(I don't really like the wording, but I cannot come up with something better; 
especially not if it should remain translatable. With the patch one has:

Explicit interface required for 'sub' at (1): allocatable argument
Explicit interface required for 'sub' at (1): pointer or allocatable result
Explicit interface required for 'sub' at (1): pointer or allocatable result
Explicit interface required for 'sub' at (1): elemental procedure )



One leftover problem: The patch currently fails on the auto_char_len_4
test case


The patch is okay with the strncpy/_("...") issues fixed.

Regarding auto_char_len_4.f90: As written in previous email, I think we 
should add a diagnostic for the external declaration ("EXTERNAL, 
CHARACTER(len=", "PROCEDURE(CHARACTER(len=)" (also 
proc-pointer) in resolve_symbol.


I think the best way forward would be to change the declarations to use 
"len=n", add a dumy argument and adapt the call. For the follow-up 
patch, one can then create a new test case.


Tobias

Re: [Patch, fortran] PR 40958 Compress module files with zlib

2013-04-10 Thread Mike Stump

On Apr 9, 2013, at 11:33 AM, Janne Blomqvist  wrote:
> the attached patch reduces the size of module files on disk

Do those modules interoperate with C++ modules flawlessly?  :-)

[PATCH, x86] Use vector moves in memmove expanding

2013-04-10 Thread Michael Zolotukhin

Hi,
This patch adds a new algorithm of expanding movmem in x86 and a bit
refactor existing implementation. This is a reincarnation of the patch
that was sent wasn't checked couple of years ago - now I reworked it
from scratch and divide into several more manageable parts.

For now this algorithm isn't used, because cost_models are tuned to
use existing ones. I believe the new algorithm will give better
performance, but I'll leave cost-models tuning for a separate patch.

Also, I changed get_mem_align_offset to make it handle MEM_REFs as
well. Probably, there is another way of getting info about alignment -
if so, please let me know.

Similar improvements could be done in expanding of memset, but that's
in progress now and I'm going to proceed with it if this patch is ok.

Bootstrap/make check/Specs2k are passing on i686 and x86_64.

Is it ok for trunk?

Changelog entry:

2013-04-10  Michael Zolotukhin  

* config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
* config/i386/i386.c (expand_set_or_movmem_via_loop): Use
adjust_address instead of change_address to keep info about alignment.
(emit_strmov): Remove.
(emit_memmov): New function.
(expand_movmem_epilogue): Refactor to properly handle bigger sizes.
(expand_movmem_epilogue): Likewise and return updated rtx for
destination.
(expand_constant_movmem_prologue): Likewise and return updated rtx for
destination and source.
(decide_alignment): Refactor, handle vector_loop.
(ix86_expand_movmem): Likewise.
(ix86_expand_setmem): Likewise.
* config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
* emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.


--
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


memmov.patch
Description: Binary data

Re: Comments on the suggestion to use infinite precision math for wide int.

2013-04-10 Thread Kenneth Zadeck


On 04/10/2013 12:02 PM, Mike Stump wrote:

On Apr 10, 2013, at 12:38 AM, Richard Biener  wrote:

Yeah, I think we want to test ~(T)0<(T)0 here.

Thanks Lawrence, in the next version of the patch, you will discover this at 
the bottom if you look hard.  :-)

actually closer to the middle.

Re: Comments on the suggestion to use infinite precision math for wide int.

2013-04-10 Thread Mike Stump

On Apr 10, 2013, at 12:38 AM, Richard Biener  wrote:
> Yeah, I think we want to test ~(T)0<(T)0 here.

Thanks Lawrence, in the next version of the patch, you will discover this at 
the bottom if you look hard.  :-)

Re: [Patch, Fortran, OOP] PR 56261: seg fault call procedure pointer on polymorphic array

2013-04-10 Thread Tobias Burnus


Am 10.04.2013 16:21, schrieb Janus Weil:

2013/4/7 Tobias Burnus :

Thus, the only place where the check can be is for:

f => ff

In your example, the explicit interface of "ff" is known thus it should
be
testable at resolution time of the proc-pointer assignment.

Right. However, strictly speaking, the pointer assignment as such is
probably valid. But of course there is not much one can do with the
proc-ptr afterwards, if it's invalid to call it ...

  Well, if one doesn't want to error on it, one can still warn. However, the
following states that it is invalid:

"If the characteristics of the pointer object or the pointer target are such
that an explicit interface is required, both the pointer object and the
pointer target shall have an explicit interface." (F2008, para 4 of "7.2.2.4
Procedure pointer assignment")

Ok, here is an updated patch, which does the discussed checking for
procedure pointer assignments. For this I have introduced a new
function 'gfc_explicit_interface_required', which checks all the items
in F08:12.4.2.2 and is loosely based on the present checks in
'resolve_global_procedure' (which are replaced by the new function).

I hope the general idea of the patch is ok and the error messages are
sufficiently comprehensible.

One leftover problem: The patch currently fails on the auto_char_len_4
test case, which is not being rejected any more. Actually I'm not
fully convinced that the dg-errors there are correct: If the EXTERNAL
statements in auto_char_len_{1,2} do not trigger an "explicit
interface required" warning, I don't see why the ones in
auto_char_len_4 should.


Regarding auto_char_len_[12].f90: A warning about an "explicit interface 
required" would be a bad joke as it contains an assumed character length 
function, which is a deprecated feature (cf. B.2.6) and it does *not* 
work with an explicit interface:


character(len=*) function func()
  func = 'ABC'
end function func

Question: Which length does the function result have? Answer, it depends 
on the declaration in the caller:

  character(len=2) :: func
  print *, func()
means that it has length 2. If you have len=4 in another procedures, it 
will have len=4 in that scoping unit.



By contrast, auto_char_len_4 uses "normal" function results, which could 
be used with an explicit interface.


As F2008 states ("12.4.2.2 Explicit interface"), an explicit interface 
is required if: "[...] (3) the procedure has a result that [...] (c) has 
a nonassumed type parameter value that is not a constant expression,"


The assumed-type-parameter is used in auto_char_len_{1,2}.f90.

In auto_char_len_4.f90:  The example does not have anything to do with 
whole-file diagnostic; the problem is just that the characteristics do 
not match. (The functions itself use a constant length.) But if "len=n" 
also had been used in the functions, the error would be correct. Thus, I 
would like to keep such a diagnostic for external function declarations, 
which should trigger even if the called function is not in the same 
file. (One could move the warning to, e.g., resolve_variable.) Quote 
from *_4.f90:


SUBROUTINE s(n)
  CHARACTER(LEN=n), EXTERNAL :: a ! { dg-error "must have an explicit 
interface" }
  CHARACTER(LEN=n), EXTERNAL :: d ! { dg-error "must have an explicit 
interface" }


Tobias

[PATCH] Fix extract_muldiv (PR tree-optimization/56899)

2013-04-10 Thread Jakub Jelinek

Hi!

As f1 in the testcase shows, applying distributive law in extract_muldiv_1
isn't safe if overflow behavior isn't defined, if we have
(op0 + c1) * c2
and the type is signed, we can't just try to fold that to
op0 * c2 + (c1 * c2)
even when we know that c1*c2 doesn't overflow, because op0 * c2
might overflow even when (op0 + c1) * c2 doesn't.

Fixed thusly, after all that hunk of code is often soon undone by
fold_build2 again or later during GIMPLE optimizations,
bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk/4.8?

2013-04-10  Jakub Jelinek  

PR tree-optimization/56899
* fold-const.c (extract_muldiv_1): Apply distributive law
only if TYPE_OVERFLOW_WRAPS (ctype).

* gcc.c-torture/execute/pr56899.c: New test.

--- gcc/fold-const.c.jj 2013-04-03 15:46:45.0 +0200
+++ gcc/fold-const.c2013-04-10 14:45:20.590321561 +0200
@@ -5851,7 +5851,7 @@ extract_muldiv_1 (tree t, tree c, enum t
   /* The last case is if we are a multiply.  In that case, we can
 apply the distributive law to commute the multiply and addition
 if the multiplication of the constants doesn't overflow.  */
-  if (code == MULT_EXPR)
+  if (code == MULT_EXPR && TYPE_OVERFLOW_WRAPS (ctype))
return fold_build2 (tcode, ctype,
fold_build2 (code, ctype,
 fold_convert (ctype, op0),
--- gcc/testsuite/gcc.c-torture/execute/pr56899.c.jj2013-04-10 
14:58:37.015788243 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr56899.c   2013-04-10 
14:58:08.0 +0200
@@ -0,0 +1,47 @@
+/* PR tree-optimization/56899 */
+
+#if __SIZEOF_INT__ == 4 && __CHAR_BIT__ == 8
+__attribute__((noinline, noclone)) void
+f1 (int v)
+{
+  int x = -214748365 * (v - 1);
+  if (x != -1932735285)
+__builtin_abort ();
+}
+
+__attribute__((noinline, noclone)) void
+f2 (int v)
+{
+  int x = 214748365 * (v + 1);
+  if (x != -1932735285)
+__builtin_abort ();
+}
+
+__attribute__((noinline, noclone)) void
+f3 (unsigned int v)
+{
+  unsigned int x = -214748365U * (v - 1);
+  if (x != -1932735285U)
+__builtin_abort ();
+}
+
+__attribute__((noinline, noclone)) void
+f4 (unsigned int v)
+{
+  unsigned int x = 214748365U * (v + 1);
+  if (x != -1932735285U)
+__builtin_abort ();
+}
+#endif
+
+int
+main ()
+{
+#if __SIZEOF_INT__ == 4 && __CHAR_BIT__ == 8
+  f1 (10);
+  f2 (-10);
+  f3 (10);
+  f4 (-10U);
+#endif
+  return 0;
+}

Jakub

Re: [PATCH, AArch64] Fix the generation of .arch and .cpu assembly directives

2013-04-10 Thread Marcus Shawcroft


On 10/04/13 15:44, Yufeng Zhang wrote:

Hi,

This patch changes the compiler to correctly generate .arch and .cpu
assembly directives in order to support the inline assembly of
instructions that are part of a feature, e.g. crypto.

OK for the trunk?

Thanks,
Yufeng


gcc/

  * config/aarch64/aarch64.c (aarch64_print_extension): New function.
  (aarch64_start_file): Use the new function.



OK

[PATCH, AArch64] Fix the generation of .arch and .cpu assembly directives

2013-04-10 Thread Yufeng Zhang


Hi,

This patch changes the compiler to correctly generate .arch and .cpu 
assembly directives in order to support the inline assembly of

instructions that are part of a feature, e.g. crypto.

OK for the trunk?

Thanks,
Yufeng


gcc/

* config/aarch64/aarch64.c (aarch64_print_extension): New function.
(aarch64_start_file): Use the new function.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 827b8df..49016c1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7080,12 +7080,30 @@ aarch64_split_atomic_op (enum rtx_code code, rtx 
old_out, rtx new_out, rtx mem,
 }
 
 static void
+aarch64_print_extension (void)
+{
+  const struct aarch64_option_extension *opt = NULL;
+
+  for (opt = all_extensions; opt->name != NULL; opt++)
+if ((aarch64_isa_flags & opt->flags_on) == opt->flags_on)
+  asm_fprintf (asm_out_file, "+%s", opt->name);
+
+  asm_fprintf (asm_out_file, "\n");
+}
+
+static void
 aarch64_start_file (void)
 {
   if (selected_arch)
-asm_fprintf (asm_out_file, "\t.arch %s\n", selected_arch->name);
+{
+  asm_fprintf (asm_out_file, "\t.arch %s", selected_arch->name);
+  aarch64_print_extension ();
+}
   else if (selected_cpu)
-asm_fprintf (asm_out_file, "\t.cpu %s\n", selected_cpu->name);
+{
+  asm_fprintf (asm_out_file, "\t.cpu %s", selected_cpu->name);
+  aarch64_print_extension ();
+}
   default_file_start();
 }

Re: [Patch, Fortran] PR56845 - Fix setting of vptr of CLASS(...),SAVE,ALLOCATABLE

2013-04-10 Thread Tobias Burnus


* PING *

Tobias Burnus:
An unallocated polymorphic variable has the declared type; however, 
for static (SAVE) variables, the current code didn't set the value.


(That the end of scope deallocation/_gfortran_caf_deregister is gone 
for coarrays (declared in the main program) was a side effect. The 
sync/deregistering will still happen via the _gfortran_caf_finalize 
call. But that's fine and in the line of the Fortran standard; in 
fact, due to the FINAL handling, the automatic deallocation of the 
main program will be also removed for nonpolymorphic allocatables.)


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias

Re: [Patch, Fortran, OOP] PR 56261: seg fault call procedure pointer on polymorphic array

2013-04-10 Thread Janus Weil

2013/4/7 Tobias Burnus :
>>> Thus, the only place where the check can be is for:
>>>
>>>f => ff
>>>
>>> In your example, the explicit interface of "ff" is known thus it should
>>> be
>>> testable at resolution time of the proc-pointer assignment.
>>
>> Right. However, strictly speaking, the pointer assignment as such is
>> probably valid. But of course there is not much one can do with the
>> proc-ptr afterwards, if it's invalid to call it ...
>
>  Well, if one doesn't want to error on it, one can still warn. However, the
> following states that it is invalid:
>
> "If the characteristics of the pointer object or the pointer target are such
> that an explicit interface is required, both the pointer object and the
> pointer target shall have an explicit interface." (F2008, para 4 of "7.2.2.4
> Procedure pointer assignment")

Ok, here is an updated patch, which does the discussed checking for
procedure pointer assignments. For this I have introduced a new
function 'gfc_explicit_interface_required', which checks all the items
in F08:12.4.2.2 and is loosely based on the present checks in
'resolve_global_procedure' (which are replaced by the new function).

I hope the general idea of the patch is ok and the error messages are
sufficiently comprehensible.

One leftover problem: The patch currently fails on the auto_char_len_4
test case, which is not being rejected any more. Actually I'm not
fully convinced that the dg-errors there are correct: If the EXTERNAL
statements in auto_char_len_{1,2} do not trigger an "explicit
interface required" warning, I don't see why the ones in
auto_char_len_4 should. Maybe the errors should not be about requiring
an explicit interface, but rather about a mismatch in the interface
(constant vs variable char len)?

Cheers,
Janus


pr56261_v4.diff
Description: Binary data


proc_ptr_40.f90
Description: Binary data

[Patch, Fortran] PR39505 - add support for !GCC$ attributes NO_ARG_CHECK

2013-04-10 Thread Tobias Burnus

Many compilers have some pragma or directive to disable the type, kind 
and rank (TKR) checks. That feature matches C's "void*" pointer and can 
be used in conjunction with passing some byte data to a procedure, which 
only needs to know either the pointer address or pointer address and size.


I think the most useful application are MPI implementation. Currently, 
the do not offer explicit interfaces for their procedures which take a 
"void *buffer" argument. For MPI 3.0, many compiler have started to use 
compiler directives which disable TKR checks - and where gfortran is 
left out.


The Fortran standard does not provide such a feature - and it likely 
won't have one in the next standard, either. The Technical Specification 
ISO/ICE TS 29113:2012 provides TYPE(*), which disables the TK part of 
TKR. That's fine if one has either scalars or arrays (including array 
elements) - then one can use "type(*) :: buf" and "type(*),dimension(*) 
:: buf". But that doesn't allow for scalars *and* arrays [1]. The next 
Fortran standard might allow for scalars passed to type(*),dimension(*) 
in Bind(C) procedures - but seemingly not for non-Bind(C) procedures nor 
is a draft in sight [2].


(There is a possibility to pass both scalars and arrays to a dummy 
argument, namely: "type(*), dimension(..)" but that uses not directly 
the address but passes an array descriptor.)


Other compilers have:

  !DEC$ ATTRIBUTES NO_ARG_CHECK :: buf
  !$PRAGMA IGNORE_TKR buf
  !DIR$ IGNORE_TKR buf
  !IBM* IGNORE_TKR buf

With the attached patch, gfortran does likewise. I essentially use the 
same mechanism as TYPE(*) with the code - after resolving the symbol, I 
even set ts.type = BT_ASSUMED. Contrary to some other compilers, which 
only allow the attribute for interfaces, this patch also allows it for 
Fortran procedures. But due to the TYPE(*) constraints, one can only use 
it with C_LOC or pass it on to another NO_ARG_CHECK dummy.


By the way, the recommended data type with this feature is TYPE(*). In 
order to increase compatibility with other codes, it also accepts 
intrinsic numeric types (and logical) of any kind.


Build and regtested on x86-64-gnu-linux.
OK for the trunk?

Tobias

[1] Generic interfaces are not really a solution as one needs one per 
rank, i.e. scalar+15 ranks = 16 specific functions; with two such 
arguments, up to 16*16 = 256 combinations. As other compilers support 
directives and as, e.g., MPI has many interfaces, MPI vendors won't go 
that route. However, I assume that they will start using gfortran's 
dimension(..) at some point, in line with MPI 3. Either the 4.8+ one 
with gfortran's current descriptor or the one from Fortran-Dev.


[2] Even if a first draft were available, one had to wait until at least 
the first J3/WG5 vote to be _reasonable_ sure that the proposal is in 
and won't be modified.
2013-04-10  Tobias Burnus  

	PR fortran/39505
	* decl.c (ext_attr_list): Add EXT_ATTR_NO_ARG_CHECK.
	* gfortran.h (ext_attr_id_t): Ditto.
	* gfortran.texi (GNU Fortran Compiler Directives):
	Document it.
	* interface.c (compare_type_rank): Ignore rank for NO_ARG_CHECK.
	(compare_parameter): Ditto - and regard as unlimited polymorphic.
	* resolve.c (resolve_symbol, resolve_variable): Add same constraint
	checks as for TYPE(*); turn dummy to TYPE(*),dimension(*).
	(resolve_global_procedure): Require explicit interface
	for NO_ARG_CHECK.

2013-04-10  Tobias Burnus  

	PR fortran/39505
	* gfortran.dg/no_arg_check_1.f90: New.
	* gfortran.dg/no_arg_check_2.f90: New.
	* gfortran.dg/no_arg_check_3.f90: New.

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 3188eae..afae899 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -8628,12 +8628,13 @@ gfc_match_final_decl (void)
 
 
 const ext_attr_t ext_attr_list[] = {
-  { "dllimport", EXT_ATTR_DLLIMPORT, "dllimport" },
-  { "dllexport", EXT_ATTR_DLLEXPORT, "dllexport" },
-  { "cdecl", EXT_ATTR_CDECL, "cdecl" },
-  { "stdcall",   EXT_ATTR_STDCALL,   "stdcall"   },
-  { "fastcall",  EXT_ATTR_FASTCALL,  "fastcall"  },
-  { NULL,EXT_ATTR_LAST,  NULL}
+  { "dllimport",EXT_ATTR_DLLIMPORT,"dllimport" },
+  { "dllexport",EXT_ATTR_DLLEXPORT,"dllexport" },
+  { "cdecl",EXT_ATTR_CDECL,"cdecl" },
+  { "stdcall",  EXT_ATTR_STDCALL,  "stdcall"   },
+  { "fastcall", EXT_ATTR_FASTCALL, "fastcall"  },
+  { "no_arg_check", EXT_ATTR_NO_ARG_CHECK, NULL},
+  { NULL,   EXT_ATTR_LAST, NULL}
 };
 
 /* Match a !GCC$ ATTRIBUTES statement of the form:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 4ebe987..ab15cc1 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -687,6 +687,7 @@ typedef enum
   EXT_ATTR_STDCALL,
   EXT_ATTR_CDECL,
   EXT_ATTR_FASTCALL,
+  EXT_ATTR_NO_ARG_CHECK,
   EXT_ATTR_LAST, EXT_ATTR_NUM = EXT_ATTR_LAST
 }
 ext_attr_id_t;
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 61cb3b

[PATCH][RFC] Handle commutative operations in SLP tree build

2013-04-10 Thread Richard Biener


This handles commutative operations during SLP tree build in the
way that if one configuration does not match, the build will
try again with commutated operands for.  This allows to remove
the special-casing of commutated loads in a complex addition
that was in the end handled as "permutation".  It of course
also applies more generally.  Permutation is currently limited
to 3 unsuccessful permutes to avoid running into the inherently
exponential complexity of tree matching.

The gcc.dg/vect/vect-complex-?.c testcases provide some testing
coverage (previously handled by the special-casing).  I have
seen failed SLP in the wild previously but it's usually on
larger testcases and dependent on operand order of commutative
operands.

I've discussed ideas to restrict the cases where we try a permutation
with Matz, but I'll rather defer that to an eventual followup.
(compute per SSA name a value dependent on the shape of its
use-def tree and use that as a quick check whether sub-trees
can possibly match)

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2013-04-10  Richard Biener  

* tree-vect-slp.c (vect_build_slp_tree_1): Split out from ...
(vect_build_slp_tree): ... here.
(vect_build_slp_tree_1): Compute which stmts of the SLP group
match.  Remove special-casing of mismatched complex loads.
(vect_build_slp_tree): Based on the result from vect_build_slp_tree_1
re-try the match with swapped commutative operands.
(vect_supported_load_permutation_p): Remove special-casing of
mismatched complex loads.
(vect_analyze_slp_instance): Adjust.

Index: trunk/gcc/tree-vect-slp.c
===
*** trunk.orig/gcc/tree-vect-slp.c  2013-04-10 13:36:12.0 +0200
--- trunk/gcc/tree-vect-slp.c   2013-04-10 15:39:13.865325388 +0200
*** vect_get_and_check_slp_defs (loop_vec_in
*** 376,400 
  }
  
  
! /* Recursively build an SLP tree starting from NODE.
!Fail (and return FALSE) if def-stmts are not isomorphic, require data
!permutation or are of unsupported types of operation.  Otherwise, return
!TRUE.  */
  
  static bool
! vect_build_slp_tree (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
!  slp_tree *node, unsigned int group_size,
!  unsigned int *max_nunits,
!  vec *loads,
!  unsigned int vectorization_factor)
  {
unsigned int i;
-   vec stmts = SLP_TREE_SCALAR_STMTS (*node);
gimple stmt = stmts[0];
enum tree_code first_stmt_code = ERROR_MARK, rhs_code = ERROR_MARK;
enum tree_code first_cond_code = ERROR_MARK;
tree lhs;
!   bool stop_recursion = false, need_same_oprnds = false;
tree vectype, scalar_type, first_op1 = NULL_TREE;
optab optab;
int icode;
--- 376,400 
  }
  
  
! /* Verify if the scalar stmts STMTS are isomorphic, require data
!permutation or are of unsupported types of operation.  Return
!true if they are, otherwise return false and indicate in *MATCHES
!which stmts are not isomorphic to the first one.  If MATCHES[0]
!is false then this indicates the comparison could not be
!carried out or the stmts will never be vectorized by SLP.  */
  
  static bool
! vect_build_slp_tree_1 (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo,
!  vec stmts, unsigned int group_size,
!  unsigned nops, unsigned int *max_nunits,
!  unsigned int vectorization_factor, bool *matches)
  {
unsigned int i;
gimple stmt = stmts[0];
enum tree_code first_stmt_code = ERROR_MARK, rhs_code = ERROR_MARK;
enum tree_code first_cond_code = ERROR_MARK;
tree lhs;
!   bool need_same_oprnds = false;
tree vectype, scalar_type, first_op1 = NULL_TREE;
optab optab;
int icode;
*** vect_build_slp_tree (loop_vec_info loop_
*** 403,429 
struct data_reference *first_dr;
HOST_WIDE_INT dummy;
gimple first_load = NULL, prev_first_load = NULL, old_first_load = NULL;
-   vec oprnds_info;
-   unsigned int nops;
-   slp_oprnd_info oprnd_info;
tree cond;
  
-   if (is_gimple_call (stmt))
- nops = gimple_call_num_args (stmt);
-   else if (is_gimple_assign (stmt))
- {
-   nops = gimple_num_ops (stmt) - 1;
-   if (gimple_assign_rhs_code (stmt) == COND_EXPR)
-   nops++;
- }
-   else
- return false;
- 
-   oprnds_info = vect_create_oprnd_info (nops, group_size);
- 
/* For every stmt in NODE find its def stmt/s.  */
FOR_EACH_VEC_ELT (stmts, i, stmt)
  {
if (dump_enabled_p ())
{
  dump_printf_loc (MSG_NOTE, vect_location, "Build SLP for ");
--- 403,415 
struct data_reference *first_dr;
HOST_WIDE_INT dummy;
gimple first_load = NULL, prev_first_load = NULL, old_first_load = NULL;
tree cond;
  
/* For every stmt in NODE find its def stmt/

[Patch] Add -gdwarf option to make gcc generate DWARF with the default version

2013-04-10 Thread Senthil Kumar Selvaraj

Hi,

This patch adds a -gdwarf option to make gcc generate DWARF with the
default version (currently 4).

A bunch of tests in the dejagnu dwarf test suite fail right now if gcc
is configured/built with support for multiple debugging formats and
DWARF is not the default. The failing tests don't explicitly specify
-gdwarf- but check for DWARF data. 

The tests need a way to ask gcc to generate the latest DWARF version supported.
Instead of hardcoding a specific version, adding this option allows the
tests to run against the latest supported DWARF version. Version specific 
DWARF tests can of course use the -gdwarf- option.  There was a 
discussion on this in the mailing list previously (see
http://gcc.gnu.org/ml/gcc/2013-04/msg7.html and
http://gcc.gnu.org/ml/gcc/2013-03/msg00271.html) and Jason was ok with a
modified version of the patch.

The patch merely forwards control to  gdwarf- option handling, setting
value to the Init value set in common.opt for gdwarf-. In addition, it
raises an error if the user specifies a version with the option.

If ok, could someone apply please (I don't have commit access) ?
Once comitted, I'll send another patch for the failing tests with the -gdwarf
option added to dg-options.

Regards
Senthil

gcc/ChangeLog

2013-04-10  Senthil Kumar Selvaraj  
Jason Merrill  

diff --git gcc/common.opt gcc/common.opt
index e02e7ed..e3645c3 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -2308,9 +2308,13 @@ Common JoinedOrMissing
 Generate debug information in default format
 
 gcoff
-Common JoinedOrMissing Negative(gdwarf-)
+Common JoinedOrMissing Negative(gdwarf)
 Generate debug information in COFF format
 
+gdwarf
+Common JoinedOrMissing UInteger Negative(gdwarf-)
+Generate debug information in the default DWARF version format
+
 gdwarf-
 Common Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format
diff --git gcc/opts.c gcc/opts.c
index 45b12fe..f96ed16 100644
--- gcc/opts.c
+++ gcc/opts.c
@@ -1699,6 +1699,18 @@ common_handle_option (struct gcc_options *opts,
   set_debug_level (SDB_DEBUG, false, arg, opts, opts_set, loc);
   break;
 
+case OPT_gdwarf:
+  if (arg && strlen(arg) != 0)
+{
+  error_at (loc, "%<-gdwarf%s%> is ambiguous; "
+"use %<-gdwarf-%s%> for DWARF version "
+"or %<-gdwarf -g%s%> for debug level", arg, arg, arg);
+  break;
+}
+  else
+{
+  value = opts->x_dwarf_version;
+}
 case OPT_gdwarf_:
   if (value < 2 || value > 4)
error_at (loc, "dwarf version %d is not supported", value);

Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output

2013-04-10 Thread Maksim Kuznetsov

Ping

2013/4/3 Maksim Kuznetsov :
> Thank you for your feedback!
>
>> For '}' case, can you simply just add
>>
>> /* Skip over any character after a percent sign.  */
>> if (*p == '%' && *(p + 1))
>> {
>>   p += 2;
>>   continue;
>> }
>>
>> without changing the do-while loop to the while loop?
>
> Loop condition (*p++ != '}') must be moved to loop body for it to not
> execute after "continue" (we just want to skip over % with following
> character without any other increments or checks). Although, loop form
> doesn't matter so I changed it back to do-while.
>
>> That's not the same thing though.  Maksim's code is correct,
>> although it could certainly be written more clearly.
>> Maybe something like
>>
>>   if (*p == '%')
>> p++;
>>   if (*p)
>> p++;
>
> Fixed.
>
>
> I also noticed that previous patch broke intel (or any alternative)
> syntax. This was because the original loop:
>
> while (*p && *p != '}' && *p++ != '|');
>
> incremented p after '|' is found, but loop in my patch didn't:
>
> while (*p && *p != '}' && *p != '|')
>   p += (*p == '%' && *(p + 1)) ? 2 : 1;
>
> I fixed it too.
>
> Updated patch is attached. Could you please have a look?
>
> ChangeLog:
>
> 2013-04-03  Maxim Kuznetsov  
>
> * final.c (do_assembler_dialects): Don't handle curly braces
> escaped by % as dialect delimiters.
> * config/i386/i386.c (ix86_print_operand_punct_valid_p): Add '{' and
> '}'.
> (ix86_print_operand): Handle '{' and '}'.
>
> testsuite/ChangeLog:
>
> 2013-04-03  Maxim Kuznetsov  
>
> * gcc.target/i386/asm-dialect-2.c: New testcase.
>
> --
> Maxim Kuznetsov



-- 
Maxim Kuznetsov

[C++ Patch] PR 54216

2013-04-10 Thread Paolo Carlini


Hi,

this issue is about some enumeration types which are strictly speaking 
illegal and we are accepting nonetheless:


enum {}; //-std=c++98 or -std=c++11

enum class {}; //-std=c++11

enum class { x }; //-std=c++11

I suppose we want to be less strict about the former thus I'm using a 
pedwarn instead of an error. Not sure about the best wording of the 
warning/error messages.


Anyway, luckily only one existing testcase needs adjusting, I thought we 
had many, in the library too (in fact now I seem to remember I fixed 
some anonymous enum uses)


Tested x86_64-linux.

Thanks,
Paolo.

///



/cp
2013-04-10  Paolo Carlini  

PR c++/54216
* cp/parser.c (cp_parser_enum_specifier): Check for empty
anonymous enums and anonymous scoped enums.

/testsuite
2013-04-10  Paolo Carlini  

PR c++/54216
* g++.dg/cpp0x/enum26.C: New.
* g++.old-deja/g++.pt/mangle1.C: Adjust.
Index: cp/parser.c
===
--- cp/parser.c (revision 197665)
+++ cp/parser.c (working copy)
@@ -14750,6 +14750,9 @@ cp_parser_enum_specifier (cp_parser* parser)
{
  identifier = make_anon_name ();
  is_anonymous = true;
+ if (scoped_enum_p)
+   error_at (type_start_token->location,
+ "anonymous scoped enum is not allowed");
}
 }
   pop_deferring_access_checks ();
@@ -14897,7 +14900,13 @@ cp_parser_enum_specifier (cp_parser* parser)
   if (type == error_mark_node)
cp_parser_skip_to_end_of_block_or_statement (parser);
   /* If the next token is not '}', then there are some enumerators.  */
-  else if (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_BRACE))
+  else if (cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_BRACE))
+   {
+ if (is_anonymous && !scoped_enum_p)
+   pedwarn (type_start_token->location, OPT_Wpedantic,
+"ISO C++ forbids empty anonymous enum");
+   }
+  else
cp_parser_enumerator_list (parser, type);
 
   /* Consume the final '}'.  */
Index: testsuite/g++.dg/cpp0x/enum26.C
===
--- testsuite/g++.dg/cpp0x/enum26.C (revision 0)
+++ testsuite/g++.dg/cpp0x/enum26.C (working copy)
@@ -0,0 +1,8 @@
+// PR c++/54216
+// { dg-options "-std=c++11 -pedantic" }
+
+enum {};// { dg-message "empty anonymous" }
+
+enum class {};  // { dg-error "anonymous" }
+
+enum class { x };   // { dg-error "anonymous" }
Index: testsuite/g++.old-deja/g++.pt/mangle1.C
===
--- testsuite/g++.old-deja/g++.pt/mangle1.C (revision 197665)
+++ testsuite/g++.old-deja/g++.pt/mangle1.C (working copy)
@@ -1,4 +1,5 @@
 // { dg-do assemble  }
+// { dg-options "" }
 // Origin: Mark Mitchell 
 
 typedef enum {} i;

Re: [PATCH][RFC] Remove TODO_ggc_collect, collect unconditionally

2013-04-10 Thread Richard Biener

On Thu, 21 Mar 2013, Richard Biener wrote:

> On Tue, 19 Mar 2013, Richard Biener wrote:
> 
> > On Tue, 19 Mar 2013, Richard Biener wrote:
> > 
> > > On Tue, 19 Mar 2013, Richard Biener wrote:
> > > 
> > > > 
> > > > This adds a GC collection point after each pass instead just after
> > > > those with TODO_ggc_collect in their todo.  The patch will possibly
> > > > slow-down gcac checking a bit (80 passes have TODO_ggc_collect,
> > > > I didn't try to enumerate those that do not, but a grep shows we
> > > > may have up to 212 passes.  OTOH gcac checking will now "properly"
> > > > verify that all pass boundaries are suitable for collection.
> > > > 
> > > > A complete patch will remove TODO_ggc_collect and all its uses
> > > > as well.
> > > > 
> > > > The patch should result in lower peak memory consumption for
> > > > some of the odd testcases that we worked on.
> > > > 
> > > > Bootstrap & regtest scheduled on x86_64-unknown-linux-gnu.
> > > 
> > > Which shows that I need to merge the IRA and reload/lra passes.
> > > Honza tells me that they are considered "separate" has historical
> > > reasons only.  Given that reload pushes TV_IRA and that the boundary
> > > isn't GC safe I don't think that is too bad (dump files will now
> > > be shared, of course).
> > > 
> > > I'll schedule a gcac checking bootstrap over night as well.
> > 
> > The following is it, changelog omits the boring part (enumerating
> > all files and pass structs touched ...).
> > 
> > Regularly bootstrapped and tested on x86_64-unknown-linux-gnu,
> > the gcac one is still running (as expected ...).
> > 
> > Any objections?
> 
> One testcase, g++.dg/pr55604.C, needs adjustment (it uses
> -fdump-rtl-reload which is gone after the patch).
> 
> I let a gcac bootstrap run into stage3 (and then killed it after
> 3 days ...).  I also built a gcac compiler without bootstrapping
> and ran the testsuite.  Unfortunately that results in lot of
> timeouts and weird errors.  I have reported PR56673 for a GC issue
> unrelated to this patch (multi-versioning is broken).  Weird errors
> include -frepo linking errors.
> 
> Any hints on how to globally disable the dejagnu timeout so I can
> eventually get a clean gcac testsuite run without the patch to
> compare against (I doubt -frepo errors can be related to this patch).

I went ahead with this now, given no comments at all and this
both bit-rotting and blocking the IL verifier TODO cleanups
in my pipeline.

Re-bootstrapped and tested on x86_64-unknown-linux-gnu and
committed.

Richard.

> > 2013-03-19  Richard Biener  
> > 
> > * passes.c (execute_todo): Do not call ggc_collect conditional here.
> > (execute_one_ipa_transform_pass): But unconditionally here.
> > (execute_one_pass): And here.
> > (init_optimization_passes): Remove reload pass.
> > * tree-pass.h (TODO_ggc_collect): Remove.
> > (pass_reload): Likewise.
> > * ira.c (do_reload): Merge into ...
> > (ira): ... this.
> > (rest_of_handle_reload): Remove.
> > (pass_reload): Likewise.
> > * config/i386/i386.c (ix86_option_override): Refer to ira instead
> > of reload for vzeroupper pass placement.
> > * : Remove TODO_ggc_collect from todo_flags_start
> > and todo_flags_finish of all passes.

Re: [C++11][4.9] Add missing REDUC_PLUS_EXPR case to potential_constant_expression_1.

2013-04-10 Thread Richard Biener

On Wed, Apr 10, 2013 at 12:50 PM, James Greenhalgh
 wrote:
>
>> -Original Message-
>> From: dosr...@gmail.com [mailto:dosr...@gmail.com] On Behalf Of Gabriel
>> Dos Reis
>> Sent: 20 March 2013 19:09
>> To: James Greenhalgh
>> Cc: Jakub Jelinek; Richard Biener; gcc-patches@gcc.gnu.org; Jason
>> Merrill; m...@codesourcery.com
>> Subject: Re: [C++11][4.9] Add missing REDUC_PLUS_EXPR case to
>> potential_constant_expression_1.
>>
>> On Wed, Mar 20, 2013 at 1:03 PM, James Greenhalgh
>>  wrote:
>>
>> > Is that be sensible? It certainly seems like someone intended to
>> > explicitly enumerate all the possible cases and ensure that they were
>> > correctly handled.
>>
>> That someone would be me.
>>
>> We need to catch loudly any front-end tree code, e.g. ASTs, object
>> we may have missed, as opposed to silently ignoring them with
>> possible miscompilation and pray that someone might be sufficiently
>> pissed off and report it as a bug.
>>
>> What is wrong isn't that the front-end inserts internal coverage
>> check; rather it is the fact that we don't have enough separation
>> between front-end asts and middle-end stuff.
>>
>> The convenience of adding a middle-end optimization (which this
>> essentially is) should not trump correctness of the implementation
>> of standard semantics.
>
> So, as far as I can see no decision came out of this thread as to
> what should be done. In that time I had to add another few tree cases
> as I added more things to TARGET_FOLD_BUILTIN.
>
> I'd like to start pushing some of these TARGET_FOLD_BUILTIN patches
> upstream, but they currently all hinge on resolving this discussion.

I still think getting rid of TARGET_FOLD_BUILTIN and replacing it with
TARGET_FOLD_STMT that only operates on GIMPLE is the way to go.

One of the issues we hit is that it's not well-defined what tree codes are
supposed to be part of GENERIC and which only part of GIMPLE
(in case we want to support GENERIC tree codes not being a superset
of GIMPLE tree codes at all).  If they are part of GENERIC then the
C++ frontend needs to handle them as folding can introduce all
GENERIC tree codes.

Richard.

> Would it be OK for this patch to go in, I know the thread started
> well for me with:
>
>> -Original Message-
>> From: Jason Merrill [mailto:ja...@redhat.com]
>> Sent: 14 March 2013 18:52
>> To: James Greenhalgh; gcc-patches@gcc.gnu.org
>> Cc: m...@codesourcery.com
>> Subject: Re: [C++11][4.9] Add missing REDUC_PLUS_EXPR case to
>> potential_constant_expression_1.
>>
>> On 03/14/2013 09:48 AM, James Greenhalgh wrote:
>> > Is this OK to commit to 4.9 when stage 1 opens up?
>>
>> Yes, but please add the other new tree codes as well.
>>
>> Jason
>
> But quickly moved on to discussion, so I didn't commit the patch.
>
> Thanks,
> James Greenhalgh
> Graduate Engineer
> ARM
>
> ---
> gcc/
>
> 2013-04-09  James Greenhalgh  
>
> * cp/semantics.c
> (potential_constant_expression_1): Add cases for REDUC_PLUS_EXPR,
> REDUC_MIN_EXPR, REDUC_MAX_EXPR.

[gomp4] Some libgomp changes

2013-04-10 Thread Jakub Jelinek

Hi!

I've committed the following set of changes to gomp-4_0-branch after
regtesting.  This adds (so far dummy) exports for the new OpenMP 4.0
library functions, and planned entry points for #pragma omp cancel*,
plus, as discussed with Richard privately also new GOMP_parallel* entry
points, because we need to change them anyway to pass in some flags
(right now just proc_bind clause values).  The API of those also changes,
before we used to emit
GOMP_parallel_start (somefn, &data, num_threads);
somefn (&data);
GOMP_parallel_end ();
but the new API will be just
GOMP_parallel (somefn, &data, num_threads, flags);
and the function will take care of calling somefn also in the initial
thread, not just in the other threads.  The advantage of that is that
we can eventually make a transparent unwind info for that with some
DWARF unwind info proglet, so that e.g. backtraces could be nicer.

No compiler changes for now, those will come up later.

2013-04-10  Jakub Jelinek  

* libgomp.map (omp_get_cancellation, omp_get_cancellation_,
omp_get_proc_bind, omp_get_proc_bind_, omp_set_default_device,
omp_set_default_device_, omp_set_default_device_8_,
omp_get_default_device, omp_get_default_device_,
omp_get_num_devices, omp_get_num_devices_, omp_get_num_teams,
omp_get_num_teams_, omp_get_team_num, omp_get_team_num_): Export
@@OMP_4.0.
(GOMP_cancel, GOMP_cancellation_point, GOMP_parallel_loop_dynamic,
GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime,
GOMP_parallel_loop_static, GOMP_parallel_sections, GOMP_parallel,
GOMP_taskgroup_start, GOMP_taskgroup_end): Export @@GOMP_4.0.
* parallel.c (GOMP_parallel_end): Add ialias.
(GOMP_parallel, GOMP_cancel, GOMP_cancellation_point): New
functions.
* omp.h.in (omp_proc_bind_t): New typedef.
(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
omp_get_team_num): New prototypes.
* env.c (omp_get_cancellation, omp_get_proc_bind,
omp_set_default_device, omp_get_default_device, omp_get_num_devices,
omp_get_num_teams, omp_get_team_num): New functions.
* fortran.c (ULP, STR1, STR2, ialias_redirect): Removed.
(omp_get_cancellation_, omp_get_proc_bind_, omp_set_default_device_,
omp_set_default_device_8_, omp_get_default_device_,
omp_get_num_devices_, omp_get_num_teams_, omp_get_team_num_): New
functions.
* libgomp.h (ialias_ulp, ialias_str1, ialias_str2, ialias_redirect,
ialias_call): Define.
* libgomp_g.h (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic,
GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime, GOMP_parallel,
GOMP_cancel, GOMP_cancellation_point, GOMP_taskgroup_start,
GOMP_taskgroup_end, GOMP_parallel_sections): New prototypes.
* task.c (GOMP_taskgroup_start, GOMP_taskgroup_end): New functions.
* sections.c (GOMP_parallel_sections): New function.
* loop.c (GOMP_parallel_loop_static, GOMP_parallel_loop_dynamic,
GOMP_parallel_loop_guided, GOMP_parallel_loop_runtime): New
functions.
(GOMP_parallel_end): Add ialias_redirect.
* omp_lib.f90.in (omp_proc_bind_kind, omp_proc_bind_false,
omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close,
omp_proc_bind_spread): New params.
(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
omp_get_team_num): New interfaces.
* omp_lib.h.in (omp_proc_bind_kind, omp_proc_bind_false,
omp_proc_bind_true, omp_proc_bind_master, omp_proc_bind_close,
omp_proc_bind_spread): New params.
(omp_get_cancellation, omp_get_proc_bind, omp_set_default_device,
omp_get_default_device, omp_get_num_devices, omp_get_num_teams,
omp_get_team_num): New externals.

--- libgomp/libgomp.map.jj  2013-03-20 10:02:05.0 +0100
+++ libgomp/libgomp.map 2013-04-10 12:06:47.635559156 +0200
@@ -113,6 +113,25 @@ OMP_3.1 {
omp_in_final_;
 } OMP_3.0;
 
+OMP_4.0 {
+  global:
+   omp_get_cancellation;
+   omp_get_cancellation_;
+   omp_get_proc_bind;
+   omp_get_proc_bind_;
+   omp_set_default_device;
+   omp_set_default_device_;
+   omp_set_default_device_8_;
+   omp_get_default_device;
+   omp_get_default_device_;
+   omp_get_num_devices;
+   omp_get_num_devices_;
+   omp_get_num_teams;
+   omp_get_num_teams_;
+   omp_get_team_num;
+   omp_get_team_num_;
+} OMP_3.1;
+
 GOMP_1.0 {
   global:
GOMP_atomic_end;
@@ -184,3 +203,17 @@ GOMP_3.0 {
   global:
GOMP_taskyield;
 } GOMP_2.0;
+
+GOMP_4.0 {
+  global:
+   GOMP_cancel;
+   GOMP_cancellation_point;
+   GOMP_parallel_loop_dynamic;
+   GOMP_parallel_loop_guided;
+

RE: [C++11][4.9] Add missing REDUC_PLUS_EXPR case to potential_constant_expression_1.

2013-04-10 Thread James Greenhalgh


> -Original Message-
> From: dosr...@gmail.com [mailto:dosr...@gmail.com] On Behalf Of Gabriel
> Dos Reis
> Sent: 20 March 2013 19:09
> To: James Greenhalgh
> Cc: Jakub Jelinek; Richard Biener; gcc-patches@gcc.gnu.org; Jason
> Merrill; m...@codesourcery.com
> Subject: Re: [C++11][4.9] Add missing REDUC_PLUS_EXPR case to
> potential_constant_expression_1.
>
> On Wed, Mar 20, 2013 at 1:03 PM, James Greenhalgh
>  wrote:
>
> > Is that be sensible? It certainly seems like someone intended to
> > explicitly enumerate all the possible cases and ensure that they were
> > correctly handled.
>
> That someone would be me.
>
> We need to catch loudly any front-end tree code, e.g. ASTs, object
> we may have missed, as opposed to silently ignoring them with
> possible miscompilation and pray that someone might be sufficiently
> pissed off and report it as a bug.
>
> What is wrong isn't that the front-end inserts internal coverage
> check; rather it is the fact that we don't have enough separation
> between front-end asts and middle-end stuff.
>
> The convenience of adding a middle-end optimization (which this
> essentially is) should not trump correctness of the implementation
> of standard semantics.

So, as far as I can see no decision came out of this thread as to
what should be done. In that time I had to add another few tree cases
as I added more things to TARGET_FOLD_BUILTIN.

I'd like to start pushing some of these TARGET_FOLD_BUILTIN patches
upstream, but they currently all hinge on resolving this discussion.

Would it be OK for this patch to go in, I know the thread started
well for me with:

> -Original Message-
> From: Jason Merrill [mailto:ja...@redhat.com]
> Sent: 14 March 2013 18:52
> To: James Greenhalgh; gcc-patches@gcc.gnu.org
> Cc: m...@codesourcery.com
> Subject: Re: [C++11][4.9] Add missing REDUC_PLUS_EXPR case to
> potential_constant_expression_1.
>
> On 03/14/2013 09:48 AM, James Greenhalgh wrote:
> > Is this OK to commit to 4.9 when stage 1 opens up?
>
> Yes, but please add the other new tree codes as well.
>
> Jason

But quickly moved on to discussion, so I didn't commit the patch.

Thanks,
James Greenhalgh
Graduate Engineer
ARM

---
gcc/

2013-04-09  James Greenhalgh  

* cp/semantics.c
(potential_constant_expression_1): Add cases for REDUC_PLUS_EXPR,
REDUC_MIN_EXPR, REDUC_MAX_EXPR.
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 72b884e..880f479 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -8619,6 +8619,9 @@ potential_constant_expression_1 (tree t, bool want_rval, tsubst_flags_t flags)
 case ABS_EXPR:
 case TRUTH_NOT_EXPR:
 case FIXED_CONVERT_EXPR:
+case REDUC_PLUS_EXPR:
+case REDUC_MIN_EXPR:
+case REDUC_MAX_EXPR:
 case UNARY_PLUS_EXPR:
   return potential_constant_expression_1 (TREE_OPERAND (t, 0), rval,
 	  flags);

Re: [PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread Marcus Shawcroft

Zhenqiang, Does Jame's patch fix your test case?

/Marcus

On 10 April 2013 11:43, Richard Earnshaw  wrote:
> On 10/04/13 11:31, James Greenhalgh wrote:
>>
>>
>>> -Original Message-
>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>>> ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen
>>> Sent: 10 April 2013 09:02
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: Marcus Shawcroft
>>> Subject: [PATCH, AARCH64] Fix unrecognizable insn issue
>>>
>>> Hi,
>>>
>>> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>>>
>>>a LE b -> b GE a
>>>
>>> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>>
>>
>> Yes it will. We should not be swapping the comparison in these cases.
>>
>>>
>>> Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
>>> for detail about the issue.
>>>
>>> The patch is to make "b" a register when inversing LE.
>>
>>
>> This patch is too restrictive. There is an `fcmle v0.2d #0` form which we
>> should be generating when we can. Also, you are only fixing one
>> problematic
>> case where there are a few.
>>
>> I don't have access to your reproducer, so I can't be certain this patch
>> is correct - I have created my own reproducer and added it in with
>> the other vect-fcm tests.
>>
>> Thorough regression tests are ongoing for this patch, but it
>> passes aarch64.exp and vect.exp with no regressions.
>>
>> Thanks,
>> James
>>
>> ---
>> gcc/
>>
>> 2013-04-10  James Greenhalgh  
>>
>> * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Fix
>> floating-point vector comparisons against 0.
>>
>> gcc/testsuite/
>>
>> 2013-04-10  James Greenhalgh  
>>
>> * gcc.target/aarch64/vect-fcm.x: Add check for zero forms of
>> inverse operands.
>> * gcc.target/aarch64/vect-fcm-eq-d.c: Check that new zero form
>> loop is vectorized.
>> * gcc.target/aarch64/vect-fcm-eq-f.c: Likewise.
>> * gcc.target/aarch64/vect-fcm-ge-d.c: Check that new zero form
>> loop is vectorized and that the correct instruction is generated.
>> * gcc.target/aarch64/vect-fcm-ge-f.c: Likewise.
>> * gcc.target/aarch64/vect-fcm-gt-d.c: Likewise.
>> * gcc.target/aarch64/vect-fcm-gt-f.c: Likewise.
>>
>>
>
> OK.
>
> R.
>
>

Re: [PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread Richard Earnshaw

On 10/04/13 11:31, James Greenhalgh wrote:

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen
Sent: 10 April 2013 09:02
To: gcc-patches@gcc.gnu.org
Cc: Marcus Shawcroft
Subject: [PATCH, AARCH64] Fix unrecognizable insn issue

Hi,

During expand, function aarch64_vcond_internal inverses some CMP, e.g.

   a LE b -> b GE a

But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.

Yes it will. We should not be swapping the comparison in these cases.

Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
for detail about the issue.

The patch is to make "b" a register when inversing LE.

This patch is too restrictive. There is an `fcmle v0.2d #0` form which we
should be generating when we can. Also, you are only fixing one problematic
case where there are a few.

I don't have access to your reproducer, so I can't be certain this patch
is correct - I have created my own reproducer and added it in with
the other vect-fcm tests.

Thorough regression tests are ongoing for this patch, but it
passes aarch64.exp and vect.exp with no regressions.

Thanks,
James

---
gcc/

2013-04-10  James Greenhalgh  

* config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Fix
floating-point vector comparisons against 0.

gcc/testsuite/

2013-04-10  James Greenhalgh  

* gcc.target/aarch64/vect-fcm.x: Add check for zero forms of
inverse operands.
* gcc.target/aarch64/vect-fcm-eq-d.c: Check that new zero form
loop is vectorized.
* gcc.target/aarch64/vect-fcm-eq-f.c: Likewise.
* gcc.target/aarch64/vect-fcm-ge-d.c: Check that new zero form
loop is vectorized and that the correct instruction is generated.
* gcc.target/aarch64/vect-fcm-ge-f.c: Likewise.
* gcc.target/aarch64/vect-fcm-gt-d.c: Likewise.
* gcc.target/aarch64/vect-fcm-gt-f.c: Likewise.

OK.

R.

[PATCH, AArch64] Compare Negative instruction in shift and extend mode

2013-04-10 Thread Hurugalawadi, Naveen

Hi,

Please find attached the patch that implements compare negative
instruction with shift and extend mode for aarch64 target.
Testcase have been added for compare and compare negative instruction.

Please review the same and let me know if there should be any 
modifications in the patch.
 
Build and tested on aarch64-thunder-elf (using Cavium's internal
simulator). No new regressions.

Thanks,
Naveen

gcc/

2013-04-10   Naveen H.S  

* config/aarch64/aarch64.c (aarch64_select_cc_mode): Use NEG
code in CC_SWP mode.
* config/aarch64/aarch64.md (*cmn_swp__reg): New
pattern.
(*cmn_swp__reg): New pattern.

gcc/testsuite/

2013-04-10   Naveen H.S  

* gcc.target/aarch64/cmn-1.c: New.
* gcc.target/aarch64/cmp.c: New.--- gcc/config/aarch64/aarch64.c	2013-04-09 11:58:48.650789435 +0530
+++ gcc/config/aarch64/aarch64.c	2013-04-10 13:20:12.625311883 +0530
@@ -3094,7 +3094,8 @@ aarch64_select_cc_mode (RTX_CODE code, r
  the comparison will have to be swapped when we emit the assembly
  code.  */
   if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode)
-  && (GET_CODE (y) == REG || GET_CODE (y) == SUBREG)
+  && (GET_CODE (y) == REG || GET_CODE (y) == SUBREG
+	  || GET_CODE (y) == NEG)
   && (GET_CODE (x) == ASHIFT || GET_CODE (x) == ASHIFTRT
 	  || GET_CODE (x) == LSHIFTRT
 	  || GET_CODE (x) == ZERO_EXTEND || GET_CODE (x) == SIGN_EXTEND))
--- gcc/config/aarch64/aarch64.md	2013-04-09 11:58:48.646789435 +0530
+++ gcc/config/aarch64/aarch64.md	2013-04-10 12:31:40.393317692 +0530
@@ -2190,7 +2190,28 @@
(set_attr "mode" "")]
 )
 
+(define_insn "*cmn_swp__reg"
+  [(set (reg:CC_SWP CC_REGNUM)
+	(compare:CC_SWP (ASHIFT:GPI
+			 (match_operand:GPI 0 "register_operand" "r")
+			 (match_operand:QI 1 "aarch64_shift_imm_" "n"))
+			(neg:GPI (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ"]
+  ""
+  "cmn\\t%2, %0,  %1"
+  [(set_attr "v8type" "alus_shift")
+   (set_attr "mode" "")]
+)
 
+(define_insn "*cmn_swp__reg"
+  [(set (reg:CC_SWP CC_REGNUM)
+	(compare:CC_SWP (ANY_EXTEND:GPI
+			 (match_operand:ALLX 0 "register_operand" "r"))
+			(neg:GPI (match_operand:GPI 1 "register_operand" "r"]
+  ""
+  "cmn\\t%1, %0, xt"
+  [(set_attr "v8type" "alus_ext")
+   (set_attr "mode" "")]
+)
 ;; ---
 ;; Store-flag and conditional select insns
 ;; ---
--- gcc/testsuite/gcc.target/aarch64/cmn-1.c	1970-01-01 05:30:00.0 +0530
+++ gcc/testsuite/gcc.target/aarch64/cmn-1.c	2013-04-10 12:27:17.845318216 +0530
@@ -0,0 +1,134 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps" } */
+
+extern void abort (void);
+
+int
+cmn_si_test1 (int a, int b, int c)
+{
+  /* { dg-final { scan-assembler "cmn\tw\[0-9\]+, w\[0-9\]+" } } */
+  if (a + b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int
+cmn_si_test2 (int a, int b, int c)
+{
+  /* { dg-final { scan-assembler "cmn\tw\[0-9\]+, w\[0-9\]+, asr 3" } } */
+  if ((a >> 3) + b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int
+cmn_si_test3 (char a, int b, int c)
+{
+  /* { dg-final { scan-assembler "cmn\tw\[0-9\]+, w\[0-9\]+, uxtb" } } */
+  if (a > -b)
+return a + c;
+  else
+return a + b + c;
+}
+
+typedef long long s64;
+
+s64
+cmn_di_test1 (s64 a, s64 b, s64 c)
+{
+  /* { dg-final { scan-assembler "cmn\tx\[0-9\]+, x\[0-9\]+" } } */
+  if (a + b)
+return a + c;
+  else
+return a + b + c;
+}
+
+s64
+cmn_di_test2 (s64 a, s64 b, s64 c)
+{
+  /* { dg-final { scan-assembler "cmn\tx\[0-9\]+, x\[0-9\]+, asr 3" } } */
+  if ((a >> 3) + b)
+return a + c;
+  else
+return a + b + c;
+}
+
+s64
+cmn_di_test3 (int a, s64 b, s64 c)
+{
+  /* { dg-final { scan-assembler "cmn\tx\[0-9\]+, x\[0-9\]+, sxtw" } } */
+  if (a > -b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int main ()
+{
+  int x;
+  s64 y;
+
+  x = cmn_si_test1 (2, 12, 5);
+  if (x != 7)
+abort ();
+
+  x = cmn_si_test1 (1, 2, 32);
+  if (x != 33)
+abort ();
+
+  x = cmn_si_test2 (7, 5, 15);
+  if (x != 22)
+abort ();
+
+  x = cmn_si_test2 (12, 1, 3);
+  if (x != 15)
+abort ();
+
+  x = cmn_si_test3 (13, 14, 5);
+  if (x != 18)
+abort ();
+
+  x = cmn_si_test3 (15, 21, 2);
+  if (x != 17)
+abort ();
+
+  y = cmn_di_test1 (0x20202020ll,
+		0x65161611ll,
+		0x42434243ll);
+  if (y != 0x62636263ll)
+abort ();
+
+  y = cmn_di_test1 (0x1010101010101ll,
+		0x123456789abcdll,
+		0x5ll);
+  if (y != 0x6565656565656ll)
+abort ();
+
+  y = cmn_di_test2 (0x31313131ll,
+		0x35466561ll,
+		0x42434243ll);
+  if (y != 0x73747374ll)
+abort ();
+
+  y = cmn_di_test2 (0x101010101ll,
+		0x123456789ll,
+		0x5ll);
+  if (y != 0x656565656ll)
+abort ();
+
+  y = cmn_di_test3 (0x62523781ll,
+		0x64234978ll,
+		0x12345123ll);
+  if (y != 0x748688a4ll)
+abort ();
+
+  y = cmn_di_test3 (0x76352626ll,
+

[PATCH, AArch64] Negate and set flags in shift mode

2013-04-10 Thread Hurugalawadi, Naveen

Hi,

Please find attached the patch that implements negs instruction
with shift for aarch64 target.
Testcase have been added for negs instruction.

Please review the same and let me know if there should be any 
modifications in the patch.
 
Build and tested on aarch64-thunder-elf (using Cavium's internal
simulator). No new regressions.

Thanks,
Naveen

gcc/

2013-04-10   Naveen H.S  

* config/aarch64/aarch64.c (aarch64_select_cc_mode): Allow NEG
code in CC_NZ mode.
* config/aarch64/aarch64.md (*neg_3_compare0): New
pattern.

gcc/testsuite/

2013-04-10   Naveen H.S  

* gcc.target/aarch64/negs.c: New.--- gcc/config/aarch64/aarch64.c	2013-04-09 11:58:48.650789435 +0530
+++ gcc/config/aarch64/aarch64.c	2013-04-10 15:42:42.493294822 +0530
@@ -3087,7 +3087,8 @@ aarch64_select_cc_mode (RTX_CODE code, r
   if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode)
   && y == const0_rtx
   && (code == EQ || code == NE || code == LT || code == GE)
-  && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND))
+  && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND
+	  || GET_CODE (x) == NEG))
 return CC_NZmode;
 
   /* A compare with a shifted operand.  Because of canonicalization,
--- gcc/config/aarch64/aarch64.md	2013-04-09 11:58:48.646789435 +0530
+++ gcc/config/aarch64/aarch64.md	2013-04-10 15:43:31.213294725 +0530
@@ -1901,6 +1901,21 @@
(set_attr "mode" "SI")]
 )
 
+(define_insn "*neg_3_compare0"
+  [(set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ
+	 (neg:GPI (ASHIFT:GPI
+		   (match_operand:GPI 1 "register_operand" "r")
+		   (match_operand:QI 2 "aarch64_shift_imm_" "n")))
+	 (const_int 0)))
+   (set (match_operand:GPI 0 "register_operand" "=r")
+	(neg:GPI (ASHIFT:GPI (match_dup 1) (match_dup 2]
+  ""
+  "negs\\t%0, %1,  %2"
+  [(set_attr "v8type" "alus_shift")
+   (set_attr "mode" "")]
+)
+
 (define_insn "*neg__2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(neg:GPI (ASHIFT:GPI
--- gcc/testsuite/gcc.target/aarch64/negs.c	1970-01-01 05:30:00.0 +0530
+++ gcc/testsuite/gcc.target/aarch64/negs.c	2013-04-10 15:44:28.981294610 +0530
@@ -0,0 +1,108 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps" } */
+
+extern void abort (void);
+int z;
+
+int
+negs_si_test1 (int a, int b, int c)
+{
+  int d = -b;
+
+  /* { dg-final { scan-assembler "negs\tw\[0-9\]+, w\[0-9\]+" } } */
+  if (d < 0)
+return a + c;
+
+  z = d;
+return b + c + d;
+}
+
+int
+negs_si_test3 (int a, int b, int c)
+{
+  int d = -(b) << 3;
+
+  /* { dg-final { scan-assembler "negs\tw\[0-9\]+, w\[0-9\]+, lsl 3" } } */
+  if (d == 0)
+return a + c;
+
+  z = d;
+return b + c + d;
+}
+
+typedef long long s64;
+s64 zz;
+
+s64
+negs_di_test1 (s64 a, s64 b, s64 c)
+{
+  s64 d = -b;
+
+  /* { dg-final { scan-assembler "negs\tx\[0-9\]+, x\[0-9\]+" } } */
+  if (d < 0)
+return a + c;
+
+  zz = d;
+return b + c + d;
+}
+
+s64
+negs_di_test3 (s64 a, s64 b, s64 c)
+{
+  s64 d = -(b) << 3;
+
+  /* { dg-final { scan-assembler "negs\tx\[0-9\]+, x\[0-9\]+, lsl 3" } } */
+  if (d == 0)
+return a + c;
+
+  zz = d;
+return b + c + d;
+}
+
+int main ()
+{
+  int x;
+  s64 y;
+
+  x = negs_si_test1 (2, 12, 5);
+  if (x != 7)
+abort ();
+
+  x = negs_si_test1 (1, 2, 32);
+  if (x != 33)
+abort ();
+
+  x = negs_si_test3 (13, 14, 5);
+  if (x != -93)
+abort ();
+
+  x = negs_si_test3 (15, 21, 2);
+  if (x != -145)
+abort ();
+
+  y = negs_di_test1 (0x20202020ll,
+		 0x65161611ll,
+		 0x42434243ll);
+  if (y != 0x62636263ll)
+abort ();
+
+  y = negs_di_test1 (0x1010101010101ll,
+		 0x123456789abcdll,
+		 0x5ll);
+  if (y != 0x6565656565656ll)
+abort ();
+
+  y = negs_di_test3 (0x62523781ll,
+		 0x64234978ll,
+		 0x12345123ll);
+  if (y != 0xfffd553d4edbll)
+abort ();
+
+  y = negs_di_test3 (0x763526268ll,
+		 0x101010101ll,
+		 0x2ll);
+  if (y != 0xfffb1b1b1b1bll)
+abort ();
+
+  return 0;
+}

RE: [PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread James Greenhalgh


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Zhenqiang Chen
> Sent: 10 April 2013 09:02
> To: gcc-patches@gcc.gnu.org
> Cc: Marcus Shawcroft
> Subject: [PATCH, AARCH64] Fix unrecognizable insn issue
>
> Hi,
>
> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>
>   a LE b -> b GE a
>
> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.

Yes it will. We should not be swapping the comparison in these cases.

>
> Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
> for detail about the issue.
>
> The patch is to make "b" a register when inversing LE.

This patch is too restrictive. There is an `fcmle v0.2d #0` form which we
should be generating when we can. Also, you are only fixing one problematic
case where there are a few.

I don't have access to your reproducer, so I can't be certain this patch
is correct - I have created my own reproducer and added it in with
the other vect-fcm tests.

Thorough regression tests are ongoing for this patch, but it
passes aarch64.exp and vect.exp with no regressions.

Thanks,
James

---
gcc/

2013-04-10  James Greenhalgh  

* config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Fix
floating-point vector comparisons against 0.

gcc/testsuite/

2013-04-10  James Greenhalgh  

* gcc.target/aarch64/vect-fcm.x: Add check for zero forms of
inverse operands.
* gcc.target/aarch64/vect-fcm-eq-d.c: Check that new zero form
loop is vectorized.
* gcc.target/aarch64/vect-fcm-eq-f.c: Likewise.
* gcc.target/aarch64/vect-fcm-ge-d.c: Check that new zero form
loop is vectorized and that the correct instruction is generated.
* gcc.target/aarch64/vect-fcm-ge-f.c: Likewise.
* gcc.target/aarch64/vect-fcm-gt-d.c: Likewise.
* gcc.target/aarch64/vect-fcm-gt-f.c: Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ad3f4a4..9b42365 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1622,6 +1622,7 @@
   "TARGET_SIMD"
 {
   int inverse = 0;
+  int use_zero_form = 0;
   int swap_bsl_operands = 0;
   rtx mask = gen_reg_rtx (mode);
   rtx tmp = gen_reg_rtx (mode);
@@ -1632,12 +1633,16 @@
   switch (GET_CODE (operands[3]))
 {
 case GE:
+case GT:
 case LE:
+case LT:
 case EQ:
-  if (!REG_P (operands[5])
-	  && (operands[5] != CONST0_RTX (mode)))
-	operands[5] = force_reg (mode, operands[5]);
-  break;
+  if (operands[5] == CONST0_RTX (mode))
+	{
+	  use_zero_form = 1;
+	  break;
+	}
+  /* Fall through.  */
 default:
   if (!REG_P (operands[5]))
 	operands[5] = force_reg (mode, operands[5]);
@@ -1688,7 +1693,26 @@
 	 a GT b -> a GT b
 	 a LE b -> b GE a
 	 a LT b -> b GT a
-	 a EQ b -> a EQ b  */
+	 a EQ b -> a EQ b
+	 Note that there also exist direct comparison against 0 forms,
+	 so catch those as a special case.  */
+  if (use_zero_form)
+	{
+	  inverse = 0;
+	  switch (GET_CODE (operands[3]))
+	{
+	case LT:
+	  base_comparison = gen_aarch64_cmlt;
+	  break;
+	case LE:
+	  base_comparison = gen_aarch64_cmle;
+	  break;
+	default:
+	  /* Do nothing, other zero form cases already have the correct
+		 base_comparison.  */
+	  break;
+	}
+	}
 
   if (!inverse)
 	emit_insn (base_comparison (mask, operands[4], operands[5]));
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c b/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c
index b6fb5ae..19ecd63 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-d.c
@@ -7,7 +7,7 @@
 
 #include "vect-fcm.x"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
 /* { dg-final { scan-assembler "fcmeq\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.2d" } } */
 /* { dg-final { scan-assembler "fcmeq\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, 0" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-f.c b/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-f.c
index 283d34f..30be5ad 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-f.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect-fcm-eq-f.c
@@ -7,7 +7,7 @@
 
 #include "vect-fcm.x"
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
 /* { dg-final { scan-assembler "fcmeq\\tv\[0-9\]+\.\[24\]s, v\[0-9\]+\.\[24\]s, v\[0-9\]+\.\[24\]s" } } */
 /* { dg-final { scan-assembler "fcmeq\\tv\[0-9\]+\.\[24\]s, v\[0-9\]+\.\[24\]s, 0" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-fcm-ge-d.c b/gcc/testsuite/gcc.target/aarch64/vect-fcm-ge-d.c
index 868e1f8..b922833

Re: [PATCH v3]IPA: fixing inline fail report caused by overwritable functions.

2013-04-10 Thread Zhouyi Zhou

Thanks Richard, nice adding, regression passed on my x86_64 GNU/Linux

On Wed, Apr 10, 2013 at 5:22 PM, Richard Biener
 wrote:
> On Tue, Apr 9, 2013 at 5:10 PM, Zhouyi Zhou  wrote:
>> Hi Richard,
>> I do not have write access to GCC SVN repository, can you commit it for me?
>> Thanks alot
>
> Committed with adding { dg-require-weak "" } to the testcase.
>
> Richard.
>
>> Cheers
>> Zhouyi
>>
>>
>> On Tue, Apr 9, 2013 at 5:04 PM, Richard Biener 
>> wrote:
>>>
>>> On Tue, Apr 9, 2013 at 5:40 AM, Zhouyi Zhou  wrote:
>>> > On Mon, Apr 8, 2013 at 5:48 PM, Richard Biener
>>> >  wrote:
>>> >>Can you trigger this message to show up with -Winline before/after the
>>> >> patch?
>>> >>Can you please add a testcase then?
>>> > Thanks Richard for reviewing, from my point of view about gcc and my
>>> > invoking of gcc, -Winline
>>> > only works on callees that be declared "inline", but if the callee is
>>> > declared
>>> > "inline", it will be AVAIL_AVAILABLE in function can_inline_edge_p, thus
>>> > out of the
>>> > range of my patch.
>>>
>>> Ah, indeed ...
>>>
>>> > So I only add a testcase for fixing the tree dump, are there any thing
>>> > more I can do?
>>>
>>> No.  Patch is ok.
>>>
>>> Thanks,
>>> Richard.
>>>
>>> > Regtested/bootstrapped on x86_64-linux
>>> >
>>> > ChangeLog:
>>> > 2013-04-08 Zhouyi Zhou 
>>> >* cif-code.def (OVERWRITABLE): correct the comment for
>>> > overwritable
>>> > function
>>> >* ipa-inline.c (can_inline_edge_p): let dump mechanism report
>>> > the inline
>>> >fail caused by overwritable functions.
>>> >* gcc.dg/tree-ssa/inline-11.c: New test
>>> >
>>> > Index: gcc/cif-code.def
>>> > ===
>>> > --- gcc/cif-code.def(revision 197549)
>>> > +++ gcc/cif-code.def(working copy)
>>> > @@ -48,7 +48,7 @@ DEFCIFCODE(REDEFINED_EXTERN_INLINE,
>>> >  /* Function is not inlinable.  */
>>> >  DEFCIFCODE(FUNCTION_NOT_INLINABLE, N_("function not inlinable"))
>>> >
>>> > -/* Function is not overwritable.  */
>>> > +/* Function is overwritable.  */
>>> >  DEFCIFCODE(OVERWRITABLE, N_("function body can be overwritten at link
>>> > time"))
>>> >
>>> >  /* Function is not an inlining candidate.  */
>>> > Index: gcc/testsuite/gcc.dg/tree-ssa/inline-11.c
>>> > ===
>>> > --- gcc/testsuite/gcc.dg/tree-ssa/inline-11.c   (revision 0)
>>> > +++ gcc/testsuite/gcc.dg/tree-ssa/inline-11.c   (working copy)
>>> > @@ -0,0 +1,13 @@
>>> > +/* { dg-do compile } */
>>> > +/* { dg-options "-O2 -fdump-tree-einline" } */
>>> > +int w;
>>> > +int bar (void) __attribute__ ((weak));
>>> > +int bar (){
>>> > +  w++;
>>> > +}
>>> > +void foo()
>>> > +{
>>> > +  bar();
>>> > +}
>>> > +/* { dg-final { scan-tree-dump-times "function body can be overwritten
>>> > at link time" 1 "einline" } } */
>>> > +/* { dg-final { cleanup-tree-dump "einline" } } */
>>> > Index: gcc/ipa-inline.c
>>> > ===
>>> > --- gcc/ipa-inline.c(revision 197549)
>>> > +++ gcc/ipa-inline.c(working copy)
>>> > @@ -266,7 +266,7 @@ can_inline_edge_p (struct cgraph_edge *e
>>> >else if (avail <= AVAIL_OVERWRITABLE)
>>> >  {
>>> >e->inline_failed = CIF_OVERWRITABLE;
>>> > -  return false;
>>> > +  inlinable = false;
>>> >  }
>>> >else if (e->call_stmt_cannot_inline_p)
>>> >  {
>>
>>

Re: [PATCH] Fix PR48184

2013-04-10 Thread Jakub Jelinek

On Wed, Apr 10, 2013 at 11:44:18AM +0200, Marek Polacek wrote:
> Ping.
> 
> On Thu, Apr 04, 2013 at 04:01:09PM +0200, Marek Polacek wrote:
> > This fixes an ICE in case we use weirdo --param value for 
> > PARAM_ALIGN_THRESHOLD.  The patch is by Andrew; I added CL, testcase
> > and tested.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?  And what about other
> > branches?
> > 
> > 2013-04-04  Marek Polacek  
> > Andrew Pinski  
> > 
> > PR tree-optimization/48184
> > * final.c (compute_alignments): Set threshold to 0
> > if PARAM_ALIGN_THRESHOLD is 0.
> > 
> > * gcc.dg/pr48184.c: New test.

Shouldn't this be again solved instead by bumping minimum for the param to 1
from 0?  Because, the smaller the param is, the bigger freq_threshold is,
so if for the smallest param we suddenly set freq_threshold to 0, it isn't
consistent.

> > --- gcc/final.c.mp  2013-04-04 14:09:04.626020852 +0200
> > +++ gcc/final.c 2013-04-04 14:09:05.672024174 +0200
> > @@ -701,7 +701,10 @@ compute_alignments (void)
> >FOR_EACH_BB (bb)
> >  if (bb->frequency > freq_max)
> >freq_max = bb->frequency;
> > -  freq_threshold = freq_max / PARAM_VALUE (PARAM_ALIGN_THRESHOLD);
> > +  if (PARAM_VALUE (PARAM_ALIGN_THRESHOLD) == 0)
> > +freq_threshold = 0;
> > +  else
> > +freq_threshold = freq_max / PARAM_VALUE (PARAM_ALIGN_THRESHOLD);
> >  
> >if (dump_file)
> >  fprintf(dump_file, "freq_max: %i\n",freq_max);

Jakub

Re: [PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread Zhenqiang Chen

On 10 April 2013 17:18, Andrew Pinski  wrote:
> On Wed, Apr 10, 2013 at 2:02 AM, Zhenqiang Chen
>  wrote:
>> On 10 April 2013 16:05, Andrew Pinski  wrote:
>>> On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
>>>  wrote:
 Hi,

 During expand, function aarch64_vcond_internal inverses some CMP, e.g.

   a LE b -> b GE a

 But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.

 Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
 for detail about the issue.

 The patch is to make "b" a register when inversing LE.

 Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?
>>>
>>>
>>> Can you add a testcase also?  It would be best to add one that ia a
>>> reduced testcase.
>>
>> Sorry. Due to licence, I can not post the code from SPEC 2006. And it
>> is hard for me to rewrite a program to reproduce it.
>
> Actually most of the code inside of SPEC 2006 is free/open source
> software.  I think that is true of wrf also.  What is not considered
> part of that is the data that is used for benchmarking (except maybe
> the GCC input which is IIRC preprocessed source from the other
> benchmarks).
> There are ways to get a reduced testcase without the full source also:
> http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction
> Again a testcase is still highly recommended here.

Thank you for the comment. I will check the licence issue.

-Zhenqiang

>>
>>> Also can you expand on what is going on here?  There is not enough
>>> information in either this email or the bug report to figure out if
>>> this is the correct patch.
>>
>> Here is more detail about the issue:
>>
>> A VEC_COND_EXPR is generated during tree-level optimization, like
>>
>> vect_var_.21092_16406 = VEC_COND_EXPR > 0.0, 0.0, 0.0 }, vect_var_.21086_16391, vect_var_.21091_16404>;
>>
>> During expand, function aarch64_vcond_internal inverse "LE" to "GE", i.e.
>> reverse "vect_var_.21091_16404 <= { 0.0, 0.0, 0.0, 0.0 }" to " { 0.0,
>> 0.0, 0.0, 0.0 } >= vect_var_.21091_16404".
>>
>> The insn after expand is like
>>
>> (insn 2909 2908 2910 165 (set (reg:V4SI 16777)
>> (unspec:V4SI [
>> (const_vector:V4SF [
>> (const_double:SF 0.0 [0x0.0p+0])
>> (const_double:SF 0.0 [0x0.0p+0])
>> (const_double:SF 0.0 [0x0.0p+0])
>> (const_double:SF 0.0 [0x0.0p+0])
>> ])
>> (reg:V4SF 9594 [ vect_var_.21059 ])
>> ] UNSPEC_CMGE))
>>
>> But this is illegal. FCMGE has two formats:
>>
>> FCMGE d, n, m
>> FCMGE d, n, #0
>>
>> Both require "n" be a register. "#0" can only be the last argument.
>> So when inversing LE to GE, function aarch64_vcond_internal should
>> make sure operands[5] is a register.
>>
>> Thanks!
>> -Zhenqiang
>>

 Thanks!
 -Zhenqiang

 ChangeLog:
 2013-04-10  Zhenqiang Chen 

 * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
 operands[5] to register when inversing LE.

 diff --git a/gcc/config/aarch64/aarch64-simd.md
 b/gcc/config/aarch64/aarch64-simd.md
 index 92dcfc0..d08d23a 100644
 --- a/gcc/config/aarch64/aarch64-simd.md
 +++ b/gcc/config/aarch64/aarch64-simd.md
 @@ -1657,6 +1657,8 @@
complimentary_comparison = gen_aarch64_cmgt;
break;
  case LE:
 +  if (!REG_P (operands[5]))
 +   operands[5] = force_reg (mode, operands[5]);
  case UNLE:
inverse = 1;
/* Fall through.  */
>>>
>>>
>>> On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
>>>  wrote:
 Hi,

 During expand, function aarch64_vcond_internal inverses some CMP, e.g.

   a LE b -> b GE a

 But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.

 Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
 for detail about the issue.

 The patch is to make "b" a register when inversing LE.

 Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?

 Thanks!
 -Zhenqiang

 ChangeLog:
 2013-04-10  Zhenqiang Chen 

 * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
 operands[5] to register when inversing LE.

 diff --git a/gcc/config/aarch64/aarch64-simd.md
 b/gcc/config/aarch64/aarch64-simd.md
 index 92dcfc0..d08d23a 100644
 --- a/gcc/config/aarch64/aarch64-simd.md
 +++ b/gcc/config/aarch64/aarch64-simd.md
 @@ -1657,6 +1657,8 @@
complimentary_comparison = gen_aarch64_cmgt;
break;
  case LE:
 +  if (!REG_P (operands[5]))
 +   operands[5] = force_reg (mode, operands[5]);
  case UNLE:
inverse = 1;
/* Fall through.  */
>
> On Wed, Apr 10, 2013 at 2:02 AM, Zhenqiang Chen
>  wrote:
>> On 10 April 2013 16:05, Andrew Pinski  wrote:
>>> On Wed, Apr 10, 2013 at 1:02 AM, Zhen

Re: [PATCH] Fix PR48184

2013-04-10 Thread Marek Polacek

Ping.

On Thu, Apr 04, 2013 at 04:01:09PM +0200, Marek Polacek wrote:
> This fixes an ICE in case we use weirdo --param value for 
> PARAM_ALIGN_THRESHOLD.  The patch is by Andrew; I added CL, testcase
> and tested.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?  And what about other
> branches?
> 
> 2013-04-04  Marek Polacek  
>   Andrew Pinski  
> 
>   PR tree-optimization/48184
>   * final.c (compute_alignments): Set threshold to 0
>   if PARAM_ALIGN_THRESHOLD is 0.
> 
>   * gcc.dg/pr48184.c: New test.
> 
> --- gcc/final.c.mp2013-04-04 14:09:04.626020852 +0200
> +++ gcc/final.c   2013-04-04 14:09:05.672024174 +0200
> @@ -701,7 +701,10 @@ compute_alignments (void)
>FOR_EACH_BB (bb)
>  if (bb->frequency > freq_max)
>freq_max = bb->frequency;
> -  freq_threshold = freq_max / PARAM_VALUE (PARAM_ALIGN_THRESHOLD);
> +  if (PARAM_VALUE (PARAM_ALIGN_THRESHOLD) == 0)
> +freq_threshold = 0;
> +  else
> +freq_threshold = freq_max / PARAM_VALUE (PARAM_ALIGN_THRESHOLD);
>  
>if (dump_file)
>  fprintf(dump_file, "freq_max: %i\n",freq_max);
> --- gcc/testsuite/gcc.dg/pr48184.c.mp 2013-04-04 13:23:15.356769534 +0200
> +++ gcc/testsuite/gcc.dg/pr48184.c2013-04-04 14:05:05.506241464 +0200
> @@ -0,0 +1,5 @@
> +/* PR tree-optimization/48184 */
> +/* { dg-do compile } */
> +/* { dg-options "-O --param align-threshold=0" } */
> +
> +void foo (void) { }
> 
>   Marek

Re: [PATCH v3]IPA: fixing inline fail report caused by overwritable functions.

2013-04-10 Thread Richard Biener

On Tue, Apr 9, 2013 at 5:10 PM, Zhouyi Zhou  wrote:
> Hi Richard,
> I do not have write access to GCC SVN repository, can you commit it for me?
> Thanks alot

Committed with adding { dg-require-weak "" } to the testcase.

Richard.

> Cheers
> Zhouyi
>
>
> On Tue, Apr 9, 2013 at 5:04 PM, Richard Biener 
> wrote:
>>
>> On Tue, Apr 9, 2013 at 5:40 AM, Zhouyi Zhou  wrote:
>> > On Mon, Apr 8, 2013 at 5:48 PM, Richard Biener
>> >  wrote:
>> >>Can you trigger this message to show up with -Winline before/after the
>> >> patch?
>> >>Can you please add a testcase then?
>> > Thanks Richard for reviewing, from my point of view about gcc and my
>> > invoking of gcc, -Winline
>> > only works on callees that be declared "inline", but if the callee is
>> > declared
>> > "inline", it will be AVAIL_AVAILABLE in function can_inline_edge_p, thus
>> > out of the
>> > range of my patch.
>>
>> Ah, indeed ...
>>
>> > So I only add a testcase for fixing the tree dump, are there any thing
>> > more I can do?
>>
>> No.  Patch is ok.
>>
>> Thanks,
>> Richard.
>>
>> > Regtested/bootstrapped on x86_64-linux
>> >
>> > ChangeLog:
>> > 2013-04-08 Zhouyi Zhou 
>> >* cif-code.def (OVERWRITABLE): correct the comment for
>> > overwritable
>> > function
>> >* ipa-inline.c (can_inline_edge_p): let dump mechanism report
>> > the inline
>> >fail caused by overwritable functions.
>> >* gcc.dg/tree-ssa/inline-11.c: New test
>> >
>> > Index: gcc/cif-code.def
>> > ===
>> > --- gcc/cif-code.def(revision 197549)
>> > +++ gcc/cif-code.def(working copy)
>> > @@ -48,7 +48,7 @@ DEFCIFCODE(REDEFINED_EXTERN_INLINE,
>> >  /* Function is not inlinable.  */
>> >  DEFCIFCODE(FUNCTION_NOT_INLINABLE, N_("function not inlinable"))
>> >
>> > -/* Function is not overwritable.  */
>> > +/* Function is overwritable.  */
>> >  DEFCIFCODE(OVERWRITABLE, N_("function body can be overwritten at link
>> > time"))
>> >
>> >  /* Function is not an inlining candidate.  */
>> > Index: gcc/testsuite/gcc.dg/tree-ssa/inline-11.c
>> > ===
>> > --- gcc/testsuite/gcc.dg/tree-ssa/inline-11.c   (revision 0)
>> > +++ gcc/testsuite/gcc.dg/tree-ssa/inline-11.c   (working copy)
>> > @@ -0,0 +1,13 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -fdump-tree-einline" } */
>> > +int w;
>> > +int bar (void) __attribute__ ((weak));
>> > +int bar (){
>> > +  w++;
>> > +}
>> > +void foo()
>> > +{
>> > +  bar();
>> > +}
>> > +/* { dg-final { scan-tree-dump-times "function body can be overwritten
>> > at link time" 1 "einline" } } */
>> > +/* { dg-final { cleanup-tree-dump "einline" } } */
>> > Index: gcc/ipa-inline.c
>> > ===
>> > --- gcc/ipa-inline.c(revision 197549)
>> > +++ gcc/ipa-inline.c(working copy)
>> > @@ -266,7 +266,7 @@ can_inline_edge_p (struct cgraph_edge *e
>> >else if (avail <= AVAIL_OVERWRITABLE)
>> >  {
>> >e->inline_failed = CIF_OVERWRITABLE;
>> > -  return false;
>> > +  inlinable = false;
>> >  }
>> >else if (e->call_stmt_cannot_inline_p)
>> >  {
>
>

Re: [PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread Andrew Pinski

On Wed, Apr 10, 2013 at 2:02 AM, Zhenqiang Chen
 wrote:
> On 10 April 2013 16:05, Andrew Pinski  wrote:
>> On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
>>  wrote:
>>> Hi,
>>>
>>> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>>>
>>>   a LE b -> b GE a
>>>
>>> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>>>
>>> Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
>>> for detail about the issue.
>>>
>>> The patch is to make "b" a register when inversing LE.
>>>
>>> Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?
>>
>>
>> Can you add a testcase also?  It would be best to add one that ia a
>> reduced testcase.
>
> Sorry. Due to licence, I can not post the code from SPEC 2006. And it
> is hard for me to rewrite a program to reproduce it.

Actually most of the code inside of SPEC 2006 is free/open source
software.  I think that is true of wrf also.  What is not considered
part of that is the data that is used for benchmarking (except maybe
the GCC input which is IIRC preprocessed source from the other
benchmarks).
There are ways to get a reduced testcase without the full source also:
http://gcc.gnu.org/wiki/A_guide_to_testcase_reduction
Again a testcase is still highly recommended here.

Thanks,
Andrew Pinski


>
>> Also can you expand on what is going on here?  There is not enough
>> information in either this email or the bug report to figure out if
>> this is the correct patch.
>
> Here is more detail about the issue:
>
> A VEC_COND_EXPR is generated during tree-level optimization, like
>
> vect_var_.21092_16406 = VEC_COND_EXPR  0.0, 0.0, 0.0 }, vect_var_.21086_16391, vect_var_.21091_16404>;
>
> During expand, function aarch64_vcond_internal inverse "LE" to "GE", i.e.
> reverse "vect_var_.21091_16404 <= { 0.0, 0.0, 0.0, 0.0 }" to " { 0.0,
> 0.0, 0.0, 0.0 } >= vect_var_.21091_16404".
>
> The insn after expand is like
>
> (insn 2909 2908 2910 165 (set (reg:V4SI 16777)
> (unspec:V4SI [
> (const_vector:V4SF [
> (const_double:SF 0.0 [0x0.0p+0])
> (const_double:SF 0.0 [0x0.0p+0])
> (const_double:SF 0.0 [0x0.0p+0])
> (const_double:SF 0.0 [0x0.0p+0])
> ])
> (reg:V4SF 9594 [ vect_var_.21059 ])
> ] UNSPEC_CMGE))
>
> But this is illegal. FCMGE has two formats:
>
> FCMGE d, n, m
> FCMGE d, n, #0
>
> Both require "n" be a register. "#0" can only be the last argument.
> So when inversing LE to GE, function aarch64_vcond_internal should
> make sure operands[5] is a register.
>
> Thanks!
> -Zhenqiang
>
>>>
>>> Thanks!
>>> -Zhenqiang
>>>
>>> ChangeLog:
>>> 2013-04-10  Zhenqiang Chen 
>>>
>>> * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
>>> operands[5] to register when inversing LE.
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>>> b/gcc/config/aarch64/aarch64-simd.md
>>> index 92dcfc0..d08d23a 100644
>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>> @@ -1657,6 +1657,8 @@
>>>complimentary_comparison = gen_aarch64_cmgt;
>>>break;
>>>  case LE:
>>> +  if (!REG_P (operands[5]))
>>> +   operands[5] = force_reg (mode, operands[5]);
>>>  case UNLE:
>>>inverse = 1;
>>>/* Fall through.  */
>>
>>
>> On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
>>  wrote:
>>> Hi,
>>>
>>> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>>>
>>>   a LE b -> b GE a
>>>
>>> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>>>
>>> Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
>>> for detail about the issue.
>>>
>>> The patch is to make "b" a register when inversing LE.
>>>
>>> Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?
>>>
>>> Thanks!
>>> -Zhenqiang
>>>
>>> ChangeLog:
>>> 2013-04-10  Zhenqiang Chen 
>>>
>>> * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
>>> operands[5] to register when inversing LE.
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>>> b/gcc/config/aarch64/aarch64-simd.md
>>> index 92dcfc0..d08d23a 100644
>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>> @@ -1657,6 +1657,8 @@
>>>complimentary_comparison = gen_aarch64_cmgt;
>>>break;
>>>  case LE:
>>> +  if (!REG_P (operands[5]))
>>> +   operands[5] = force_reg (mode, operands[5]);
>>>  case UNLE:
>>>inverse = 1;
>>>/* Fall through.  */

On Wed, Apr 10, 2013 at 2:02 AM, Zhenqiang Chen
 wrote:
> On 10 April 2013 16:05, Andrew Pinski  wrote:
>> On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
>>  wrote:
>>> Hi,
>>>
>>> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>>>
>>>   a LE b -> b GE a
>>>
>>> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>>>
>>> Refer https://bugs.launchpad.net/linaro

Re: [PATCH, combine] Fix host-specific behavior in simplify_compare_const()

2013-04-10 Thread Chung-Ju Wu

2013/4/9 Eric Botcazou :
>> On behalf of Andes Technology Co., we have signed FSF agreement.
>> However, so far I don't have svn write access yet.
>> Would you please help to commit this patch?
>
> Jeff has kindly offered to sponsor you for write access, so you should be able
> to install it yourself.  Otherwise I'll be happy to do it for you.
>
> Thanks for fixing this problem in the combiner.
>
> --
> Eric Botcazou

With Jeff's help, I got my svn write access yesterday
and added myself to MAINTAINERS in "Write After Approval" section.

With your approval and suggestions, the revised patch
(http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00354.html)
is committed as Rev. 197666.

Appreciate your approval and Jeff's help. :)


Best regards,
jasonwucj

[PATCH] Handle mixed constant / invariant ops in SLP

2013-04-10 Thread Richard Biener


This makes us handle mixed constant / invariant ops in SLP
(constant ops are a subset of invariant ops).  This allows us
to vectorize both foo and bar (the vect_get_constant_vectors
adjustment is only necessary for foo).

The patch also simplifies type checking (which in the end I think
is not required ...) as the type of the operand is the same as
it's def operand of its definition (well, they are the same ...).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2013-04-10  Richard Biener  

* tree-vectorizer.h (struct _slp_oprnd_info): Remove
first_const_oprnd field, rename first_def_type to first_op_type.
* tree-vect-slp.c (vect_create_oprnd_info): Adjust.
(vect_get_and_check_slp_defs): Always use the type of the
operand.  Allow mixed vect_external_def, vect_constant_def types.
(vect_get_constant_vectors): Handle mixed vect_external_def,
vect_constant_def types.

* gcc.dg/vect/slp-39.c: New testcase.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 197665)
--- gcc/tree-vectorizer.h   (working copy)
*** typedef struct _slp_oprnd_info
*** 169,176 
   operand itself in case it's constant, and an indication if it's a pattern
   stmt.  */
enum vect_def_type first_dt;
!   tree first_def_type;
!   tree first_const_oprnd;
bool first_pattern;
  } *slp_oprnd_info;
  
--- 169,175 
   operand itself in case it's constant, and an indication if it's a pattern
   stmt.  */
enum vect_def_type first_dt;
!   tree first_op_type;
bool first_pattern;
  } *slp_oprnd_info;
  
Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 197665)
--- gcc/tree-vect-slp.c (working copy)
*** vect_create_oprnd_info (int nops, int gr
*** 140,147 
oprnd_info = XNEW (struct _slp_oprnd_info);
oprnd_info->def_stmts.create (group_size);
oprnd_info->first_dt = vect_uninitialized_def;
!   oprnd_info->first_def_type = NULL_TREE;
!   oprnd_info->first_const_oprnd = NULL_TREE;
oprnd_info->first_pattern = false;
oprnds_info.quick_push (oprnd_info);
  }
--- 140,146 
oprnd_info = XNEW (struct _slp_oprnd_info);
oprnd_info->def_stmts.create (group_size);
oprnd_info->first_dt = vect_uninitialized_def;
!   oprnd_info->first_op_type = NULL_TREE;
oprnd_info->first_pattern = false;
oprnds_info.quick_push (oprnd_info);
  }
*** vect_get_and_check_slp_defs (loop_vec_in
*** 321,336 
{
  oprnd_info->first_dt = dt;
  oprnd_info->first_pattern = pattern;
! if (def)
!   {
! oprnd_info->first_def_type = TREE_TYPE (def);
! oprnd_info->first_const_oprnd = NULL_TREE;
!   }
! else
! {
!   oprnd_info->first_def_type = NULL_TREE;
!   oprnd_info->first_const_oprnd = oprnd;
! }
}
else
{
--- 320,326 
{
  oprnd_info->first_dt = dt;
  oprnd_info->first_pattern = pattern;
! oprnd_info->first_op_type = TREE_TYPE (oprnd);
}
else
{
*** vect_get_and_check_slp_defs (loop_vec_in
*** 341,354 
 vect_internal_def.  */
  if (((oprnd_info->first_dt != dt
  && !(oprnd_info->first_dt == vect_reduction_def
!  && dt == vect_internal_def))
!|| (oprnd_info->first_def_type != NULL_TREE
!  && def
!  && !types_compatible_p (oprnd_info->first_def_type,
!  TREE_TYPE (def
!  || (!def
!  && !types_compatible_p (TREE_TYPE 
(oprnd_info->first_const_oprnd),
!  TREE_TYPE (oprnd
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
--- 331,343 
 vect_internal_def.  */
  if (((oprnd_info->first_dt != dt
  && !(oprnd_info->first_dt == vect_reduction_def
!  && dt == vect_internal_def)
!   && !((oprnd_info->first_dt == vect_external_def
! || oprnd_info->first_dt == vect_constant_def)
!&& (dt == vect_external_def
!|| dt == vect_constant_def)))
!|| !types_compatible_p (oprnd_info->first_op_type,
!  TREE_TYPE (oprnd
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
*** vect_get_constant_vectors (tree op, slp_
*** 2471,2477 
   the lhs, so make sure the scalar is the right type

Re: [PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread Zhenqiang Chen

On 10 April 2013 16:05, Andrew Pinski  wrote:
> On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
>  wrote:
>> Hi,
>>
>> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>>
>>   a LE b -> b GE a
>>
>> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>>
>> Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
>> for detail about the issue.
>>
>> The patch is to make "b" a register when inversing LE.
>>
>> Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?
>
>
> Can you add a testcase also?  It would be best to add one that ia a
> reduced testcase.

Sorry. Due to licence, I can not post the code from SPEC 2006. And it
is hard for me to rewrite a program to reproduce it.

> Also can you expand on what is going on here?  There is not enough
> information in either this email or the bug report to figure out if
> this is the correct patch.

Here is more detail about the issue:

A VEC_COND_EXPR is generated during tree-level optimization, like

vect_var_.21092_16406 = VEC_COND_EXPR ;

During expand, function aarch64_vcond_internal inverse "LE" to "GE", i.e.
reverse "vect_var_.21091_16404 <= { 0.0, 0.0, 0.0, 0.0 }" to " { 0.0,
0.0, 0.0, 0.0 } >= vect_var_.21091_16404".

The insn after expand is like

(insn 2909 2908 2910 165 (set (reg:V4SI 16777)
(unspec:V4SI [
(const_vector:V4SF [
(const_double:SF 0.0 [0x0.0p+0])
(const_double:SF 0.0 [0x0.0p+0])
(const_double:SF 0.0 [0x0.0p+0])
(const_double:SF 0.0 [0x0.0p+0])
])
(reg:V4SF 9594 [ vect_var_.21059 ])
] UNSPEC_CMGE))

But this is illegal. FCMGE has two formats:

FCMGE d, n, m
FCMGE d, n, #0

Both require "n" be a register. "#0" can only be the last argument.
So when inversing LE to GE, function aarch64_vcond_internal should
make sure operands[5] is a register.

Thanks!
-Zhenqiang

>>
>> Thanks!
>> -Zhenqiang
>>
>> ChangeLog:
>> 2013-04-10  Zhenqiang Chen 
>>
>> * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
>> operands[5] to register when inversing LE.
>>
>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 92dcfc0..d08d23a 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -1657,6 +1657,8 @@
>>complimentary_comparison = gen_aarch64_cmgt;
>>break;
>>  case LE:
>> +  if (!REG_P (operands[5]))
>> +   operands[5] = force_reg (mode, operands[5]);
>>  case UNLE:
>>inverse = 1;
>>/* Fall through.  */
>
>
> On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
>  wrote:
>> Hi,
>>
>> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>>
>>   a LE b -> b GE a
>>
>> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>>
>> Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
>> for detail about the issue.
>>
>> The patch is to make "b" a register when inversing LE.
>>
>> Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?
>>
>> Thanks!
>> -Zhenqiang
>>
>> ChangeLog:
>> 2013-04-10  Zhenqiang Chen 
>>
>> * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
>> operands[5] to register when inversing LE.
>>
>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 92dcfc0..d08d23a 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -1657,6 +1657,8 @@
>>complimentary_comparison = gen_aarch64_cmgt;
>>break;
>>  case LE:
>> +  if (!REG_P (operands[5]))
>> +   operands[5] = force_reg (mode, operands[5]);
>>  case UNLE:
>>inverse = 1;
>>/* Fall through.  */

Re: RFA: Fix tree-optimization/55524

2013-04-10 Thread Richard Biener

On Tue, Apr 9, 2013 at 6:24 PM, Joern Rennecke
 wrote:
> Quoting Richard Biener :
>
>> I don't see that.  It's merely a complication of optimal handling of
>> a * b +- c * d vs. just a * b +- c.  The pass does simple pattern matching
>> only, not doing a global optimal transform, so adding another special-case
>> is reasonable.  Special-casing just for single-use 2nd multiplication
>> simplifies the cases for example.
>
>
> I have attached a version of the patch that uses this simpler test.
> Currently bootstrapping / regtesting on i686-pc-linux-gnu .

Ok if the testing succeeds.

Thanks,
Richard.

>
> gcc:
> 2013-04-09  Joern Rennecke 
>
> PR tree-optimization/55524
> * tree-ssa-math-opts.c
> (convert_mult_to_fma): Don't use an fms construct
> when we don't have an fms operation, but fnma, and it looks
> likely that we'll be able to use the latter.
>
> gcc/testsuite:
> 2013-04-09  Joern Rennecke 
>
> PR tree-optimization/55524
> * gcc.target/epiphany/fnma-1.c: New test.
>
> Index: testsuite/gcc.target/epiphany/fnma-1.c
> ===
> --- testsuite/gcc.target/epiphany/fnma-1.c  (revision 0)
> +++ testsuite/gcc.target/epiphany/fnma-1.c  (working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-times "fmsub\[ \ta-zA-Z0-9\]*," 1 } } */
> +
> +float
> +f (float ar, float ai, float br, float bi)
> +{
> +  return ar * br - ai * bi;
> +}
> Index: tree-ssa-math-opts.c
> ===
> --- tree-ssa-math-opts.c(revision 197578)
> +++ tree-ssa-math-opts.c(working copy)
> @@ -2570,6 +2570,24 @@ convert_mult_to_fma (gimple mul_stmt, tr
>   return false;
> }
>
> +  /* If the subtrahend (gimple_assign_rhs2 (use_stmt)) is computed
> +by a MULT_EXPR that we'll visit later, we might be able to
> +get a more profitable match with fnma.
> +OTOH, if we don't, a negate / fma pair has likely lower latency
> +that a mult / subtract pair.  */
> +  if (use_code == MINUS_EXPR && !negate_p
> + && gimple_assign_rhs1 (use_stmt) == result
> + && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing
> + && optab_handler (fnma_optab, TYPE_MODE (type)) !=
> CODE_FOR_nothing)
> +   {
> + tree rhs2 = gimple_assign_rhs2 (use_stmt);
> + gimple stmt2 = SSA_NAME_DEF_STMT (rhs2);
> +
> + if (has_single_use (rhs2)
> + && gimple_assign_rhs_code (stmt2) == MULT_EXPR)
> +   return false;
> +   }
> +
>/* We can't handle a * b + a * b.  */
>if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt))
> return false;
>

[PATCH] Vectorizer TLC, load permutation handling

2013-04-10 Thread Richard Biener


This splits out load permutation computation from SLP tree building.
It also removes the broken support for swapping mismatched operands.
If it ever triggers we'll ICE later because:

case vect_internal_def:
!   if (different_types)
! {
! oprnd0_info = (*oprnds_info)[0];
! oprnd1_info = (*oprnds_info)[0];
!   if (i == 0)
! oprnd1_info->def_stmts.quick_push (def_stmt);
!   else
! oprnd0_info->def_stmts.quick_push (def_stmt);
! }

pushes to the same operand vector twice ...  The cases this
tries to handle should all be canonicalized by reassoc earlier.

I am going to re-instantiate more complete support for handling
commutated operations in the next patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-04-10  Richard Biener  

* tree-vect-slp.c (vect_get_and_check_slp_defs): Remove
broken code swapping operands.
(vect_build_slp_tree): Do not compute load permutations here.
(vect_analyze_slp_instance): Compute load permutations here,
after building the SLP tree.

Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c (revision 197635)
--- gcc/tree-vect-slp.c (working copy)
*** vect_get_and_check_slp_defs (loop_vec_in
*** 204,218 
  {
tree oprnd;
unsigned int i, number_of_oprnds;
!   tree def, def_op0 = NULL_TREE;
gimple def_stmt;
enum vect_def_type dt = vect_uninitialized_def;
-   enum vect_def_type dt_op0 = vect_uninitialized_def;
struct loop *loop = NULL;
-   enum tree_code rhs_code;
-   bool different_types = false;
bool pattern = false;
!   slp_oprnd_info oprnd_info, oprnd0_info, oprnd1_info;
int op_idx = 1;
tree compare_rhs = NULL_TREE;
  
--- 204,215 
  {
tree oprnd;
unsigned int i, number_of_oprnds;
!   tree def;
gimple def_stmt;
enum vect_def_type dt = vect_uninitialized_def;
struct loop *loop = NULL;
bool pattern = false;
!   slp_oprnd_info oprnd_info;
int op_idx = 1;
tree compare_rhs = NULL_TREE;
  
*** vect_get_and_check_slp_defs (loop_vec_in
*** 334,345 
oprnd_info->first_def_type = NULL_TREE;
oprnd_info->first_const_oprnd = oprnd;
  }
- 
- if (i == 0)
-   {
- def_op0 = def;
- dt_op0 = dt;
-   }
}
else
{
--- 331,336 
*** vect_get_and_check_slp_defs (loop_vec_in
*** 357,413 
   TREE_TYPE (def
   || (!def
   && !types_compatible_p (TREE_TYPE 
(oprnd_info->first_const_oprnd),
!  TREE_TYPE (oprnd)))
!  || different_types)
{
! if (number_of_oprnds != 2)
!   {
! if (dump_enabled_p ())
!   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"Build SLP failed: different types ");
! 
! return false;
! }
! 
! /* Try to swap operands in case of binary operation.  */
!   if (i == 0)
! different_types = true;
!   else
!   {
! oprnd0_info = (*oprnds_info)[0];
! if (is_gimple_assign (stmt)
! && (rhs_code = gimple_assign_rhs_code (stmt))
! && TREE_CODE_CLASS (rhs_code) == tcc_binary
! && commutative_tree_code (rhs_code)
! && oprnd0_info->first_dt == dt
! && oprnd_info->first_dt == dt_op0
! && def_op0 && def
! && !(oprnd0_info->first_def_type
!  && !types_compatible_p (oprnd0_info->first_def_type,
!  TREE_TYPE (def)))
!   && !(oprnd_info->first_def_type
!&& !types_compatible_p (oprnd_info->first_def_type,
!TREE_TYPE (def_op0
! {
!   if (dump_enabled_p ())
!   {
! dump_printf_loc (MSG_NOTE, vect_location,
!  "Swapping operands of ");
! dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
!   }
! 
! swap_tree_operands (stmt, gimple_assign_rhs1_ptr (stmt),
! gimple_assign_rhs2_ptr (stmt));
!   }
!   else
! {
! if (dump_enabled_p ())
!   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
!"Build SLP failed: differen

Re: [PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread Andrew Pinski

On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
 wrote:
> Hi,
>
> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>
>   a LE b -> b GE a
>
> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>
> Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
> for detail about the issue.
>
> The patch is to make "b" a register when inversing LE.
>
> Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?


Can you add a testcase also?  It would be best to add one that ia a
reduced testcase.
Also can you expand on what is going on here?  There is not enough
information in either this email or the bug report to figure out if
this is the correct patch.

Thanks,
Andrew Pinski

>
> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2013-04-10  Zhenqiang Chen 
>
> * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
> operands[5] to register when inversing LE.
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index 92dcfc0..d08d23a 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1657,6 +1657,8 @@
>complimentary_comparison = gen_aarch64_cmgt;
>break;
>  case LE:
> +  if (!REG_P (operands[5]))
> +   operands[5] = force_reg (mode, operands[5]);
>  case UNLE:
>inverse = 1;
>/* Fall through.  */


On Wed, Apr 10, 2013 at 1:02 AM, Zhenqiang Chen
 wrote:
> Hi,
>
> During expand, function aarch64_vcond_internal inverses some CMP, e.g.
>
>   a LE b -> b GE a
>
> But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.
>
> Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
> for detail about the issue.
>
> The patch is to make "b" a register when inversing LE.
>
> Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?
>
> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2013-04-10  Zhenqiang Chen 
>
> * config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
> operands[5] to register when inversing LE.
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index 92dcfc0..d08d23a 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1657,6 +1657,8 @@
>complimentary_comparison = gen_aarch64_cmgt;
>break;
>  case LE:
> +  if (!REG_P (operands[5]))
> +   operands[5] = force_reg (mode, operands[5]);
>  case UNLE:
>inverse = 1;
>/* Fall through.  */

[PATCH, AARCH64] Fix unrecognizable insn issue

2013-04-10 Thread Zhenqiang Chen

Hi,

During expand, function aarch64_vcond_internal inverses some CMP, e.g.

  a LE b -> b GE a

But if "b" is "CONST0_RTX", "b GE a" will be an illegal insn.

Refer https://bugs.launchpad.net/linaro-toolchain-binaries/+bug/1163942
for detail about the issue.

The patch is to make "b" a register when inversing LE.

Is it OK for trunk, 4.8 and arm/aarch64-4.7-branch?

Thanks!
-Zhenqiang

ChangeLog:
2013-04-10  Zhenqiang Chen 

* config/aarch64/aarch64-simd.md (aarch64_vcond_internal): Set
operands[5] to register when inversing LE.

diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index 92dcfc0..d08d23a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1657,6 +1657,8 @@
   complimentary_comparison = gen_aarch64_cmgt;
   break;
 case LE:
+  if (!REG_P (operands[5]))
+   operands[5] = force_reg (mode, operands[5]);
 case UNLE:
   inverse = 1;
   /* Fall through.  */

Re: [i386] Replace builtins with vector extensions

2013-04-10 Thread Richard Biener

On Tue, Apr 9, 2013 at 9:15 PM, Marc Glisse  wrote:
> On Tue, 9 Apr 2013, Marc Glisse wrote:
>
>> On Tue, 9 Apr 2013, Richard Biener wrote:
>>
>>> I seem to remember discussion in the PR(s) that the intrinsics should
>>> (and do for other compilers) expand to the desired instructions even when
>>> the corresponding instruction set is disabled.
>>
>>
>> emmintrin.h starts with:
>> #ifndef __SSE2__
>> # error "SSE2 instruction set not enabled"
>
>
> Oh, re-reading your post, it looks like you mean we should change the
> current behavior, not just avoid regressions...
>
> My opinion on the intrinsics is that they are the portable way to use
> vectors on x86, but they are not equivalent to asm (which people should use
> if they don't want the compiler looking at their code). Knowingly generating
> SSE code with -mno-sse is not very appealing.
>
> However, the arguments in:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56298
> make sense. I guess I'll forget about this patch.

Note that to fully support emitting intrinsics "correctly" even without -msse
x86 specific builtins need to be used and they need to conditionally expand
to either UNSPECs (if the required instriuction set / modes are not available)
or regular RTL (where they can be folded to generic GIMPLE earlier
then as well).
A complication is register allocation which would need to understand how to
allocate registers for the UNSPECs - even if some of the modes would not
be "available".  So it's indeed a mess ...

That said, folding of the x86 builtins to GIMPLE looks like a more
viable approach
that would not interfere too much with any possible route we would go here.
As suggested previously please add a new target hook with the same interface
as fold_stmt in case you want to work on this.

Thanks,
Richard.

> --
> Marc Glisse

Re: Comments on the suggestion to use infinite precision math for wide int.

2013-04-10 Thread Richard Biener

On Tue, Apr 9, 2013 at 5:36 PM, Lawrence Crowl  wrote:
>
> On Apr 9, 2013 2:02 AM, "Richard Biener"  wrote:
>>
>> On Mon, Apr 8, 2013 at 10:39 PM, Lawrence Crowl  wrote:
>> > On 4/8/13, Kenneth Zadeck  wrote:
>> >> The other problem, which i invite you to use the full power of
>> >> your c++ sorcery on, is the one where defining an operator so
>> >> that wide-int + unsigned hwi is either rejected or properly
>> >> zero extended.  If you can do this, I will go along with
>> >> your suggestion that the internal rep should be sign extended.
>> >> Saying that constants are always sign extended seems ok, but there
>> >> are a huge number of places where we convert unsigned hwis as
>> >> the second operand and i do not want that to be a trap.  I went
>> >> thru a round of this, where i did not post the patch because i
>> >> could not make this work.  And the number of places where you
>> >> want to use an hwi as the second operand dwarfs the number of
>> >> places where you want to use a small integer constant.
>> >
>> > You can use overloading, as in the following, which actually ignores
>> > handling the sign in the representation.
>> >
>> > class number {
>> > unsigned int rep1;
>> > int representation;
>> > public:
>> > number(int arg) : representation(arg) {}
>> > number(unsigned int arg) : representation(arg) {}
>> > friend number operator+(number, int);
>> > friend number operator+(number, unsigned int);
>> > friend number operator+(int, number);
>> > friend number operator+(unsigned int, number);
>> > };
>> >
>> > number operator+(number n, int si) {
>> > return n.representation + si;
>> > }
>> >
>> > number operator+(number n, unsigned int ui) {
>> > return n.representation + ui;
>> > }
>> >
>> > number operator+(int si, number n) {
>> > return n.representation + si;
>> > }
>> >
>> > number operator+(unsigned int ui, number n) {
>> > return n.representation + ui;
>> > }
>>
>> That does not work for types larger than int/unsigned int as HOST_WIDE_INT
>> usually is (it's long / unsigned long).  When you pass an int or unsigned
>> int
>> to
>>
>> number operator+(unsigned long ui, number n);
>> number operator+(long ui, number n)
>>
>> you get an ambiguity.  You can "fix" that by relying on template argument
>> deduction and specialization instead of on overloading and integer
>> conversion
>> rules.
>
> Ah, I hadn't quite gotten the point. This problem is being fixed in the
> standard, but that won't help GCC anytime soon.
>
>>
>> > If the argument type is of a template type parameter, then
>> > you can test the template type via
>> >
>> > if (std::is_signed::value)
>> >    // sign extend
>> > else
>> >    // zero extend
>> >
>> > See http://www.cplusplus.com/reference/type_traits/is_signed/.
>>
>> Yes, if we want to use the standard library.  For what integer types
>> is std::is_signed required to be implemented in C++98 (long long?)?
>
> It is in C++03/TR1, which is our base requirement. Otherwise, we can test
> ~(T)0<(T)0.

Yeah, I think we want to test ~(T)0<(T)0 here.  Relying on C++03/TR1 is
too obscure if there is an easy workaround.

Richard.

>> Consider non-GCC host compilers.
>>
>> Richard.
>>
>> > If you want to handle non-builtin types that are asigne dor unsigned,
>> > then you need to add a specialization for is_signed.
>> >
>> > --
>> > Lawrence Crowl

Add myself to MAINTAINERS as Write After Approval

2013-04-10 Thread Chung-Ju Wu

Adding myself to the list of members in "Write After Approval" section.


Index: ChangeLog
===
--- ChangeLog   (revision 197662)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2013-04-10  Chung-Ju Wu  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2013-03-30  Matthias Klose  

* Makefile.def (target_modules): Don't install libffi.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 197662)
+++ MAINTAINERS (working copy)
@@ -548,6 +548,7 @@
 Ollie Wild a...@google.com
 Kevin Williams kevin.willi...@inria.fr
 Carlo Wood ca...@alinoe.com
+Chung-Ju Wujasonw...@gmail.com
 Le-Chun Wu l...@google.com
 Mingjie Xing   mingjie.x...@gmail.com
 Canqun Yangcan...@nudt.edu.cn


Best regards,
jasonwucj

Re: [Debug, Fortran] RFC patch for DW_TAG_namelist (PR fortran/37132)

2013-04-10 Thread Jakub Jelinek

On Wed, Apr 10, 2013 at 12:17:31AM +0200, Tobias Burnus wrote:
> +/* Output debug information for namelists.   */
> +
> +void
> +dwarf2out_namelist_decl (const char *name, tree context,
> +  vec *item_decls)
> +{
> +  dw_die_ref scope_die, nml_die, nml_item_die, nml_item_ref_die;
> +  tree item;
> +  int i;
> +
> +  if (debug_info_level <= DINFO_LEVEL_TERSE)
> +return;
> +
> +  if (!(dwarf_version >= 2))
> +return;

Just a nit, GCC only supports DWARF {2,3,4} right now, DWARF1 support used
to be done using a different source file that is long removed, and as
DW_TAG_namelist* is already in DWARF2, there is no point to test whether
dwarf_version >= 2, it always is.

Also, if you are including a new header in debug.h, you need to adjust
dependencies in Makefile.in.  As it didn't have any includes before,
replace all occurrences of debug.h with $(DEBUG_H) and add
DEBUG_H = debug.h $(VEC_H)
somewhere in between lines for other headers.

Jakub

77 matches

Mail list logo