date:20150923

[Bug target/67391] [SH] Convert clrt addc to normal add insn

2015-09-23 Thread olegendo at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67391

--- Comment #7 from Oleg Endo  ---
(In reply to Kazumoto Kojima from comment #6)
> Test completed with no new failures on sh4-unknown-linux-gnu.

Thanks!

[Bug rtl-optimization/66790] Invalid uninitialized register handling in REE

2015-09-23 Thread bernds at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66790

--- Comment #36 from Bernd Schmidt  ---
This looks better. I still don't quite understand why you're treating
MUST_CLOBBER and MAY_CLOBBER defs differently in simulate. It looks like a
MUST_CLOBBER produces a bit in gen which I think is not what is wanted.

Anything wrong with writing this simply as follows?

  if (DF_REF_FLAGS_IS_SET (def, DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER))
{
  bitmap_set_bit (kill, regno);
  bitmap_clear_bit (gen, regno);
}
  /* In the worst case, partial and conditional defs can leave bits
 uninitialized, so assume they do not change anything.  */
  else if (!DF_REF_FLAGS_IS_SET (def, DF_REF_PARTIAL | DF_REF_CONDITIONAL))
{
  bitmap_set_bit (gen, regno);
  bitmap_clear_bit (kill, regno);
}

And, not as a requirement for your patch, but as a point for discussion - do we
want a special all_ones_bitmap that doesn't take up memory for purposes like
this? It would add two additional tests to each bitmap_{and,ior} operation.

Re: [PATCH c-family/49654/49655] reject invalid options in pragma diagnostic

2015-09-23 Thread Bernd Schmidt


On 09/22/2015 08:08 PM, Manuel López-Ibáñez wrote:

Use find_opt instead of linear search through options in
handle_pragma_diagnostic (PR 49654) and reject non-warning options and
options not valid for the current language (PR 49655).



+  /* option_string + 1 to skip the initial '-' */
+  unsigned int lang_mask = c_common_option_lang_mask () | CL_COMMON;
+  unsigned int option_index = find_opt (option_string + 1, lang_mask);


Swap the first two lines to have the comment in the right spot.


+  else if (!(cl_options[option_index].flags & lang_mask))
+{
+  char * ok_langs = write_langs (cl_options[option_index].flags);
+  char * bad_lang = write_langs (c_common_option_lang_mask ());
+  warning_at (loc, OPT_Wpragmas,
+ "option %qs is valid for %s but not for %s",
+ option_string, ok_langs, bad_lang);
+  free (ok_langs);
+  free (bad_lang);
+  return;
+}


Slightly surprising, but I checked and find_opt is documented to return 
an option for a different front end if it can't find a valid one 
matching lang_mask.


Patch is ok.


Bernd

[Bug driver/47785] GCC with -flto does not pass options to the assembler

2015-09-23 Thread dominiq at lps dot ens.fr

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47785

--- Comment #7 from Dominique d'Humieres  ---
Another instance on x86_64-apple-darwin14 with Xcode 7

[Book15] f90/bug% gcc6 /opt/gcc/_clean/gcc/testsuite/gcc.dg/debug/pr41893-1.c
-gstabs1 -Wa,-Q -flto -fwhole-program -O
/opt/gcc/_clean/gcc/testsuite/gcc.dg/debug/pr41893-2.c
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:1:9: error:
unsupported directive '.stabs'
.stabs 
"/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccnRWDSD.ltrans0.o",100,0,0,Ltext0
^
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:4:9: error:
unsupported directive '.stabs'
.stabs  "gcc2_compiled.",60,0,0,0
^
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:8:9: error:
unsupported directive '.stabs'
.stabs 
"/opt/gcc/_clean/gcc/testsuite/gcc.dg/debug/pr41893-1.c",132,0,0,Ltext1
^
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:10:2: error:
unknown directive
.stabd  68,0,16
^
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:13:2: error:
unknown directive
.stabd  68,0,16
^
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:17:9: error:
unsupported directive '.stabs'
.stabs  "main:F(0,1)=r(0,1);-2147483648;2147483647;",36,0,0,_main
^
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:22:2: error:
unknown directive
.stabd  68,0,12
^
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:27:9: error:
unsupported directive '.stabs'
.stabs  "func1:F(0,2)=(0,2)",36,0,0,_func1
^
/var/folders/8q/sh_swgz96r7f5vnn08f7fxr0gn/T//ccJ7keUF.s:72:9: error:
unsupported directive '.stabs'
.stabs  "",100,0,0,Letext0
^
lto-wrapper: fatal error: gcc6 returned 1 exit status
compilation terminated.
collect2: fatal error: lto-wrapper returned 1 exit status
compilation terminated.

Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread Michael Matz

Hi,

On Tue, 22 Sep 2015, David Malcolm wrote:

> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
> table ever get smaller, or does it only ever get inserted into?

It only ever grows.

> An idea I had is that we could stash short ranges directly into the 32 
> bits of location_t, by offsetting the per-column-bits somewhat.

It's certainly worth an experiment: let's say you restrict yourself to 
tokens less than 8 characters, you need an additional 3 bits (using one 
value, e.g. zero, as the escape value).  That leaves 20 bits for the line 
numbers (for the normal 8 bit columns), which might be enough for most 
single-file compilations.  For LTO compilation this often won't be enough.

> My plan is to investigate the impact these patches have on the time and 
> memory consumption of the compiler,

When you do so, make sure you're also measuring an LTO compilation with 
debug info of something big (firefox).  I know that we already had issues 
with the size of the linemap data in the past for these cases (probably 
when we added columns).

Ciao,
Michael.

Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Richard Biener

On Wed, 23 Sep 2015, Kyrill Tkachov wrote:

> 
> On 23/09/15 10:09, Pinski, Andrew wrote:
> > > On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov 
> > > wrote:
> > > 
> > > 
> > > > On 22/09/15 20:31, Jeff Law wrote:
> > > > > On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
> > > > > Hi all,
> > > > > Unfortunately, I see a testsuite regression with this patch:
> > > > > FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"
> > > > > 
> > > > > The reduced part of that test is:
> > > > > void
> > > > > test1 (int x, unsigned u)
> > > > > {
> > > > > if ((1U << x) != 64
> > > > > || (2 << x) != u
> > > > > || (x << x) != 384
> > > > > || (3 << x) == 9
> > > > > || (x << 14) != 98304U
> > > > > || (1 << x) == 14
> > > > > || (3 << 2) != 12)
> > > > >   __builtin_abort ();
> > > > > }
> > > > > 
> > > > > The patched ifcombine pass works more or less as expected and produces
> > > > > fewer basic blocks.
> > > > > Before this patch a relevant part of the ifcombine dump for test1 is:
> > > > > ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> > > > > if (x_1(D) != 6)
> > > > >   goto ;
> > > > > else
> > > > >   goto ;
> > > > > 
> > > > > ;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
> > > > > _2 = 2 << x_1(D);
> > > > > _3 = (unsigned intD.10) _2;
> > > > > if (_3 != u_4(D))
> > > > >   goto ;
> > > > > else
> > > > >   goto ;
> > > > > 
> > > > > 
> > > > > After this patch it is:
> > > > > ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> > > > > _2 = 2 << x_1(D);
> > > > > _3 = (unsigned intD.10) _2;
> > > > > _9 = _3 != u_4(D);
> > > > > _10 = x_1(D) != 6;
> > > > > _11 = _9 | _10;
> > > > > if (_11 != 0)
> > > > >   goto ;
> > > > > else
> > > > >   goto ;
> > > > > 
> > > > > The second form ends up generating worse codegen however, and the
> > > > > badness starts with the dom1 pass.
> > > > > In the unpatched case it manages to deduce that x must be 6 by the
> > > > > time
> > > > > it reaches basic block 3 and
> > > > > uses that information to eliminate the shift in "_2 = 2 << x_1(D)"
> > > > > from
> > > > > basic block 3
> > > > > In the patched case it is unable to make that call, I think because
> > > > > the
> > > > > x != 6 condition is IORed
> > > > > with another test.
> > > > > 
> > > > > I'm not familiar with the internals of the dom pass, so I'm not sure
> > > > > where to go looking for a fix for this.
> > > > > Is the ifcombine change a step in the right direction? If so, what
> > > > > would
> > > > > need to be done to fix the issue with
> > > > > the dom pass?
> > > > I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
> > > > true, then _11 is true.  But we can't reasonably record any kind of
> > > > equivalence for _9 or _10 individually.
> > > > 
> > > > If the statement
> > > > _11 = _9 | _10;
> > > > 
> > > > Were changed to
> > > > 
> > > > _11 = _9 & _10;
> > > > 
> > > > Then we could record something useful about _9 and _10.
> > > > 
> > > > 
> > > > > I suppose what we want is to not combine basic blocks if the sequence
> > > > > and conditions of the basic blocks are
> > > > > such that dom can potentially exploit them, but how do we express
> > > > > that?
> > > > I don't think there's going to be a way to directly express that.  You
> > > > could essentially claim that TRUTH_OR is more expensive than TRUTH_AND
> > > > because of the impact on DOM, but that in and of itself may not resolve
> > > > the situation either.
> > > > 
> > > > I think the question we need to answer is whether or not your changes
> > > > are generally better, even if there's specific instances where they make
> > > > things worse.  If the benefits outweigh the negatives then we can xfail
> > > > that test.
> > > Ok, I'll investigate and benchmark some more.
> > > Andrew, this transformation to ifcombine (together with the restriction
> > > that the inner condition block
> > > has to be a single comparison) was added by you with r204194.
> > > Is there a particular reason for that restriction and why it is applied to
> > > the inner block and not either?
> > My reasoning at the time was there might be an "expensive" instruction or
> > one that might trap (I did not check to see if the other part of the code
> > was detecting that).
> > The outer block did not need any checks as we have something like
> > ...
> > If (a)
> >If (b)
> > 
> > Or
> > 
> > If (a)
> >Goto f
> > else if (b)
> >   
> > Else
> > {
> > F:
> > 
> > }
> > 
> > And there was no need to check what was before the if (a) part just what is
> > in between the two ifs.
> 
> Ah, because the code in outer_cond_bb would have to be executed anyway whether
> we perform the conversion or not, right?

All ifcombine transforms make the outer condition unconditionally 
true/false thus the check should

[Bug rtl-optimization/66790] Invalid uninitialized register handling in REE

2015-09-23 Thread bonzini at gnu dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66790

--- Comment #35 from Paolo Bonzini  ---
Comment on attachment 36377
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36377
Updated candidate patch

> + This problem determines which registers may be uninitialized. It first
> + assumes these are all initialized and then it eliminates the ones 
> reached
> + by paths without crossing a definition.  The IN bitmap is clear at first
> + (i.e. all registers are assumed not to be initialized) so don't consider
> + its value the first time.  */
> +  return bitmap_and_into (op1, op2);

Is this comment obsolete?  The IN bitmap is all set at first.  Otherwise looks
good.

Re: New post-LTO OpenACC pass

2015-09-23 Thread Bernd Schmidt


On 09/22/2015 05:16 PM, Nathan Sidwell wrote:

+   if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+ /* acc_on_device must be evaluated at compile time for
+constant arguments.  */
+ {
+   oacc_xform_on_device (call);
+   rescan = true;
+ }


Is there a reason this is not done as part of pass_fold_builtins? (It 
looks like maybe adding this to fold_call_stmt in builtins.c would be 
sufficient too).



Bernd

Re: [PATCH, i386, AVX-512] Fix iterator for k, introduce kshift[lr][bwdq].

2015-09-23 Thread Kirill Yukhin

Hello,
On 22 Sep 18:14, Kirill Yukhin wrote:
> Hello,
> Patch in the bottom fixes iterator for k insns
> since QI mode is only available for AVX-512DQ.
> 
> It also adds support for kshift[rl][bwdq]. This patterns
> will be used for mask load/store autogeneration on which
> Ilya Enkovich is working on.
> 
> gcc/
>   * config/i386/i386.md (define_code_attr mshift): New.
>   (define_mode_iterator SWI1248_AVX512BW): Rename ...
>   (SWI1248_AVX512BW): ... to this. Make QI enabled for TARGET_AVX512DQ
>   only.
>   (define_insn "*k"): Use new iterator name.
>   (define_insn "*3"): New.
> 
> Bootstrapped and regtest in progress
> 
> Is it ok for trunk (if regtest pass)?
Emit pattern was wrong (caught by Spec2k6 autogeneration).

Comitted to main trunk as obvious.

gcc/
* config/i386/i386.md (define_insn "*3"): Fix
insn emit.

--
Thanks, K

commit 254e3b944ac96441544d36c438e92a9a09b963b1
Author: Kirill Yukhin 
Date:   Wed Sep 23 16:24:50 2015 +0300

AVX-512. Fix emit in '*3' pattern.

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c0911d4..ba5ab32 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9366,7 +9366,7 @@
(any_lshift:SWI1248_AVX512BWDQ (match_operand:SWI1248_AVX512BWDQ 1 
"register_operand" "k")
   (match_operand:QI 2 "immediate_operand" 
"i")))]
   "TARGET_AVX512F"
-  "k %2, %1, %0|%0, %1, %2"
+  "k\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "msklog")
(set_attr "prefix" "vex")])

[Bug target/67391] [SH] Convert clrt addc to normal add insn

2015-09-23 Thread olegendo at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67391

--- Comment #10 from Oleg Endo  ---
The core issue should be fixed.  I'd like to keep this PR open though for a
while.

[Bug objc/67694] New: ICE on returning undefined enum in must_pass_in_stack_var_size_or_pad

2015-09-23 Thread miyuki at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67694

Bug ID: 67694
   Summary: ICE on returning undefined enum in
must_pass_in_stack_var_size_or_pad
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Keywords: ice-on-invalid-code
  Severity: normal
  Priority: P3
 Component: objc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: miyuki at gcc dot gnu.org
  Target Milestone: ---
Target: x86_64-linux-gnu

$ cat test.m 
@interface F
@end
@implementation F
- (enum e) e { return E; };
@end

$ cc1obj test.m 
 -[F e]
test.m: In function '-[F e]':
test.m:4:9: internal compiler error: Segmentation fault
 - (enum e) e { return E; };
 ^
0xaf3ecf crash_signal
/home/miyuki/gcc/src/gcc/toplev.c:353
0x730abf must_pass_in_stack_var_size_or_pad(machine_mode, tree_node const*)
/home/miyuki/gcc/src/gcc/calls.c:5068
0xdeb2af ix86_must_pass_in_stack
/home/miyuki/gcc/src/gcc/config/i386/i386.c:6301
0xdfb530 classify_argument
/home/miyuki/gcc/src/gcc/config/i386/i386.c:6880
0xdfc8e9 examine_argument
/home/miyuki/gcc/src/gcc/config/i386/i386.c:7274
0xe05065 ix86_return_in_memory
/home/miyuki/gcc/src/gcc/config/i386/i386.c:8682
0x8963d5 aggregate_value_p(tree_node const*, tree_node const*)
/home/miyuki/gcc/src/gcc/function.c:2089
0x89cf3e allocate_struct_function(tree_node*, bool)
/home/miyuki/gcc/src/gcc/function.c:4989
0x6155af store_parm_decls()
/home/miyuki/gcc/src/gcc/c/c-decl.c:8866
0x5c3716 objc_start_function(tree_node*, tree_node*, tree_node*, c_arg_info*)
/home/miyuki/gcc/src/gcc/objc/objc-act.c:8630
0x5c7e89 really_start_method
/home/miyuki/gcc/src/gcc/objc/objc-act.c:8683
0x5c946f start_method_def
/home/miyuki/gcc/src/gcc/objc/objc-act.c:8398
0x5c946f objc_start_method_definition(bool, tree_node*, tree_node*, tree_node*)
/home/miyuki/gcc/src/gcc/objc/objc-act.c:2073
0x668ff0 c_parser_objc_method_definition
/home/miyuki/gcc/src/gcc/c/c-parser.c:8645
0x66f929 c_parser_translation_unit
/home/miyuki/gcc/src/gcc/c/c-parser.c:1323
0x66f929 c_parse_file()
/home/miyuki/gcc/src/gcc/c/c-parser.c:15509
0x6cb832 c_common_parse_file()
/home/miyuki/gcc/src/gcc/c-family/c-opts.c:1058
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

I'm not sure, whether the problem is in the Objective C FE or somewhere in
ix86_return_in_memory target hook (and it's callees), but this ICE does not
occur when compiling for i?86:

$ cc1obj -m32 test.m
 -[F e]
test.m: In function '-[F e]':
test.m:4:23: error: 'E' undeclared (first use in this function)
 - (enum e) e { return E; };
   ^
test.m:4:23: note: each undeclared identifier is reported only once for each
function it appears in

[Bug c/49654] Linear search through options in handle_pragma_diagnostic

2015-09-23 Thread manu at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49654

Manuel López-Ibáñez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||manu at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #3 from Manuel López-Ibáñez  ---
Fixed in GCC 6.0

[Bug target/67439] [4.9/5/6 Regression]ICE: unrecognizable insn compiling arm-fp16 testcases with -march=armv7-a and -mrestrict-it

2015-09-23 Thread ktkachov at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67439

--- Comment #8 from ktkachov at gcc dot gnu.org ---
Author: ktkachov
Date: Wed Sep 23 10:36:48 2015
New Revision: 228039

URL: https://gcc.gnu.org/viewcvs?rev=228039=gcc=rev
Log:
[ARM] PR 67439: Allow matching of *arm32_movhf when -mrestrict-it is on

Backport from mainline
2015-09-10  Kyrylo Tkachov  

PR target/67439
* config/arm/arm.md (*arm32_movhf): Remove !arm_restrict_it from
predicate.  Set predicable_short_it attr to "no".

PR target/67439
* gcc.target/arm/pr67439_1.c: New test.


Added:
branches/gcc-5-branch/gcc/testsuite/gcc.target/arm/pr67439_1.c
Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/arm/arm.md
branches/gcc-5-branch/gcc/testsuite/ChangeLog

Re: [gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-09-23 Thread Thomas Schwinge

Hi!

On Fri, 18 Sep 2015 06:51:18 -0700, Cesar Philippidis  
wrote:
> On 09/18/2015 01:39 AM, Thomas Schwinge wrote:
> 
> > On Tue, 1 Sep 2015 18:29:55 +0200, Tom de Vries  
> > wrote:
> >> On 27/08/15 03:37, Cesar Philippidis wrote:
> >>> -  ctx->ganglocal_size_host = align_and_expand (_host, host_size, 
> >>> align);
> >>
> >> I suspect this caused a bootstrap failure (align_and_expand unused). 
> >> Worked-around as attached.

> > If I remember correctly, this has only ever been used in the "ganglocal"
> > implementation -- which is now gone.  So, should align_and_expand also be
> > elided (Cesar)?
> 
> Most likely. I probably overlooked it when I was working on that
> ganglocal removal patch. Can you remove it please? I'm already juggling
> a couple of patches right now.

Together with removal of printing the declarator for sdata, committed to
gomp-4_0-branch in r228038:

commit f5890b47c1b6f09134c4bfadcc7ece0d5403a1d7
Author: tschwinge 
Date:   Wed Sep 23 10:35:31 2015 +

More "ganglocal" cleanup

gcc/
* config/nvptx/nvptx.c (nvptx_file_start): Don't print declaration
of sdata.
* omp-low.c (align_and_expand): Remove function.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228038 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp   |  6 ++
 gcc/config/nvptx/nvptx.c |  1 -
 gcc/omp-low.c| 15 ---
 3 files changed, 6 insertions(+), 16 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 21c6fa0..c66f80a 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2015-09-23  Thomas Schwinge  
+
+   * config/nvptx/nvptx.c (nvptx_file_start): Don't print declaration
+   of sdata.
+   * omp-low.c (align_and_expand): Remove function.
+
 2015-09-22  Cesar Philippidis  
 
* gimplify.c (oacc_default_clause): Inspect pointer types when
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index 5640e34..37b50a3 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -4063,7 +4063,6 @@ nvptx_file_start (void)
   else
 fputs ("\t.target\tsm_30\n", asm_out_file);
   fprintf (asm_out_file, "\t.address_size %d\n", GET_MODE_BITSIZE (Pmode));
-  fprintf (asm_out_file, "\t.extern .shared .u8 sdata[];\n");
   fputs ("// END PREAMBLE\n", asm_out_file);
 }
 
diff --git gcc/omp-low.c gcc/omp-low.c
index ee527d0..ec76096 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1446,21 +1446,6 @@ omp_copy_decl (tree var, copy_body_data *cb)
   return error_mark_node;
 }
 
-/* Modify the old size *POLDSZ to align it up to ALIGN, and then return
-   a value with SIZE added to it.  */
-static tree ATTRIBUTE_UNUSED
-align_and_expand (tree *poldsz, tree size, unsigned int align)
-{
-  tree oldsz = *poldsz;
-  oldsz = fold_build2 (BIT_AND_EXPR, size_type_node,
-  fold_build2 (PLUS_EXPR, size_type_node,
-   oldsz, size_int (align - 1)),
-  fold_build1 (BIT_NOT_EXPR, size_type_node,
-   size_int (align - 1)));
-  *poldsz = oldsz;
-  return fold_build2 (PLUS_EXPR, size_type_node, oldsz, size);
-}
-
 /* Debugging dumps for parallel regions.  */
 void dump_omp_region (FILE *, struct omp_region *, int);
 void debug_omp_region (struct omp_region *);


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Kyrill Tkachov



On 23/09/15 11:10, Richard Biener wrote:

On Wed, 23 Sep 2015, Kyrill Tkachov wrote:


On 23/09/15 10:09, Pinski, Andrew wrote:

On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov 
wrote:



On 22/09/15 20:31, Jeff Law wrote:

On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
Hi all,
Unfortunately, I see a testsuite regression with this patch:
FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"

The reduced part of that test is:
void
test1 (int x, unsigned u)
{
 if ((1U << x) != 64
 || (2 << x) != u
 || (x << x) != 384
 || (3 << x) == 9
 || (x << 14) != 98304U
 || (1 << x) == 14
 || (3 << 2) != 12)
   __builtin_abort ();
}

The patched ifcombine pass works more or less as expected and produces
fewer basic blocks.
Before this patch a relevant part of the ifcombine dump for test1 is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
 if (x_1(D) != 6)
   goto ;
 else
   goto ;

;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
 _2 = 2 << x_1(D);
 _3 = (unsigned intD.10) _2;
 if (_3 != u_4(D))
   goto ;
 else
   goto ;


After this patch it is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
 _2 = 2 << x_1(D);
 _3 = (unsigned intD.10) _2;
 _9 = _3 != u_4(D);
 _10 = x_1(D) != 6;
 _11 = _9 | _10;
 if (_11 != 0)
   goto ;
 else
   goto ;

The second form ends up generating worse codegen however, and the
badness starts with the dom1 pass.
In the unpatched case it manages to deduce that x must be 6 by the
time
it reaches basic block 3 and
uses that information to eliminate the shift in "_2 = 2 << x_1(D)"
from
basic block 3
In the patched case it is unable to make that call, I think because
the
x != 6 condition is IORed
with another test.

I'm not familiar with the internals of the dom pass, so I'm not sure
where to go looking for a fix for this.
Is the ifcombine change a step in the right direction? If so, what
would
need to be done to fix the issue with
the dom pass?

I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
true, then _11 is true.  But we can't reasonably record any kind of
equivalence for _9 or _10 individually.

If the statement
_11 = _9 | _10;

Were changed to

_11 = _9 & _10;

Then we could record something useful about _9 and _10.



I suppose what we want is to not combine basic blocks if the sequence
and conditions of the basic blocks are
such that dom can potentially exploit them, but how do we express
that?

I don't think there's going to be a way to directly express that.  You
could essentially claim that TRUTH_OR is more expensive than TRUTH_AND
because of the impact on DOM, but that in and of itself may not resolve
the situation either.

I think the question we need to answer is whether or not your changes
are generally better, even if there's specific instances where they make
things worse.  If the benefits outweigh the negatives then we can xfail
that test.

Ok, I'll investigate and benchmark some more.
Andrew, this transformation to ifcombine (together with the restriction
that the inner condition block
has to be a single comparison) was added by you with r204194.
Is there a particular reason for that restriction and why it is applied to
the inner block and not either?

My reasoning at the time was there might be an "expensive" instruction or
one that might trap (I did not check to see if the other part of the code
was detecting that).
The outer block did not need any checks as we have something like
...
If (a)
If (b)

Or

If (a)
Goto f
else if (b)
   
Else
{
F:

}

And there was no need to check what was before the if (a) part just what is
in between the two ifs.

Ah, because the code in outer_cond_bb would have to be executed anyway whether
we perform the conversion or not, right?

All ifcombine transforms make the outer condition unconditionally
true/false thus the check should have been on whether the outer
cond BB is "empty".  Which would solve your problem, right?


I'm not sure I follow. Why does cond bb has to be empty?



Note that other transforms (bit test recognition) don't care (sth
we might want to fix?).

In general this needs a better cost function, maybe simply use
estimate_num_insns with speed estimates and compare against a
new --param.


Thanks, that looks like a starting point.
If we were add some kind of costing check here, would we even need
the checks mentioned above? I don't think it will affect correctness
(the inner cond bb is checked for no side-effects before entering this 
function).

Thanks,
Kyrill



Thanks,
Richard.


Thanks,
Kyrill


What I mean by expensive for an example is division or some function call.

Thanks,
Andrew



Thanks,
Kyrill




jeff

[Bug c/48885] missed optimization with restrict qualifier?

2015-09-23 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48885

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-09-23
 Ever confirmed|0   |1

--- Comment #11 from Richard Biener  ---
It's just a matter of implementing the missing ???

/* Mark "other" loads and stores as belonging to CLIQUE and with
   base zero.  */

static bool
visit_loadstore (gimple *, tree base, tree ref, void *clique_)
{
  unsigned short clique = (uintptr_t)clique_;
  if (TREE_CODE (base) == MEM_REF
  || TREE_CODE (base) == TARGET_MEM_REF)
{
  tree ptr = TREE_OPERAND (base, 0);
  if (TREE_CODE (ptr) == SSA_NAME)
{
  /* ???  We need to make sure 'ptr' doesn't include any of
 the restrict tags in its points-to set.  */
  return false;

(well, we could handle default-defs without adding that).  Thus for the
particular testcase:

@@ -6952,7 +7047,8 @@ visit_loadstore (gimple *, tree base, tr
   || TREE_CODE (base) == TARGET_MEM_REF)
 {
   tree ptr = TREE_OPERAND (base, 0);
-  if (TREE_CODE (ptr) == SSA_NAME)
+  if (TREE_CODE (ptr) == SSA_NAME
+ && ! SSA_NAME_IS_DEFAULT_DEF (ptr))
{
  /* ???  We need to make sure 'ptr' doesn't include any of
 the restrict tags in its points-to set.  */

but we can of course do better.  Remember all 'restrict_var' we added
bases for and above lookup the points-to solution for 'ptr' and intersect
it with the restrict_var set.  If that's empty - fine - if not, we have
to continue failing.

I'm testing the above simple fix and amend the comment.

[SH][committed] Fix PR 67391

2015-09-23 Thread Oleg Endo

Hi,

The attached patch fixes PR 67391.  Some additional reg overlapping were
added to the addsi3 patterns while making LRA on SH work, but not all of
them seem to be good.  Removing them, seems to be working just fine.
Tested on sh-elf (LRA enabled) with make -k check
RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
and by Kaz on sh4-linux.

Committed to trunk as r228046 and to the GCC 5 branch as r228047.

Cheers,
Oleg

gcc/ChangeLog:
PR target/67391
* config/sh/sh.md (addsi3, *addsi3_compact): Don't check for overlapping
regs when matching the pattern.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 228020)
+++ gcc/config/sh/sh.md	(working copy)
@@ -2129,11 +2129,6 @@
 {
   if (TARGET_SHMEDIA)
 operands[1] = force_reg (SImode, operands[1]);
-  else if (! arith_operand (operands[2], SImode))
-{
-  if (reg_overlap_mentioned_p (operands[0], operands[1]))
-	FAIL;
-}
 })
 
 (define_insn "addsi3_media"
@@ -2172,10 +2167,7 @@
   [(set (match_operand:SI 0 "arith_reg_dest" "=r,")
 	(plus:SI (match_operand:SI 1 "arith_operand" "%0,r")
 		 (match_operand:SI 2 "arith_or_int_operand" "rI08,rn")))]
-  "TARGET_SH1
-   && ((rtx_equal_p (operands[0], operands[1])
-&& arith_operand (operands[2], SImode))
-   || ! reg_overlap_mentioned_p (operands[0], operands[1]))"
+  "TARGET_SH1"
   "@
 	add	%2,%0
 	#"

Re: New post-LTO OpenACC pass

2015-09-23 Thread Nathan Sidwell


On 09/23/15 06:59, Bernd Schmidt wrote:

On 09/22/2015 05:16 PM, Nathan Sidwell wrote:

+if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+  /* acc_on_device must be evaluated at compile time for
+ constant arguments.  */
+  {
+oacc_xform_on_device (call);
+rescan = true;
+  }


Is there a reason this is not done as part of pass_fold_builtins? (It looks like
maybe adding this to fold_call_stmt in builtins.c would be sufficient too).


Perhaps it could be.  I'll need to check where  that pass happens.  Anyway, the 
main thrust of this patch is the new pass, which I thought might be easier to 
review with minimal additional  clutter.


nathan

[Bug rtl-optimization/66790] Invalid uninitialized register handling in REE

2015-09-23 Thread derodat at adacore dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66790

--- Comment #32 from Pierre-Marie de Rodat  ---
(In reply to Bernd Schmidt from comment #28)
> It is sufficient for OUT(3) to be all-zeros. And I don't think the
> LAST_CHANGE_AGE mechanism does anything to prevent it. Please try it
> out. I think you have to initialize your bitmaps correctly rather than
> rely on "visited".
Thank you very much for the reproducer: I confirm that the
LAST_CHANGE_AGE mechanism does not have the effect I thought. I just
updated my patch locally to initialize bitmaps to 1 for all registers
and thus remove the visited field: this fixes the issue you found.

> > (In reply to Bernd Schmidt from comment #14)
> > > I do have to say that I am still uncomfortable with changing RRE to
> 
> I did not write this.
Indeed: Kenneth Zadeck said this in comment #20. I made a pasto, sorry
for that!



(In reply to Paolo Bonzini from comment #29)
> BTW, are you sure that
> 
> +  if (DF_REF_FLAGS_IS_SET (def,
> +  DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> +   /* All partial or conditional def
> +  seen are included in the gen set. */
> +   bitmap_set_bit (gen, regno);
> 
> ?  I would have thought they don't belong in any set, and on the
> contrary I would have thought that may-clobber definitions count as
> kills.
For partial and conditional defs in the context of MIR:

  * if the register was initialized, then it is still initialized
afterwards, whatever happens;

  * if the register was uninitialized, then in the case of partial def
there will still be bits uninitialized and in the case of
conditional def it is possible that the instruction leaves the
register uninitialized: in both case there is a possibility to leave
part of the register uninitialized.

So I would agree with: they don't belong in any set.

While thinking about this, I also realized with REE in mind that since
we need a conservative computation for MIR, we must set KILL/clear GEN
for register refs with DF_REF_MAY_CLOBBER: it may leave the register
uninitialized.

(In reply to Paolo Bonzini from comment #31)
> Ah, I see now.  I think you're right that the DF_REF_MUST_CLOBBER case
> should also clear GEN in df_live_bb_local_compute.
> 
> However, regarding the "BTW" I am fairly sure now that
> df_live_bb_local_compute and the corresponding function for MIR should
> handle may-clobber and may-sets differently.  If you think of
> may-clobber and may-set as a diamond-shaped CFG:
> 
> […]
> 
> Then at the join point you have an "OR" for LIVE (so the clobber's
> KILL disappears and the set's GEN remains), and an "AND" for MIR.  For
> MIR the clobber's KILL remains and the set's GEN disappears.
Agreed, thank you! I have updated both MIR and LIVE in the light of
this. (bootstrapped and regtested on x86_64-linux)

Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target

2015-09-23 Thread Ilya Enkovich

On 14 Sep 17:50, Uros Bizjak wrote:
> 
> +(define_insn_and_split "*zext_doubleword"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI (match_operand:SWI24 1 "nonimmediate_operand" "rm")))]
> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
> +  "#"
> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
> +   (set (match_dup 2) (const_int 0))]
> +  "split_double_mode (DImode, [0], 1, [0], [2]);")
> +
> +(define_insn_and_split "*zextqi_doubleword"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
> +  "#"
> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
> +   (set (match_dup 2) (const_int 0))]
> +  "split_double_mode (DImode, [0], 1, [0], [2]);")
> +
> 
> Please put the above patterns together with other zero_extend
> patterns. You can also merge these two patterns using SWI124 mode
> iterator with  mode attribute as a register constraint. Also, no
> need to check for GENERAL_REG_P after reload, when "r" constraint is
> in effect:
> 
> (define_insn_and_split "*zext_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r")
>  (zero_extend:DI (match_operand:SWI124 1 "nonimmediate_operand" "m")))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   "&& reload_completed"
>   [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>(set (match_dup 2) (const_int 0))]
>   "split_double_mode (DImode, [0], 1, [0], [2]);")

Register constraint doesn't affect split and I need GENERAL_REG_P to filter 
other registers case.

I merged QI and HI cases of zext but made a separate pattern for SI case 
because it doesn't need zero_extend in resulting code.  Bootstrapped and 
regtested for x86_64-unknown-linux-gnu.

Thanks,
Ilya
--
gcc/

2015-09-23  Ilya Enkovich  

* config/i386/i386.c: Include dbgcnt.h.
(has_non_address_hard_reg): New.
(convertible_comparison_p): New.
(scalar_to_vector_candidate_p): New.
(remove_non_convertible_regs): New.
(scalar_chain): New.
(scalar_chain::scalar_chain): New.
(scalar_chain::~scalar_chain): New.
(scalar_chain::add_to_queue): New.
(scalar_chain::mark_dual_mode_def): New.
(scalar_chain::analyze_register_chain): New.
(scalar_chain::add_insn): New.
(scalar_chain::build): New.
(scalar_chain::compute_convert_gain): New.
(scalar_chain::replace_with_subreg): New.
(scalar_chain::replace_with_subreg_in_insn): New.
(scalar_chain::emit_conversion_insns): New.
(scalar_chain::make_vector_copies): New.
(scalar_chain::convert_reg): New.
(scalar_chain::convert_op): New.
(scalar_chain::convert_insn): New.
(scalar_chain::convert): New.
(convert_scalars_to_vector): New.
(pass_data_stv): New.
(pass_stv): New.
(make_pass_stv): New.
(ix86_option_override): Created and register stv pass.
(flag_opts): Add -mstv.
(ix86_option_override_internal): Likewise.
* config/i386/i386.md (SWIM1248x): New.
(*movdi_internal): Add xmm to mem alternative for TARGET_STV.
(and3): Use SWIM1248x iterator instead of SWIM.
(*anddi3_doubleword): New.
(*zext_doubleword): New.
(*zextsi_doubleword): New.
(3): Use SWIM1248x iterator instead of SWIM.
(*di3_doubleword): New.
* config/i386/i386.opt (mstv): New.
* dbgcnt.def (stv_conversion): New.

gcc/testsuite/

2015-09-23  Ilya Enkovich  

* gcc.target/i386/pr65105-1.c: New.
* gcc.target/i386/pr65105-2.c: New.
* gcc.target/i386/pr65105-3.c: New.
* gcc.target/i386/pr65105-4.C: New.
* gcc.dg/lower-subreg-1.c: Add -mno-stv options for ia32.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d547cfd..2663f85 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-iterator.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
+#include "dbgcnt.h"

 /* This file should be included last.  */
 #include "target-def.h"
@@ -2600,6 +2601,908 @@ rest_of_handle_insert_vzeroupper (void)
   return 0;
 }

+/* Return 1 if INSN uses or defines a hard register.
+   Hard register uses in a memory address are ignored.
+   Clobbers and flags definitions are ignored.  */
+
+static bool
+has_non_address_hard_reg (rtx_insn *insn)
+{
+  df_ref ref;
+  FOR_EACH_INSN_DEF (ref, insn)
+if (HARD_REGISTER_P (DF_REF_REAL_REG (ref))
+   && !DF_REF_FLAGS_IS_SET (ref, DF_REF_MUST_CLOBBER)
+   && DF_REF_REGNO (ref) != FLAGS_REG)
+  return true;
+
+  FOR_EACH_INSN_USE (ref, insn)
+if (!DF_REF_REG_MEM_P (ref) &&

Re: [Patch/ccmp] Cost instruction sequences to choose better expand order

2015-09-23 Thread Bernd Schmidt


No. Please see NOTE part of the description. AArch64 doesn't cost ccmp
currently. It will be fixed by a seperate patch later. The testcase is
thus marked as XFAIL.


I'd prefer to do things in the right order. Your patch is approved, but 
please commit only after you can remove the xfail from the testcase.



Bernd

Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.

2015-09-23 Thread Jonathan Wakely


On 17/09/15 09:37 -0600, Martin Sebor wrote:

On 09/17/2015 05:16 AM, Jonathan Wakely wrote:

On 16/09/15 17:42 -0600, Martin Sebor wrote:

I see now the first exists test will detect symlink loops in
the original path. But I'm not convinced there isn't a corner
case that's subject to a TOCTOU race condition between the first
exists test and the while loop during which a symlink loop can
be introduced.

Suppose we call the function with /foo/bar as an argument and
the path exists and contains no symlinks. result is / and cmpts
is set to { foo, bar }. Just as the loop is entered, /foo/bar
is replaced with a symlink containing /foo/bar. The loop then
proceeds like so:

1. The first iteration removes foo from cmpts and sets result
to /foo. cmpts is { bar }.

2. The second iteration removes bar from cmpts, sets result to
/foo/bar, determines it's a symlink, reads its contents, sees
it's an absolute pathname and replaces result with /. It then
inserts the symlink's components { foo, bar } into cmpts. cmpts
becomes { foo, bar }. exists(result) succeeds.

3. The next iteration of the loop has the same initial state
as the first.

But I could have very easily missed something that takes care
of this corner case. If I did, sorry for the false alarm!


No, you're right. The TS says such filesystem races are undefined:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4099.html#fs.race.behavior

but it would be nice to fail gracefully rather than DOS the
application.

The simplest approach would be to increment a counter every time we
follow a symlink, and if it reaches some limit decide something is
wrong and fail with ELOOP.

I don't see how anything else can be 100% bulletproof, because a truly
evil attacker could just keep altering the contents of symlinks so we
keep ping-ponging between two or more paths. If we keep track of paths
we've seen before the attacker could just keep changing the contents
to a unique path each time, that initially exists as a file, but by
the time we get to is_symlink() its become a symlink to a new path.

So if we use a counter, what's a sane maximum? Is MAXSYMLINKS in
 the value the kernel uses? 20 seems quite low, I was
thinking of a much higher number.


Yes, it is a corner case, and it's not really avoidable in the case
of hard links. For symlinks, POSIX defines the SYMLOOP_MAX constant
as the maximum, with the _SC_SYMLOOP_MAX and _PC_SYMLOOP_MAX
sysconf and pathconf variables. Otherwise 40 seems reasonable.

With this, I'll let you get back to work -- I think we've beat this
function to death ;)


Here's what I committed. Similar to the last patch, but using the new
is_dot and is_dotdot helpers.


commit 8128173a00c234ccf34e258115747fa0e3b4457a
Author: Jonathan Wakely 
Date:   Wed Sep 23 02:00:57 2015 +0100

Limit number of symlinks that canonical() will resolve

* src/filesystem/ops.cc (canonical): Simplify error handling and
limit number of symlinks that can be resolved.

diff --git a/libstdc++-v3/src/filesystem/ops.cc 
b/libstdc++-v3/src/filesystem/ops.cc
index 5ff8120..7b261fb 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -116,6 +116,7 @@ fs::canonical(const path& p, const path& base, error_code& 
ec)
 {
   const path pa = absolute(p, base);
   path result;
+
 #ifdef _GLIBCXX_USE_REALPATH
   char_ptr buf{ nullptr };
 # if _XOPEN_VERSION < 700
@@ -137,18 +138,9 @@ fs::canonical(const path& p, const path& base, error_code& 
ec)
 }
 #endif
 
-  auto fail = [, ](int e) mutable {
-  if (!ec.value())
-   ec.assign(e, std::generic_category());
-  result.clear();
-  };
-
   if (!exists(pa, ec))
-{
-  fail(ENOENT);
-  return result;
-}
-  // else we can assume no unresolvable symlink loops
+return result;
+  // else: we know there are (currently) no unresolvable symlink loops
 
   result = pa.root_path();
 
@@ -156,20 +148,19 @@ fs::canonical(const path& p, const path& base, 
error_code& ec)
   for (auto& f : pa.relative_path())
 cmpts.push_back(f);
 
-  while (!cmpts.empty())
+  int max_allowed_symlinks = 40;
+
+  while (!cmpts.empty() && !ec)
 {
   path f = std::move(cmpts.front());
   cmpts.pop_front();
 
-  if (f.compare(".") == 0)
+  if (is_dot(f))
{
- if (!is_directory(result, ec))
-   {
- fail(ENOTDIR);
- break;
-   }
+ if (!is_directory(result, ec) && !ec)
+   ec.assign(ENOTDIR, std::generic_category());
}
-  else if (f.compare("..") == 0)
+  else if (is_dotdot(f))
{
  auto parent = result.parent_path();
  if (parent.empty())
@@ -184,27 +175,30 @@ fs::canonical(const path& p, const path& base, 
error_code& ec)
  if (is_symlink(result, ec))
{
  path link = read_symlink(result, ec);
- if (!ec.value())
+ if (!ec)
{
- if

[Bug c/48885] missed optimization with restrict qualifier?

2015-09-23 Thread vries at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48885

--- Comment #12 from vries at gcc dot gnu.org ---
(In reply to Richard Biener from comment #11)
> I'm testing the above simple fix and amend the comment.

Consider the example with functions f and g I gave in comment 10. Using the
patch from comment 11, I get at ealias:
...
void f(int* __restrict__&, int*) (intD.9 * restrict & restrict pD.2252, intD.9
* p2D.2253)
{
  intD.9 * _3;

  # VUSE <.MEM_1(D)>
  # PT = { D.2265 } (nonlocal)
  _3 = MEM[(intD.9 * restrict &)p_2(D) clique 1 base 1];

  # .MEM_4 = VDEF <.MEM_1(D)>
  MEM[(intD.9 *)_3 clique 1 base 2] = 1;

  # .MEM_6 = VDEF <.MEM_4>
  MEM[(intD.9 *)p2_5(D) clique 1 base 0] = 2;
...

AFAIU, this is incorrect. The two stores can be now disambiguated based on same
clique/different base, but in fact the stores can alias (in fact they do, in
the  "f (gp, gp)" call from g).

[gomp4] vector reductions

2015-09-23 Thread Nathan Sidwell

I've committed this reimplementation of the vector shuffling code.  In preparing 
a fix for the worker reductions (to use a lockless scheme), I wanted to check 
VIEW_CONVERT_EXPR DTRT.  Use of gimplify_assign also reduces the code size.


nathan
2015-09-23  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_generate_vector_shuffle):
	Reimplement using integer builtins and VIEW_CONVERT_EXPR.
	(nvptx_goacc_reduction_fini): Pass location to
	nvptx_generate_vector_shuffle.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 228021)
+++ config/nvptx/nvptx.c	(working copy)
@@ -4478,68 +4478,43 @@ nvptx_get_worker_red_addr_fn (tree var,
will cast the variable if necessary.  */
 
 static void
-nvptx_generate_vector_shuffle (tree dest_var, tree var, int shfl,
+nvptx_generate_vector_shuffle (location_t loc,
+			   tree dest_var, tree var, unsigned shift,
 			   gimple_seq *seq)
 {
-  tree vartype = TREE_TYPE (var);
-  enum nvptx_builtins fn = NVPTX_BUILTIN_SHUFFLE_DOWN;
-  machine_mode mode = TYPE_MODE (vartype);
-  tree casted_dest = dest_var;
-  tree casted_var = var;
-  tree call_arg_type;
+  unsigned fn = NVPTX_BUILTIN_SHUFFLE_DOWN;
+  tree_code code = NOP_EXPR;
+  tree type = unsigned_type_node;
 
-  switch (mode)
+  switch (TYPE_MODE (TREE_TYPE (var)))
 {
+case SFmode:
+  code = VIEW_CONVERT_EXPR;
+  /* FALLTHROUGH */
 case QImode:
 case HImode:
 case SImode:
-  fn = NVPTX_BUILTIN_SHUFFLE_DOWN;
-  call_arg_type = unsigned_type_node;
   break;
+
+case DFmode:
+  code = VIEW_CONVERT_EXPR;
+  /* FALLTHROUGH  */
 case DImode:
+  type = long_long_unsigned_type_node;
   fn = NVPTX_BUILTIN_SHUFFLE_DOWNLL;
-  call_arg_type = long_long_unsigned_type_node;
-  break;
-case DFmode:
-  fn = NVPTX_BUILTIN_SHUFFLE_DOWND;
-  call_arg_type = double_type_node;
-  break;
-case SFmode:
-  fn = NVPTX_BUILTIN_SHUFFLE_DOWNF;
-  call_arg_type = float_type_node;
   break;
+
 default:
   gcc_unreachable ();
 }
 
-  /* All of the integral types need to be unsigned.  Furthermore, small
- integral types may need to be extended to 32-bits.  */
-  bool need_conversion = !types_compatible_p (vartype, call_arg_type);
+  tree call = build_call_expr_loc (loc, nvptx_builtin_decl (fn, true),
+   2, build1 (code, type, var),
+   build_int_cst (unsigned_type_node, shift));
 
-  if (need_conversion)
-{
-  casted_var = make_ssa_name (call_arg_type);
-  tree t1 = fold_build1 (NOP_EXPR, call_arg_type, var);
-  gassign *conv1 = gimple_build_assign (casted_var, t1);
-  gimple_seq_add_stmt (seq, conv1);
-}
-
-  tree fndecl = nvptx_builtin_decl (fn, true);
-  tree shift =  build_int_cst (unsigned_type_node, shfl);
-  gimple call = gimple_build_call (fndecl, 2, casted_var, shift);
-
-  gimple_seq_add_stmt (seq, call);
-
-  if (need_conversion)
-{
-  casted_dest = make_ssa_name (call_arg_type);
-  tree t2 = fold_build1 (NOP_EXPR, vartype, casted_dest);
-  gassign *conv2 = gimple_build_assign (dest_var, t2);
-  gimple_seq_add_stmt (seq, conv2);
-}
+  call = fold_build1 (code, TREE_TYPE (dest_var), call);
 
-  update_stmt (call);
-  gimple_call_set_lhs (call, casted_dest);
+  gimplify_assign (dest_var, call, seq);
 }
 
 /* NVPTX implementation of GOACC_REDUCTION_SETUP.  Reserve shared
@@ -4770,11 +4745,12 @@ nvptx_goacc_reduction_fini (gimple call)
   for (int shfl = PTX_VECTOR_LENGTH / 2; shfl > 0; shfl = shfl >> 1)
 	{
 	  tree other_var = make_ssa_name (TREE_TYPE (var));
-	  nvptx_generate_vector_shuffle (other_var, var, shfl, );
+	  nvptx_generate_vector_shuffle (gimple_location (call),
+	 other_var, var, shfl, );
 
 	  r = make_ssa_name (TREE_TYPE (var));
 	  gimplify_assign (r, fold_build2 (op, TREE_TYPE (var),
-	 var, other_var), );
+	   var, other_var), );
 	  var = r;
 	}
 }

[PATCH] Fix testcase from PR48885

2015-09-23 Thread Richard Biener


I am currently testing the following patch enabling us to optimize

void
test (int *a, int *b, int * restrict v)
{
*a = *v;
*b = *v;
}

there is a simple case we can handle without implementing ??? from
visit_loadstore.

Richard.

2015-09-23  Richard Biener  

PR tree-optimization/48885
* tree-ssa-structalias.c (visit_loadstore): Handle default defs
as not including any restrict tags from other pointers.

* gcc.dg/tree-ssa/restrict-6.c: New testcase.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 228037)
--- gcc/tree-ssa-structalias.c  (working copy)
*** visit_loadstore (gimple *, tree base, tr
*** 6952,6961 
|| TREE_CODE (base) == TARGET_MEM_REF)
  {
tree ptr = TREE_OPERAND (base, 0);
!   if (TREE_CODE (ptr) == SSA_NAME)
{
  /* ???  We need to make sure 'ptr' doesn't include any of
!the restrict tags in its points-to set.  */
  return false;
}
  
--- 7047,7057 
|| TREE_CODE (base) == TARGET_MEM_REF)
  {
tree ptr = TREE_OPERAND (base, 0);
!   if (TREE_CODE (ptr) == SSA_NAME
! && ! SSA_NAME_IS_DEFAULT_DEF (ptr))
{
  /* ???  We need to make sure 'ptr' doesn't include any of
!the restrict tags we added bases for in its points-to set.  */
  return false;
}
  
Index: gcc/testsuite/gcc.dg/tree-ssa/restrict-6.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/restrict-6.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/restrict-6.c  (working copy)
***
*** 0 
--- 1,11 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-fre1" } */
+ 
+ void
+ test (int *a, int *b, int * __restrict__ v)
+ {
+   *a = *v;
+   *b = *v;
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "= \\*v" 1 "fre1" } } */

Re: New post-LTO OpenACC pass

2015-09-23 Thread Bernd Schmidt


On 09/23/2015 02:14 PM, Nathan Sidwell wrote:

On 09/23/15 06:59, Bernd Schmidt wrote:

On 09/22/2015 05:16 PM, Nathan Sidwell wrote:

+if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+  /* acc_on_device must be evaluated at compile time for
+ constant arguments.  */
+  {
+oacc_xform_on_device (call);
+rescan = true;
+  }


Is there a reason this is not done as part of pass_fold_builtins? (It
looks like
maybe adding this to fold_call_stmt in builtins.c would be sufficient
too).


Perhaps it could be.  I'll need to check where  that pass happens.
Anyway, the main thrust of this patch is the new pass, which I thought
might be easier to review with minimal additional  clutter.


There's no issue adding a new pass if there's a demonstrated need for 
it, but I think builtin folding doesn't quite meet that criterion given 
that we already have a pass that does that. Unless you really need it to 
happen very early in the pipeline - fold_builtins runs pretty late, but 
I checked and fold_call_stmt gets called from pass_forwprop and possibly 
from elsewhere too.



Bernd

[Bug c/49655] diagnostic pragma accepts non-warning options

2015-09-23 Thread manu at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49655

--- Comment #2 from Manuel López-Ibáñez  ---
Author: manu
Date: Wed Sep 23 13:07:07 2015
New Revision: 228049

URL: https://gcc.gnu.org/viewcvs?rev=228049=gcc=rev
Log:
[c-family/49654/49655] reject invalid options in pragma diagnostic

Use find_opt instead of linear search through options in
handle_pragma_diagnostic (PR 49654) and reject non-warning options and
options not valid for the current language (PR 49655).

gcc/testsuite/ChangeLog:

2015-09-23  Manuel López-Ibáñez  

PR c/49655
* gcc.dg/pragma-diag-6.c: New test.

gcc/ChangeLog:

2015-09-23  Manuel López-Ibáñez  

PR c/49655
* opts.h (write_langs): Declare.
* opts-global.c (write_langs): Make it extern.

gcc/c-family/ChangeLog:

2015-09-23  Manuel López-Ibáñez  

PR c/49654
PR c/49655
* c-pragma.c (handle_pragma_diagnostic): Detect non-warning
options and options not valid for the current language.


Added:
trunk/gcc/testsuite/gcc.dg/pragma-diag-6.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/c-pragma.c
trunk/gcc/opts-global.c
trunk/gcc/opts.h
trunk/gcc/testsuite/ChangeLog

[Bug c/49654] Linear search through options in handle_pragma_diagnostic

2015-09-23 Thread manu at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49654

--- Comment #2 from Manuel López-Ibáñez  ---
Author: manu
Date: Wed Sep 23 13:07:07 2015
New Revision: 228049

URL: https://gcc.gnu.org/viewcvs?rev=228049=gcc=rev
Log:
[c-family/49654/49655] reject invalid options in pragma diagnostic

Use find_opt instead of linear search through options in
handle_pragma_diagnostic (PR 49654) and reject non-warning options and
options not valid for the current language (PR 49655).

gcc/testsuite/ChangeLog:

2015-09-23  Manuel López-Ibáñez  

PR c/49655
* gcc.dg/pragma-diag-6.c: New test.

gcc/ChangeLog:

2015-09-23  Manuel López-Ibáñez  

PR c/49655
* opts.h (write_langs): Declare.
* opts-global.c (write_langs): Make it extern.

gcc/c-family/ChangeLog:

2015-09-23  Manuel López-Ibáñez  

PR c/49654
PR c/49655
* c-pragma.c (handle_pragma_diagnostic): Detect non-warning
options and options not valid for the current language.


Added:
trunk/gcc/testsuite/gcc.dg/pragma-diag-6.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/c-pragma.c
trunk/gcc/opts-global.c
trunk/gcc/opts.h
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/67681] Missed vectorization: induction variable used after loop

2015-09-23 Thread alalaw01 at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #2 from alalaw01 at gcc dot gnu.org ---
Being stupid here, but why does the outer loop having multiple exits matter -
it's the inner loop that should be vectorized?

FOO was a macro used to selectively make the test i>max disappear (enabling
vectorization) - the two commandlines had -DFOO=0 (vectorizes) and -DFOO=1
(doesn't).

Re: [ubsan PATCH] Fix uninitialized var issue (PR sanitizer/64906)

2015-09-23 Thread Bernd Schmidt


On 09/22/2015 05:11 PM, Marek Polacek wrote:


diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
index e0cce84..d2bc264 100644
--- gcc/c-family/c-ubsan.c
+++ gcc/c-family/c-ubsan.c
@@ -104,6 +104,7 @@ ubsan_instrument_division (location_t loc, tree op0, tree 
op1)
}
  }
t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op0), t);
+  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op1), t);
if (flag_sanitize_undefined_trap_on_error)
  tt = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 0);
else


I really don't know this code, but just before the location you're 
patching, there's this:


  /* In case we have a SAVE_EXPR in a conditional context, we need to
 make sure it gets evaluated before the condition.  If the OP0 is
 an instrumented array reference, mark it as having side effects so
 it's not folded away.  */
  if (flag_sanitize & SANITIZE_BOUNDS)
{
  tree xop0 = op0;
  while (CONVERT_EXPR_P (xop0))
xop0 = TREE_OPERAND (xop0, 0);
  if (TREE_CODE (xop0) == ARRAY_REF)
{
  TREE_SIDE_EFFECTS (xop0) = 1;
  TREE_SIDE_EFFECTS (op0) = 1;
}
}

Does that need to be done for op1 as well? (I really wonder why this is 
needed or whether it's sufficient to find such an ARRAY_REF if you can 
have more complex operands).


The same pattern occurs in another function, so it may be best to break 
it out into a new function if additional occurrences are necessary.



Bernd

[Bug c/48885] missed optimization with restrict qualifier?

2015-09-23 Thread rguenther at suse dot de

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48885

--- Comment #13 from rguenther at suse dot de  ---
On Wed, 23 Sep 2015, vries at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48885
> 
> --- Comment #12 from vries at gcc dot gnu.org ---
> (In reply to Richard Biener from comment #11)
> > I'm testing the above simple fix and amend the comment.
> 
> Consider the example with functions f and g I gave in comment 10. Using the
> patch from comment 11, I get at ealias:
> ...
> void f(int* __restrict__&, int*) (intD.9 * restrict & restrict pD.2252, intD.9
> * p2D.2253)
> {
>   intD.9 * _3;
> 
>   # VUSE <.MEM_1(D)>
>   # PT = { D.2265 } (nonlocal)
>   _3 = MEM[(intD.9 * restrict &)p_2(D) clique 1 base 1];
> 
>   # .MEM_4 = VDEF <.MEM_1(D)>
>   MEM[(intD.9 *)_3 clique 1 base 2] = 1;
> 
>   # .MEM_6 = VDEF <.MEM_4>
>   MEM[(intD.9 *)p2_5(D) clique 1 base 0] = 2;
> ...
> 
> AFAIU, this is incorrect. The two stores can be now disambiguated based on 
> same
> clique/different base, but in fact the stores can alias (in fact they do, in
> the  "f (gp, gp)" call from g).

How is this a valid testcase?  You are accessing g()s *gp through
p and p2 even though p is marked as restrict.  Did you mean to write

void
f (int *&__restrict__ p, int *p2)

?

Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Nathan Sidwell


On 09/23/15 05:27, Thomas Schwinge wrote:

Hi Nathan!

On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:

I've committed this patch to add a new pair of internal functions.  These will
be used in implementing reductions.

They'll be emitted around reduction finalization, and implement the locking
required for the general case of combining reduction values.  They may be
transformed in the oacc_xform pass, and the default behaviour is to delete them,
if there is no RTL expander.  For PTX we delete them if they are at the vector
level.

This avoids needing machine-specific builtins to expand to, and thus should
result in less backend code duplication.


With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
removed, should the gcc.target/nvptx/spinlock-1.c and
gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
these be re-written differently?


confused.  I don't think I remoced those locks.  Certainly didn't intend to, and 
I would have expected massive test fails if I had.


nathan

--
Nathan Sidwell

Re: [gomp4] Another oacc reduction simplification

2015-09-23 Thread Nathan Sidwell


On 09/23/15 04:02, Thomas Schwinge wrote:

Hi!

On Tue, 22 Sep 2015 11:29:37 -0400, Nathan Sidwell  wrote:

I've committed this patch, which simplifies the generation of openacc reduction
code.


Aside from the progression mentioned in
,
this change is also causing a regression:

 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
35)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
58)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
62)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
81)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
85)
 [-PASS:-]{+FAIL:+} c-c++-common/goacc/routine-7.c  (test for errors, line 
89)
 [-PASS:-]{+FAIL: c-c++-common/goacc/routine-7.c (internal compiler error)+}
 {+FAIL:+} c-c++-common/goacc/routine-7.c (test for excess errors)


Odd.   I didn't see any new fails.  Will look

nathan
--
Nathan Sidwell

[Bug rtl-optimization/66790] Invalid uninitialized register handling in REE

2015-09-23 Thread bonzini at gnu dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66790

--- Comment #37 from Paolo Bonzini  ---
Bernd is right that you have a missing 'else'.

[Bug middle-end/67662] -fsanitize=undefined cries wolf for X - 1 + X when X is 2**30

2015-09-23 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67662

--- Comment #3 from Richard Biener  ---
Yeah, r122414 fixed the PR30364 issue incompletely, leaving a special-case that
still mishandles this case.  Testing a patch.

[Bug c++/67693] New: Spurious warning: control reaches end of non-void function [-Wreturn-type]

2015-09-23 Thread larsch at belunktum dot dk

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67693

Bug ID: 67693
   Summary: Spurious warning: control reaches end of non-void
function [-Wreturn-type]
   Product: gcc
   Version: 5.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: larsch at belunktum dot dk
  Target Milestone: ---

gcc 4.5.3 through to 5.2.0 all warn about reaching end of non-void function on
this code:

  struct foo { ~foo(); };
  int f(int n, int m) {
foo x; // Remove this to get rid of warning
switch (n) {
case 0:
  if (n == 4)
return 1;
  else
return 2;
  break; // Or remove this
default:
  return 0;
}
  }

Output:

  source.cpp: In function 'int f(int, int)':
  14 : warning: control reaches end of non-void function [-Wreturn-type]

Command line: 

  gcc -Wall -O0 source.cpp

Platforms: x86/arm/arm64.

Other info:

Removing the variable 'x' gets rid of the warning. So does removing the
(unnecessary, but valid) 'break' statement. Adding optimization (-O1 or more)
eliminates the warning.

[v3 patch] Fix Filesystem TS directory iterators

2015-09-23 Thread Jonathan Wakely


directory_iterator and recursive_directory_iterator fail to meet this
requirement in http://wg21.link/n4099#Class-directory_iterator

 The directory_iterator default constructor shall create an iterator
 equal to the end iterator value, and this shall be the only valid
 iterator for the end condition.

The current code creates the end iterator when an error occurs during
construction and an error_code parameter was used (so an exception
is not thrown, but construction finishes normally and sets the
error_code).

This fixes it by creating a distinct error state that is not the end
iterator state:

 // An error occurred, we need a non-empty shared_ptr so that *this will
 // not compare equal to the end iterator.
 _M_dir.reset(static_cast(nullptr));

This way the shared_ptr owns a null pointer, so (bool)_M_dir is false
(and we don't allow incrementing or dereferencing) but it can be
distinguished from an empty shared_ptr by comparing them using
shared_ptr::owner_before.

(The order of the owner_before checks is chosen so that the common
case of testing iter != directory_iterator() should short-circuit and
only check the first condition).

There were a few other problems with directory iterators, including
the fact that the get_file_type function never worked because autoconf
was defining _GLIBCXX_GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE instead of
the macro I was checking, _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE.

I've removed the ErrorCode utility that was meant to simplify
clearing/setting an error_code that may or may not be present, but
really just obsfuscated things.

I'm also now consistently checking the skip_permission_denied flag
everywhere it matters.

Tested x86_64-linux, powerpc64le-linux, x86_64-dragonfly4.1, committed
to trunk.


commit 8d08e1c6724cb433e1ca4f975ce85bd277ba2389
Author: Jonathan Wakely 
Date:   Wed Sep 23 00:28:19 2015 +0100

Fix semantics of Filesystem TS directory iterators

[class.directory_iterator] p4 and [directory_iterator.members] p4
require that only the default constructor and ignored permission denied
errors can create the end iterator.

* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Remove _GLIBCXX_
prefix from HAVE_STRUCT_DIRENT_D_TYPE.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/experimental/fs_dir.h (operator==, operator==):
Use owner_before instead of pointer equality.
(directory_iterator(std::shared_ptr<_Dir>, error_code*)): Remove.
* src/filesystem/dir.cc (ErrorCode): Remove.
(_Dir::advance): Change ErrorCode parameter to error_code*, add
directory_options parameter and check it on error.
(opendir): Rename to open_dir to avoid clashing with macro. Change
ErrorCode parameter to error_code*.
(make_shared_dir): Remove.
(native_readdir) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Don't set errno.
(directory_iterator(std::shared_ptr<_Dir>, error_code*)): Remove.
(directory_iterator(const path&, directory_options, error_code*)):
Pass options to _Dir::advance and create non-end iterator on error.
(recursive_directory_iterator(const path&, directory_options,
error_code*)): Clear error_code on ignored error, create non-end
iterator otherwise.
(recursive_directory_iterator::increment): Pass _M_options to
_Dir::advance.
(recursive_directory_iterator::pop): Likewise.
* testsuite/experimental/filesystem/iterators/directory_iterator.cc:
New.
* testsuite/experimental/filesystem/iterators/
recursive_directory_iterator.cc: New.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index c133c25..4b031f7 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3940,7 +3940,7 @@ dnl
   [glibcxx_cv_dirent_d_type=no])
   ])
   if test $glibcxx_cv_dirent_d_type = yes; then
-AC_DEFINE(_GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE, 1, [Define to 1 if `d_type' 
is a member of `struct dirent'.])
+AC_DEFINE(HAVE_STRUCT_DIRENT_D_TYPE, 1, [Define to 1 if `d_type' is a 
member of `struct dirent'.])
   fi
   AC_MSG_RESULT($glibcxx_cv_dirent_d_type)
 dnl
diff --git a/libstdc++-v3/include/experimental/fs_dir.h 
b/libstdc++-v3/include/experimental/fs_dir.h
index d46d41b..0c5253f 100644
--- a/libstdc++-v3/include/experimental/fs_dir.h
+++ b/libstdc++-v3/include/experimental/fs_dir.h
@@ -201,14 +201,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   return __tmp;
 }
 
-friend bool
-operator==(const directory_iterator& __lhs,
-   const directory_iterator& __rhs)
-{ return __lhs._M_dir == __rhs._M_dir; }
-
   private:
 directory_iterator(const path&, directory_options, error_code*);
-directory_iterator(std::shared_ptr<_Dir>, error_code*);
+
+friend bool
+operator==(const directory_iterator& __lhs,
+   const directory_iterator& __rhs);
 
 friend class

[PATCH] Preserve restrict dependence info in FRE/PRE

2015-09-23 Thread Richard Biener


I noticed we don't handle secondary effects of restrict in FRE when
looking at another testcase from PR48885:

int
f (int *__restrict__ &__restrict__ p, int *p2)
{
  *p = 1;
  *p2 = 2;
  return *p;
}

with the previously posted patch to improve the handling for p2
we should be able to optimize the return stmt to return 1
in FRE1.  Without the following patch we remove the redundant
load of 'p' but not the load from *p.  This is because the SCCVN
IL didn't record dependence info and did not reconstruct it for
the alias walks or final PRE insert.

The following fixes that - bootstrap and regtest running on
x86_64-unknown-linux-gnu.

Richard.

2015-09-23  Richard Biener  

* tree-ssa-sccvn.h (vn_reference_op_struct): Add clique and base
members.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Record clique
and base for MEM_REF and TARGET_MEM_REF.  Handle BIT_FIELD_REF
offset.
(ao_ref_init_from_vn_reference): Record clique and base in the
built base.
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise

* g++.dg/tree-ssa/restrict3.C: New testcase.

Index: gcc/tree-ssa-sccvn.h
===
*** gcc/tree-ssa-sccvn.h(revision 228037)
--- gcc/tree-ssa-sccvn.h(working copy)
*** typedef struct vn_reference_op_struct
*** 83,88 
--- 83,91 
ENUM_BITFIELD(tree_code) opcode : 16;
/* 1 for instrumented calls.  */
unsigned with_bounds : 1;
+   /* Dependence info, used for [TARGET_]MEM_REF only.  */
+   unsigned short clique;
+   unsigned short base;
/* Constant offset this op adds or -1 if it is variable.  */
HOST_WIDE_INT off;
tree type;
Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 228037)
--- gcc/tree-ssa-sccvn.c(working copy)
*** copy_reference_ops_from_ref (tree ref, v
*** 773,778 
--- 783,790 
temp.op1 = TMR_STEP (ref);
temp.op2 = TMR_OFFSET (ref);
temp.off = -1;
+   temp.clique = MR_DEPENDENCE_CLIQUE (ref);
+   temp.base = MR_DEPENDENCE_BASE (ref);
result->quick_push (temp);
  
memset (, 0, sizeof (temp));
*** copy_reference_ops_from_ref (tree ref, v
*** 816,826 
--- 828,846 
  temp.op0 = TREE_OPERAND (ref, 1);
  if (tree_fits_shwi_p (TREE_OPERAND (ref, 1)))
temp.off = tree_to_shwi (TREE_OPERAND (ref, 1));
+ temp.clique = MR_DEPENDENCE_CLIQUE (ref);
+ temp.base = MR_DEPENDENCE_BASE (ref);
  break;
case BIT_FIELD_REF:
  /* Record bits and position.  */
  temp.op0 = TREE_OPERAND (ref, 1);
  temp.op1 = TREE_OPERAND (ref, 2);
+ if (tree_fits_shwi_p (TREE_OPERAND (ref, 2)))
+   {
+ HOST_WIDE_INT off = tree_to_shwi (TREE_OPERAND (ref, 2));
+ if (off % BITS_PER_UNIT == 0)
+   temp.off = off / 8;
+   }
  break;
case COMPONENT_REF:
  /* The field decl is enough to unambiguously specify the field,
*** ao_ref_init_from_vn_reference (ao_ref *r
*** 1017,1022 
--- 1037,1044 
  base_alias_set = get_deref_alias_set (op->op0);
  *op0_p = build2 (MEM_REF, op->type,
   NULL_TREE, op->op0);
+ MR_DEPENDENCE_CLIQUE (*op0_p) = op->clique;
+ MR_DEPENDENCE_BASE (*op0_p) = op->base;
  op0_p = _OPERAND (*op0_p, 0);
  break;
  
Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 228037)
--- gcc/tree-ssa-pre.c  (working copy)
*** create_component_ref_by_pieces_1 (basic_
*** 2531,2537 
 off));
baseop = build_fold_addr_expr (base);
  }
!   return fold_build2 (MEM_REF, currop->type, baseop, offset);
}
  
  case TARGET_MEM_REF:
--- 2531,2540 
 off));
baseop = build_fold_addr_expr (base);
  }
!   genop = build2 (MEM_REF, currop->type, baseop, offset);
!   MR_DEPENDENCE_CLIQUE (genop) = currop->clique;
!   MR_DEPENDENCE_BASE (genop) = currop->base;
!   return genop;
}
  
  case TARGET_MEM_REF:
*** create_component_ref_by_pieces_1 (basic_
*** 2554,2561 
if (!genop1)
  return NULL_TREE;
  }
!   return build5 (TARGET_MEM_REF, currop->type,
!  baseop, currop->op2, genop0, currop->op1, genop1);
}
  
  case ADDR_EXPR:
--- 2557,2568 
if (!genop1)
  return NULL_TREE;
  }
!   genop = build5 (TARGET_MEM_REF, currop->type,
!   baseop, currop->op2, genop0, currop->op1, genop1);
! 
!

Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target

2015-09-23 Thread Uros Bizjak

On Wed, Sep 23, 2015 at 12:19 PM, Ilya Enkovich  wrote:
> On 14 Sep 17:50, Uros Bizjak wrote:
>>
>> +(define_insn_and_split "*zext_doubleword"
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> + (zero_extend:DI (match_operand:SWI24 1 "nonimmediate_operand" "rm")))]
>> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>> +  "#"
>> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
>> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>> +   (set (match_dup 2) (const_int 0))]
>> +  "split_double_mode (DImode, [0], 1, [0], 
>> [2]);")
>> +
>> +(define_insn_and_split "*zextqi_doubleword"
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> + (zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
>> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>> +  "#"
>> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
>> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>> +   (set (match_dup 2) (const_int 0))]
>> +  "split_double_mode (DImode, [0], 1, [0], 
>> [2]);")
>> +
>>
>> Please put the above patterns together with other zero_extend
>> patterns. You can also merge these two patterns using SWI124 mode
>> iterator with  mode attribute as a register constraint. Also, no
>> need to check for GENERAL_REG_P after reload, when "r" constraint is
>> in effect:
>>
>> (define_insn_and_split "*zext_doubleword"
>>   [(set (match_operand:DI 0 "register_operand" "=r")
>>  (zero_extend:DI (match_operand:SWI124 1 "nonimmediate_operand" "m")))]
>>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>>   "#"
>>   "&& reload_completed"
>>   [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>>(set (match_dup 2) (const_int 0))]
>>   "split_double_mode (DImode, [0], 1, [0], [2]);")
>
> Register constraint doesn't affect split and I need GENERAL_REG_P to filter 
> other registers case.

OK.

> I merged QI and HI cases of zext but made a separate pattern for SI case 
> because it doesn't need zero_extend in resulting code.  Bootstrapped and 
> regtested for x86_64-unknown-linux-gnu.

This change is OK.

The patch LGTM, but please wait a couple of days if Jeff has some
comment on algorithmic aspect of the patch.

Thanks,
Uros.

>
> Thanks,
> Ilya
> --
> gcc/
>
> 2015-09-23  Ilya Enkovich  
>
> * config/i386/i386.c: Include dbgcnt.h.
> (has_non_address_hard_reg): New.
> (convertible_comparison_p): New.
> (scalar_to_vector_candidate_p): New.
> (remove_non_convertible_regs): New.
> (scalar_chain): New.
> (scalar_chain::scalar_chain): New.
> (scalar_chain::~scalar_chain): New.
> (scalar_chain::add_to_queue): New.
> (scalar_chain::mark_dual_mode_def): New.
> (scalar_chain::analyze_register_chain): New.
> (scalar_chain::add_insn): New.
> (scalar_chain::build): New.
> (scalar_chain::compute_convert_gain): New.
> (scalar_chain::replace_with_subreg): New.
> (scalar_chain::replace_with_subreg_in_insn): New.
> (scalar_chain::emit_conversion_insns): New.
> (scalar_chain::make_vector_copies): New.
> (scalar_chain::convert_reg): New.
> (scalar_chain::convert_op): New.
> (scalar_chain::convert_insn): New.
> (scalar_chain::convert): New.
> (convert_scalars_to_vector): New.
> (pass_data_stv): New.
> (pass_stv): New.
> (make_pass_stv): New.
> (ix86_option_override): Created and register stv pass.
> (flag_opts): Add -mstv.
> (ix86_option_override_internal): Likewise.
> * config/i386/i386.md (SWIM1248x): New.
> (*movdi_internal): Add xmm to mem alternative for TARGET_STV.
> (and3): Use SWIM1248x iterator instead of SWIM.
> (*anddi3_doubleword): New.
> (*zext_doubleword): New.
> (*zextsi_doubleword): New.
> (3): Use SWIM1248x iterator instead of SWIM.
> (*di3_doubleword): New.
> * config/i386/i386.opt (mstv): New.
> * dbgcnt.def (stv_conversion): New.
>
> gcc/testsuite/
>
> 2015-09-23  Ilya Enkovich  
>
> * gcc.target/i386/pr65105-1.c: New.
> * gcc.target/i386/pr65105-2.c: New.
> * gcc.target/i386/pr65105-3.c: New.
> * gcc.target/i386/pr65105-4.C: New.
> * gcc.dg/lower-subreg-1.c: Add -mno-stv options for ia32.
>
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d547cfd..2663f85 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-iterator.h"
>  #include "tree-chkp.h"
>  #include "rtl-chkp.h"
> +#include "dbgcnt.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2600,6 +2601,908 @@ rest_of_handle_insert_vzeroupper (void)
>return 0;
>  }
>
> +/* Return 1 if INSN uses or defines a hard register.
> +   Hard register uses in a memory

Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-23 Thread Richard Biener

On Wed, 23 Sep 2015, Kyrill Tkachov wrote:

> 
> On 23/09/15 11:10, Richard Biener wrote:
> > On Wed, 23 Sep 2015, Kyrill Tkachov wrote:
> > 
> > > On 23/09/15 10:09, Pinski, Andrew wrote:
> > > > > On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov 
> > > > > wrote:
> > > > > 
> > > > > 
> > > > > > On 22/09/15 20:31, Jeff Law wrote:
> > > > > > > On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
> > > > > > > Hi all,
> > > > > > > Unfortunately, I see a testsuite regression with this patch:
> > > > > > > FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"
> > > > > > > 
> > > > > > > The reduced part of that test is:
> > > > > > > void
> > > > > > > test1 (int x, unsigned u)
> > > > > > > {
> > > > > > >  if ((1U << x) != 64
> > > > > > >  || (2 << x) != u
> > > > > > >  || (x << x) != 384
> > > > > > >  || (3 << x) == 9
> > > > > > >  || (x << 14) != 98304U
> > > > > > >  || (1 << x) == 14
> > > > > > >  || (3 << 2) != 12)
> > > > > > >__builtin_abort ();
> > > > > > > }
> > > > > > > 
> > > > > > > The patched ifcombine pass works more or less as expected and
> > > > > > > produces
> > > > > > > fewer basic blocks.
> > > > > > > Before this patch a relevant part of the ifcombine dump for test1
> > > > > > > is:
> > > > > > > ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> > > > > > >  if (x_1(D) != 6)
> > > > > > >goto ;
> > > > > > >  else
> > > > > > >goto ;
> > > > > > > 
> > > > > > > ;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
> > > > > > >  _2 = 2 << x_1(D);
> > > > > > >  _3 = (unsigned intD.10) _2;
> > > > > > >  if (_3 != u_4(D))
> > > > > > >goto ;
> > > > > > >  else
> > > > > > >goto ;
> > > > > > > 
> > > > > > > 
> > > > > > > After this patch it is:
> > > > > > > ;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
> > > > > > >  _2 = 2 << x_1(D);
> > > > > > >  _3 = (unsigned intD.10) _2;
> > > > > > >  _9 = _3 != u_4(D);
> > > > > > >  _10 = x_1(D) != 6;
> > > > > > >  _11 = _9 | _10;
> > > > > > >  if (_11 != 0)
> > > > > > >goto ;
> > > > > > >  else
> > > > > > >goto ;
> > > > > > > 
> > > > > > > The second form ends up generating worse codegen however, and the
> > > > > > > badness starts with the dom1 pass.
> > > > > > > In the unpatched case it manages to deduce that x must be 6 by the
> > > > > > > time
> > > > > > > it reaches basic block 3 and
> > > > > > > uses that information to eliminate the shift in "_2 = 2 << x_1(D)"
> > > > > > > from
> > > > > > > basic block 3
> > > > > > > In the patched case it is unable to make that call, I think
> > > > > > > because
> > > > > > > the
> > > > > > > x != 6 condition is IORed
> > > > > > > with another test.
> > > > > > > 
> > > > > > > I'm not familiar with the internals of the dom pass, so I'm not
> > > > > > > sure
> > > > > > > where to go looking for a fix for this.
> > > > > > > Is the ifcombine change a step in the right direction? If so, what
> > > > > > > would
> > > > > > > need to be done to fix the issue with
> > > > > > > the dom pass?
> > > > > > I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
> > > > > > true, then _11 is true.  But we can't reasonably record any kind of
> > > > > > equivalence for _9 or _10 individually.
> > > > > > 
> > > > > > If the statement
> > > > > > _11 = _9 | _10;
> > > > > > 
> > > > > > Were changed to
> > > > > > 
> > > > > > _11 = _9 & _10;
> > > > > > 
> > > > > > Then we could record something useful about _9 and _10.
> > > > > > 
> > > > > > 
> > > > > > > I suppose what we want is to not combine basic blocks if the
> > > > > > > sequence
> > > > > > > and conditions of the basic blocks are
> > > > > > > such that dom can potentially exploit them, but how do we express
> > > > > > > that?
> > > > > > I don't think there's going to be a way to directly express that.
> > > > > > You
> > > > > > could essentially claim that TRUTH_OR is more expensive than
> > > > > > TRUTH_AND
> > > > > > because of the impact on DOM, but that in and of itself may not
> > > > > > resolve
> > > > > > the situation either.
> > > > > > 
> > > > > > I think the question we need to answer is whether or not your
> > > > > > changes
> > > > > > are generally better, even if there's specific instances where they
> > > > > > make
> > > > > > things worse.  If the benefits outweigh the negatives then we can
> > > > > > xfail
> > > > > > that test.
> > > > > Ok, I'll investigate and benchmark some more.
> > > > > Andrew, this transformation to ifcombine (together with the
> > > > > restriction
> > > > > that the inner condition block
> > > > > has to be a single comparison) was added by you with r204194.
> > > > > Is there a particular reason for that restriction and why it is
> > > > > applied to
> > > > > the inner block and not either?
> > > > My reasoning at the

[v3 patch] Fix filesystem::create_directories() function

2015-09-23 Thread Jonathan Wakely


This function wasn't working properly (testing is useful!)

Tested x86_64-linux, powerpc64le-linux and x86_64-dragonfly4.1,
committed to trunk.

commit 9f9ee62dc3e3d5a1cc825298b93afedc2eaf0aeb
Author: Jonathan Wakely 
Date:   Tue Sep 22 23:43:59 2015 +0100

Fix filesystem::create_directories() function

* src/filesystem/ops.cc (is_dot, is_dotdot): Define new helpers.
(create_directories): Fix error handling.
* testsuite/experimental/filesystem/operations/create_directories.cc:
New.

diff --git a/libstdc++-v3/src/filesystem/ops.cc 
b/libstdc++-v3/src/filesystem/ops.cc
index b5c8eb9..5ff8120 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -85,6 +85,24 @@ fs::absolute(const path& p, const path& base)
 
 namespace
 {
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  inline bool is_dot(wchar_t c) { return c == L'.'; }
+#else
+  inline bool is_dot(char c) { return c == '.'; }
+#endif
+
+  inline bool is_dot(const fs::path& path)
+  {
+const auto& filename = path.native();
+return filename.size() == 1 && is_dot(filename[0]);
+  }
+
+  inline bool is_dotdot(const fs::path& path)
+  {
+const auto& filename = path.native();
+return filename.size() == 2 && is_dot(filename[0]) && is_dot(filename[1]);
+  }
+
   struct free_as_in_malloc
   {
 void operator()(void* p) const { ::free(p); }
@@ -576,19 +594,36 @@ fs::create_directories(const path& p)
 bool
 fs::create_directories(const path& p, error_code& ec) noexcept
 {
+  if (p.empty())
+{
+  ec = std::make_error_code(errc::invalid_argument);
+  return false;
+}
   std::stack missing;
   path pp = p;
-  ec.clear();
-  while (!p.empty() && !exists(pp, ec) && !ec.value())
+
+  while (!pp.empty() && status(pp, ec).type() == file_type::not_found)
 {
-  missing.push(pp);
-  pp = pp.parent_path();
+  ec.clear();
+  const auto& filename = pp.filename();
+  if (!is_dot(filename) && !is_dotdot(filename))
+   missing.push(pp);
+  pp.remove_filename();
 }
-  while (!missing.empty() && !ec.value())
+
+  if (ec || missing.empty())
+return false;
+
+  do
 {
-  create_directory(missing.top(), ec);
+  const path& top = missing.top();
+  create_directory(top, ec);
+  if (ec && is_directory(top))
+   ec.clear();
   missing.pop();
 }
+  while (!missing.empty() && !ec);
+
   return missing.empty();
 }
 
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
new file mode 100644
index 000..b84d966
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
@@ -0,0 +1,75 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11 -lstdc++fs" }
+// { dg-require-filesystem-ts "" }
+
+#include 
+#include 
+#include 
+
+namespace fs = std::experimental::filesystem;
+
+void
+test01()
+{
+  bool test __attribute__((unused)) = false;
+  std::error_code ec;
+
+  // Test empty path.
+  bool b = fs::create_directories( "", ec );
+  VERIFY( ec );
+  VERIFY( !b );
+
+  // Test existing path.
+  b = fs::create_directories( fs::current_path(), ec );
+  VERIFY( !ec );
+  VERIFY( !b );
+
+  // Test non-existent path.
+  const auto p = __gnu_test::nonexistent_path();
+  b = fs::create_directories( p, ec );
+  VERIFY( !ec );
+  VERIFY( b );
+  VERIFY( is_directory(p) );
+
+  b = fs::create_directories( p/".", ec );
+  VERIFY( !ec );
+  VERIFY( !b );
+
+  b = fs::create_directories( p/"..", ec );
+  VERIFY( !ec );
+  VERIFY( !b );
+
+  b = fs::create_directories( p/"d1/d2/d3", ec );
+  VERIFY( !ec );
+  VERIFY( b );
+  VERIFY( is_directory(p/"d1/d2/d3") );
+
+  b = fs::create_directories( p/"./d4/../d5", ec );
+  VERIFY( !ec );
+  VERIFY( b );
+  VERIFY( is_directory(p/"./d4/../d5") );
+
+  remove_all(p, ec);
+}
+
+int
+main()
+{
+  test01();
+}

[Bug target/61578] [4.9 regression] Code size increase for ARM thumb compared to 4.8.x when compiling with -Os

2015-09-23 Thread vogt at linux dot vnet.ibm.com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

--- Comment #29 from Dominik Vogt  ---
I think I understand what's going on:

Consider the patched code in match_reloads():

+   = (ins[1] < 0 && REG_P (in_rtx)
+  && (int) REGNO (in_rtx) < lra_new_regno_start
+  && find_regno_note (curr_insn, REG_DEAD, REGNO (in_rtx))
+  ? lra_create_new_reg (inmode, in_rtx, goal_class, "")
+  : lra_create_new_reg_with_unique_value (outmode, out_rtx,
+  goal_class, ""));

(1) This code normally makes a unique copy of the register in in_rtx, but if
the register is marked as REG_DEAD in the curr_insn, it just makes a copy of
the register using lra_create_new_reg(), with the same .val and .offset in the
reg_info structure.

(2) Further down in match_reloads, new insns are generated and stored in
*before and *after.  However, the new "after" insn still references the old
register.  In other words, in step (1) the code has made the assumption that
the old register is no longer used, then generates an insn that uses it after
it was marked as REG_DEAD.

(3) Based on the bogus decision in (1), the condition in lra-lives.c decides
that the two registers are identical copies and can be mapped to the same hard
register:

+ && (((src_regno >= FIRST_PSEUDO_REGISTER
+   && (! sparseset_bit_p (pseudos_live, src_regno)
+   || (dst_regno >= FIRST_PSEUDO_REGISTER
+   && lra_reg_val_equal_p (src_regno,
+   lra_reg_info[dst_regno].val,
+  
lra_reg_info[dst_regno].offset))

[Bug c/49655] diagnostic pragma accepts non-warning options

2015-09-23 Thread manu at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49655

Manuel López-Ibáñez  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||manu at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #3 from Manuel López-Ibáñez  ---
Fixed in GCC 6.0

[Bug rtl-optimization/66790] Invalid uninitialized register handling in REE

2015-09-23 Thread derodat at adacore dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66790

Pierre-Marie de Rodat  changed:

   What|Removed |Added

  Attachment #36098|0   |1
is obsolete||

--- Comment #33 from Pierre-Marie de Rodat  ---
Created attachment 36377
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36377=edit
Updated candidate patch

[Bug rtl-optimization/66790] Invalid uninitialized register handling in REE

2015-09-23 Thread derodat at adacore dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66790

--- Comment #34 from Pierre-Marie de Rodat  ---
Created attachment 36378
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36378=edit
Fix for DF_LIVE local BB information

[PATCH] Improve restrict handling further

2015-09-23 Thread Richard Biener


The following fixes

int
f5 (S *__restrict x, S *__restrict y)
{
  x->p[0] = 5;
  y->p[0] = 0;
// { dg-final { scan-tree-dump-times "return 5" 1 "optimized" { xfail 
*-*-* } } }
  return x->p[0];
}

which requires building representatives for restrict qualified pointers
(as opposed to references or decl-by-references).  The fear here was
that as we can access that representative with out-of-bound objects
(we eventually point to an array) we'd miscompute points-to sets.
I verified we do the obvious thing here, namely glob those accesses
to the first/last subfield of the representative (that code was added
to compensate for pointer arithmetic going ouf-of-bounds).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-09-23  Richard Biener  

* tree-ssa-structalias.c (intra_create_variable_infos): Build
representatives for all restrict qualified pointer destinations.

* g++.dg/tree-ssa/restrict2.C: Un-XFAIL testcase.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 228014)
--- gcc/tree-ssa-structalias.c  (working copy)
*** intra_create_variable_infos (struct func
*** 5854,5865 
  {
varinfo_t p = get_vi_for_tree (t);
  
!   /* For restrict qualified pointers to objects passed by
!  reference build a real representative for the pointed-to object.
!Treat restrict qualified references the same.  */
!   if (TYPE_RESTRICT (TREE_TYPE (t))
! && ((DECL_BY_REFERENCE (t) && POINTER_TYPE_P (TREE_TYPE (t)))
! || TREE_CODE (TREE_TYPE (t)) == REFERENCE_TYPE)
  && !type_contains_placeholder_p (TREE_TYPE (TREE_TYPE (t
{
  struct constraint_expr lhsc, rhsc;
--- 5854,5865 
  {
varinfo_t p = get_vi_for_tree (t);
  
!   /* For restrict qualified pointers build a representative for
!the pointed-to object.  Note that this ends up handling
!out-of-bound references conservatively by aggregating them
!in the first/last subfield of the object.  */
!   if (POINTER_TYPE_P (TREE_TYPE (t))
! && TYPE_RESTRICT (TREE_TYPE (t))
  && !type_contains_placeholder_p (TREE_TYPE (TREE_TYPE (t
{
  struct constraint_expr lhsc, rhsc;
Index: gcc/testsuite/g++.dg/tree-ssa/restrict2.C
===
*** gcc/testsuite/g++.dg/tree-ssa/restrict2.C   (revision 228014)
--- gcc/testsuite/g++.dg/tree-ssa/restrict2.C   (working copy)
*** f5 (S *__restrict x, S *__restrict y)
*** 45,52 
  {
x->p[0] = 5;
y->p[0] = 0;
! // We might handle this some day
! // { dg-final { scan-tree-dump-times "return 5" 1 "optimized" { xfail *-*-* } 
} }
return x->p[0];
  }
  
--- 45,51 
  {
x->p[0] = 5;
y->p[0] = 0;
! // { dg-final { scan-tree-dump-times "return 5" 1 "optimized" } }
return x->p[0];
  }

Re: [PATCH c-family/49654/49655] reject invalid options in pragma diagnostic

2015-09-23 Thread Marek Polacek

On Tue, Sep 22, 2015 at 08:08:28PM +0200, Manuel López-Ibáñez wrote:
> +  else if (!(cl_options[option_index].flags & lang_mask))
> +{
> +  char * ok_langs = write_langs (cl_options[option_index].flags);
> +  char * bad_lang = write_langs (c_common_option_lang_mask ());

Please remove the spaces after * when you commit the patch.

Thanks,

Marek

[Bug target/67391] [SH] Convert clrt addc to normal add insn

2015-09-23 Thread olegendo at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67391

--- Comment #9 from Oleg Endo  ---
Author: olegendo
Date: Wed Sep 23 11:57:27 2015
New Revision: 228047

URL: https://gcc.gnu.org/viewcvs?rev=228047=gcc=rev
Log:
gcc/
Backport from mainline
2015-09-23  Oleg Endo  

PR target/67391
* config/sh/sh.md (addsi3, *addsi3_compact): Don't check for
overlapping
regs when matching the pattern.

Modified:
branches/gcc-5-branch/gcc/ChangeLog
branches/gcc-5-branch/gcc/config/sh/sh.md

[Bug target/67391] [SH] Convert clrt addc to normal add insn

2015-09-23 Thread olegendo at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67391

--- Comment #8 from Oleg Endo  ---
Author: olegendo
Date: Wed Sep 23 11:55:45 2015
New Revision: 228046

URL: https://gcc.gnu.org/viewcvs?rev=228046=gcc=rev
Log:
gcc/
PR target/67391
* config/sh/sh.md (addsi3, *addsi3_compact): Don't check for
overlapping
regs when matching the pattern.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh.md

[Bug middle-end/67662] -fsanitize=undefined cries wolf for X - 1 + X when X is 2**30

2015-09-23 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67662

Richard Biener  changed:

   What|Removed |Added

  Known to work||6.0

--- Comment #4 from Richard Biener  ---
Fixed on trunk, I am considering to backport to GCC 5 at least.

Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling

2015-09-23 Thread Thomas Schwinge

Hi!

I clarified the -foffload usage:
.

On Wed, 23 Sep 2015 00:23:50 +0200, Bernd Schmidt  wrote:
> On 09/22/2015 02:02 PM, Thomas Schwinge wrote:
> >
> > gcc/
> > * gcc.c (handle_foffload_option): Don't lose the trailing NUL
> > character when appending to offload_targets.
> >
> > gcc/
> > * configure.ac (offload_targets, OFFLOAD_TARGETS): Separate
> > offload targets by commas, not colons.
> > * config.in: Regenerate.
> > * configure: Likewise.
> > * gcc.c (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Due to that,
> > instead of setting up the default offload targets here...
> > (process_command): ..., do it here.
> > libgomp/
> > * plugin/configfrag.ac (OFFLOAD_TARGETS): Clarify that offload
> > targets are separated by commas.
> > * config.h.in: Regenerate.
> 
> Looks ok to me

Thanks for the prompt review!

> except this double ChangeLog seems messed up.

Hmm, I thought that was the standard way to format ChangeLogs for
several/independent changes?  Anyway, to avoid that, I've split the patch
into two separate commits; r228053 and r228054:

commit daa8f58fd840e8d35f362306fb54e1963f4cbd0f
Author: tschwinge 
Date:   Wed Sep 23 14:52:50 2015 +

Fix --enable-offload-targets/-foffload handling, pt. 1

gcc/
* configure.ac (offload_targets, OFFLOAD_TARGETS): Separate
offload targets by commas, not colons.
* config.in: Regenerate.
* configure: Likewise.
* gcc.c (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Due to that,
instead of setting up the default offload targets here...
(process_command): ..., do it here.
libgomp/
* plugin/configfrag.ac (OFFLOAD_TARGETS): Clarify that offload
targets are separated by commas.
* config.h.in: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@228053 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog|   14 ++
 gcc/config.in|2 +-
 gcc/configure|2 +-
 gcc/configure.ac |4 ++--
 gcc/gcc.c|   23 +--
 gcc/lto-wrapper.c|4 
 libgomp/config.h.in  |2 +-
 libgomp/plugin/configfrag.ac |2 +-
 8 files changed, 37 insertions(+), 16 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 0e9b728..df71558 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,4 +1,18 @@
 2015-09-23  Thomas Schwinge  
+
+   * configure.ac (offload_targets, OFFLOAD_TARGETS): Separate
+   offload targets by commas, not colons.
+   * config.in: Regenerate.
+   * configure: Likewise.
+   * gcc.c (driver::maybe_putenv_COLLECT_LTO_WRAPPER): Due to that,
+   instead of setting up the default offload targets here...
+   (process_command): ..., do it here.
+   libgomp/
+   * plugin/configfrag.ac (OFFLOAD_TARGETS): Clarify that offload
+   targets are separated by commas.
+   * config.h.in: Regenerate.
+
+2015-09-23  Thomas Schwinge  
Nathan Sidwell  
 
* omp-low.h (omp_reduction_init_op): Declare.
diff --git gcc/config.in gcc/config.in
index 431d262..c5c1be4 100644
--- gcc/config.in
+++ gcc/config.in
@@ -1913,7 +1913,7 @@
 #endif
 
 
-/* Define to hold the list of target names suitable for offloading. */
+/* Define to offload targets, separated by commas. */
 #ifndef USED_FOR_TARGET
 #undef OFFLOAD_TARGETS
 #endif
diff --git gcc/configure gcc/configure
index 6fb11a7..7493c80 100755
--- gcc/configure
+++ gcc/configure
@@ -7696,7 +7696,7 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ 
/g'`; do
   if test x"$offload_targets" = x; then
 offload_targets=$tgt
   else
-offload_targets="$offload_targets:$tgt"
+offload_targets="$offload_targets,$tgt"
   fi
 done
 
diff --git gcc/configure.ac gcc/configure.ac
index a6e078a..9d1f6f1 100644
--- gcc/configure.ac
+++ gcc/configure.ac
@@ -941,11 +941,11 @@ for tgt in `echo $enable_offload_targets | sed 's/,/ 
/g'`; do
   if test x"$offload_targets" = x; then
 offload_targets=$tgt
   else
-offload_targets="$offload_targets:$tgt"
+offload_targets="$offload_targets,$tgt"
   fi
 done
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to hold the list of target names suitable for offloading.])
+  [Define to offload targets, separated by commas.])
 if test x"$offload_targets" != x; then
   AC_DEFINE(ENABLE_OFFLOADING, 1,
 [Define this to enable support for offloading.])
diff --git gcc/gcc.c gcc/gcc.c
index 757bfc9..78b68e2 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -284,7 +284,8 @@ static const char *const spec_version =

Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread Richard Biener

On Wed, Sep 23, 2015 at 3:19 PM, Michael Matz  wrote:
> Hi,
>
> On Tue, 22 Sep 2015, David Malcolm wrote:
>
>> The drawback is that it could bloat the ad-hoc table.  Can the ad-hoc
>> table ever get smaller, or does it only ever get inserted into?
>
> It only ever grows.
>
>> An idea I had is that we could stash short ranges directly into the 32
>> bits of location_t, by offsetting the per-column-bits somewhat.
>
> It's certainly worth an experiment: let's say you restrict yourself to
> tokens less than 8 characters, you need an additional 3 bits (using one
> value, e.g. zero, as the escape value).  That leaves 20 bits for the line
> numbers (for the normal 8 bit columns), which might be enough for most
> single-file compilations.  For LTO compilation this often won't be enough.
>
>> My plan is to investigate the impact these patches have on the time and
>> memory consumption of the compiler,
>
> When you do so, make sure you're also measuring an LTO compilation with
> debug info of something big (firefox).  I know that we already had issues
> with the size of the linemap data in the past for these cases (probably
> when we added columns).

The issue we have with LTO is that the linemap gets populated in quite
random order and thus we repeatedly switch files (we've mitigated this
somewhat for GCC 5).  We also considered dropping column info
(and would drop range info) as diagnostics are from optimizers only
with LTO and we keep locations merely for debug info.

Richard.

>
> Ciao,
> Michael.

Re: patch for PR61578

2015-09-23 Thread Dominik Vogt

On Tue, Sep 01, 2015 at 03:39:19PM -0400, Vladimir Makarov wrote:
>   The following patch is for
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578
> 
>   The patch was bootstrapped and tested on x86 and x86-64.
> 
>   Committed as rev. 227382.
> 
> 2015-09-01  Vladimir Makarov  
> 
> PR target/61578
> * lra-lives.c (process_bb_lives): Process move pseudos with the
> same value for copies and preferences
> * lra-constraints.c (match_reload): Create match reload pseudo
> with the same value from single dying input pseudo.

This check-in caused a regression on s390, please see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578 for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

[PATCH] Fix PR67662

2015-09-23 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-09-23   Richard Biener  

PR middle-end/67662
* fold-const.c (fold_binary_loc): Do not reassociate two vars with
undefined overflow unless they will cancel out.

* gcc.dg/ubsan/pr67662.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 228037)
+++ gcc/fold-const.c(working copy)
@@ -9493,25 +9511,32 @@ fold_binary_loc (location_t loc,
{
  tree tmp0 = var0;
  tree tmp1 = var1;
+ bool one_neg = false;
 
  if (TREE_CODE (tmp0) == NEGATE_EXPR)
-   tmp0 = TREE_OPERAND (tmp0, 0);
+   {
+ tmp0 = TREE_OPERAND (tmp0, 0);
+ one_neg = !one_neg;
+   }
  if (CONVERT_EXPR_P (tmp0)
  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (tmp0, 0)))
  && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (tmp0, 0)))
  <= TYPE_PRECISION (atype)))
tmp0 = TREE_OPERAND (tmp0, 0);
  if (TREE_CODE (tmp1) == NEGATE_EXPR)
-   tmp1 = TREE_OPERAND (tmp1, 0);
+   {
+ tmp1 = TREE_OPERAND (tmp1, 0);
+ one_neg = !one_neg;
+   }
  if (CONVERT_EXPR_P (tmp1)
  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (tmp1, 0)))
  && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (tmp1, 0)))
  <= TYPE_PRECISION (atype)))
tmp1 = TREE_OPERAND (tmp1, 0);
  /* The only case we can still associate with two variables
-is if they are the same, modulo negation and bit-pattern
-preserving conversions.  */
- if (!operand_equal_p (tmp0, tmp1, 0))
+is if they cancel out.  */
+ if (!one_neg
+ || !operand_equal_p (tmp0, tmp1, 0))
ok = false;
}
}
Index: gcc/testsuite/gcc.dg/ubsan/pr67662.c
===
--- gcc/testsuite/gcc.dg/ubsan/pr67662.c(revision 0)
+++ gcc/testsuite/gcc.dg/ubsan/pr67662.c(working copy)
@@ -0,0 +1,14 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=undefined" } */
+
+extern void abort (void);
+
+int
+main (void)
+{
+  int halfmaxval = __INT_MAX__ / 2 + 1;
+  int maxval = halfmaxval - 1 + halfmaxval;
+  if (maxval != __INT_MAX__)
+abort ();
+  return 0;
+}

Re: Refactor omp_reduction_init: omp_reduction_init_op

2015-09-23 Thread Bernd Schmidt


gcc/
* omp-low.h (omp_reduction_init_op): Declare.
* omp-low.c (omp_reduction_init_op): New, broken out of ...
(omp_reduction_init): ... here.  Call it.
* tree-parloops.c (initialize_reductions): Use
omp_reduction_init_op.


That looks ok.


Bernd

Re: Powerpc atomic_load

2015-09-23 Thread Sebastian Huber


On 10/09/15 19:52, David Edelsohn wrote:

https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html


Is there specific reason why the SYNC L,E (Elemental Memory Barriers) 
defined by Power-ISA V2.07 doesn't appear in this table?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: [RFC] Masking vectorized loops with bound not aligned to VF.

2015-09-23 Thread Richard Biener

On Fri, Sep 18, 2015 at 6:07 PM, Kirill Yukhin  wrote:
> Hello,
> On 18 Sep 10:31, Richard Biener wrote:
>> On Thu, 17 Sep 2015, Ilya Enkovich wrote:
>>
>> > 2015-09-16 15:30 GMT+03:00 Richard Biener :
>> > > On Mon, 14 Sep 2015, Kirill Yukhin wrote:
>> > >
>> > >> Hello,
>> > >> I'd like to initiate discussion on vectorization of loops which
>> > >> boundaries are not aligned to VF. Main target for this optimization
>> > >> right now is x86's AVX-512, which features per-element embedded masking
>> > >> for all instructions. The main goal for this mail is to agree on overall
>> > >> design of the feature.
>> > >>
>> > >> This approach was presented @ GNU Cauldron 2015 by Ilya Enkovich [1].
>> > >>
>> > >> Here's a sketch of the algorithm:
>> > >>   1. Add check on basic stmts for masking: possibility to introduce 
>> > >> index vector and
>> > >>  corresponding mask
>> > >>   2. At the check if statements are vectorizable we additionally check 
>> > >> if stmts
>> > >>  need and can be masked and compute masking cost. Result is stored 
>> > >> in `stmt_vinfo`.
>> > >>  We are going  to mask only mem. accesses, reductions and modify 
>> > >> mask for already
>> > >>  masked stmts (mask load, mask store and vect. condition)
>> > >
>> > > I think you also need to mask divisions (for integer divide by zero) and
>> > > want to mask FP ops which may result in NaNs or denormals (because that's
>> > > generally to slow down execution a lot in my experience).
>> > >
>> > > Why not simply mask all stmts?
>> >
>> > Hi,
>> >
>> > Statement masking may be not free. Especially if we need to transform
>> > mask somehow to do it. It also may be unsupported on a platform (e.g.
>> > for AVX-512 not all instructions support masking) but still not be a
>> > problem to mask a loop. BTW for AVX-512 masking doesn't boost
>> > performance even if we have some special cases like NaNs. We don't
>> > consider exceptions in vector code (and it seems to be a case now?)
>> > otherwise we would need to mask them also.
>>
>> Well, we do need to honor
>>
>>   if (x != 0.)
>>y[i] = z[i] / x;
>>
>> in some way.  I think if-conversion currently simply gives up here.
>> So if we have the epilogue and using masked loads what are the
>> contents of the 'masked' elements (IIRC they are zero or all-ones,
>> right)?  If the end up as zero then even simple code like
>>
>>   for (i;;)
>>a[i] = b[i] / c[i];
>>
>> cannot be transformed in the suggested way with -ftrapping-math
>> and the remainder iteration might get slow if processing NaN
>> operands is still as slow as it was 10 years ago.
>>
>> IMHO for if-converting possibly trapping stmts (like the above
>> example) we need some masking support anyway (and a way to express
>> the masking in GIMPLE).
> We'll use if-cvt technique. If op is trapping - we do not apply masking for 
> loop remainder
> This is subject for further development. Currently we don't try truly mask 
> existing GIMPLE
> stmts. All masking is achieved using `vec_cond` and we're not sure that 
> trapping is really
> useful feature while vectorization is on.

Ok.  And yes, we'd need to have a way to predicate such stmts directly.

>> > >>   3. Make a decision about masking: take computed costs and est. 
>> > >> iterations count
>> > >>  into consideration
>> > >>   4. Modify prologue/epilogue generation according decision made at 
>> > >> analysis. Three
>> > >>  options available:
>> > >> a. Use scalar remainder
>> > >> b. Use masked remainder. Won't be supported in first version
>> > >> c. Mask main loop
>> > >>   5.Support vectorized loop masking:
>> > >> - Create stmts for mask generation
>> > >> - Support generation of masked vector code (create generic vector 
>> > >> code then
>> > >>   patch it w/ masks)
>> > >>   -  Mask loads/stores/vconds/reductions only
>> > >>
>> > >>  In first version (targeted v6) we're not going to support 4.b and loop
>> > >> mask pack/unpack. No `pack/unpack` means that masking will be supported
>> > >> only for types w/ the same size as index variable
>> > >
>> > > This means that if ncopies for any stmt is > 1 masking won't be 
>> > > supported,
>> > > right?  (you'd need two or more different masks)
>> >
>> > We don't think it is a very important feature to have in initial
>> > version. It can be added later and shouldn't affect overall
>> > implementation design much. BTW currently masked loads and stores
>> > don't support masks of other sizes and don't do masks pack/unpack.
>>
>> I think masked loads/stores support this just fine.  Remember the
>> masks are regular vectors generated by cond exprs in the current code.
> Not quite true, mask load/stores are not supported for different size.
> E.g. this example is not vectorized:
>   int a[LENGTH], b[LENGTH];
>   long long c[LENGTH];
>
>   int test ()
>   {
> int i;
> #pragma omp simd safelen(16)
> for (i = 0; i < LENGTH; i++)

Refactor omp_reduction_init: omp_reduction_init_op (was: [gomp4] ptx reduction simplification)

2015-09-23 Thread Thomas Schwinge

Hi!

On Tue, 22 Sep 2015 11:11:59 -0400, Nathan Sidwell  
wrote:
> On 09/22/15 11:10, Thomas Schwinge wrote:
> > On Fri, 18 Sep 2015 20:05:48 -0400, Nathan Sidwell  wrote:
> >> I've committed this patch to rework and simplify [...]
> >> the reduction lowering hooks.
> >>
> >> The current implementation [...]
> >> [was] overcomplicated in a number of ways.
> >
> >>* omp-low.h (omp_reduction_init_op): Declare.
> >>* omp-low.c (omp_reduction_init_op): New, broken out of ...
> >>(omp_reduction_init): ... here.  Call it.
> >>* tree-parloops.c (initialize_reductions): Use
> >>omp_redutction_init_op.
> >
> > Should this go into trunk already?  (I can test it, if you'd like me to.)
> 
> go  for it!

Tested on x86_64-pc-linux-gnu; no changes.  OK for trunk?

commit de2726ef46b8d875239ccb445c784c56e1a716dc
Author: Thomas Schwinge 
Date:   Tue Sep 22 17:30:40 2015 +0200

Refactor omp_reduction_init: omp_reduction_init_op

2015-09-23  Thomas Schwinge  
Nathan Sidwell  

gcc/
* omp-low.h (omp_reduction_init_op): Declare.
* omp-low.c (omp_reduction_init_op): New, broken out of ...
(omp_reduction_init): ... here.  Call it.
* tree-parloops.c (initialize_reductions): Use
omp_reduction_init_op.
---
 gcc/omp-low.c   |   16 
 gcc/omp-low.h   |1 +
 gcc/tree-parloops.c |   16 +---
 3 files changed, 18 insertions(+), 15 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 88a5149..fae407d 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -3372,13 +3372,12 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context 
*ctx)
 }
 
 
-/* Construct the initialization value for reduction CLAUSE.  */
+/* Construct the initialization value for reduction operation OP.  */
 
 tree
-omp_reduction_init (tree clause, tree type)
+omp_reduction_init_op (location_t loc, enum tree_code op, tree type)
 {
-  location_t loc = OMP_CLAUSE_LOCATION (clause);
-  switch (OMP_CLAUSE_REDUCTION_CODE (clause))
+  switch (op)
 {
 case PLUS_EXPR:
 case MINUS_EXPR:
@@ -3451,6 +3450,15 @@ omp_reduction_init (tree clause, tree type)
 }
 }
 
+/* Construct the initialization value for reduction CLAUSE.  */
+
+tree
+omp_reduction_init (tree clause, tree type)
+{
+  return omp_reduction_init_op (OMP_CLAUSE_LOCATION (clause),
+   OMP_CLAUSE_REDUCTION_CODE (clause), type);
+}
+
 /* Return alignment to be assumed for var in CLAUSE, which should be
OMP_CLAUSE_ALIGNED.  */
 
diff --git gcc/omp-low.h gcc/omp-low.h
index 8a4052e..44e35a3 100644
--- gcc/omp-low.h
+++ gcc/omp-low.h
@@ -25,6 +25,7 @@ struct omp_region;
 extern tree find_omp_clause (tree, enum omp_clause_code);
 extern void omp_expand_local (basic_block);
 extern void free_omp_regions (void);
+extern tree omp_reduction_init_op (location_t, enum tree_code, tree);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index c164121..94cacb6 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -565,8 +565,8 @@ reduc_stmt_res (gimple stmt)
 int
 initialize_reductions (reduction_info **slot, struct loop *loop)
 {
-  tree init, c;
-  tree bvar, type, arg;
+  tree init;
+  tree type, arg;
   edge e;
 
   struct reduction_info *const reduc = *slot;
@@ -577,16 +577,10 @@ initialize_reductions (reduction_info **slot, struct loop 
*loop)
   /* In the phi node at the header, replace the argument coming
  from the preheader with the reduction initialization value.  */
 
-  /* Create a new variable to initialize the reduction.  */
+  /* Initialize the reduction.  */
   type = TREE_TYPE (PHI_RESULT (reduc->reduc_phi));
-  bvar = create_tmp_var (type, "reduction");
-
-  c = build_omp_clause (gimple_location (reduc->reduc_stmt),
-   OMP_CLAUSE_REDUCTION);
-  OMP_CLAUSE_REDUCTION_CODE (c) = reduc->reduction_code;
-  OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (reduc_stmt_res (reduc->reduc_stmt));
-
-  init = omp_reduction_init (c, TREE_TYPE (bvar));
+  init = omp_reduction_init_op (gimple_location (reduc->reduc_stmt),
+   reduc->reduction_code, type);
   reduc->init = init;
 
   /* Replace the argument representing the initialization value


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Nathan Sidwell


On 09/23/15 10:16, Thomas Schwinge wrote:

Hi Nathan!

On Wed, 23 Sep 2015 08:40:51 -0400, Nathan Sidwell  
wrote:

On 09/23/15 05:27, Thomas Schwinge wrote:

On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:

I've committed this patch to add a new pair of internal functions.  These will
be used in implementing reductions.

They'll be emitted around reduction finalization, and implement the locking
required for the general case of combining reduction values.  They may be
transformed in the oacc_xform pass, and the default behaviour is to delete them,
if there is no RTL expander.  For PTX we delete them if they are at the vector
level.

This avoids needing machine-specific builtins to expand to, and thus should
result in less backend code duplication.


With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
removed, should the gcc.target/nvptx/spinlock-1.c and
gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
these be re-written differently?


confused.  I don't think I remoced those locks.  Certainly didn't intend to, and
I would have expected massive test fails if I had.


You didn't remove the functionality, but you did remove the
__builtin_nvptx_lock and __builtin_nvptx_unlock builtins (which the two
test cases were written for), replacing them with GOACC_LOCK/GOACC_UNLOCK
internal functions, nvptx_expand_oacc_lock_unlock.


ah, thanks. I expect even these are going to go away soon. the spinlock 
testcases should be removed.


nathan

--
Nathan Sidwell

Re: [PATCH] Add new hooks ASM_OUTPUT_START_FUNCTION_HEADER ...

2015-09-23 Thread Dominik Vogt

On Tue, Sep 22, 2015 at 01:56:15PM -0600, Jeff Law wrote: > Is
there some good reason these aren't hooks?

No, that was just inobservance.  New version attached.  Would it be
preferrable to initialize the hooks with a NULL pointer and test
the pointer before calling them?  (That way the changes to
hooks.[ch] could be dropped.)

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* target.def: Add function_start and function_end hooks.
* hooks.c (hook_void_FILEptr_tree): New function.
* hooks.h: Ditto.
* varasm.c (assemble_start_function): Call hook at start of function.
(assemble_end_function): Call hook at end of function.
* doc/tm.texi.in: Document new hooks.
* doc/tm.texi: Regenerate.
>From 791b0dc5ba32ace51fb8214cdb0cf769b91a024c Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 29 Jul 2015 16:14:23 +0100
Subject: [PATCH] Add new hooks asm_out.function_start and
 asm_out.function_end.

They are used by the implementation of __attribute__ ((target(...))) on S390.
---
 gcc/doc/tm.texi| 10 ++
 gcc/doc/tm.texi.in |  4 
 gcc/hooks.c|  7 +++
 gcc/hooks.h|  1 +
 gcc/target.def | 16 
 gcc/varasm.c   |  2 ++
 6 files changed, 40 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index d548d96..62d83db 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7348,6 +7348,16 @@ Output to @code{asm_out_file} any text which the assembler expects
 to find at the end of a file.  The default is to output nothing.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_ASM_FUNCTION_START (FILE *@var{}, @var{tree})
+Output to @code{asm_out_file} any text which is necessary at the start of
+a function.  The default is to output nothing.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_ASM_FUNCTION_END (FILE *@var{}, @var{tree})
+Output to @code{asm_out_file} any text which is necessary at the end of a
+function.  The default is to output nothing.
+@end deftypefn
+
 @deftypefun void file_end_indicate_exec_stack ()
 Some systems use a common convention, the @samp{.note.GNU-stack}
 special section, to indicate whether or not an object file relies on
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9bef4a5..b1c4b96 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -5122,6 +5122,10 @@ This describes the overall framework of an assembly file.
 
 @hook TARGET_ASM_FILE_END
 
+@hook TARGET_ASM_FUNCTION_START
+
+@hook TARGET_ASM_FUNCTION_END
+
 @deftypefun void file_end_indicate_exec_stack ()
 Some systems use a common convention, the @samp{.note.GNU-stack}
 special section, to indicate whether or not an object file relies on
diff --git a/gcc/hooks.c b/gcc/hooks.c
index 0fb9add..3440e06 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -146,6 +146,13 @@ hook_void_FILEptr_constcharptr_const_tree (FILE *, const char *, const_tree)
 {
 }
 
+/* Generic hook that takes (FILE *, tree) and does
+   nothing.  */
+void
+hook_void_FILEptr_tree (FILE *, tree)
+{
+}
+
 /* Generic hook that takes (FILE *, rtx) and returns false.  */
 bool
 hook_bool_FILEptr_rtx_false (FILE *a ATTRIBUTE_UNUSED,
diff --git a/gcc/hooks.h b/gcc/hooks.h
index c3d4bd3..bbd26cb 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -70,6 +70,7 @@ extern void hook_void_rtx_insn_int (rtx_insn *, int);
 extern void hook_void_FILEptr_constcharptr (FILE *, const char *);
 extern void hook_void_FILEptr_constcharptr_const_tree (FILE *, const char *,
 		   const_tree);
+extern void hook_void_FILEptr_tree (FILE *, tree);
 extern bool hook_bool_FILEptr_rtx_false (FILE *, rtx);
 extern void hook_void_rtx_tree (rtx, tree);
 extern void hook_void_tree (tree);
diff --git a/gcc/target.def b/gcc/target.def
index aa5a1f1..4a18be5 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -672,6 +672,22 @@ to find at the end of a file.  The default is to output nothing.",
  void, (void),
  hook_void_void)
 
+/* Output additional text at the start of a function.  */
+DEFHOOK
+(function_start,
+ "Output to @code{asm_out_file} any text which is necessary at the start of\n\
+a function.  The default is to output nothing.",
+ void, (FILE *, tree),
+ hook_void_FILEptr_tree)
+
+/* Output additional text at the end of a function.  */
+DEFHOOK
+(function_end,
+ "Output to @code{asm_out_file} any text which is necessary at the end of a\n\
+function.  The default is to output nothing.",
+ void, (FILE *, tree),
+ hook_void_FILEptr_tree)
+
 /* Output any boilerplate text needed at the beginning of an
LTO output stream.  */
 DEFHOOK
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 706e652..1b6f7b7 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -1701,6 +1701,7 @@ assemble_start_function (tree decl, const char *fnname)
   char tmp_label[100];
   bool hot_label_written = false;
 
+  targetm.asm_out.function_start (asm_out_file, current_function_decl);
   if (flag_reorder_blocks_and_partition)
 {

Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Richard Biener

On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich  wrote:
> 2015-09-18 16:40 GMT+03:00 Ilya Enkovich :
>> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>>>
>>> I was thinking about targets not supporting generating vec
>>> (of whatever mode) from a comparison directly but only via
>>> a COND_EXPR.
>>
>> Where may these direct comparisons come from? Vectorizer never
>> generates unsupported statements. It means we get them from
>> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
>> Actually vect lowering checks if we are able to make comparison and
>> expand also uses vec_cond to expand vector comparison, so probably we
>> may live with them.
>>
>>>
>>> Not sure if we are always talking about the same thing for
>>> "bool patterns".  I'd remove bool patterns completely, IMHO
>>> they are not necessary at all.
>>
>> I refer to transformations made by vect_recog_bool_pattern. Don't see
>> how to remove them completely for targets not supporting comparison
>> vectorization.
>>
>>>
>>> I think we do allow this, just the vectorizer doesn't expect it.  In the 
>>> long
>>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>>> VEC_COND_EXPR.  Just didn't have the time to do this...
>>
>> That would be nice. As a first step I'd like to support optabs for
>> VEC_COND_EXPR directly using vec.
>>
>> Thanks,
>> Ilya
>>
>>>
>>> Richard.
>
> Hi Richard,
>
> Do you think we have enough confidence approach is working and we may
> start integrating it into trunk? What would be integration plan then?

I'm still worried about the vec vector size vs. element size
issue (well, somewhat).

Otherwise the integration plan would be

 1) put in the vector GIMPLE type support and change the vector
comparison type IL requirement to be vector,
fixing all fallout

 2) get support for directly expanding vector comparisons to
vector and make use of that from the x86 backend

 3) make the vectorizer generate the above if supported

I think independent improvements are

 1) remove (most) of the bool patterns from the vectorizer

 2) make VEC_COND_EXPR not have a GENERIC comparison embedded

(same for COND_EXPR?)

Richard.

> Thanks,
> Ilya

Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Thomas Schwinge

Hi Nathan!

On Wed, 23 Sep 2015 08:40:51 -0400, Nathan Sidwell  
wrote:
> On 09/23/15 05:27, Thomas Schwinge wrote:
> > On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:
> >> I've committed this patch to add a new pair of internal functions.  These 
> >> will
> >> be used in implementing reductions.
> >>
> >> They'll be emitted around reduction finalization, and implement the locking
> >> required for the general case of combining reduction values.  They may be
> >> transformed in the oacc_xform pass, and the default behaviour is to delete 
> >> them,
> >> if there is no RTL expander.  For PTX we delete them if they are at the 
> >> vector
> >> level.
> >>
> >> This avoids needing machine-specific builtins to expand to, and thus should
> >> result in less backend code duplication.
> >
> > With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
> > removed, should the gcc.target/nvptx/spinlock-1.c and
> > gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
> > these be re-written differently?
> 
> confused.  I don't think I remoced those locks.  Certainly didn't intend to, 
> and 
> I would have expected massive test fails if I had.

You didn't remove the functionality, but you did remove the
__builtin_nvptx_lock and __builtin_nvptx_unlock builtins (which the two
test cases were written for), replacing them with GOACC_LOCK/GOACC_UNLOCK
internal functions, nvptx_expand_oacc_lock_unlock.


Grüße,
 Thomas


signature.asc
Description: PGP signature

Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Richard Biener

On Fri, Sep 18, 2015 at 3:40 PM, Ilya Enkovich  wrote:
> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>> On Thu, Sep 3, 2015 at 3:57 PM, Ilya Enkovich  wrote:
>>> 2015-09-03 15:11 GMT+03:00 Richard Biener :
 On Thu, Sep 3, 2015 at 2:03 PM, Ilya Enkovich  
 wrote:
> Adding CCs.
>
> 2015-09-03 15:03 GMT+03:00 Ilya Enkovich :
>> 2015-09-01 17:25 GMT+03:00 Richard Biener :
>>
>> Totally disabling old style vector comparison and bool pattern is a
>> goal but doing hat would mean a lot of regressions for many targets.
>> Do you want to it to be tried to estimate amount of changes required
>> and reveal possible issues? What would be integration plan for these
>> changes? Do you want to just introduce new vector in GIMPLE
>> disabling bool patterns and then resolving vectorization regression on
>> all targets or allow them live together with following target switch
>> one by one from bool patterns with finally removing them? Not all
>> targets are likely to be adopted fast I suppose.

 Well, the frontends already create vec_cond exprs I believe.  So for
 bool patterns the vectorizer would have to do the same, but the
 comparison result in there would still use vec.  Thus the scalar

  _Bool a = b < c;
  _Bool c = a || d;
  if (c)

 would become

  vec a = VEC_COND ;
  vec c = a | d;
>>>
>>> This should be identical to
>>>
>>> vec<_Bool> a = a < b;
>>> vec<_Bool> c = a | d;
>>>
>>> where vec<_Bool> has VxSI mode. And we should prefer it in case target
>>> supports vector comparison into vec, right?
>>>

 when the target does not have vecs directly and otherwise
 vec directly (dropping the VEC_COND).

 Just the vector comparison inside the VEC_COND would always
 have vec type.
>>>
>>> I don't really understand what you mean by 'doesn't have vecs
>>> dirrectly' here. Currently I have a hook to ask for a vec mode
>>> and assume target doesn't support it in case it returns VOIDmode. But
>>> in such case I have no mode to use for vec inside VEC_COND
>>> either.
>>
>> I was thinking about targets not supporting generating vec
>> (of whatever mode) from a comparison directly but only via
>> a COND_EXPR.
>
> Where may these direct comparisons come from? Vectorizer never
> generates unsupported statements. It means we get them from
> gimplifier?

That's what I say - the vecotirzer wouldn't generate them.

> So touch optabs in gimplifier to avoid direct comparisons?
> Actually vect lowering checks if we are able to make comparison and
> expand also uses vec_cond to expand vector comparison, so probably we
> may live with them.
>
>>
>>> In default implementation of the new target hook I always return
>>> integer vector mode (to have default behavior similar to the current
>>> one). It should allow me to use vec for conditions in all
>>> vec_cond. But we'd need some other trigger for bool patterns to apply.
>>> Probably check vec_cmp optab in check_bool_pattern and don't convert
>>> in case comparison is supported by target? Or control it via
>>> additional hook.
>>
>> Not sure if we are always talking about the same thing for
>> "bool patterns".  I'd remove bool patterns completely, IMHO
>> they are not necessary at all.
>
> I refer to transformations made by vect_recog_bool_pattern. Don't see
> how to remove them completely for targets not supporting comparison
> vectorization.

The vectorizer can vectorize comparisons by emitting a VEC_COND_EXPR
(the bool pattern would turn the comparison into a COND_EXPR).  I don't
see how the pattern intermediate step is necessary.  The important part
is to get the desired vector type of the comparison determined.

>>

 And the "bool patterns" I am talking about are those in
 tree-vect-patterns.c, not any targets instruction patterns.
>>>
>>> I refer to them also. BTW bool patterns also pull comparison into
>>> vec_cond. Thus we cannot have SSA_NAME in vec_cond as a condition. I
>>> think with vector comparisons in place we should allow SSA_NAME as
>>> conditions in VEC_COND for better CSE. That should require new vcond
>>> optabs though.
>>
>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>> VEC_COND_EXPR.  Just didn't have the time to do this...
>
> That would be nice. As a first step I'd like to support optabs for
> VEC_COND_EXPR directly using vec.
>
> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>> Ilya
>>>

 Richard.

>>
>> Ilya

Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread Michael Matz

Hi,

On Wed, 23 Sep 2015, Richard Biener wrote:

> The issue we have with LTO is that the linemap gets populated in quite 
> random order and thus we repeatedly switch files (we've mitigated this 
> somewhat for GCC 5).

Yes.

> We also considered dropping column info (and would drop range info) as 
> diagnostics are from optimizers only with LTO and we keep locations 
> merely for debug info.

That would be the obvious mitigations, yes.  I do like the fact that we'd 
be able to do all this without enlarging location_t.

Ciao,
Micha.

[Bug c/48885] missed optimization with restrict qualifier?

2015-09-23 Thread vries at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48885

--- Comment #14 from vries at gcc dot gnu.org ---
(In reply to rguent...@suse.de from comment #13)
> On Wed, 23 Sep 2015, vries at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48885
> > 
> > --- Comment #12 from vries at gcc dot gnu.org ---
> > (In reply to Richard Biener from comment #11)
> > > I'm testing the above simple fix and amend the comment.
> > 
> > Consider the example with functions f and g I gave in comment 10. Using the
> > patch from comment 11, I get at ealias:
> > ...
> > void f(int* __restrict__&, int*) (intD.9 * restrict & restrict pD.2252, 
> > intD.9
> > * p2D.2253)
> > {
> >   intD.9 * _3;
> > 
> >   # VUSE <.MEM_1(D)>
> >   # PT = { D.2265 } (nonlocal)
> >   _3 = MEM[(intD.9 * restrict &)p_2(D) clique 1 base 1];
> > 
> >   # .MEM_4 = VDEF <.MEM_1(D)>
> >   MEM[(intD.9 *)_3 clique 1 base 2] = 1;
> > 
> >   # .MEM_6 = VDEF <.MEM_4>
> >   MEM[(intD.9 *)p2_5(D) clique 1 base 0] = 2;
> > ...
> > 
> > AFAIU, this is incorrect. The two stores can be now disambiguated based on 
> > same
> > clique/different base, but in fact the stores can alias (in fact they do, in
> > the  "f (gp, gp)" call from g).
> 
> How is this a valid testcase?
> You are accessing g()s *gp through
> p and p2 even though p is marked as restrict.

To be exact, p is a restrict reference to a restrict pointer.
And AFAIU it's a valid test-case.

>  Did you mean to write
> 
> void
> f (int *&__restrict__ p, int *p2)
> 
> ?

No. I'll try explain, renaming variables to help clarification, and adding a
call to g for completeness:
...
void
f (int *__restrict__ &__restrict__ fp, int *fp2)
{
  *fp = 1;
  *fp2 = 2;
}

void
g (int *__restrict__ gp)
{
  f (gp, gp);
}

void
h (void)
{
  int ha;
  g ();
}
...

Let's look at the three restricts in the example.

First, there's the second restrict in "int *__restrict &__restrict fp", which
is a reference to object gp. Since object gp is not modified during f, the
restrict has no consequence.

Then there's the restrict in "int *__restrict__ gp". The object pointed to is
ha, and it's modified during g. So all accesses to ha during g need to be based
on gp. And that is the case. The '*fp2 = 1' is based on gp. And the '*fp2 = 2'
is based on gp.

Finally, there's the first restrict in "int *__restrict &__restrict fp". That's
a copy of the type of gp.

Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-23 Thread Jeff Law


On 09/23/2015 07:47 AM, Michael Matz wrote:

Hi,

On Wed, 23 Sep 2015, Richard Biener wrote:


The issue we have with LTO is that the linemap gets populated in quite
random order and thus we repeatedly switch files (we've mitigated this
somewhat for GCC 5).


Yes.


We also considered dropping column info (and would drop range info) as
diagnostics are from optimizers only with LTO and we keep locations
merely for debug info.


That would be the obvious mitigations, yes.  I do like the fact that we'd
be able to do all this without enlarging location_t.

That's the hope.

However, I did ask David to ponder the effects if ultimately we did need 
to extend location_t to 64 bits.


Jff

Re: Powerpc atomic_load

2015-09-23 Thread Peter Bergner

On Wed, 2015-09-23 at 16:15 +0200, Sebastian Huber wrote:
> On 10/09/15 19:52, David Edelsohn wrote:
> > https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
> 
> Is there specific reason why the SYNC L,E (Elemental Memory Barriers) 
> defined by Power-ISA V2.07 doesn't appear in this table?

Probably because that category is only implemented on some (one?) cpus
(eg, E6500) and not on any of the server cpus (eg, power[45678]), so no
one cared enough to add that info? :-)  It would probably be useful to
add though.

Peter

Re: [RFC] Try vector as a new representation for vector masks

2015-09-23 Thread Ilya Enkovich

2015-09-18 16:40 GMT+03:00 Ilya Enkovich :
> 2015-09-18 15:22 GMT+03:00 Richard Biener :
>>
>> I was thinking about targets not supporting generating vec
>> (of whatever mode) from a comparison directly but only via
>> a COND_EXPR.
>
> Where may these direct comparisons come from? Vectorizer never
> generates unsupported statements. It means we get them from
> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
> Actually vect lowering checks if we are able to make comparison and
> expand also uses vec_cond to expand vector comparison, so probably we
> may live with them.
>
>>
>> Not sure if we are always talking about the same thing for
>> "bool patterns".  I'd remove bool patterns completely, IMHO
>> they are not necessary at all.
>
> I refer to transformations made by vect_recog_bool_pattern. Don't see
> how to remove them completely for targets not supporting comparison
> vectorization.
>
>>
>> I think we do allow this, just the vectorizer doesn't expect it.  In the long
>> run I want to get rid of the GENERIC exprs in both COND_EXPR and
>> VEC_COND_EXPR.  Just didn't have the time to do this...
>
> That would be nice. As a first step I'd like to support optabs for
> VEC_COND_EXPR directly using vec.
>
> Thanks,
> Ilya
>
>>
>> Richard.

Hi Richard,

Do you think we have enough confidence approach is working and we may
start integrating it into trunk? What would be integration plan then?

Thanks,
Ilya

[Bug middle-end/67662] -fsanitize=undefined cries wolf for X - 1 + X when X is 2**30

2015-09-23 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67662

--- Comment #5 from Richard Biener  ---
Author: rguenth
Date: Wed Sep 23 14:09:48 2015
New Revision: 228051

URL: https://gcc.gnu.org/viewcvs?rev=228051=gcc=rev
Log:
2015-09-23   Richard Biener  

PR middle-end/67662
* fold-const.c (fold_binary_loc): Do not reassociate two vars with
undefined overflow unless they will cancel out.

* gcc.dg/ubsan/pr67662.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/ubsan/pr67662.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/fold-const.c
trunk/gcc/testsuite/ChangeLog

Re: [gomp4] lock/unlock internal fn

2015-09-23 Thread Thomas Schwinge

Hi!

On Wed, 23 Sep 2015 10:19:15 -0400, Nathan Sidwell  
wrote:
> On 09/23/15 10:16, Thomas Schwinge wrote:
> > On Wed, 23 Sep 2015 08:40:51 -0400, Nathan Sidwell 
> >  wrote:
> >> On 09/23/15 05:27, Thomas Schwinge wrote:
> >>> On Mon, 17 Aug 2015 15:30:16 -0400, Nathan Sidwell  wrote:
>  I've committed this patch to add a new pair of internal functions.  
>  These will
>  be used in implementing reductions.

> >>> With the __builtin_nvptx_lock and __builtin_nvptx_unlock builtins
> >>> removed, should the gcc.target/nvptx/spinlock-1.c and
> >>> gcc.target/nvptx/spinlock-2.c test cases then be removed, too, or should
> >>> these be re-written differently?
> >>
> >> confused.  I don't think I remoced those locks.  Certainly didn't intend 
> >> to, and
> >> I would have expected massive test fails if I had.
> >
> > You didn't remove the functionality, but you did remove the
> > __builtin_nvptx_lock and __builtin_nvptx_unlock builtins (which the two
> > test cases were written for), replacing them with GOACC_LOCK/GOACC_UNLOCK
> > internal functions, nvptx_expand_oacc_lock_unlock.
> 
> ah, thanks. I expect even these are going to go away soon. the spinlock 
> testcases should be removed.

Committed to gomp-4_0-branch in r228055:

commit fa0a1ef0b746e6f2f7c54f5516ee2c8ebe05cf25
Author: tschwinge 
Date:   Wed Sep 23 15:16:05 2015 +

[nvptx] Remove obsolete spinlock test cases

gcc/testsuite/
* gcc.target/nvptx/spinlock-1.c: Remove file.
* gcc.target/nvptx/spinlock-2.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228055 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp|5 +
 gcc/testsuite/gcc.target/nvptx/spinlock-1.c |   11 ---
 gcc/testsuite/gcc.target/nvptx/spinlock-2.c |   10 --
 3 files changed, 5 insertions(+), 21 deletions(-)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index b14167e..1e7667d 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2015-09-23  Thomas Schwinge  
+
+   * gcc.target/nvptx/spinlock-1.c: Remove file.
+   * gcc.target/nvptx/spinlock-2.c: Likewise.
+
 2015-09-18  Thomas Schwinge  
 
* gcc.target/nvptx/spinlock-1.c: Fix DejaGnu directives.
diff --git gcc/testsuite/gcc.target/nvptx/spinlock-1.c 
gcc/testsuite/gcc.target/nvptx/spinlock-1.c
deleted file mode 100644
index b464ad9..000
--- gcc/testsuite/gcc.target/nvptx/spinlock-1.c
+++ /dev/null
@@ -1,11 +0,0 @@
-/* { dg-do compile } */
-void Foo ()
-{
-  __builtin_nvptx_lock (0);
-  __builtin_nvptx_unlock (0);
-}
-
-
-/* { dg-final { scan-assembler-times ".atom.global.cas.b32" 2 } } */
-/* { dg-final { scan-assembler ".global .u32 __global_lock;" } } */
-/* { dg-final { scan-assembler-not ".shared .u32 __shared_lock;" } } */
diff --git gcc/testsuite/gcc.target/nvptx/spinlock-2.c 
gcc/testsuite/gcc.target/nvptx/spinlock-2.c
deleted file mode 100644
index 9a51d3f..000
--- gcc/testsuite/gcc.target/nvptx/spinlock-2.c
+++ /dev/null
@@ -1,10 +0,0 @@
-/* { dg-do compile } */
-void Foo ()
-{
-  __builtin_nvptx_lock (1);
-  __builtin_nvptx_unlock (1);
-}
-
-/* { dg-final { scan-assembler-times ".atom.shared.cas.b32" 2 } } */
-/* { dg-final { scan-assembler ".shared .u32 __shared_lock;" } } */
-/* { dg-final { scan-assembler-not ".global .u32 __global_lock;" } } */


Grüße,
 Thomas


signature.asc
Description: PGP signature

[Bug other/67590] libcc1 cannot find objdump when cross build native

2015-09-23 Thread doko at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67590

Matthias Klose  changed:

   What|Removed |Added

 CC||doko at gcc dot gnu.org

--- Comment #1 from Matthias Klose  ---
please send the patch together with an appropriate changelog entry to the
gcc-pat ches ML, maybe referencing this PR.

[Bug c/43651] add warning for duplicate qualifier

2015-09-23 Thread miyuki at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43651

Mikhail Maltsev  changed:

   What|Removed |Added

 CC||miyuki at gcc dot gnu.org

--- Comment #6 from Mikhail Maltsev  ---
Clang also emits a warning:

$ cat test.c
int foo(const char const *data);

$ /opt/clang-3.6.2/bin/clang -S -std=c89 test.c
test.c:1:20: warning: duplicate 'const' declaration specifier
[-Wduplicate-decl-specifier]
int foo(const char const *data);
   ^~
1 warning generated.
$ /opt/clang-3.6.2/bin/clang -S -std=c99 test.c
test.c:1:20: warning: duplicate 'const' declaration specifier
[-Wduplicate-decl-specifier]
int foo(const char const *data);
   ^
1 warning generated.
$ /opt/clang-3.6.2/bin/clang -xc++ -S -std=c++11 test.c
test.c:1:20: warning: duplicate 'const' declaration specifier
[-Wduplicate-decl-specifier]
int foo(const char const *data);
   ^~
1 warning generated.

"-pedantic-errors" turns this warning into an error in C++ and C89 (but not
C99).

Recently I came across such problem in the code base, which I work with. In
that case it was clearly a mistake, because the author meant 'const char *const
data', so it would be nice if GCC could warn about this.

[patch] Reduce space and time overhead of std::thread

2015-09-23 Thread Jonathan Wakely


For PR 65393 I avoided some unnecessary shared_ptr copies while
launching a std::thread. This goes further and avoids shared_ptr
entirely, using unique_ptr instead. This reduces the memory overhead
of a std::thread by 32 bytes (on 64-bit) and avoids any
reference-count updates.

The downside is it exports some new symbols, and we have to keep the
old code for backwards compatibility, but I think it's worth doing.

Does anybody disagree?



commit 2d7e89aae8ac12dd7a6b2083e5169679c1200cc5
Author: Jonathan Wakely 
Date:   Thu Mar 12 13:23:23 2015 +

Reduce space and time overhead of std::thread

	PR libstdc++/65393
	* config/abi/pre/gnu.ver: Export new symbols.
	* include/std/thread (thread::_State, thread::_State_impl): New types.
	(thread::_M_start_thread): Add overload taking unique_ptr<_State>.
	(thread::_M_make_routine): Remove.
	(thread::_S_make_state): Add.
	(thread::_Impl_base, thread::_Impl, thread::_M_start_thread)
	[_GLIBCXX_THREAD_ABI_COMPAT] Only declare conditionally.
	* src/c++11/thread.cc (execute_native_thread_routine): Rename to
	execute_native_thread_routine_compat and re-define to use _State.
	(thread::_State::~_State()): Define.
	(thread::_M_make_thread): Define new overload.
	(thread::_M_make_thread) [_GLIBCXX_THREAD_ABI_COMPAT]: Only define old
	overloads conditionally.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index d42cd37..08d9bc6 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1870,6 +1870,11 @@ GLIBCXX_3.4.22 {
 # std::uncaught_exceptions()
 _ZSt19uncaught_exceptionsv;
 
+# std::thread::_State::~_State()
+_ZT[ISV]NSt6thread6_StateE;
+_ZNSt6thread6_StateD[012]Ev;
+_ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE;
+
 } GLIBCXX_3.4.21;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index ebbda62..c67ec46 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -60,9 +60,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class thread
   {
   public:
+// Abstract base class for types that wrap arbitrary functors to be
+// invoked in the new thread of execution.
+struct _State
+{
+  virtual ~_State();
+  virtual void _M_run() = 0;
+};
+using _State_ptr = unique_ptr<_State>;
+
 typedef __gthread_t			native_handle_type;
-struct _Impl_base;
-typedef shared_ptr<_Impl_base>	__shared_base_type;
 
 /// thread::id
 class id
@@ -92,29 +99,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	operator<<(basic_ostream<_CharT, _Traits>& __out, thread::id __id);
 };
 
-// Simple base type that the templatized, derived class containing
-// an arbitrary functor can be converted to and called.
-struct _Impl_base
-{
-  __shared_base_type	_M_this_ptr;
-
-  inline virtual ~_Impl_base();
-
-  virtual void _M_run() = 0;
-};
-
-template
-  struct _Impl : public _Impl_base
-  {
-	_Callable		_M_func;
-
-	_Impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f))
-	{ }
-
-	void
-	_M_run() { _M_func(); }
-  };
-
   private:
 id_M_id;
 
@@ -133,16 +117,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   thread(_Callable&& __f, _Args&&... __args)
   {
 #ifdef GTHR_ACTIVE_PROXY
-	// Create a reference to pthread_create, not just the gthr weak symbol
-_M_start_thread(_M_make_routine(std::__bind_simple(
-std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)),
-	reinterpret_cast(_create));
+	// Create a reference to pthread_create, not just the gthr weak symbol.
+	auto __depend = reinterpret_cast(_create);
 #else
-_M_start_thread(_M_make_routine(std::__bind_simple(
-std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)));
+	auto __depend = nullptr;
 #endif
+_M_start_thread(_S_make_state(
+	  std::__bind_simple(std::forward<_Callable>(__f),
+ std::forward<_Args>(__args)...)),
+	__depend);
   }
 
 ~thread()
@@ -190,23 +173,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 hardware_concurrency() noexcept;
 
   private:
+template
+  struct _State_impl : public _State
+  {
+	_Callable		_M_func;
+
+	_State_impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f))
+	{ }
+
+	void
+	_M_run() { _M_func(); }
+  };
+
+void
+_M_start_thread(_State_ptr, void (*)());
+
+template
+  static _State_ptr
+  _S_make_state(_Callable&& __f)
+  {
+	using _Impl = _State_impl<_Callable>;
+	return _State_ptr{new _Impl{std::forward<_Callable>(__f)}};
+  }
+#if _GLIBCXX_THREAD_ABI_COMPAT
+  public:
+struct _Impl_base;
+typedef shared_ptr<_Impl_base>	__shared_base_type;
+struct _Impl_base
+

[gomp4 6/8] libgomp: provide stub bar.h on nvptx

2015-09-23 Thread Alexander Monakov

This stub header only provides empty struct gomp_barrier_t.  For now I've
punted on providing a minimally-correct implementation.

* config/nvptx/bar.h: New file.
---
 libgomp/config/nvptx/bar.h | 38 ++
 1 file changed, 38 insertions(+)
 create mode 100644 libgomp/config/nvptx/bar.h

diff --git a/libgomp/config/nvptx/bar.h b/libgomp/config/nvptx/bar.h
new file mode 100644
index 000..009d85f
--- /dev/null
+++ b/libgomp/config/nvptx/bar.h
@@ -0,0 +1,38 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is an NVPTX specific implementation of a barrier synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation is a stub, for now.  */
+
+#ifndef GOMP_BARRIER_H
+#define GOMP_BARRIER_H 1
+
+typedef struct
+{
+} gomp_barrier_t;
+
+typedef unsigned int gomp_barrier_state_t;
+
+#endif /* GOMP_BARRIER_H */

[gomp4 7/8] libgomp: work around missing pthread_attr_t on nvptx

2015-09-23 Thread Alexander Monakov

Although newlib headers define most pthreads types, pthread_attr_t is not
available.  Macro-replace it by 'void' to keep the prototype of
gomp_init_thread_affinity unchanged, and do not declare gomp_thread_attr.

* libgomp.h: Define pthread_attr_t to void on NVPTX.
---
 libgomp/libgomp.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index d51b08b..f4255b4 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -510,8 +510,13 @@ static inline struct gomp_task_icv *gomp_icv (bool write)
 return _global_icv;
 }
 
+#ifdef __nvptx__
+/* pthread_attr_t is not provided by newlib on NVPTX.  */
+#define pthread_attr_t void
+#else
 /* The attributes to be used during thread creation.  */
 extern pthread_attr_t gomp_thread_attr;
+#endif
 
 /* Function prototypes.  */

[gomp4 8/8] libgomp: provide ICVs via env.c on nvptx

2015-09-23 Thread Alexander Monakov

This patch ports env.c to NVPTX.  It drops all environment parsing routines
since there's no "environment" on the device.  For now, the useful effect of
the patch is providing 'omp_is_initial_device' to distinguish host execution
from target execution in user code.

Several functions use gomp_icv, which is not adjusted for NVPTX and thus will
try to use EMUTLS.  The intended way forward is to provide a custom
implementation of gomp_icv on NVPTX, likely via pre-allocating storage prior
to spawning a team.

* config/nvptx/env.c: New file.
---
 libgomp/config/nvptx/env.c | 219 +
 1 file changed, 219 insertions(+)

diff --git a/libgomp/config/nvptx/env.c b/libgomp/config/nvptx/env.c
index e69de29..f964b29 100644
--- a/libgomp/config/nvptx/env.c
+++ b/libgomp/config/nvptx/env.c
@@ -0,0 +1,219 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This file defines the OpenMP internal control variables.  There is
+   no environment on the accelerator, so the variables can be changed
+   only via OpenMP API in target regions.  */
+
+#include "libgomp.h"
+#include "libgomp_f.h"
+
+#include 
+
+struct gomp_task_icv gomp_global_icv = {
+  .nthreads_var = 1,
+  .thread_limit_var = UINT_MAX,
+  .run_sched_var = GFS_DYNAMIC,
+  .run_sched_modifier = 1,
+  .default_device_var = 0,
+  .dyn_var = false,
+  .nest_var = false,
+  .bind_var = omp_proc_bind_false,
+  .target_data = NULL
+};
+
+unsigned long gomp_max_active_levels_var = INT_MAX;
+unsigned long gomp_available_cpus = 1, gomp_managed_threads = 1;
+unsigned long long gomp_spin_count_var, gomp_throttled_spin_count_var;
+unsigned long *gomp_nthreads_var_list, gomp_nthreads_var_list_len;
+char *gomp_bind_var_list;
+unsigned long gomp_bind_var_list_len;
+void **gomp_places_list;
+unsigned long gomp_places_list_len;
+int gomp_debug_var;
+
+void
+omp_set_num_threads (int n)
+{
+  struct gomp_task_icv *icv = gomp_icv (true);
+  icv->nthreads_var = (n > 0 ? n : 1);
+}
+
+void
+omp_set_dynamic (int val)
+{
+  struct gomp_task_icv *icv = gomp_icv (true);
+  icv->dyn_var = val;
+}
+
+int
+omp_get_dynamic (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->dyn_var;
+}
+
+void
+omp_set_nested (int val)
+{
+  struct gomp_task_icv *icv = gomp_icv (true);
+  icv->nest_var = val;
+}
+
+int
+omp_get_nested (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->nest_var;
+}
+
+void
+omp_set_schedule (omp_sched_t kind, int modifier)
+{
+  struct gomp_task_icv *icv = gomp_icv (true);
+  switch (kind)
+{
+case omp_sched_static:
+  if (modifier < 1)
+   modifier = 0;
+  icv->run_sched_modifier = modifier;
+  break;
+case omp_sched_dynamic:
+case omp_sched_guided:
+  if (modifier < 1)
+   modifier = 1;
+  icv->run_sched_modifier = modifier;
+  break;
+case omp_sched_auto:
+  break;
+default:
+  return;
+}
+  icv->run_sched_var = kind;
+}
+
+void
+omp_get_schedule (omp_sched_t *kind, int *modifier)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  *kind = icv->run_sched_var;
+  *modifier = icv->run_sched_modifier;
+}
+
+int
+omp_get_max_threads (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->nthreads_var;
+}
+
+int
+omp_get_thread_limit (void)
+{
+  struct gomp_task_icv *icv = gomp_icv (false);
+  return icv->thread_limit_var > INT_MAX ? INT_MAX : icv->thread_limit_var;
+}
+
+void
+omp_set_max_active_levels (int max_levels)
+{
+  if (max_levels >= 0)
+gomp_max_active_levels_var = max_levels;
+}
+
+int
+omp_get_max_active_levels (void)
+{
+  return gomp_max_active_levels_var;
+}
+
+int
+omp_get_cancellation (void)
+{
+  return 0;
+}
+
+omp_proc_bind_t
+omp_get_proc_bind (void)
+{
+  return omp_proc_bind_false;
+}
+
+void
+omp_set_default_device (int device_num __attribute__((unused)))
+{
+}
+
+int
+omp_get_default_device (void)
+{
+  return 0;
+}
+
+int
+omp_get_num_devices

[Bug c/67661] Wrong warning when declare VLAs: operation on 'x' may be undefined [-Wsequence-point]

2015-09-23 Thread joseph at codesourcery dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67661

--- Comment #1 from joseph at codesourcery dot com  ---
You'll need to give a full testcase (complete compilable file and options 
used to compile it).  What you gave isn't a compilable testcase; it gives 
"error: variably modified 'y' at file scope".  Put inside a function, it 
gives "warning: unused variable 'y' [-Wunused-variable]", but does not 
give the warning you mention.  And there's no variable 'b' in your example 
at all.

[Bug c/43651] add warning for duplicate qualifier

2015-09-23 Thread manu at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43651

--- Comment #7 from Manuel López-Ibáñez  ---
> Recently I came across such problem in the code base, which I work with. In
> that case it was clearly a mistake, because the author meant 'const char
> *const data', so it would be nice if GCC could warn about this.

I think that a patch that warns for 

const Type const x;

but not for:

typedef Type const X;
const X x;

and neither for

#define TYPE Type const
const TYPE;

would be accepted. But someone has to write it.

Re: (patch,rfc) s/gimple/gimple */

2015-09-23 Thread Thomas Schwinge

Hi!

On Sat, 19 Sep 2015 20:55:35 -0400, Trevor Saunders  
wrote:
> On Fri, Sep 18, 2015 at 09:32:37AM -0600, Jeff Law wrote:
> > On 09/18/2015 07:32 AM, Trevor Saunders wrote:
> > >On Wed, Sep 16, 2015 at 03:11:14PM -0400, David Malcolm wrote:
> > >>On Wed, 2015-09-16 at 09:16 -0400, Trevor Saunders wrote:
> > >>>I gave changing from gimple to gimple * a shot last week.

> ok, its committed now :)

I guess the following should also be adjusted?

gcc/doc/gimple.texi:@subsection @code{gimple_statement_base} (gsbase)
gcc/doc/gimple.texi:@cindex gimple_statement_base
gcc/doc/gimple.texi:Inherited from @code{struct gimple_statement_base}.
gcc/doc/gimple.texi:   gimple_statement_base
gcc/doc/gimple.texi:codes.  Then you must add a corresponding 
gimple_statement_base subclass
gcc/doc/gimple.texi:as a pointer to the appropriate gimple_statement_base 
subclass.
gcc/gdbhooks.py:pp.add_printer_for_types(['gimple', 
'gimple_statement_base *',
gcc/gimple.h:   always stored in gimple_statement_base.subcode and they may 
only be
gcc/gimple.h: This is different than the BLOCK field in 
gimple_statement_base,
gcc/gimple.h:   Note: This is based on gimple_statement_base, not g_s_omp, 
because g_s_omp

gcc/doc/gimple.texi:@subsection @code{gimple_statement_base} (gsbase)
gcc/doc/gimple.texi:@item@code{gsbase}   @tab 256
gcc/doc/gimple.texi:@item @code{gsbase}
gcc/doc/gimple.texi:@item @code{gsbase}  @tab 256

Grüße,
 Thomas

signature.asc
Description: PGP signature

[gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis

Gang, worker, vector and collapse all contain optional arguments which
may be used during loop expansion. In OpenACC, those expressions could
contain variables, but those variables aren't always getting remapped
automatically. This patch remaps those variables inside lower_omp_loop.

Note that I didn't need to use a tree walker for more complicated
expressions because it's not required. By the time those clauses reach
lower_omp_loop, only the result of the expression is available. So the
other variables in those expressions get remapped with everything else
during omplow. Therefore, the only problematic case is when the the
optional expression is just a decl, e.g. gang(static:foo).

I've applied this patch to gomp-4_0-branch.

Cesar

Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249) (take

2015-09-23 Thread Marek Polacek

On Tue, Sep 22, 2015 at 03:33:34PM -0600, Martin Sebor wrote:
> It's fine by me (for whatever it's worth).

Thanks.  Let's wait if Jason/Joseph or anyone else wants to chime in.

> Btw., if you're unhappy about having to wipe out the whole chain
> after every side-effect it occurred to me that it might be possible
> to do better: instead of deleting the whole chain, only remove from
> it the elements that may be affected by the side-effect. This should
> make it possible to keep on the chain all conditions involving local
> variables whose address hasn't been taken, which I would expect to
> be most in most cases.

I'm not unhappy about deleting the chain ;).  I'd rather not do that
because that might get somewhat hairy.  First, I don't think we have
the capability to easily detect variables whose address hasn't been
taken, second, consider e.g.

  if (j == 4) // ...
  else if ((j++, --k, ++l)) // ...
  else if (bar (j, )) // ...

we'd probably need some walk_tree, save the variables temporarily somewhere
etc.; that might slow and complicate things for a corner case.  Or am I being
just too lazy? ;)

Thanks,

Marek

[Bug go/67695] New: Please improve POSIX shell compatibility of libgo/mksysinfo.sh

2015-09-23 Thread ryo_on at yk dot rim.or.jp

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67695

Bug ID: 67695
   Summary: Please improve POSIX shell compatibility of
libgo/mksysinfo.sh
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: go
  Assignee: ian at airs dot com
  Reporter: ryo_on at yk dot rim.or.jp
CC: cmang at google dot com
  Target Milestone: ---

Created attachment 36379
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36379=edit
Use test = instead of test == in libgo/mksysinfo.sh

Please use POSIX test = notation instead of BASH test == in libgo/mksysinfo.sh.

[Bug boehm-gc/66848] boehm-gc fails test suite on x86_64-apple-darwin15

2015-09-23 Thread howarth.at.gcc at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66848

--- Comment #9 from Jack Howarth  ---
Note that the earliest upstream boehm-gc release which builds and passes make
check on 10.11 is gc-7.2.tar.gz from http://www.hboehm.info/gc/gc_source/.
Diffing the current boehm-gc sources in gcc trunk suggests that the current FSF
boehm-gc is based on 7.1 (for at least mach_dep.c, etc).

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-23 Thread Alan Hayward



On 18/09/2015 14:53, "Alan Hayward"  wrote:

>
>
>On 18/09/2015 14:26, "Alan Lawrence"  wrote:
>
>>On 18/09/15 13:17, Richard Biener wrote:
>>>
>>> Ok, I see.
>>>
>>> That this case is already vectorized is because it implements MAX_EXPR,
>>> modifying it slightly to
>>>
>>> int foo (int *a)
>>> {
>>>int val = 0;
>>>for (int i = 0; i < 1024; ++i)
>>>  if (a[i] > val)
>>>val = a[i] + 1;
>>>return val;
>>> }
>>>
>>> makes it no longer handled by current code.
>>>
>>
>>Yes. I believe the idea for the patch is to handle arbitrary expressions
>>like
>>
>>int foo (int *a)
>>{
>>int val = 0;
>>for (int i = 0; i < 1024; ++i)
>>  if (some_expression (i))
>>val = another_expression (i);
>>return val;
>>}
>
>Yes, that’s correct. Hopefully my new test cases should cover everything.
>

Attached is a new version of the patch containing all the changes
requested by Richard.


Thanks,
Alan.




0001-Support-for-vectorizing-conditional-expressions.patch
Description: Binary data

[Bug sanitizer/64888] ubsan doesn't work with openmp

2015-09-23 Thread mpolacek at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64888

--- Comment #2 from Marek Polacek  ---
Started looking into this.  Seems that we should handle specially those
artificial ubsan decls such as *.Lubsan_data in omp_default_clause, i.e. treat
them as shared?  But I hardly know what I'm talking about; inviting Jakub to
comment.

libgo patch committed: rewrite lfstack to look more like gc code

2015-09-23 Thread Ian Lance Taylor

This patch by Michael Hudson-Doyle rewrites the lfstack code in libgo
to look more like that in the gc library.  It also fixes it for arm64.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227863)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e069d4417a692c1261df99fe3323277e1a0193d2
+2087b95180caea3477647c449772b7fecc01a71c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/lfstack.goc
===
--- libgo/runtime/lfstack.goc   (revision 227696)
+++ libgo/runtime/lfstack.goc   (working copy)
@@ -9,25 +9,41 @@ package runtime
 #include "arch.h"
 
 #if __SIZEOF_POINTER__ == 8
-// Amd64 uses 48-bit virtual addresses, 47-th bit is used as kernel/user flag.
-// So we use 17msb of pointers as ABA counter.
-# define PTR_BITS 47
-#else
-# define PTR_BITS 32
-#endif
-#define PTR_MASK ((1ull<> CNT_BITS) << 3);
+}
+#else
+static inline uint64 lfPack(LFNode *node, uintptr cnt) {
+   return ((uint64)(uintptr)(node)<<32) | cnt;
+}
+static inline LFNode* lfUnpack(uint64 val) {
+   return (LFNode*)(uintptr)(val >> 32);
+}
 #endif
 
 void
@@ -35,16 +51,16 @@ runtime_lfstackpush(uint64 *head, LFNode
 {
uint64 old, new;
 
-   if((uintptr)node != ((uintptr)node_MASK)) {
+   if(node != lfUnpack(lfPack(node, 0))) {
runtime_printf("p=%p\n", node);
runtime_throw("runtime_lfstackpush: invalid pointer");
}
 
node->pushcnt++;
-   new = 
(uint64)(uintptr)node|(((uint64)node->pushcnt_MASK)<pushcnt);
for(;;) {
old = runtime_atomicload64(head);
-   node->next = (LFNode*)(uintptr)(old_MASK);
+   node->next = lfUnpack(old);
if(runtime_cas64(head, old, new))
break;
}
@@ -60,11 +76,11 @@ runtime_lfstackpop(uint64 *head)
old = runtime_atomicload64(head);
if(old == 0)
return nil;
-   node = (LFNode*)(uintptr)(old_MASK);
+   node = lfUnpack(old);
node2 = runtime_atomicloadp(>next);
new = 0;
if(node2 != nil)
-   new = 
(uint64)(uintptr)node2|(((uint64)node2->pushcnt_MASK)<pushcnt);
if(runtime_cas64(head, old, new))
return node;
}

[PATCH][tree-inline][obvious] Delete redundant count_insns_seq

2015-09-23 Thread Kyrill Tkachov


Hi all,

I notice that the functions count_insns_seq and estimate_num_insns_seq perform 
the exact same function for exactly the same arguments.
It's redundant to keep both around. I've decided to delete count_insns_seq and 
replace its one use by estimate_num_insns_seq.

Bootstrapped and tested on aarch64, x86_64.
I think this change is obvious, so I'll commit it in 24 hours unless someone 
objects.

Thanks,
Kyrill

2015-09-23  Kyrylo Tkachov  

* tree-inline.h (count_insns_seq): Delete prototype.
(estimate_num_insns_seq): Define prototype.
* tree-inline.c (count_insns_seq): Delete.
(estimate_num_insns_seq): Remove static qualifier.
* tree-eh.c (decide_copy_try_finally): Replace use of count_insns_seq
with estimate_num_insns_seq.
commit b4266c4bd350628fe5d333998b7a76a7d4ab2ad5
Author: Kyrylo Tkachov 
Date:   Wed Sep 23 12:14:46 2015 +0100

[tree-inline] Delete redundant count_insns_seq

diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
index c19d2be..cb1f08a 100644
--- a/gcc/tree-eh.c
+++ b/gcc/tree-eh.c
@@ -1621,7 +1621,7 @@ decide_copy_try_finally (int ndests, bool may_throw, gimple_seq finally)
 }
 
   /* Finally estimate N times, plus N gotos.  */
-  f_estimate = count_insns_seq (finally, _size_weights);
+  f_estimate = estimate_num_insns_seq (finally, _size_weights);
   f_estimate = (f_estimate + 1) * ndests;
 
   /* Switch statement (cost 10), N variable assignments, N gotos.  */
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index abaea3f..36075b2 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3972,8 +3972,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
the statements in the statement sequence STMTS.
WEIGHTS contains weights attributed to various constructs.  */
 
-static
-int estimate_num_insns_seq (gimple_seq stmts, eni_weights *weights)
+int
+estimate_num_insns_seq (gimple_seq stmts, eni_weights *weights)
 {
   int cost;
   gimple_stmt_iterator gsi;
@@ -4262,19 +4262,6 @@ init_inline_once (void)
   eni_time_weights.return_cost = 2;
 }
 
-/* Estimate the number of instructions in a gimple_seq. */
-
-int
-count_insns_seq (gimple_seq seq, eni_weights *weights)
-{
-  gimple_stmt_iterator gsi;
-  int n = 0;
-  for (gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next ())
-n += estimate_num_insns (gsi_stmt (gsi), weights);
-
-  return n;
-}
-
 
 /* Install new lexical TREE_BLOCK underneath 'current_block'.  */
 
diff --git a/gcc/tree-inline.h b/gcc/tree-inline.h
index f0e5436..b8fb2a2 100644
--- a/gcc/tree-inline.h
+++ b/gcc/tree-inline.h
@@ -207,7 +207,7 @@ tree copy_decl_no_change (tree decl, copy_body_data *id);
 int estimate_move_cost (tree type, bool);
 int estimate_num_insns (gimple *, eni_weights *);
 int estimate_num_insns_fn (tree, eni_weights *);
-int count_insns_seq (gimple_seq, eni_weights *);
+int estimate_num_insns_seq (gimple_seq, eni_weights *);
 bool tree_versionable_function_p (tree);
 extern tree remap_decl (tree decl, copy_body_data *id);
 extern tree remap_type (tree type, copy_body_data *id);

[Bug tree-optimization/67690] [5/6 Regression] wrong code with -O2 on x86_64/Linux

2015-09-23 Thread mpolacek at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67690

Marek Polacek  changed:

   What|Removed |Added

 CC||mpolacek at gcc dot gnu.org
   Target Milestone|--- |5.3
Summary|wrong code with -O2 on  |[5/6 Regression] wrong code
   |x86_64/Linux|with -O2 on x86_64/Linux

--- Comment #2 from Marek Polacek  ---
Confirmed.  Started with r220164.

[Bug tree-optimization/67690] [5/6 Regression] wrong code with -O2 on x86_64/Linux

2015-09-23 Thread mpolacek at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67690

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-09-23
 Ever confirmed|0   |1

[gomp4 2/8] nvptx mkoffload: do not restrict to OpenACC

2015-09-23 Thread Alexander Monakov

This patch allows to meaningfully invoke mkoffload with -fopenmp.  The check
for -fopenacc flag is specific to gomp4 branch: trunk does not have it.

* config/nvptx/mkoffload.c (main): Do not check for -fopenacc.
---
 gcc/config/nvptx/mkoffload.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 0114394..8c15686 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -468,15 +468,12 @@ main (int argc, char **argv)
   obstack_ptr_grow (_obstack, str);
 }
 
-  bool fopenacc = false;
   for (int ix = 1; ix != argc; ix++)
 {
   if (!strcmp (argv[ix], "-v"))
verbose = true;
   else if (!strcmp (argv[ix], "-save-temps"))
save_temps = true;
-  else if (!strcmp (argv[ix], "-fopenacc"))
-   fopenacc = true;
 
   if (!strcmp (argv[ix], "-o") && ix + 1 != argc)
outname = argv[++ix];
@@ -491,8 +488,8 @@ main (int argc, char **argv)
 fatal_error (input_location, "cannot open '%s'", ptx_cfile_name);
 
   /* PR libgomp/65099: Currently, we only support offloading in 64-bit
- configurations, and only for OpenACC offloading.  */
-  if (!target_ilp32 && fopenacc)
+ configurations.  */
+  if (!target_ilp32)
 {
   ptx_name = make_temp_file (".mkoffload");
   obstack_ptr_grow (_obstack, "-o");

[gomp4 0/8] NVPTX: initial OpenMP offloading

2015-09-23 Thread Alexander Monakov

Hello,

This patch series implements some minimally required changes to have OpenMP
offloading working for NVPTX target on the gomp4 branch.  '#pragma omp target'
and data updates should work, but all parallel execution functionality remains
stubbed out (uses of '#pragma omp parallel' in target regions yield a link
error).

I'd like to get feedback on the patches, and approval for the gomp-4_0-branch
where possible.

Patches 1-2 unbreak compilation with offloading, patch 4 allows to invoke a
target region on the accelerator, patches 5-8 unbreak libgomp.h and allow
env.c to be compiled for the accelerator.

  nvptx: remove assumption of OpenACC attrs presence
  nvptx mkoffload: do not restrict to OpenACC
  libgomp: provide target-to-host fallback diagnostic
  libgomp: minimal OpenMP support in plugin-nvptx.c
  libgomp: provide sem.h, mutex.h, ptrlock.h on nvptx
  libgomp: provide stub bar.h on nvptx
  libgomp: work around missing pthread_attr_t on nvptx
  libgomp: provide ICVs via env.c on nvptx

 gcc/config/nvptx/mkoffload.c   |   7 +-
 gcc/config/nvptx/nvptx.c   |  19 ++--
 libgomp/config/nvptx/bar.h |  38 +++
 libgomp/config/nvptx/env.c | 219 +
 libgomp/config/nvptx/mutex.h   |  67 +
 libgomp/config/nvptx/ptrlock.h |  73 ++
 libgomp/config/nvptx/sem.h |  65 
 libgomp/libgomp.h  |   5 +
 libgomp/plugin/plugin-nvptx.c  |  30 +-
 libgomp/target.c   |   1 +
 10 files changed, 508 insertions(+), 16 deletions(-)
 create mode 100644 libgomp/config/nvptx/bar.h
 create mode 100644 libgomp/config/nvptx/mutex.h
 create mode 100644 libgomp/config/nvptx/ptrlock.h
 create mode 100644 libgomp/config/nvptx/sem.h

[gomp4 3/8] libgomp: provide target-to-host fallback diagnostic

2015-09-23 Thread Alexander Monakov

This patch allows to see when target regions are executed on host with
GOMP_DEBUG=1 in the environment.

* target.c (GOMP_target): Use gomp_debug on fallback path.
---
 libgomp/target.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/target.c b/libgomp/target.c
index 6ca80ad..1cc2098 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1008,6 +1008,7 @@ GOMP_target (int device, void (*fn) (void *), const void 
*unused,
   || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
 {
   /* Host fallback.  */
+  gomp_debug (0, "%s: target region executing on host\n", __FUNCTION__);
   struct gomp_thread old_thr, *thr = gomp_thread ();
   old_thr = *thr;
   memset (thr, '\0', sizeof (*thr));

[gomp4 4/8] libgomp: minimal OpenMP support in plugin-nvptx.c

2015-09-23 Thread Alexander Monakov

This is a minimal patch for NVPTX OpenMP offloading, using Jakub's initial
implementation.  It allows to successfully run '#pragma omp target', without
any parallel execution: 1 team of 1 thread is spawned on the device, and
target regions with '#pragma omp parallel' will fail with a link error.

* plugin/plugin-nvptx.c (nvptx_host2dev): Allow NULL 'nvthd'.
(nvptx_dev2host): Ditto.
(GOMP_OFFLOAD_get_caps): Add GOMP_OFFLOAD_CAP_OPENMP_400.
(GOMP_OFFLOAD_run): New.
---
 libgomp/plugin/plugin-nvptx.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 52c49c7..a3eaafa 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1052,7 +1052,7 @@ nvptx_host2dev (void *d, const void *h, size_t s)
 GOMP_PLUGIN_fatal ("invalid size");
 
 #ifndef DISABLE_ASYNC
-  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+  if (nvthd && nvthd->current_stream != nvthd->ptx_dev->null_stream)
 {
   CUevent *e;
 
@@ -1117,7 +1117,7 @@ nvptx_dev2host (void *h, const void *d, size_t s)
 GOMP_PLUGIN_fatal ("invalid size");
 
 #ifndef DISABLE_ASYNC
-  if (nvthd->current_stream != nvthd->ptx_dev->null_stream)
+  if (nvthd && nvthd->current_stream != nvthd->ptx_dev->null_stream)
 {
   CUevent *e;
 
@@ -1451,7 +1451,7 @@ GOMP_OFFLOAD_get_name (void)
 unsigned int
 GOMP_OFFLOAD_get_caps (void)
 {
-  return GOMP_OFFLOAD_CAP_OPENACC_200;
+  return GOMP_OFFLOAD_CAP_OPENACC_200 | GOMP_OFFLOAD_CAP_OPENMP_400;
 }
 
 int
@@ -1788,3 +1788,27 @@ GOMP_OFFLOAD_openacc_set_cuda_stream (int async, void 
*stream)
 {
   return nvptx_set_cuda_stream (async, stream);
 }
+
+void
+GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars)
+{
+  CUfunction function = ((struct targ_fn_descriptor *) tgt_fn)->fn;
+  CUresult r;
+  struct ptx_device *ptx_dev = ptx_devices[ord];
+  const char *maybe_abort_msg = "(perhaps abort was called)";
+  void *args = _vars;
+
+  r = cuLaunchKernel (function,
+ 1, 1, 1,
+ 1, 1, 1,
+ 0, ptx_dev->null_stream->stream, , 0);
+  if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuLaunchKernel error: %s", cuda_error (r));
+
+  r = cuCtxSynchronize ();
+  if (r == CUDA_ERROR_LAUNCH_FAILED)
+GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s %s\n", cuda_error (r),
+  maybe_abort_msg);
+  else if (r != CUDA_SUCCESS)
+GOMP_PLUGIN_fatal ("cuCtxSynchronize error: %s", cuda_error (r));
+}

[gomp4 1/8] nvptx: remove assumption of OpenACC attrs presence

2015-09-23 Thread Alexander Monakov

This patch makes one OpenACC-specific path in nvptx_record_offload_symbol
optional.

* config/nvptx/nvptx.c (nvptx_record_offload_symbol): Allow missing
OpenACC attributes.
---
 gcc/config/nvptx/nvptx.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 53850a1..21c59ef 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4026,19 +4026,22 @@ nvptx_record_offload_symbol (tree decl)
 
 case FUNCTION_DECL:
   {
-   tree attr = get_oacc_fn_attrib (decl);
-   tree dims = TREE_VALUE (attr);
-   unsigned ix;
-   
fprintf (asm_out_file, "//:FUNC_MAP \"%s\"",
 IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
 
-   for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
+   tree attr = get_oacc_fn_attrib (decl);
+   if (attr)
  {
-   int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+   tree dims = TREE_VALUE (attr);
+   unsigned ix;
 
-   gcc_assert (!TREE_PURPOSE (dims));
-   fprintf (asm_out_file, ", %#x", size);
+   for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
+   {
+ int size = TREE_INT_CST_LOW (TREE_VALUE (dims));
+
+ gcc_assert (!TREE_PURPOSE (dims));
+ fprintf (asm_out_file, ", %#x", size);
+   }
  }
 
fprintf (asm_out_file, "\n");

Re: [ubsan PATCH] Fix uninitialized var issue (PR sanitizer/64906)

2015-09-23 Thread Marek Polacek

On Wed, Sep 23, 2015 at 01:08:53PM +0200, Bernd Schmidt wrote:
> On 09/22/2015 05:11 PM, Marek Polacek wrote:
> 
> >diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
> >index e0cce84..d2bc264 100644
> >--- gcc/c-family/c-ubsan.c
> >+++ gcc/c-family/c-ubsan.c
> >@@ -104,6 +104,7 @@ ubsan_instrument_division (location_t loc, tree op0, 
> >tree op1)
> > }
> >  }
> >t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op0), t);
> >+  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op1), t);
> >if (flag_sanitize_undefined_trap_on_error)
> >  tt = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 
> > 0);
> >else
> 
> I really don't know this code, but just before the location you're patching,
> there's this:
> 
>   /* In case we have a SAVE_EXPR in a conditional context, we need to
>  make sure it gets evaluated before the condition.  If the OP0 is
>  an instrumented array reference, mark it as having side effects so
>  it's not folded away.  */
>   if (flag_sanitize & SANITIZE_BOUNDS)
> {
>   tree xop0 = op0;
>   while (CONVERT_EXPR_P (xop0))
> xop0 = TREE_OPERAND (xop0, 0);
>   if (TREE_CODE (xop0) == ARRAY_REF)
> {
>   TREE_SIDE_EFFECTS (xop0) = 1;
>   TREE_SIDE_EFFECTS (op0) = 1;
> }
> }
> 
> Does that need to be done for op1 as well? (I really wonder why this is
> needed or whether it's sufficient to find such an ARRAY_REF if you can have
> more complex operands).
 
Good point.  I've dug into this and that hunk doesn't seem to be needed
(anymore?).  I suppose there was a reason I added that, but removing it
doesn't seem to break anything.  It can be triggered with a code like:

struct S
{
  unsigned long a[1];
  int l;
};

static inline unsigned long
fn (const struct S *s, int i)
{
  return s->a[i] / i;
}

int
main ()
{
  struct S s;
  fn (, 1);
}

With the hunk, we sanitize the same array twice -- that's "suboptimal".  With
the hunk removed, we sanitize the array just once as expected.

> The same pattern occurs in another function, so it may be best to break it
> out into a new function if additional occurrences are necessary.

Given that the code above seems to be useless now, I think let's put this
patch in as-is, backport it to gcc-5, then remove those redundant hunks on
trunk and add the testcase above.  Do you agree?

Marek

[gomp4 5/8] libgomp: provide sem.h, mutex.h, ptrlock.h on nvptx

2015-09-23 Thread Alexander Monakov

This patch provides minimal non-stub implementations for libgomp
mutex/ptrlock/semaphore, using atomic ops and busy waiting.  The goal here is
to at least provide stub struct declarations necessary to unbreak libgomp.h.

Atomics with busy waiting seems to be the only way to provide such primitives
for inter-team synchronizations, but for intra-team ops a more efficient
implementation may be possible.

(all functionality is unused since consumers are stubbed out in config/nvptx)

* config/nvptx/mutex.h: New file.
* config/nvptx/ptrlock.h: New file.
* config/nvptx/sem.h: New file.
---
 libgomp/config/nvptx/mutex.h   | 67 ++
 libgomp/config/nvptx/ptrlock.h | 73 ++
 libgomp/config/nvptx/sem.h | 65 +
 3 files changed, 205 insertions(+)
 create mode 100644 libgomp/config/nvptx/mutex.h
 create mode 100644 libgomp/config/nvptx/ptrlock.h
 create mode 100644 libgomp/config/nvptx/sem.h

diff --git a/libgomp/config/nvptx/mutex.h b/libgomp/config/nvptx/mutex.h
new file mode 100644
index 000..a98d5a9
--- /dev/null
+++ b/libgomp/config/nvptx/mutex.h
@@ -0,0 +1,67 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Alexander Monakov 
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is an NVPTX specific implementation of a mutex synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation uses atomic instructions and busy waiting.  */
+
+#ifndef GOMP_MUTEX_H
+#define GOMP_MUTEX_H 1
+
+typedef int gomp_mutex_t;
+
+#define GOMP_MUTEX_INIT_0 1
+
+static inline void
+gomp_mutex_init (gomp_mutex_t *mutex)
+{
+  *mutex = 0;
+}
+
+static inline void
+gomp_mutex_destroy (gomp_mutex_t *mutex)
+{
+}
+
+static inline void
+gomp_mutex_lock (gomp_mutex_t *mutex)
+{
+  int value = __atomic_load_n (mutex, MEMMODEL_ACQUIRE);
+  for (;;)
+{
+  while (value == 0)
+   value = __atomic_load_n (mutex, MEMMODEL_ACQUIRE);
+  if (__atomic_compare_exchange_n (mutex, , 1, false,
+  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED))
+   return;
+}
+}
+
+static inline void
+gomp_mutex_unlock (gomp_mutex_t *mutex)
+{
+  __atomic_store_n (mutex, 0, MEMMODEL_RELEASE);
+}
+#endif /* GOMP_MUTEX_H */
diff --git a/libgomp/config/nvptx/ptrlock.h b/libgomp/config/nvptx/ptrlock.h
new file mode 100644
index 000..c4ff033
--- /dev/null
+++ b/libgomp/config/nvptx/ptrlock.h
@@ -0,0 +1,73 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Alexander Monakov 
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is an NVPTX specific implementation of a mutex synchronization
+   mechanism for libgomp.  This type is private to the library.  This
+   implementation uses atomic instructions and busy waiting.
+
+   A ptrlock has four states:
+   0/NULL Initial
+   1  Owned by me, I get to write a

[gomp4] oacc xform updates

2015-09-23 Thread Nathan Sidwell

I've committed this patch to change all the OACC hooks to take a gcall * rather 
than 'gimple'.  mainline has changed the type of 'gimple', and we know we're 
passing a call anyway.  Also updated the rescanning to be more straightforwards.


nathan
2015-09-23  Nathan Sidwell  

	* target.def: GOACC hooks take gcall arg.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_goacc_reduction, default_goacc_fork_join,
	default_coacc_lock): Gimple is a gcall pointer.
	* omp-low.c (oacc_xform_on_device): Arg is a gcall.  Adjust.
	(oacc_xform_dim): Likewise.
	(execute_oacc_transform): Adjust to pass gcall pointer to worker
	functions.  Handle rescan immediately.
	(default_goacc_reduction, default_goacc_fork_join,
	default_coacc_lock): Gimple is a gcall pointer.
	*  config/nvptx/nvptx.c (nvptx_xform_fork_join,
	nvptx_xform_lock, nvptx_goacc_reduction_setup,
	nvptx_goacc_reduction_init, nvptx_goacc_reduction_fini, 
	nvptx_goacc_reduction_teardown, nvptx_goacc_reduction): Argument
	is a gcall, adjust.

Index: target.def
===
--- target.def	(revision 228058)
+++ target.def	(working copy)
@@ -1667,7 +1667,7 @@ DEFHOOK
 calls to target-specific gimple.  It is executed during the oacc_xform\n\
 pass.  It should return true, if the functions should be deleted.  The\n\
 default hook returns true, if there are no RTL expanders for them.",
-bool, (gimple stmt, const int dims[], bool is_fork),
+bool, (gcall *call, const int dims[], bool is_fork),
 default_goacc_fork_join)
 
 DEFHOOK
@@ -1677,7 +1677,7 @@ IFN_GOACC_LOCK_INIT function calls to ta
 executed during the oacc_xform pass.  It should return true, if the\n\
 functions should be deleted.  The default hook returns true, if there\n\
 are no RTL expanders for them.",
-bool, (gimple stmt, const int dims[], unsigned ifn_code),
+bool, (gcall *call, const int dims[], unsigned ifn_code),
 default_goacc_lock)
 
 DEFHOOK
@@ -1692,7 +1692,7 @@ hook removes statement @var{call} after
 inserted.  This hook is also responsible for allocating any storage for\n\
 reductions when necessary.  It returns @var{true} if the expanded\n\
 sequence introduces any calls to OpenACC-specific internal functions.",
-bool, (gimple call),
+bool, (gcall *call),
 default_goacc_reduction)
 
 HOOK_VECTOR_END (goacc)
Index: omp-low.c
===
--- omp-low.c	(revision 228058)
+++ omp-low.c	(working copy)
@@ -14689,9 +14660,9 @@ make_pass_late_lower_omp (gcc::context *
offloaded function we're never 'none'.  */
 
 static void
-oacc_xform_on_device (gimple stmt)
+oacc_xform_on_device (gcall *call)
 {
-  tree arg = gimple_call_arg (stmt, 0);
+  tree arg = gimple_call_arg (call, 0);
   unsigned val = GOMP_DEVICE_HOST;
 	  
 #ifdef ACCEL_COMPILER
@@ -14708,14 +14679,14 @@ oacc_xform_on_device (gimple stmt)
   }
 #endif
   result = fold_convert (integer_type_node, result);
-  tree lhs = gimple_call_lhs (stmt);
+  tree lhs = gimple_call_lhs (call);
   gimple_seq seq = NULL;
 
   push_gimplify_context (true);
   gimplify_assign (lhs, result, );
   pop_gimplify_context (NULL);
 
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
   gsi_replace_with_seq (, seq, false);
 }
 
@@ -14723,9 +14694,9 @@ oacc_xform_on_device (gimple stmt)
constants, where possible.  */
 
 static void
-oacc_xform_dim (gimple stmt, const int dims[], bool is_pos)
+oacc_xform_dim (gcall *call, const int dims[], bool is_pos)
 {
-  tree arg = gimple_call_arg (stmt, 0);
+  tree arg = gimple_call_arg (call, 0);
   unsigned axis = (unsigned)TREE_INT_CST_LOW (arg);
   int size = dims[axis];
 
@@ -14742,11 +14713,11 @@ oacc_xform_dim (gimple stmt, const int d
 }
 
   /* Replace the internal call with a constant.  */
-  tree lhs = gimple_call_lhs (stmt);
+  tree lhs = gimple_call_lhs (call);
   gimple g = gimple_build_assign
 (lhs, build_int_cst (integer_type_node, size));
 
-  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
   gsi_replace (, g, false);
 }
 
@@ -14815,10 +14786,8 @@ oacc_validate_dims (tree fn, tree attrs,
 static unsigned int
 execute_oacc_transform ()
 {
-  basic_block bb;
   tree attrs = get_oacc_fn_attrib (current_function_decl);
   int dims[GOMP_DIM_MAX];
-  bool needs_rescan;
   
   if (!attrs)
 /* Not an offloaded function.  */
@@ -14830,80 +14799,97 @@ execute_oacc_transform ()
  dominance information to update SSA.  */
   calculate_dominance_info (CDI_DOMINATORS);
 
-  do
-{
-  needs_rescan = false;
-
-  FOR_ALL_BB_FN (bb, cfun)
-	for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+  {
+	gimple stmt = gsi_stmt (gsi);
+	int rescan = 0;
+	
+	if (!is_gimple_call (stmt))
 	  {
-	gimple stmt =

Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis

On 09/23/2015 10:42 AM, Cesar Philippidis wrote:

> I've applied this patch to gomp-4_0-branch.

This patch, that is.

Cesar

2015-09-23  Cesar Philippidis  

	gcc/
	* omp-low.c (lower_omp_for): Remap any variables present in
	OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR and
	OMP_CLAUSE_COLLAPSE becuase they will be used later by expand_omp_for.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Test if
	static gang expressions containing variables work.
	* testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ec76096..3f36b7a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11325,6 +11325,35 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (oacc_tail)
 gimple_seq_add_seq (, oacc_tail);
 
+  /* Update the variables inside any clauses which may be involved in loop
+ expansion later on.  */
+  for (tree c = gimple_omp_for_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
+{
+  int args;
+
+  switch (OMP_CLAUSE_CODE (c))
+	{
+	default:
+	  args = 0;
+	  break;
+	case OMP_CLAUSE_GANG:
+	  args = 2;
+	  break;
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_COLLAPSE:
+	  args = 1;
+	  break;
+	}
+
+  for (int i = 0; i < args; i++)
+	{
+	  tree expr = OMP_CLAUSE_OPERAND (c, i);
+	  if (expr && DECL_P (expr))
+	OMP_CLAUSE_OPERAND (c, i) = build_outer_var_ref (expr, ctx);
+	}
+}
+
   pop_gimplify_context (new_stmt);
 
   gimple_bind_append_vars (new_stmt, ctx->block_vars);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
index 3a9a508..20a866d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
@@ -39,7 +39,7 @@ int
 main ()
 {
   int a[N];
-  int i;
+  int i, x;
 
 #pragma acc parallel loop gang (static:*) num_gangs (10)
   for (i = 0; i < 100; i++)
@@ -78,5 +78,21 @@ main ()
 
   test_nonstatic (a, 10);
 
+  /* Static arguments with a variable expression.  */
+
+  x = 20;
+#pragma acc parallel loop gang (static:0+x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  x = 20;
+#pragma acc parallel loop gang (static:x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
index e562535..7d56060 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
@@ -3,6 +3,7 @@
 program main
   integer, parameter :: n = 100
   integer i, a(n), b(n)
+  integer x
 
   do i = 1, n
  b(i) = i
@@ -48,6 +49,23 @@ program main
 
   call test (a, b, 20, n)
 
+  x = 5
+  !$acc parallel loop gang (static:0+x) num_gangs (10)
+  do i = 1, n
+ a(i) = b(i) + 5
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 5, n)
+
+  x = 10
+  !$acc parallel loop gang (static:x) num_gangs (10)
+  do i = 1, n
+ a(i) = b(i) + 10
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 10, n)
 end program main
 
 subroutine test (a, b, sarg, n)

[Bug c++/67693] Spurious warning: control reaches end of non-void function [-Wreturn-type]

2015-09-23 Thread miyuki at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67693

Mikhail Maltsev  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||miyuki at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #1 from Mikhail Maltsev  ---
dup

*** This bug has been marked as a duplicate of bug 66590 ***

Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Thomas Schwinge

Hi!

On Wed, 23 Sep 2015 10:57:40 -0700, Cesar Philippidis  
wrote:
> On 09/23/2015 10:42 AM, Cesar Philippidis wrote:
> | Gang, worker, vector and collapse all contain optional arguments which
> | may be used during loop expansion. In OpenACC, those expressions could
> | contain variables

I'm fairly sure that at least the collapse clause needs to be a
compile-time constant?

> | but those variables aren't always getting remapped
> | automatically. This patch remaps those variables inside lower_omp_loop.

Shouldn't that be done in lower_rec_input_clauses?  (Maybe I'm confused
-- it's been a long time that I looked at this code.)  (Jakub?)

> | Note that I didn't need to use a tree walker for more complicated
> | expressions because it's not required. By the time those clauses reach
> | lower_omp_loop, only the result of the expression is available. So the
> | other variables in those expressions get remapped with everything else
> | during omplow. Therefore, the only problematic case is when the the
> | optional expression is just a decl, e.g. gang(static:foo).
> 
> > I've applied this patch to gomp-4_0-branch.

> 2015-09-23  Cesar Philippidis  
> 
>   gcc/
>   * omp-low.c (lower_omp_for): Remap any variables present in
>   OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR and
>   OMP_CLAUSE_COLLAPSE becuase they will be used later by expand_omp_for.
> 
>   libgomp/
>   * testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Test if
>   static gang expressions containing variables work.
>   * testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.
> 
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index ec76096..3f36b7a 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -11325,6 +11325,35 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
>if (oacc_tail)
>  gimple_seq_add_seq (, oacc_tail);
>  
> +  /* Update the variables inside any clauses which may be involved in loop
> + expansion later on.  */
> +  for (tree c = gimple_omp_for_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
> +{
> +  int args;
> +
> +  switch (OMP_CLAUSE_CODE (c))
> + {
> + default:
> +   args = 0;
> +   break;
> + case OMP_CLAUSE_GANG:
> +   args = 2;
> +   break;
> + case OMP_CLAUSE_VECTOR:
> + case OMP_CLAUSE_WORKER:
> + case OMP_CLAUSE_COLLAPSE:
> +   args = 1;
> +   break;
> + }
> +
> +  for (int i = 0; i < args; i++)
> + {
> +   tree expr = OMP_CLAUSE_OPERAND (c, i);
> +   if (expr && DECL_P (expr))
> + OMP_CLAUSE_OPERAND (c, i) = build_outer_var_ref (expr, ctx);
> + }
> +}
> +
>pop_gimplify_context (new_stmt);
>  
>gimple_bind_append_vars (new_stmt, ctx->block_vars);
> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
> index 3a9a508..20a866d 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
> @@ -39,7 +39,7 @@ int
>  main ()
>  {
>int a[N];
> -  int i;
> +  int i, x;
>  
>  #pragma acc parallel loop gang (static:*) num_gangs (10)
>for (i = 0; i < 100; i++)
> @@ -78,5 +78,21 @@ main ()
>  
>test_nonstatic (a, 10);
>  
> +  /* Static arguments with a variable expression.  */
> +
> +  x = 20;
> +#pragma acc parallel loop gang (static:0+x) num_gangs (10)
> +  for (i = 0; i < 100; i++)
> +a[i] = GANG_ID (i);
> +
> +  test_static (a, 10, 20);
> +
> +  x = 20;
> +#pragma acc parallel loop gang (static:x) num_gangs (10)
> +  for (i = 0; i < 100; i++)
> +a[i] = GANG_ID (i);
> +
> +  test_static (a, 10, 20);
> +
>return 0;
>  }
> diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90 
> b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
> index e562535..7d56060 100644
> --- a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
> @@ -3,6 +3,7 @@
>  program main
>integer, parameter :: n = 100
>integer i, a(n), b(n)
> +  integer x
>  
>do i = 1, n
>   b(i) = i
> @@ -48,6 +49,23 @@ program main
>  
>call test (a, b, 20, n)
>  
> +  x = 5
> +  !$acc parallel loop gang (static:0+x) num_gangs (10)
> +  do i = 1, n
> + a(i) = b(i) + 5
> +  end do
> +  !$acc end parallel loop
> +
> +  call test (a, b, 5, n)
> +
> +  x = 10
> +  !$acc parallel loop gang (static:x) num_gangs (10)
> +  do i = 1, n
> + a(i) = b(i) + 10
> +  end do
> +  !$acc end parallel loop
> +
> +  call test (a, b, 10, n)
>  end program main
>  
>  subroutine test (a, b, sarg, n)


Grüße,
 Thomas


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 193 matches

Mail list logo