Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)

2018-09-10 Thread Julian Brown
On Mon, 10 Sep 2018 22:22:15 +0100
Jason Merrill  wrote:

> On Mon, Sep 10, 2018 at 7:07 PM, Julian Brown
>  wrote:
> > I think it's more accurate to say that OpenACC says nothing about
> > C++ references at all, nor about how unadorned pointers are mapped
> > in copy/copyin/copyout clauses. So arguably we get to choose
> > whatever we want, preferably based on the principle of least
> > surprise. (ICE'ing definitely counts as a surprise!)
> >
> > As noted in a previous email, PGI seems to treat pointers to
> > aggregates specially, mapping them as ptr[0:1], but it's unclear if
> > the same is true for pointers to scalars with their compiler.
> > Neither behaviour seems to be standard-mandated, but this patch
> > extends the idea to references to scalars nonetheless.  
> 
> That certainly seems like the most sensible way of handling references
> to non-arrays.  [...]

To try to clarify things for myself a bit, I tried to figure out better
what the current OpenMP behaviour in GCC is, and what the equivalent
OpenACC behaviour should be. I think the handling of references can and
should match between the two APIs (though implementation details of the
patch to make that so need a little work still).

Pointers (without array sections) are a little more awkward: going by
what OpenMP 4.5 and OpenACC 2.5 say, there does seem to be a deliberate
difference in mapping behaviour, at least for cases that are specified.

Previously, I was confusing the cases marked (*) and (**) below a
little. So, we have:

== OpenMP 4.5 =

#include 

int
main (int argc, char* argv[])
{
  int arr[32];
  int  = arr[16];
  int *myptr = [18];
  const char *sep = "";

  for (int i = 0; i < 32; i++)
arr[i] = i;

//#pragma omp target // mapped as firstprivate: no effect on host
//#pragma omp target defaultmap(tofrom:scalar) // works
#pragma omp target map(tofrom:myref) // works
  {
myref = 1000;
  }

#pragma omp target enter data map(to:arr[0:32])

//#pragma omp target // works, mapped as zero-length array section (*)
//#pragma omp target map(tofrom:myptr) // crashes (**)
#pragma omp target map(tofrom:myptr[0:1]) // works
  {
*myptr = 2000;
  }

#pragma omp target exit data map(from:arr[0:32])

  for (int i = 0; i < 32; i++, sep = ", ")
printf ("%s%d", sep, arr[i]);

  printf ("\n");

  return 0;
}


== OpenACC 2.5 

#include 

int
main (int argc, char* argv[])
{
  int arr[32];
  int  = arr[16];
  int *myptr = [18];
  const char *sep = "";

  for (int i = 0; i < 32; i++)
arr[i] = i;

//#pragma acc parallel // mapped as firstprivate: no effect on host
#pragma acc parallel copy(myref) // works
  {
myref = 1000;
  }

#pragma acc enter data copyin(arr[0:32])

//#pragma acc parallel // crashes (*)
//#pragma acc parallel copy(myptr) // crashes (**)
//#pragma acc parallel copy(myptr[0:1]) // works
//#pragma acc parallel present(myptr) // runtime error, not present
#pragma acc parallel present(myptr[0:1]) // works
  {
*myptr = 2000;
  }

#pragma acc exit data copyout(arr[0:32])

  for (int i = 0; i < 32; i++, sep = ", ")
printf ("%s%d", sep, arr[i]);

  printf ("\n");

  return 0;
}

===

The pointer-mapping cases marked (*), implicit mapping, are the ones
specified in OpenMP 4.5 to map as zero-length array sections. For
OpenACC the pointer is considered a scalar so is mapped as bits (so the
host pointer causes the target to crash on dereference).

The cases marked (**) -- also maybe applicable to C++ "this" --
currently copy as bits on OpenMP and on OpenACC, but could be changed
to map like length-one array sections. Or, they could raise a warning.
There's no apparent difference between OpenMP and OpenACC there though
(in specified behaviour and/or implementation? Despite what I thought
previously) so that's probably a decision for another day.

Cheers,

Julian


Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)

2018-09-10 Thread Jakub Jelinek
On Mon, Sep 10, 2018 at 10:22:15PM +0100, Jason Merrill wrote:
> > As noted in a previous email, PGI seems to treat pointers to
> > aggregates specially, mapping them as ptr[0:1], but it's unclear if the
> > same is true for pointers to scalars with their compiler. Neither
> > behaviour seems to be standard-mandated, but this patch extends the
> > idea to references to scalars nonetheless.
> 
> That certainly seems like the most sensible way of handling references
> to non-arrays.  And the 'this' pointer, incidentally.  Should we not
> do the same for OpenMP?  Jakub?

OpenMP specifies what to do, though for 4.0, 4.5 and 5.0 it is all different
(and also depends on defaultmap clause), I believe currently we implement
what 4.5 says and when I'll try to implement the 5.0 version, I'll certainly
try to follow the standard.  With defaultmap, one can specify what will
happen with various kinds of implicit mappings (map them as bits,
firstprivatize them, for pointers handle them as zero length array sections,
refuse to do any implicit mapping).

E.g. part of what OpenMP 5.0 says is:
...
- If a defaultmap clause is present for the category of the variable and
specifies an implicit behavior other than default, the data-mapping attribute
is determined by that clause.
- If the target construct is within a class non-static member function, and
a variable is an accessible data member of the object for which the
non-static data member function is invoked, the variable is treated as if
the this[:1] expression had appeared in a map clause with a map-type of
tofrom.  Additionally, if the variable is of a type pointer or reference to
pointer, it is also treated as if it has appeared in a map clause as a
zero-length array section.
- If the this keyword is referenced inside a target construct within a class
non-static member function, it is treated as if the this[:1] expression had
appeared in a map clause with a map-type of tofrom.
- A variable that is of type pointer is treated as if it is the base pointer
of a zero-length array section that appeared as a list item in a map clause.
- A variable that is of type reference to pointer is treated as if it had
appeared in a map clause as a zero-length array section.
...
- If the type of a list item is a reference to a type T then the reference
in the device data environment is initialized to refer to the object in the
device data environment that corresponds to the object referenced by the
list item.  If mapping occurs, it occurs as though the object were mapped
through a pointer with an array section of type T and length one.
- No type mapped through a reference can contain a reference to its own
type, or any cycle of references to types that could produce a cycle of
references.
...

Jakub


Re: [wwwdocs] projects/cfo.html -- simplify

2018-09-10 Thread Gerald Pfeifer
On Sun, 2 Sep 2018, Gerald Pfeifer wrote:
> On Sun, 2 Sep 2018, Gerald Pfeifer wrote:
>> This uses a new CSS class *once* instead of attributing most elements
>> in the table with align="right" and also brings us closer to HTML 5
>> compliance.
> And this also takes care of align="center" by using the CSS of the
> same name I introduced earlier today.

And this is a final cleanup patch.

Committed.

Gerald

Omit border="1" from the performance table.  Make an empty  really so.

Index: projects/cfo.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cfo.html,v
retrieving revision 1.17
diff -u -r1.17 cfo.html
--- projects/cfo.html   2 Sep 2018 19:20:55 -   1.17
+++ projects/cfo.html   10 Sep 2018 21:27:51 -
@@ -164,9 +164,9 @@
 http://szeged.github.io/csibe/;>CSiBE benchmark with respect
 to the mainline at the last merge (2005-07-11).
 
-
+
 
-  
+  
   Code size save
   Compilation time multiplier
 


Re: [Patch, Fortran, F03] PR 85395: private clause contained in derived type acquires spurious scope

2018-09-10 Thread Janus Weil
Am Mo., 10. Sep. 2018 um 08:55 Uhr schrieb Paul Richard Thomas
:
>
> Hi Janus,
>
> That's OK for trunk and, I would suggest 8-branch.

Thanks, Paul. Committed to trunk as r264196. Will backport to 8-branch
in a week or so.

Cheers,
Janus



> On 9 September 2018 at 17:31, Janus Weil  wrote:
> > Hi all,
> >
> > the attached patch fixes a problem with the accessibility of procedure
> > pointer components (private/public attribute). It is rather
> > straightforward and regtest cleanly on x86_64-linux-gnu (for further
> > details see the PR). Ok for trunk?
> >
> > Cheers,
> > Janus
>
>
>
> --
> "If you can't explain it simply, you don't understand it well enough"
> - Albert Einstein


Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)

2018-09-10 Thread Jason Merrill
On Mon, Sep 10, 2018 at 7:07 PM, Julian Brown  wrote:
> On Mon, 10 Sep 2018 10:52:47 -0700
> Cesar Philippidis  wrote:
>
>> On 09/10/2018 10:37 AM, Jason Merrill wrote:
>> > On Mon, Sep 10, 2018 at 4:05 AM, Julian Brown
>> >  wrote:
>> >> This patch (by Cesar) changes the way C++ references are mapped in
>> >> OpenACC regions, fixing an ICE in the non-scalar-data.C testcase.
>> >>
>> >> Post-patch, references are mapped like this (from the omplower
>> >> dump):
>> >>
>> >> map(force_present:*x [len: 4]) map(firstprivate ref:x [pointer
>> >> assign, bias: 0])
>> >>
>> >> Tested with offloading to NVPTX and bootstrapped. OK for trunk?
>> >>
>> >> Thanks,
>> >>
>> >> Julian
>> >>
>> >> ChangeLog
>> >>
>> >> 2018-09-09  Cesar Philippidis  
>> >> Julian Brown  
>> >>
>> >> PR middle-end/86336
>> >>
>> >> (gimplify_adjust_omp_clauses_1): Update handling of
>> >> mapping of C++ references.
>> >
>> > How is reference handling specified differently between OpenMP and
>> > OpenACC?  It seems strange for them to differ.
>>
>> Both OpenACC and OpenMP privatize mapped array pointers on the
>> accelerator for subarrays in the same way. However, for pointers
>> without subarrays, OpenMP treats them as zero-length arrays, whereas
>> OpenACC treats them as ordinary scalars so that the pointer target
>> will not get remapped on the accelerator (which is odd because
>> there's a deviceptr clause for that). Scalars in C++ are special,
>> because references must treated like an array of length one, for lack
>> of a better terminology.
>
> I think it's more accurate to say that OpenACC says nothing about C++
> references at all, nor about how unadorned pointers are mapped in
> copy/copyin/copyout clauses. So arguably we get to choose whatever we
> want, preferably based on the principle of least surprise. (ICE'ing
> definitely counts as a surprise!)
>
> As noted in a previous email, PGI seems to treat pointers to
> aggregates specially, mapping them as ptr[0:1], but it's unclear if the
> same is true for pointers to scalars with their compiler. Neither
> behaviour seems to be standard-mandated, but this patch extends the
> idea to references to scalars nonetheless.

That certainly seems like the most sensible way of handling references
to non-arrays.  And the 'this' pointer, incidentally.  Should we not
do the same for OpenMP?  Jakub?

Jason


Re: Keep std::deque algos specializations in Debug mode

2018-09-10 Thread François Dumont
One more update, running more tests show that template alias was badly 
defined when users are using directly .


So I enventually defined it in std::__detail namespace where it is 
easier to use _GLIBCXX_STD_C. Too bad that _Deque_iterator didn't got 
defined in __detail namespace so that we could get it from here whatever 
the mode activated.


François


On 09/07/2018 06:30 PM, François Dumont wrote:

I realized that I was not checking the Debug implementations.

Doing so also have the advantage to show clearly which overload of the 
algos is being used. If it is using the correct Debug overload I 
consider that there are chances that it is also using the correct 
normal overload. This is easier to check than using gdb.


This way I found out that I was not calling the expected std::fill 
because I pass 0 rather than '\0'. I wonder if we could have the 
correct behavior if we were simply using 0. It would be great if gcc 
was using the fill deque iterator overload even if warning in case the 
pass int value do not match the deque value type.


Ok to commit ?

François


On 09/06/2018 10:07 PM, François Dumont wrote:

On 09/04/2018 02:59 PM, Jonathan Wakely wrote:




  template
    void
-    fill(const _Deque_iterator<_Tp, _Tp&, _Tp*>& __first,
- const _Deque_iterator<_Tp, _Tp&, _Tp*>& __last, const _Tp& 
__value)
+    fill(const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>& 
__first,

+ const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>& __last,
+ const _Tp& __value)
    {
-  typedef typename _Deque_iterator<_Tp, _Tp&, _Tp*>::_Self _Self;
-
-  for (typename _Self::_Map_pointer __node = __first._M_node + 1;
-   __node < __last._M_node; ++__node)
-    std::fill(*__node, *__node + _Self::_S_buffer_size(), __value);
+  typedef typename _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, 
_Tp*>::_Self

+    _Self;

  if (__first._M_node != __last._M_node)
{
  std::fill(__first._M_cur, __first._M_last, __value);
+
+  for (typename _Self::_Map_pointer __node = __first._M_node + 1;
+   __node != __last._M_node; ++__node)


Is there any particular reason to change this from using < to != for
the comparison?


I consider that the reason for having a < comparison was that this 
loop was done before checking __first._M_node != __last._M_node. As I 
moved it inside the block I also prefer to use a usual condition when 
iterating other iterators/pointers.


Isn't it a simpler operation ? Do you fear a compiler warning about 
it like we used to have in vector implementation before introducing 
the __builtin_unreachable calls ?




(This change is part of the reason I asked for the ChangeLog, but you
didn't mention it in the ChangeLog).

I had forgotten about it but I can add it in ChangeLog.


Moving it inside the condition makes sense (not only does it avoid a
branch in the single-page case, but means we fill the elements in
order).
Yes, it is the main reason I moved it, I should have signal it when I 
submit the patch.



+    std::fill(*__node, *__node + _Self::_S_buffer_size(), 
__value);

+
  std::fill(__last._M_first, __last._M_cur, __value);
}
  else


The rest of the code changes look fine, I just wondered about that
bit.

I do have some comments on the new tests though ...



+
+void test01()
+{
+  std::deque d;
+  for (char c = 0; c != std::numeric_limits::max(); ++c)
+    d.push_back(c);
+
+  std::deque dest(std::numeric_limits::max(), '\0');


These deques only have 127 or 255 elements (depending on
is_signed) which will fit on a single page of a deque (the
default is 512 bytes per page).

That means the tests don't exercise the logic for handling
non-contiguous blocks of memory.

Ideally we'd want to test multiple cases:

- a single page, with/without empty capacity at front/back
- multiple pages, with/without empty capacity at front/back

That would be 8 cases. I think we want to test at least a single
page and multiple pages.

I think I started to create the fill.cc which require usage of char 
to make sure it uses __builtin_memset and then extrapolated to other 
algos.


But I had already reviewed those tests for a patch I'll submit after 
this one so here is the revisited tests.


In this new proposal I also introduce a template alias to simplify 
the C++11 overloads. I define it in __gnu_debug to avoid polluting 
std namespace with a non-Standard thing.


    * include/bits/stl_deque.h
    (fill, copy, copy_backward, move, move_backward): Move overloads for
    std::deque iterators in std namespace.
    * include/bits/deque.tcc: Likewise.
    (fill): Move loop on nodes inside branch when first and last 
nodes are

    different. Replace for loop < condition on nodes with a !=.
    * include/debug/deque
    (__gnu_debug::_SDeque_iterator<>, 
__gnu_debug::_SDeque_const_iterator):

    New template aliases.
    (fill, copy, copy_backward, move, move_backward):
    New overloads for std::__debug::deque iterators. Forward to 
normal 

[PATCH, i386]: Rewrite x87 trigonometric patterns

2018-09-10 Thread Uros Bizjak
Hello!

Attached patch removes a bunch of sincos -> sin, cos splitters. These
are not necessary anymore since middle end handles the transformation
in a generic way.
Additionally, the patch removes unnecessary mixed-mode patters
(SFmode/DFmode input operand and XFmode outputs). These are not
needed, because all x87 float_extend RTXes degenerate to a plain move
(or a no-op move). The patch also includes a couple of cleanups with
no functional changes.

2018-09-10  Uros Bizjak  

* config/i386/i386.md (xf2): Rename from *xf2_i387.
(*_extendxf2_i387): Remove insn pattern.
(mode2): New expander.
(sincos_extendxf3_i387): Remove insn pattern.
(sincos -> sin, cos splitters): Remove splitter patterns.
(sincos3): Change operand 2 predicate to general_operand.
Extend operand 2 to XFmode and generate sincosxf3 insn.
(fptanxf4_i387): Change mode of operands 0 and 3 to SFmode.
Change operand 3 predicate to const1_operand.
(fptan_extendxf4_i387): Remove insn pattern.
(tanxf2): Update operands in the call to fptanxf4_i387.
(tan2): Change operand 1 predicate to general_operand.
Extend operand 1 to XFmode and generate tanxf3 insn.
(atan2xf3): Rename from *fpatanxf3_i387.
(fpatan_extendxf3_i387): Remove insn pattern.
(atan2xf3): Remove expander.
(atan22): Change operand 1 predicate to general_operand.
Extend operand 1 to XFmode and generate atanxf3 insn.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 264185)
+++ config/i386/i386.md (working copy)
@@ -15375,7 +15375,7 @@
[(UNSPEC_SIN "sin")
 (UNSPEC_COS "cos")])
 
-(define_insn "*xf2_i387"
+(define_insn "xf2"
   [(set (match_operand:XF 0 "register_operand" "=f")
(unspec:XF [(match_operand:XF 1 "register_operand" "0")]
   SINCOS))]
@@ -15386,25 +15386,23 @@
(set_attr "znver1_decode" "vector")
(set_attr "mode" "XF")])
 
-(define_insn "*_extendxf2_i387"
-  [(set (match_operand:XF 0 "register_operand" "=f")
-   (unspec:XF [(float_extend:XF
- (match_operand:MODEF 1 "register_operand" "0"))]
-  SINCOS))]
+(define_expand "2"
+  [(set (match_operand:MODEF 0 "register_operand")
+   (unspec:MODEF [(match_operand:MODEF 1 "general_operand")]
+ SINCOS))]
   "TARGET_USE_FANCY_MATH_387
&& (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
|| TARGET_MIX_SSE_I387)
&& flag_unsafe_math_optimizations"
-  "f"
-  [(set_attr "type" "fpspc")
-   (set_attr "znver1_decode" "vector")
-   (set_attr "mode" "XF")])
+{
+  rtx op0 = gen_reg_rtx (XFmode);
+  rtx op1 = gen_reg_rtx (XFmode);
 
-;; When sincos pattern is defined, sin and cos builtin functions will be
-;; expanded to sincos pattern with one of its outputs left unused.
-;; CSE pass will figure out if two sincos patterns can be combined,
-;; otherwise sincos pattern will be split back to sin or cos pattern,
-;; depending on the unused output.
+  emit_insn (gen_extendxf2 (op1, operands[1]));
+  emit_insn (gen_xf2 (op0, op1));
+  emit_insn (gen_truncxf2 (operands[0], op0));
+  DONE;
+})
 
 (define_insn "sincosxf3"
   [(set (match_operand:XF 0 "register_operand" "=f")
@@ -15419,70 +15417,10 @@
(set_attr "znver1_decode" "vector")
(set_attr "mode" "XF")])
 
-(define_split
-  [(set (match_operand:XF 0 "register_operand")
-   (unspec:XF [(match_operand:XF 2 "register_operand")]
-  UNSPEC_SINCOS_COS))
-   (set (match_operand:XF 1 "register_operand")
-   (unspec:XF [(match_dup 2)] UNSPEC_SINCOS_SIN))]
-  "find_regno_note (insn, REG_UNUSED, REGNO (operands[0]))
-   && can_create_pseudo_p ()"
-  [(set (match_dup 1) (unspec:XF [(match_dup 2)] UNSPEC_SIN))])
-
-(define_split
-  [(set (match_operand:XF 0 "register_operand")
-   (unspec:XF [(match_operand:XF 2 "register_operand")]
-  UNSPEC_SINCOS_COS))
-   (set (match_operand:XF 1 "register_operand")
-   (unspec:XF [(match_dup 2)] UNSPEC_SINCOS_SIN))]
-  "find_regno_note (insn, REG_UNUSED, REGNO (operands[1]))
-   && can_create_pseudo_p ()"
-  [(set (match_dup 0) (unspec:XF [(match_dup 2)] UNSPEC_COS))])
-
-(define_insn "sincos_extendxf3_i387"
-  [(set (match_operand:XF 0 "register_operand" "=f")
-   (unspec:XF [(float_extend:XF
- (match_operand:MODEF 2 "register_operand" "0"))]
-  UNSPEC_SINCOS_COS))
-   (set (match_operand:XF 1 "register_operand" "=u")
-(unspec:XF [(float_extend:XF (match_dup 2))] UNSPEC_SINCOS_SIN))]
-  "TARGET_USE_FANCY_MATH_387
-   && (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
-   || TARGET_MIX_SSE_I387)
-   && flag_unsafe_math_optimizations"
-  "fsincos"
-  [(set_attr "type" "fpspc")
-   (set_attr "znver1_decode" "vector")
-   (set_attr "mode" "XF")])
-
-(define_split
-  [(set (match_operand:XF 0 

[wwwdocs] Use plain

2018-09-10 Thread Gerald Pfeifer
Per the HTML 5 validator used by w3.org.

Applied.

Gerald

Index: projects/cxx0x.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cxx0x.html,v
retrieving revision 1.76
diff -u -r1.76 cxx0x.html
--- projects/cxx0x.html 1 Sep 2018 23:42:09 -   1.76
+++ projects/cxx0x.html 10 Sep 2018 19:04:48 -
@@ -3,7 +3,7 @@

 
 
-
+