Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)
On Mon, 10 Sep 2018 22:22:15 +0100 Jason Merrill wrote: > On Mon, Sep 10, 2018 at 7:07 PM, Julian Brown > wrote: > > I think it's more accurate to say that OpenACC says nothing about > > C++ references at all, nor about how unadorned pointers are mapped > > in copy/copyin/copyout clauses. So arguably we get to choose > > whatever we want, preferably based on the principle of least > > surprise. (ICE'ing definitely counts as a surprise!) > > > > As noted in a previous email, PGI seems to treat pointers to > > aggregates specially, mapping them as ptr[0:1], but it's unclear if > > the same is true for pointers to scalars with their compiler. > > Neither behaviour seems to be standard-mandated, but this patch > > extends the idea to references to scalars nonetheless. > > That certainly seems like the most sensible way of handling references > to non-arrays. [...] To try to clarify things for myself a bit, I tried to figure out better what the current OpenMP behaviour in GCC is, and what the equivalent OpenACC behaviour should be. I think the handling of references can and should match between the two APIs (though implementation details of the patch to make that so need a little work still). Pointers (without array sections) are a little more awkward: going by what OpenMP 4.5 and OpenACC 2.5 say, there does seem to be a deliberate difference in mapping behaviour, at least for cases that are specified. Previously, I was confusing the cases marked (*) and (**) below a little. So, we have: == OpenMP 4.5 = #include int main (int argc, char* argv[]) { int arr[32]; int = arr[16]; int *myptr = [18]; const char *sep = ""; for (int i = 0; i < 32; i++) arr[i] = i; //#pragma omp target // mapped as firstprivate: no effect on host //#pragma omp target defaultmap(tofrom:scalar) // works #pragma omp target map(tofrom:myref) // works { myref = 1000; } #pragma omp target enter data map(to:arr[0:32]) //#pragma omp target // works, mapped as zero-length array section (*) //#pragma omp target map(tofrom:myptr) // crashes (**) #pragma omp target map(tofrom:myptr[0:1]) // works { *myptr = 2000; } #pragma omp target exit data map(from:arr[0:32]) for (int i = 0; i < 32; i++, sep = ", ") printf ("%s%d", sep, arr[i]); printf ("\n"); return 0; } == OpenACC 2.5 #include int main (int argc, char* argv[]) { int arr[32]; int = arr[16]; int *myptr = [18]; const char *sep = ""; for (int i = 0; i < 32; i++) arr[i] = i; //#pragma acc parallel // mapped as firstprivate: no effect on host #pragma acc parallel copy(myref) // works { myref = 1000; } #pragma acc enter data copyin(arr[0:32]) //#pragma acc parallel // crashes (*) //#pragma acc parallel copy(myptr) // crashes (**) //#pragma acc parallel copy(myptr[0:1]) // works //#pragma acc parallel present(myptr) // runtime error, not present #pragma acc parallel present(myptr[0:1]) // works { *myptr = 2000; } #pragma acc exit data copyout(arr[0:32]) for (int i = 0; i < 32; i++, sep = ", ") printf ("%s%d", sep, arr[i]); printf ("\n"); return 0; } === The pointer-mapping cases marked (*), implicit mapping, are the ones specified in OpenMP 4.5 to map as zero-length array sections. For OpenACC the pointer is considered a scalar so is mapped as bits (so the host pointer causes the target to crash on dereference). The cases marked (**) -- also maybe applicable to C++ "this" -- currently copy as bits on OpenMP and on OpenACC, but could be changed to map like length-one array sections. Or, they could raise a warning. There's no apparent difference between OpenMP and OpenACC there though (in specified behaviour and/or implementation? Despite what I thought previously) so that's probably a decision for another day. Cheers, Julian
Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)
On Mon, Sep 10, 2018 at 10:22:15PM +0100, Jason Merrill wrote: > > As noted in a previous email, PGI seems to treat pointers to > > aggregates specially, mapping them as ptr[0:1], but it's unclear if the > > same is true for pointers to scalars with their compiler. Neither > > behaviour seems to be standard-mandated, but this patch extends the > > idea to references to scalars nonetheless. > > That certainly seems like the most sensible way of handling references > to non-arrays. And the 'this' pointer, incidentally. Should we not > do the same for OpenMP? Jakub? OpenMP specifies what to do, though for 4.0, 4.5 and 5.0 it is all different (and also depends on defaultmap clause), I believe currently we implement what 4.5 says and when I'll try to implement the 5.0 version, I'll certainly try to follow the standard. With defaultmap, one can specify what will happen with various kinds of implicit mappings (map them as bits, firstprivatize them, for pointers handle them as zero length array sections, refuse to do any implicit mapping). E.g. part of what OpenMP 5.0 says is: ... - If a defaultmap clause is present for the category of the variable and specifies an implicit behavior other than default, the data-mapping attribute is determined by that clause. - If the target construct is within a class non-static member function, and a variable is an accessible data member of the object for which the non-static data member function is invoked, the variable is treated as if the this[:1] expression had appeared in a map clause with a map-type of tofrom. Additionally, if the variable is of a type pointer or reference to pointer, it is also treated as if it has appeared in a map clause as a zero-length array section. - If the this keyword is referenced inside a target construct within a class non-static member function, it is treated as if the this[:1] expression had appeared in a map clause with a map-type of tofrom. - A variable that is of type pointer is treated as if it is the base pointer of a zero-length array section that appeared as a list item in a map clause. - A variable that is of type reference to pointer is treated as if it had appeared in a map clause as a zero-length array section. ... - If the type of a list item is a reference to a type T then the reference in the device data environment is initialized to refer to the object in the device data environment that corresponds to the object referenced by the list item. If mapping occurs, it occurs as though the object were mapped through a pointer with an array section of type T and length one. - No type mapped through a reference can contain a reference to its own type, or any cycle of references to types that could produce a cycle of references. ... Jakub
Re: [wwwdocs] projects/cfo.html -- simplify
On Sun, 2 Sep 2018, Gerald Pfeifer wrote: > On Sun, 2 Sep 2018, Gerald Pfeifer wrote: >> This uses a new CSS class *once* instead of attributing most elements >> in the table with align="right" and also brings us closer to HTML 5 >> compliance. > And this also takes care of align="center" by using the CSS of the > same name I introduced earlier today. And this is a final cleanup patch. Committed. Gerald Omit border="1" from the performance table. Make an empty really so. Index: projects/cfo.html === RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cfo.html,v retrieving revision 1.17 diff -u -r1.17 cfo.html --- projects/cfo.html 2 Sep 2018 19:20:55 - 1.17 +++ projects/cfo.html 10 Sep 2018 21:27:51 - @@ -164,9 +164,9 @@ http://szeged.github.io/csibe/;>CSiBE benchmark with respect to the mainline at the last merge (2005-07-11). - + - + Code size save Compilation time multiplier
Re: [Patch, Fortran, F03] PR 85395: private clause contained in derived type acquires spurious scope
Am Mo., 10. Sep. 2018 um 08:55 Uhr schrieb Paul Richard Thomas : > > Hi Janus, > > That's OK for trunk and, I would suggest 8-branch. Thanks, Paul. Committed to trunk as r264196. Will backport to 8-branch in a week or so. Cheers, Janus > On 9 September 2018 at 17:31, Janus Weil wrote: > > Hi all, > > > > the attached patch fixes a problem with the accessibility of procedure > > pointer components (private/public attribute). It is rather > > straightforward and regtest cleanly on x86_64-linux-gnu (for further > > details see the PR). Ok for trunk? > > > > Cheers, > > Janus > > > > -- > "If you can't explain it simply, you don't understand it well enough" > - Albert Einstein
Re: [PATCH, OpenACC] C++ reference mapping (PR middle-end/86336)
On Mon, Sep 10, 2018 at 7:07 PM, Julian Brown wrote: > On Mon, 10 Sep 2018 10:52:47 -0700 > Cesar Philippidis wrote: > >> On 09/10/2018 10:37 AM, Jason Merrill wrote: >> > On Mon, Sep 10, 2018 at 4:05 AM, Julian Brown >> > wrote: >> >> This patch (by Cesar) changes the way C++ references are mapped in >> >> OpenACC regions, fixing an ICE in the non-scalar-data.C testcase. >> >> >> >> Post-patch, references are mapped like this (from the omplower >> >> dump): >> >> >> >> map(force_present:*x [len: 4]) map(firstprivate ref:x [pointer >> >> assign, bias: 0]) >> >> >> >> Tested with offloading to NVPTX and bootstrapped. OK for trunk? >> >> >> >> Thanks, >> >> >> >> Julian >> >> >> >> ChangeLog >> >> >> >> 2018-09-09 Cesar Philippidis >> >> Julian Brown >> >> >> >> PR middle-end/86336 >> >> >> >> (gimplify_adjust_omp_clauses_1): Update handling of >> >> mapping of C++ references. >> > >> > How is reference handling specified differently between OpenMP and >> > OpenACC? It seems strange for them to differ. >> >> Both OpenACC and OpenMP privatize mapped array pointers on the >> accelerator for subarrays in the same way. However, for pointers >> without subarrays, OpenMP treats them as zero-length arrays, whereas >> OpenACC treats them as ordinary scalars so that the pointer target >> will not get remapped on the accelerator (which is odd because >> there's a deviceptr clause for that). Scalars in C++ are special, >> because references must treated like an array of length one, for lack >> of a better terminology. > > I think it's more accurate to say that OpenACC says nothing about C++ > references at all, nor about how unadorned pointers are mapped in > copy/copyin/copyout clauses. So arguably we get to choose whatever we > want, preferably based on the principle of least surprise. (ICE'ing > definitely counts as a surprise!) > > As noted in a previous email, PGI seems to treat pointers to > aggregates specially, mapping them as ptr[0:1], but it's unclear if the > same is true for pointers to scalars with their compiler. Neither > behaviour seems to be standard-mandated, but this patch extends the > idea to references to scalars nonetheless. That certainly seems like the most sensible way of handling references to non-arrays. And the 'this' pointer, incidentally. Should we not do the same for OpenMP? Jakub? Jason
Re: Keep std::deque algos specializations in Debug mode
One more update, running more tests show that template alias was badly defined when users are using directly . So I enventually defined it in std::__detail namespace where it is easier to use _GLIBCXX_STD_C. Too bad that _Deque_iterator didn't got defined in __detail namespace so that we could get it from here whatever the mode activated. François On 09/07/2018 06:30 PM, François Dumont wrote: I realized that I was not checking the Debug implementations. Doing so also have the advantage to show clearly which overload of the algos is being used. If it is using the correct Debug overload I consider that there are chances that it is also using the correct normal overload. This is easier to check than using gdb. This way I found out that I was not calling the expected std::fill because I pass 0 rather than '\0'. I wonder if we could have the correct behavior if we were simply using 0. It would be great if gcc was using the fill deque iterator overload even if warning in case the pass int value do not match the deque value type. Ok to commit ? François On 09/06/2018 10:07 PM, François Dumont wrote: On 09/04/2018 02:59 PM, Jonathan Wakely wrote: template void - fill(const _Deque_iterator<_Tp, _Tp&, _Tp*>& __first, - const _Deque_iterator<_Tp, _Tp&, _Tp*>& __last, const _Tp& __value) + fill(const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>& __first, + const _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>& __last, + const _Tp& __value) { - typedef typename _Deque_iterator<_Tp, _Tp&, _Tp*>::_Self _Self; - - for (typename _Self::_Map_pointer __node = __first._M_node + 1; - __node < __last._M_node; ++__node) - std::fill(*__node, *__node + _Self::_S_buffer_size(), __value); + typedef typename _GLIBCXX_STD_C::_Deque_iterator<_Tp, _Tp&, _Tp*>::_Self + _Self; if (__first._M_node != __last._M_node) { std::fill(__first._M_cur, __first._M_last, __value); + + for (typename _Self::_Map_pointer __node = __first._M_node + 1; + __node != __last._M_node; ++__node) Is there any particular reason to change this from using < to != for the comparison? I consider that the reason for having a < comparison was that this loop was done before checking __first._M_node != __last._M_node. As I moved it inside the block I also prefer to use a usual condition when iterating other iterators/pointers. Isn't it a simpler operation ? Do you fear a compiler warning about it like we used to have in vector implementation before introducing the __builtin_unreachable calls ? (This change is part of the reason I asked for the ChangeLog, but you didn't mention it in the ChangeLog). I had forgotten about it but I can add it in ChangeLog. Moving it inside the condition makes sense (not only does it avoid a branch in the single-page case, but means we fill the elements in order). Yes, it is the main reason I moved it, I should have signal it when I submit the patch. + std::fill(*__node, *__node + _Self::_S_buffer_size(), __value); + std::fill(__last._M_first, __last._M_cur, __value); } else The rest of the code changes look fine, I just wondered about that bit. I do have some comments on the new tests though ... + +void test01() +{ + std::deque d; + for (char c = 0; c != std::numeric_limits::max(); ++c) + d.push_back(c); + + std::deque dest(std::numeric_limits::max(), '\0'); These deques only have 127 or 255 elements (depending on is_signed) which will fit on a single page of a deque (the default is 512 bytes per page). That means the tests don't exercise the logic for handling non-contiguous blocks of memory. Ideally we'd want to test multiple cases: - a single page, with/without empty capacity at front/back - multiple pages, with/without empty capacity at front/back That would be 8 cases. I think we want to test at least a single page and multiple pages. I think I started to create the fill.cc which require usage of char to make sure it uses __builtin_memset and then extrapolated to other algos. But I had already reviewed those tests for a patch I'll submit after this one so here is the revisited tests. In this new proposal I also introduce a template alias to simplify the C++11 overloads. I define it in __gnu_debug to avoid polluting std namespace with a non-Standard thing. * include/bits/stl_deque.h (fill, copy, copy_backward, move, move_backward): Move overloads for std::deque iterators in std namespace. * include/bits/deque.tcc: Likewise. (fill): Move loop on nodes inside branch when first and last nodes are different. Replace for loop < condition on nodes with a !=. * include/debug/deque (__gnu_debug::_SDeque_iterator<>, __gnu_debug::_SDeque_const_iterator): New template aliases. (fill, copy, copy_backward, move, move_backward): New overloads for std::__debug::deque iterators. Forward to normal
[PATCH, i386]: Rewrite x87 trigonometric patterns
Hello! Attached patch removes a bunch of sincos -> sin, cos splitters. These are not necessary anymore since middle end handles the transformation in a generic way. Additionally, the patch removes unnecessary mixed-mode patters (SFmode/DFmode input operand and XFmode outputs). These are not needed, because all x87 float_extend RTXes degenerate to a plain move (or a no-op move). The patch also includes a couple of cleanups with no functional changes. 2018-09-10 Uros Bizjak * config/i386/i386.md (xf2): Rename from *xf2_i387. (*_extendxf2_i387): Remove insn pattern. (mode2): New expander. (sincos_extendxf3_i387): Remove insn pattern. (sincos -> sin, cos splitters): Remove splitter patterns. (sincos3): Change operand 2 predicate to general_operand. Extend operand 2 to XFmode and generate sincosxf3 insn. (fptanxf4_i387): Change mode of operands 0 and 3 to SFmode. Change operand 3 predicate to const1_operand. (fptan_extendxf4_i387): Remove insn pattern. (tanxf2): Update operands in the call to fptanxf4_i387. (tan2): Change operand 1 predicate to general_operand. Extend operand 1 to XFmode and generate tanxf3 insn. (atan2xf3): Rename from *fpatanxf3_i387. (fpatan_extendxf3_i387): Remove insn pattern. (atan2xf3): Remove expander. (atan22): Change operand 1 predicate to general_operand. Extend operand 1 to XFmode and generate atanxf3 insn. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline SVN. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 264185) +++ config/i386/i386.md (working copy) @@ -15375,7 +15375,7 @@ [(UNSPEC_SIN "sin") (UNSPEC_COS "cos")]) -(define_insn "*xf2_i387" +(define_insn "xf2" [(set (match_operand:XF 0 "register_operand" "=f") (unspec:XF [(match_operand:XF 1 "register_operand" "0")] SINCOS))] @@ -15386,25 +15386,23 @@ (set_attr "znver1_decode" "vector") (set_attr "mode" "XF")]) -(define_insn "*_extendxf2_i387" - [(set (match_operand:XF 0 "register_operand" "=f") - (unspec:XF [(float_extend:XF - (match_operand:MODEF 1 "register_operand" "0"))] - SINCOS))] +(define_expand "2" + [(set (match_operand:MODEF 0 "register_operand") + (unspec:MODEF [(match_operand:MODEF 1 "general_operand")] + SINCOS))] "TARGET_USE_FANCY_MATH_387 && (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) || TARGET_MIX_SSE_I387) && flag_unsafe_math_optimizations" - "f" - [(set_attr "type" "fpspc") - (set_attr "znver1_decode" "vector") - (set_attr "mode" "XF")]) +{ + rtx op0 = gen_reg_rtx (XFmode); + rtx op1 = gen_reg_rtx (XFmode); -;; When sincos pattern is defined, sin and cos builtin functions will be -;; expanded to sincos pattern with one of its outputs left unused. -;; CSE pass will figure out if two sincos patterns can be combined, -;; otherwise sincos pattern will be split back to sin or cos pattern, -;; depending on the unused output. + emit_insn (gen_extendxf2 (op1, operands[1])); + emit_insn (gen_xf2 (op0, op1)); + emit_insn (gen_truncxf2 (operands[0], op0)); + DONE; +}) (define_insn "sincosxf3" [(set (match_operand:XF 0 "register_operand" "=f") @@ -15419,70 +15417,10 @@ (set_attr "znver1_decode" "vector") (set_attr "mode" "XF")]) -(define_split - [(set (match_operand:XF 0 "register_operand") - (unspec:XF [(match_operand:XF 2 "register_operand")] - UNSPEC_SINCOS_COS)) - (set (match_operand:XF 1 "register_operand") - (unspec:XF [(match_dup 2)] UNSPEC_SINCOS_SIN))] - "find_regno_note (insn, REG_UNUSED, REGNO (operands[0])) - && can_create_pseudo_p ()" - [(set (match_dup 1) (unspec:XF [(match_dup 2)] UNSPEC_SIN))]) - -(define_split - [(set (match_operand:XF 0 "register_operand") - (unspec:XF [(match_operand:XF 2 "register_operand")] - UNSPEC_SINCOS_COS)) - (set (match_operand:XF 1 "register_operand") - (unspec:XF [(match_dup 2)] UNSPEC_SINCOS_SIN))] - "find_regno_note (insn, REG_UNUSED, REGNO (operands[1])) - && can_create_pseudo_p ()" - [(set (match_dup 0) (unspec:XF [(match_dup 2)] UNSPEC_COS))]) - -(define_insn "sincos_extendxf3_i387" - [(set (match_operand:XF 0 "register_operand" "=f") - (unspec:XF [(float_extend:XF - (match_operand:MODEF 2 "register_operand" "0"))] - UNSPEC_SINCOS_COS)) - (set (match_operand:XF 1 "register_operand" "=u") -(unspec:XF [(float_extend:XF (match_dup 2))] UNSPEC_SINCOS_SIN))] - "TARGET_USE_FANCY_MATH_387 - && (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) - || TARGET_MIX_SSE_I387) - && flag_unsafe_math_optimizations" - "fsincos" - [(set_attr "type" "fpspc") - (set_attr "znver1_decode" "vector") - (set_attr "mode" "XF")]) - -(define_split - [(set (match_operand:XF 0
[wwwdocs] Use plain
Per the HTML 5 validator used by w3.org. Applied. Gerald Index: projects/cxx0x.html === RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cxx0x.html,v retrieving revision 1.76 diff -u -r1.76 cxx0x.html --- projects/cxx0x.html 1 Sep 2018 23:42:09 - 1.76 +++ projects/cxx0x.html 10 Sep 2018 19:04:48 - @@ -3,7 +3,7 @@ - +