[PATCH] Fix free_lang_data on asm stmts (PR lto/91572)
Hi! The following testcase ICEs, because on the inline asm LTO streaming streams the constraint strings ("g" in this case), including their type, but the fld type discovery doesn't see that type and so we end up streaming const char type turned into its own main variant. The strings for asm are in TREE_PURPOSE of the TREE_LIST args. walk_tree doesn't walk TREE_PURPOSE though. Tried to change that, but it breaks way too much, tried to walk TREE_PURPOSE of TREE_LIST just for the fld walking (find_decls_types_r), but that doesn't work either, most of the TREE_PURPOSE we do not want to walk, usually it contains C++ default arguments which fld clears. So, this directed patch walks the TREE_PURPOSE solely for the asm stmt arguments. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2019-08-31 Jakub Jelinek PR lto/91572 * tree.c (find_decls_types_in_node): Also walk TREE_PURPOSE of GIMPLE_ASM TREE_LIST operands. * g++.dg/lto/pr91572_0.C: New test. --- gcc/tree.c.jj 2019-08-29 10:22:06.337702323 +0200 +++ gcc/tree.c 2019-08-29 11:07:16.120107950 +0200 @@ -6142,6 +6142,13 @@ find_decls_types_in_node (struct cgraph_ { tree arg = gimple_op (stmt, i); find_decls_types (arg, fld); + /* find_decls_types doesn't walk TREE_PURPOSE of TREE_LISTs, +which we need for asm stmts. */ + if (arg + && TREE_CODE (arg) == TREE_LIST + && TREE_PURPOSE (arg) + && gimple_code (stmt) == GIMPLE_ASM) + find_decls_types (TREE_PURPOSE (arg), fld); } } } --- gcc/testsuite/g++.dg/lto/pr91572_0.C.jj 2019-08-28 18:13:47.718349087 +0200 +++ gcc/testsuite/g++.dg/lto/pr91572_0.C2019-08-28 18:13:41.695436342 +0200 @@ -0,0 +1,12 @@ +// PR lto/91572 +// { dg-lto-do link } +// { dg-lto-options { { -O -fPIC -flto } } } +// { dg-require-effective-target shared } +// { dg-require-effective-target fpic } +// { dg-extra-ld-options "-shared" } + +void foo (char); +namespace N { + class A { A (); }; + A::A () { asm ("" : : "g" (0)); } +} Jakub
Go patch committed: Check for notinheap struct at each struct field
This patch to the Go frontend checks for a notinheap struct at each struct field. When generating write barriers, we were only checking for a notinheap struct at the outermost struct. That mishandled the case of setting a pointer to a notinheap struct as a field of another struct that not notinheap. This caused an invalid write barrier error when building the 1.13 version of the runtime. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 275239) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -289d94b9e6303ec74649d1f08d418300f2b4d0fd +3b8a505824abb2a69f4c04c555a4ba29ab8b102b The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/wb.cc === --- gcc/go/gofrontend/wb.cc (revision 274890) +++ gcc/go/gofrontend/wb.cc (working copy) @@ -733,6 +733,31 @@ Gogo::assign_needs_write_barrier( && !lhs->type()->points_to()->in_heap()) return false; + // For a struct assignment, we don't need a write barrier if all + // the field types can not be in the heap. + Struct_type* st = lhs->type()->struct_type(); + if (st != NULL) + { + bool in_heap = false; + const Struct_field_list* fields = st->fields(); + for (Struct_field_list::const_iterator p = fields->begin(); + p != fields->end(); + p++) + { + Type* ft = p->type(); + if (!ft->has_pointer()) + continue; + if (!ft->in_heap()) + continue; + if (ft->points_to() != NULL && !ft->points_to()->in_heap()) + continue; + in_heap = true; + break; + } + if (!in_heap) + return false; + } + Field_reference_expression* fre = lhs->field_reference_expression(); if (fre != NULL) { @@ -788,31 +813,6 @@ Gogo::assign_needs_write_barrier( && this->is_nonwb_pointer(ue->operand(), nonwb_pointers)) return false; - // For a struct assignment, we don't need a write barrier if all the - // pointer types can not be in the heap. - Struct_type* st = lhs->type()->struct_type(); - if (st != NULL) -{ - bool in_heap = false; - const Struct_field_list* fields = st->fields(); - for (Struct_field_list::const_iterator p = fields->begin(); - p != fields->end(); - p++) - { - Type* ft = p->type(); - if (!ft->has_pointer()) - continue; - if (!ft->in_heap()) - continue; - if (ft->points_to() != NULL && !ft->points_to()->in_heap()) - continue; - in_heap = true; - break; - } - if (!in_heap) - return false; -} - // Write barrier needed in other cases. return true; }
Go patch committed: Support and use single argument go:linkname
The gc compiler has started permitting go:linkname comments with a single argument to mean that a function should be externally visible outside the package. This patch implements this in the Go frontend. We also change the libgo runtime package to use it, rather than repeating the name just to export a function. We also remove a couple of unnecessary go:linkname comments on declarations. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 275238) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -80403eb9e95c9642ebabb4d7c43deedaa763211f +289d94b9e6303ec74649d1f08d418300f2b4d0fd The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/gogo.cc === --- gcc/go/gofrontend/gogo.cc (revision 275227) +++ gcc/go/gofrontend/gogo.cc (working copy) @@ -2531,9 +2531,22 @@ Gogo::add_linkname(const std::string& go if (no == NULL) go_error_at(loc, "%s is not defined", go_name.c_str()); else if (no->is_function()) -no->func_value()->set_asm_name(ext_name); +{ + if (ext_name.empty()) + no->func_value()->set_is_exported_by_linkname(); + else + no->func_value()->set_asm_name(ext_name); +} else if (no->is_function_declaration()) -no->func_declaration_value()->set_asm_name(ext_name); +{ + if (ext_name.empty()) + go_error_at(loc, + ("//% missing external name " +"for declaration of %s"), + go_name.c_str()); + else + no->func_declaration_value()->set_asm_name(ext_name); +} else go_error_at(loc, ("%s is not a function; " @@ -5465,7 +5478,8 @@ Function::Function(Function_type* type, calls_recover_(false), is_recover_thunk_(false), has_recover_thunk_(false), calls_defer_retaddr_(false), is_type_specific_function_(false), in_unique_section_(false), export_for_inlining_(false), -is_inline_only_(false), is_referenced_by_inline_(false) +is_inline_only_(false), is_referenced_by_inline_(false), +is_exported_by_linkname_(false) { } @@ -6220,6 +6234,11 @@ Function::get_or_make_decl(Gogo* gogo, N if (this->is_referenced_by_inline_) flags |= Backend::function_is_visible; + // A go:linkname directive can be used to force a function to be + // visible. + if (this->is_exported_by_linkname_) + flags |= Backend::function_is_visible; + // If a function calls the predeclared recover function, we // can't inline it, because recover behaves differently in a // function passed directly to defer. If this is a recover Index: gcc/go/gofrontend/gogo.h === --- gcc/go/gofrontend/gogo.h(revision 274890) +++ gcc/go/gofrontend/gogo.h(working copy) @@ -547,7 +547,9 @@ class Gogo { this->file_block_names_[name] = location; } // Add a linkname, from the go:linkname compiler directive. This - // changes the externally visible name of go_name to be ext_name. + // changes the externally visible name of GO_NAME to be EXT_NAME. + // If EXT_NAME is the empty string, GO_NAME is unchanged, but the + // symbol is made publicly visible. void add_linkname(const std::string& go_name, bool is_exported, const std::string& ext_name, Location location); @@ -1359,6 +1361,11 @@ class Function set_asm_name(const std::string& asm_name) { this->asm_name_ = asm_name; } + // Mark this symbol as exported by a linkname directive. + void + set_is_exported_by_linkname() + { this->is_exported_by_linkname_ = true; } + // Return the pragmas for this function. unsigned int pragmas() const @@ -1706,6 +1713,9 @@ class Function // True if this function is referenced from an inlined body that // will be put into the export data. bool is_referenced_by_inline_ : 1; + // True if we should make this function visible to other packages + // because of a go:linkname directive. + bool is_exported_by_linkname_ : 1; }; // A snapshot of the current binding state. Index: gcc/go/gofrontend/lex.cc === --- gcc/go/gofrontend/lex.cc(revision 274614) +++ gcc/go/gofrontend/lex.cc(working copy) @@ -1996,7 +1996,7 @@ Lex::skip_cpp_comment() while (ps < pend && *ps != ' ' && *ps != '\t') ++ps; - if (ps < pend) + if (ps <= pend) go_name = std::string(pg, ps - pg); while (ps < pend && (*ps == ' ' || *ps == '\t')) ++ps; @@ -2015,8 +2015,8 @@ Lex::skip_cpp_comment() ext_name.clear(); } } -
Go patch committed: Don't report runtime escapes if we've seen errors
If we get errors during a Go compilation, we skip the escape analysis pass. If we are compiling the runtime package, we report an error if a bound method expression escapes. The effect is that if we get an error while compiling the runtime package, we would report confusing and meaningless errors about bound method expressions escaping. This patch stops doing that. Bootstrapped and ran Go tests on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 275237) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -2444eb1e8c697531f8beb403679e4ab00b16dbf5 +80403eb9e95c9642ebabb4d7c43deedaa763211f The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 274998) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -7948,7 +7948,9 @@ Bound_method_expression::do_flatten(Gogo Node* n = Node::make_node(this); if ((n->encoding() & ESCAPE_MASK) == Node::ESCAPE_NONE) ret->heap_expression()->set_allocate_on_stack(); - else if (gogo->compiling_runtime() && gogo->package_name() == "runtime") + else if (gogo->compiling_runtime() + && gogo->package_name() == "runtime" + && !saw_errors()) go_error_at(loc, "%s escapes to heap, not allowed in runtime", n->ast_format(gogo).c_str());
Re: Go patch committed: Provide index information on bounds check failure
On Thu, Aug 29, 2019 at 1:50 PM Andreas Schwab wrote: > > On Aug 28 2019, Ian Lance Taylor wrote: > > > This patch to the Go frontend and libgo changes the panic message > > reported for an out of bounds index or slice operation to include the > > invalid values. > > This breaks aarch64/-mabi=ilp32. > > aarch64-suse-linux/ilp32/libgo/archive/tar/check-testlog: > > /usr/aarch64-suse-linux/bin/ld: _gotest_.o: in function > `archive..z2ftar.Reader.next': > /opt/gcc/gcc-20190829/Build/aarch64-suse-linux/ilp32/libgo/gotest1086/test/reader.go:72: > undefined reference to `runtime.goPanicExtendSliceAlen' We should probably use a different GOARCH value for that build, such as arm64p32. But for now I'm committing this patch which will fix both problems by just always building panic32.go. Bootstrapped and tested on x86_64-pc-linux-gnu. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 275227) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -11fd9208f8545e882f945d3ed86fcc33abf1a61b +2444eb1e8c697531f8beb403679e4ab00b16dbf5 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/runtime/panic32.go === --- libgo/go/runtime/panic32.go (revision 274998) +++ libgo/go/runtime/panic32.go (working copy) @@ -2,8 +2,6 @@ // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file. -// +build 386 amd64p32 arm mips mipsle m68k nios2 sh shbe - package runtime import _ "unsafe" // for go:linkname
Re: [PATCH, V3, #4 of 10], Add general prefixed/pcrel support
On Fri, Aug 30, 2019 at 11:35:11AM -0500, Segher Boessenkool wrote: > > - unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1)); > > - return val + 0x8000 >= 0x1 - (TARGET_POWERPC64 ? 8 : 12); > > + HOST_WIDE_INT val = INTVAL (XEXP (addr, 1)); > > + HOST_WIDE_INT extra = TARGET_POWERPC64 ? 8 : 12; > > The 8 vs. 12 could use a comment (yes, I know it was there already). Do you > know what this is about, why it is 8 and 12? "extra" here covers the increase in offset needed to access the memory using multiple registers. For example, when loading a TImode mem to gprs you will load at offset+0 and offset+8 when powerpc64, and offset+0, offset+4, offset+8, and offset+12 when powerpc32. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.
On Fri, Aug 30, 2019 at 09:42:06AM +0200, Uros Bizjak wrote: > On Fri, Aug 30, 2019 at 9:22 AM Richard Biener > wrote: > > > > On Thu, Aug 29, 2019 at 9:54 AM Uros Bizjak wrote: > > > > > > On Wed, Aug 28, 2019 at 5:12 PM Uros Bizjak wrote: > > > > > > > > Attached patch improves costing for STV shifts and corrects reject > > > > condition for out of range shift count operands. > > > > > > > > 2019-08-28 Uroš Bizjak > > > > > > > > * config/i386/i386-features.c > > > > (general_scalar_chain::compute_convert_gain): > > > > Correct cost for double-word shifts. > > > > (general_scalar_to_vector_candidate_p): Reject count operands > > > > greater or equal to mode bitsize. > > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > > > > > Committed to mainline SVN. > > > > > > Ouch... I mixed up patches and actually committed the patch that > > > removes maximum from cost of sse<->int moves. > > > > > > I can leave the patch for a day, so we can see the effects of the cost > > > change, and if the patch creates problems, I'll revert it. > > > > Regresses gromacs and namd quite a bit on Haswell, also perl in SPEC 2000. > > I guess we should try understand why rather than reverting immediately > > (I'd leave it in at least another few days to get more testers pick up the > > rev.). > > The correct patch is actually [1] (I have updated the subject): > > 2019-08-28 Uroš Bizjak > > * config/i386/i386.c (ix86_register_move_cost): Do not > limit the cost of moves to/from XMM register to minimum 8. > > There is no technical reason to limit the costs to minimum 8 anymore, > and several targets (e.g skylake) also claim that moves between SSE > and general regs are as cheap as moves between general regs. However, > these values were never benchmarked properly due to the mentioned > limitation and apparently cause some unwanted secondary effects > (lowering the clock). > > I agree with your proposal to leave the change in the tree some more > time. At the end, we could raise the costs for all targets to 8 to > effectively revert to the previous state. > > [1] https://gcc.gnu.org/ml/gcc-patches/2019-08/msg02009.html Those SPEC regressions sound similar to what I saw when trying to give proper costing to moves between general regs and vsx regs on Power9. rs6000.c:rs6000_ira_change_pseudo_allocno_class is the hack I used to work around bad register allocation decisions due to poor register pressure calculation. -- Alan Modra Australia Development Lab, IBM
[Committed] PR fortran/91587 -- A syntax error should occur
This is obvious. 2019-08-30 Steven G. Kargl PR fortran/91587 * io.c (match_filepos): MATCH_ERROR should branch to a syntax error. 2019-08-30 Steven G. Kargl PR fortran/91587 * gfortran.dg/pr91587.f90: New test. svn commit gcc/fortran/io.c gcc/fortran/ChangeLog gcc/testsuite/ChangeLog \ gcc/testsuite/gfortran.dg/pr91587.f90 Index: gcc/fortran/io.c === --- gcc/fortran/io.c(revision 275048) +++ gcc/fortran/io.c(working copy) @@ -2845,7 +2845,7 @@ match_filepos (gfc_statement st, gfc_exec_op op) m = match_file_element (fp); if (m == MATCH_ERROR) -goto done; +goto syntax; if (m == MATCH_NO) { m = gfc_match_expr (&fp->unit); Index: gcc/testsuite/gfortran.dg/pr91587.f90 === --- gcc/testsuite/gfortran.dg/pr91587.f90 (nonexistent) +++ gcc/testsuite/gfortran.dg/pr91587.f90 (working copy) @@ -0,0 +1,12 @@ +! { dg-do compile } +! PR fortran/91587 +! Code contributed by Gerhard Steinmetz +program p + backspace(err=!) ! { dg-error "Syntax error in" } + flush(err=!) ! { dg-error "Syntax error in" } + rewind(err=!) ! { dg-error "Syntax error in" } +end + +subroutine bar ! An other matcher runs, and gives a different error. + endfile(err=!)! { dg-error "Expecting END" } +end -- Steve
Re: [PATCH 2/2] gcc/riscv: Add a mechanism to remove some calls to _riscv_save_0
On Sun, Aug 25, 2019 at 9:40 AM Jim Wilson wrote: > The problem with the linux toolchain support for -msave-restore is > that we forgot to add a version script to export the symbols. I made > the obvious change to add a RISC-V specific libgcc version script, and > now everything in the g++ and gfortran testsuites are linking, but the > execution tests are all timing out. So there is still something > wrong. I need to do some more work here. Following up on this, I found a linker bug where it was emitting error messages for non-pic code in shared libraries, but wasn't exiting with an error, so bad shared libraries were being produced without killing the build. I wrote and committed a binutils patch to fix that. The underlying problem is that the save/restore functions were always called as non-pic, even for a shared library. But fixing that produced shared libraries that failed again, which turned out because the save/restore functions use the alternate link register t0/x5 which is clobbered by plts, so we can't call them in shared libraries at all. I wrote and committed a gcc patch to disable -msave-restore when -fpic is used, and emit a warning message if the user explicitly turned on -msave-restore. Since we can't use the save/restore functions in shared libraries, we don't need to export them or support pic calls to them, and I dropped those patches. With these changes I'm now able to do -msave-restore testing with a linux toolchain. Testing with and without your patch for a riscv64-linux toolchain, I see 11 extra gcc failures, 2 extra g++ failures, and 8 extra gfortran failures. I didn't look at the details. I suspect that most of these are failing for the same reason, and there is only a couple of minor bugs that need to be fixed to make your patch work. Jim
Re: [ PATCH ] C++20
Ahem -- we were supposed to use the 20 version of the constexpr macro, not the base one. I will note that, for some reason, the default constructor was already constexpr, so we don't change that one! On Fri, Aug 30, 2019 at 5:11 PM JeanHeyd Meneide wrote: > > On Fri, Aug 30, 2019 at 3:41 PM Jonathan Wakely wrote: > > > > On 30/08/19 15:22 -0400, JeanHeyd Meneide wrote: > > >This patch implements as it currently exists in the C++20 Working > > >Draft. > > > > Nice! > > > > > > >Notes: > > >- __gnu_cxx::__normal_iterator is not fully constexpr, so its not used here > > > > I'd prefer to make __normal_iterator constexpr, and use it. > > It needs to be constexpr anyway for string and vector. > > ... > > Alright, thank you for the feedback. I fixed it up! > > Tested x86_64-pc-linux-gnu. > > 2019-08-30 JeanHeyd "ThePhD" Meneide > > * include/std/span: Implement the entirety of span. > * include/bits/stl_iterator.h: __gnu_cxx::__normal_iterator C> is now constexpr-qualified for C++11+. >* include/bits/range_access.h: Add __adl_* versions of access > functions. > * testsuite/23_containers/span/everything.cc: constexpr and > non-constexpr tests. > * include/Makefile.in: Add span to install. > * include/Makefile.am: Likewise diff --git a/.gitignore b/.gitignore index d9d3967a12c..fd116c362ac 100644 --- a/.gitignore +++ b/.gitignore @@ -57,3 +57,7 @@ REVISION /mpc* /gmp* /isl* + +# ignore sprinkled in dev files that keep popping up +.vscode/ +.vs/ diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 3fe80f32cc4..b8b786d9260 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -68,6 +68,7 @@ std_headers = \ ${std_srcdir}/scoped_allocator \ ${std_srcdir}/set \ ${std_srcdir}/shared_mutex \ + ${std_srcdir}/span \ ${std_srcdir}/sstream \ ${std_srcdir}/stack \ ${std_srcdir}/stdexcept \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index b675d356cd4..cd1e9df5482 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -412,6 +412,7 @@ std_headers = \ ${std_srcdir}/scoped_allocator \ ${std_srcdir}/set \ ${std_srcdir}/shared_mutex \ + ${std_srcdir}/span \ ${std_srcdir}/sstream \ ${std_srcdir}/stack \ ${std_srcdir}/stdexcept \ diff --git a/libstdc++-v3/include/bits/range_access.h b/libstdc++-v3/include/bits/range_access.h index d1e74711433..3acaebadcf1 100644 --- a/libstdc++-v3/include/bits/range_access.h +++ b/libstdc++-v3/include/bits/range_access.h @@ -318,6 +318,72 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif // C++17 +#if __cplusplus > 201703L + // "why are these in namespace std:: and not __gnu_cxx:: ?" + // because if we don't put them here it's impossible to + // have implicit ADL with "using std::begin/end/size/data;". + template +constexpr auto +__adl_begin(_Container& __cont) noexcept(noexcept(begin(__cont))) +{ return begin(__cont); } + + template +constexpr auto +__adl_end(_Container& __cont) noexcept(noexcept(end(__cont))) +{ return end(__cont); } + + template +constexpr auto +__adl_cbegin(_Container& __cont) noexcept(noexcept(cbegin(__cont))) +{ return cbegin(__cont); } + + template +constexpr auto +__adl_cend(_Container& __cont) noexcept(noexcept(cend(__cont))) +{ return cend(__cont); } + + template +constexpr auto +__adl_rbegin(_Container& __cont) noexcept(noexcept(rbegin(__cont))) +{ return rbegin(__cont); } + + template +constexpr auto +__adl_rend(_Container& __cont) noexcept(noexcept(rend(__cont))) +{ return rend(__cont); } + + template +constexpr auto +__adl_crbegin(_Container& __cont) noexcept(noexcept(crbegin(__cont))) +{ return crbegin(__cont); } + + template +constexpr auto +__adl_crend(_Container& __cont) noexcept(noexcept(crend(__cont))) +{ return crend(__cont); } + + template +constexpr auto +__adl_data(_Container& __cont) noexcept(noexcept(data(__cont))) +{ return data(__cont); } + + template +constexpr auto +__adl_cdata(_Container& __cont) noexcept(noexcept(cdata(__cont))) +{ return cdata(__cont); } + + template +constexpr auto +__adl_size(_Container& __cont) noexcept(noexcept(size(__cont))) +{ return size(__cont); } + + template +constexpr auto +__adl_empty(_Container& __cont) noexcept(noexcept(empty(__cont))) +{ return empty(__cont); } + +#endif // C++20 + _GLIBCXX_END_NAMESPACE_VERSION } // namespace diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h index 8ab0d72b0c2..f6a6060f2e3 100644 --- a/libstdc++-v3/include/bits/stl_iterator.h +++ b/libstdc++-v3/include/bits/stl_iterator.h @@ -799,55 +799,58 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION typedef type
[PATCH] RISC-V: Disable -msave-restore for shared libraries.
This was noticed while trying to test -msave-restore support. The save/restore routines use the alternate return register t0/x5 which is clobbered by the PLT header, so we can't use them in shared libraries. This patch disables -msave-restore when -fpic (and -mplt), and emits a warning if the user explicitly turned on -msave-restore. Tested with a riscv64-linux cross compiler build and make check with the option -msave-restore hardwired on. There are thousands of failures without the patch. With the patch I get a reasonable number of failures. Also tested by hand to verify I get the warning when the user specifies the option. Committed. Jim gcc/ * config/riscv/riscv.c (riscv_option_override): If -msave-restore and -fpic and -mplt then disable -msave-restore and warn. --- gcc/config/riscv/riscv.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 9b16a1eb9c2..1e7528f1cb9 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -4636,6 +4636,16 @@ riscv_option_override (void) error ("%<-mriscv-attribute%> RISC-V ELF attribute requires GNU as 2.32" " [%<-mriscv-attribute%>]"); #endif + + /* The save-restore routines use t0 which is clobbered by the plt header, + so we can't use them when building shared libraries. */ + if (TARGET_SAVE_RESTORE && flag_pic && TARGET_PLT) +{ + target_flags &= ~MASK_SAVE_RESTORE; + if (target_flags_explicit & MASK_SAVE_RESTORE) + warning (0, "%<-msave-restore%> disabled; not supported with PLT " +"based shared libraries"); +} } /* Implement TARGET_CONDITIONAL_REGISTER_USAGE. */ -- 2.17.1
Go patch committed: Permit anonymous and empty fields in C header
This patch to the Go frontend permits putting structs with anonymous and empty fields in the C header file runtime.inc that is used to build the C runtime code. This is required for upcoming 1.13 support, as the m struct has picked up an anonymous field. Doing this lets the C header contain all the type descriptor structs, so start using those in the C code. This cuts the number of copies of type descriptor definitions from 3 to 2. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 275010) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -db738935c77443840994e5a9f77e619e67a4c43a +11fd9208f8545e882f945d3ed86fcc33abf1a61b The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/gogo.cc === --- gcc/go/gofrontend/gogo.cc (revision 274998) +++ gcc/go/gofrontend/gogo.cc (working copy) @@ -5238,11 +5238,11 @@ Gogo::write_c_header() // package they are mostly types defined by mkrsysinfo.sh based // on the C system header files. We don't need to translate // types to C and back to Go. But do accept the special cases - // _defer and _panic. + // _defer, _panic, and _type. std::string name = Gogo::unpack_hidden_name(no->name()); if (name[0] == '_' && (name[1] < 'A' || name[1] > 'Z') - && (name != "_defer" && name != "_panic")) + && (name != "_defer" && name != "_panic" && name != "_type")) continue; if (no->is_type() && no->type_value()->struct_type() != NULL) Index: gcc/go/gofrontend/types.cc === --- gcc/go/gofrontend/types.cc (revision 274935) +++ gcc/go/gofrontend/types.cc (working copy) @@ -6777,8 +6777,6 @@ Struct_type::can_write_to_c_header( p != fields->end(); ++p) { - if (p->is_anonymous()) - return false; if (!this->can_write_type_to_c_header(p->type(), requires, declare)) return false; if (Gogo::message_name(p->field_name()) == "_") @@ -6847,6 +6845,9 @@ Struct_type::can_write_type_to_c_header( } if (t->struct_type() != NULL) { + // We will accept empty struct fields, but not print them. + if (t->struct_type()->total_field_count() == 0) + return true; requires->push_back(no); return t->struct_type()->can_write_to_c_header(requires, declare); } @@ -6871,6 +6872,12 @@ Struct_type::write_to_c_header(std::ostr p != fields->end(); ++p) { + // Skip fields that are empty struct types. The C code can't + // refer to them anyhow. + if (p->type()->struct_type() != NULL + && p->type()->struct_type()->total_field_count() == 0) + continue; + os << '\t'; this->write_field_to_c_header(os, p->field_name(), p->type()); os << ';' << std::endl; Index: libgo/go/reflect/makefunc_ffi_c.c === --- libgo/go/reflect/makefunc_ffi_c.c (revision 274169) +++ libgo/go/reflect/makefunc_ffi_c.c (working copy) @@ -3,7 +3,6 @@ // license that can be found in the LICENSE file. #include "runtime.h" -#include "go-type.h" #ifdef USE_LIBFFI Index: libgo/mkruntimeinc.sh === --- libgo/mkruntimeinc.sh (revision 274998) +++ libgo/mkruntimeinc.sh (working copy) @@ -29,6 +29,12 @@ do sed -e '/struct '${TYPE}' {/,/^}/s/^.*$//' runtime.inc.tmp2 > runtime.inc.tmp3; mv runtime.inc.tmp3 runtime.inc.tmp2 done -sed -e 's/sigset/sigset_go/' runtime.inc.tmp2 > ${OUT} +sed -e 's/sigset/sigset_go/' runtime.inc.tmp2 > runtime.inc.tmp3 +mv runtime.inc.tmp3 runtime.inc.tmp2 + +# Make all the fields of type structs const. +sed -e '/struct .*type {/,/^}/ s/ \(.*;\)/const \1/' < runtime.inc.tmp2 > runtime.inc.tmp3 +mv -f runtime.inc.tmp3 ${OUT} + rm -f runtime.inc.tmp2 runtime.inc.tmp3 exit 0 Index: libgo/runtime/go-construct-map.c === --- libgo/runtime/go-construct-map.c(revision 274169) +++ libgo/runtime/go-construct-map.c(working copy) @@ -9,20 +9,17 @@ #include #include "runtime.h" -#include "go-type.h" -extern void *makemap (const struct __go_map_type *, intgo hint, - void *) +extern void *makemap (const struct maptype *, intgo hint, void *) __asm__ (GOSYM_PREFIX "runtime.makemap"); -extern void *mapassign (const struct __go_map_type *, void *hmap, - const void *key) +extern void *mapassign (const struct maptype *, void *hmap, const void *key) __asm__ (G
Re: [PATCH] or1k: Fix issue with set_got clobbering r9
On Fri, Aug 30, 2019 at 08:21:56AM -0700, Richard Henderson wrote: > LGTM. Thank you. > On 8/30/19 2:31 AM, Stafford Horne wrote: > > Hello, any comments on this? > > > > If nothing I will commit in a few days. > > > > On Thu, Aug 22, 2019 at 08:44:04PM +0900, Stafford Horne wrote: > >> When compiling glibc we found that the GOT register was being allocated > >> r9 when the instruction was still set_got_tmp. That caused set_got to > >> clobber r9. We cannot simply say set_got_tmp clobbers r9 as this is the > >> reason for having the temporary set_got_tmp. > >> > >> Fix by using a register class constraint that does not allow r9 during > >> register allocation. > >> > >> gcc/ChangeLog: > >> > >>* config/or1k/constraints.md (t): New constraint. > >>* config/or1k/or1k.h (GOT_REGS): New register class. > >>* config/or1k/or1k.md (set_got_tmp, set_got): Use t contraint. > >> --- > >> gcc/config/or1k/constraints.md | 4 > >> gcc/config/or1k/or1k.h | 3 +++ > >> gcc/config/or1k/or1k.md| 4 ++-- > >> 3 files changed, 9 insertions(+), 2 deletions(-) > >> > >> diff --git a/gcc/config/or1k/constraints.md > >> b/gcc/config/or1k/constraints.md > >> index 8cac7eb5329..ba330c6b8c2 100644 > >> --- a/gcc/config/or1k/constraints.md > >> +++ b/gcc/config/or1k/constraints.md > >> @@ -25,6 +25,7 @@ > >> ; We use: > >> ; c - sibcall registers > >> ; d - double pair base registers (excludes r0, r30 and r31 which > >> overflow) > >> +; t - got address registers (excludes r9 is clobbered by set_got) > > > > I will changee this to (... r9 which is clobbered ...) > > > >> ; I - constant signed 16-bit > >> ; K - constant unsigned 16-bit > >> ; M - constant signed 16-bit shifted left 16-bits (l.movhi) > >> @@ -36,6 +37,9 @@ > >> (define_register_constraint "d" "DOUBLE_REGS" > >>"Registers which can be used for double reg pairs.") > >> > >> +(define_register_constraint "t" "GOT_REGS" > >> + "Registers which can be used to store the Global Offset Table (GOT) > >> address.") > >> + > >> ;; Immediates > >> (define_constraint "I" > >>"A signed 16-bit immediate in the range -32768 to 32767." > >> diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h > >> index 2b29e62fdd3..4c32607bac1 100644 > >> --- a/gcc/config/or1k/or1k.h > >> +++ b/gcc/config/or1k/or1k.h > >> @@ -190,6 +190,7 @@ enum reg_class > >>NO_REGS, > >>SIBCALL_REGS, > >>DOUBLE_REGS, > >> + GOT_REGS, > >>GENERAL_REGS, > >>FLAG_REGS, > >>ALL_REGS, > >> @@ -202,6 +203,7 @@ enum reg_class > >>"NO_REGS", \ > >>"SIBCALL_REGS", \ > >>"DOUBLE_REGS", \ > >> + "GOT_REGS", \ > >>"GENERAL_REGS", \ > >>"FLAG_REGS",\ > >>"ALL_REGS" } > >> @@ -215,6 +217,7 @@ enum reg_class > >> { { 0x, 0x }, \ > >>{ SIBCALL_REGS_MASK, 0 }, \ > >>{ 0x7f7e, 0x }, \ > >> + { 0xfdff, 0x }, \ > >>{ 0x, 0x0003 }, \ > >>{ 0x, 0x0004 }, \ > >>{ 0x, 0x0007 } \ > >> diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md > >> index cee11d078cc..36bcee336ab 100644 > >> --- a/gcc/config/or1k/or1k.md > >> +++ b/gcc/config/or1k/or1k.md > >> @@ -706,7 +706,7 @@ > >> ;; set_got pattern below. This works because the set_got_tmp insn is the > >> ;; first insn in the stream and that it isn't moved during RA. > >> (define_insn "set_got_tmp" > >> - [(set (match_operand:SI 0 "register_operand" "=r") > >> + [(set (match_operand:SI 0 "register_operand" "=t") > >>(unspec_volatile:SI [(const_int 0)] UNSPECV_SET_GOT))] > >>"" > >> { > >> @@ -715,7 +715,7 @@ > >> > >> ;; The insn to initialize the GOT. > >> (define_insn "set_got" > >> - [(set (match_operand:SI 0 "register_operand" "=r") > >> + [(set (match_operand:SI 0 "register_operand" "=t") > >>(unspec:SI [(const_int 0)] UNSPEC_SET_GOT)) > >> (clobber (reg:SI LR_REGNUM))] > >>"" > >> -- > >> 2.21.0 > >> >
Re: [ PATCH ] C++20
On Fri, Aug 30, 2019 at 3:41 PM Jonathan Wakely wrote: > > On 30/08/19 15:22 -0400, JeanHeyd Meneide wrote: > >This patch implements as it currently exists in the C++20 Working > >Draft. > > Nice! > > > >Notes: > >- __gnu_cxx::__normal_iterator is not fully constexpr, so its not used here > > I'd prefer to make __normal_iterator constexpr, and use it. > It needs to be constexpr anyway for string and vector. > ... Alright, thank you for the feedback. I fixed it up! Tested x86_64-pc-linux-gnu. 2019-08-30 JeanHeyd "ThePhD" Meneide * include/std/span: Implement the entirety of span. * include/bits/stl_iterator.h: __gnu_cxx::__normal_iterator is now constexpr-qualified for C++11+. * include/bits/range_access.h: Add __adl_* versions of access functions. * testsuite/23_containers/span/everything.cc: constexpr and non-constexpr tests. * include/Makefile.in: Add span to install. * include/Makefile.am: Likewise diff --git a/.gitignore b/.gitignore index d9d3967a12c..fd116c362ac 100644 --- a/.gitignore +++ b/.gitignore @@ -57,3 +57,7 @@ REVISION /mpc* /gmp* /isl* + +# ignore sprinkled in dev files that keep popping up +.vscode/ +.vs/ diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 3fe80f32cc4..b8b786d9260 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -68,6 +68,7 @@ std_headers = \ ${std_srcdir}/scoped_allocator \ ${std_srcdir}/set \ ${std_srcdir}/shared_mutex \ + ${std_srcdir}/span \ ${std_srcdir}/sstream \ ${std_srcdir}/stack \ ${std_srcdir}/stdexcept \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index b675d356cd4..cd1e9df5482 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -412,6 +412,7 @@ std_headers = \ ${std_srcdir}/scoped_allocator \ ${std_srcdir}/set \ ${std_srcdir}/shared_mutex \ + ${std_srcdir}/span \ ${std_srcdir}/sstream \ ${std_srcdir}/stack \ ${std_srcdir}/stdexcept \ diff --git a/libstdc++-v3/include/bits/range_access.h b/libstdc++-v3/include/bits/range_access.h index d1e74711433..3acaebadcf1 100644 --- a/libstdc++-v3/include/bits/range_access.h +++ b/libstdc++-v3/include/bits/range_access.h @@ -318,6 +318,72 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif // C++17 +#if __cplusplus > 201703L + // "why are these in namespace std:: and not __gnu_cxx:: ?" + // because if we don't put them here it's impossible to + // have implicit ADL with "using std::begin/end/size/data;". + template +constexpr auto +__adl_begin(_Container& __cont) noexcept(noexcept(begin(__cont))) +{ return begin(__cont); } + + template +constexpr auto +__adl_end(_Container& __cont) noexcept(noexcept(end(__cont))) +{ return end(__cont); } + + template +constexpr auto +__adl_cbegin(_Container& __cont) noexcept(noexcept(cbegin(__cont))) +{ return cbegin(__cont); } + + template +constexpr auto +__adl_cend(_Container& __cont) noexcept(noexcept(cend(__cont))) +{ return cend(__cont); } + + template +constexpr auto +__adl_rbegin(_Container& __cont) noexcept(noexcept(rbegin(__cont))) +{ return rbegin(__cont); } + + template +constexpr auto +__adl_rend(_Container& __cont) noexcept(noexcept(rend(__cont))) +{ return rend(__cont); } + + template +constexpr auto +__adl_crbegin(_Container& __cont) noexcept(noexcept(crbegin(__cont))) +{ return crbegin(__cont); } + + template +constexpr auto +__adl_crend(_Container& __cont) noexcept(noexcept(crend(__cont))) +{ return crend(__cont); } + + template +constexpr auto +__adl_data(_Container& __cont) noexcept(noexcept(data(__cont))) +{ return data(__cont); } + + template +constexpr auto +__adl_cdata(_Container& __cont) noexcept(noexcept(cdata(__cont))) +{ return cdata(__cont); } + + template +constexpr auto +__adl_size(_Container& __cont) noexcept(noexcept(size(__cont))) +{ return size(__cont); } + + template +constexpr auto +__adl_empty(_Container& __cont) noexcept(noexcept(empty(__cont))) +{ return empty(__cont); } + +#endif // C++20 + _GLIBCXX_END_NAMESPACE_VERSION } // namespace diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h index 8ab0d72b0c2..4d432da0c6b 100644 --- a/libstdc++-v3/include/bits/stl_iterator.h +++ b/libstdc++-v3/include/bits/stl_iterator.h @@ -803,51 +803,51 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _M_current(_Iterator()) { } explicit - __normal_iterator(const _Iterator& __i) _GLIBCXX_NOEXCEPT + _GLIBCXX_CONSTEXPR __normal_iterator(const _Iterator& __i) _GLIBCXX_NOEXCEPT : _M_current(__i) { } // Allow iterator to const_iterator conversion template -__normal_iterator(const
Re: [PATCH] Optimize to_chars
On 30/08/19 22:46 +0300, Antony Polukhin wrote: пт, 30 авг. 2019 г. в 19:01, Jonathan Wakely : <...> Have you tried comparing the improved code to libc++'s implementation? I believe they use precomputed arrays of digits, but they use larger arrays that allow 4 bytes to be written at once, which is considerably faster (and those precomputed arrays libe in libc++.so not in the header). Would we be better off keeping the precomputed arrays and expanding them to do 4-byte writes? This would not do good for bases 2, 8 and 16. For power of two bases there is no costly `mod` or `div` instructions, only bit operations. By increasing the digits table size the cache misses become more likely. OK, thanks. I'll try benchmarking your improved code against the libc++ version next week.
Re: [PATCH] Optimize to_chars
пт, 30 авг. 2019 г. в 19:01, Jonathan Wakely : <...> > Have you tried comparing the improved code to libc++'s implementation? > I believe they use precomputed arrays of digits, but they use larger > arrays that allow 4 bytes to be written at once, which is considerably > faster (and those precomputed arrays libe in libc++.so not in the > header). Would we be better off keeping the precomputed arrays and > expanding them to do 4-byte writes? This would not do good for bases 2, 8 and 16. For power of two bases there is no costly `mod` or `div` instructions, only bit operations. By increasing the digits table size the cache misses become more likely. For base 10... A few decades ago it was definitely a good idea to have a big array of digits. Nowadays compilers know how to replace `__val / 10` and `__val % 10` with much cheaper multiplications and bit magic. This is not as cheap as bit operations for power of two bases, but some cache misses could nullify the advantage. I need to experiment with base 10. There are ways to compress the digits table, but it requires benchmarking. Because of that this patch does not touch the __detail::__to_chars_10. > > Since we don't have a patch to do that, I think I'll commit yours. We > can always go back to precomputed arrays later if somebody does that > work. > > My only comments are on the changelog: > > >Changelog: > >* include/std/charconv (__detail::__to_chars_8, > >__detail::__to_chars_16): Replace array of precomputed digits > > When the list of changed functions is split across lines it should be > like this: > > * include/std/charconv (__detail::__to_chars_8) > (__detail::__to_chars_16): Replace array of precomputed digits > > i.e close the parentheses before the line break, and reopen on the > next line. > > >with arithmetic operations to avoid CPU cache misses. Remove > >zero termination from array of digits to allow symbol merge with > >generic implementation of __detail::__to_chars. Replace final > >offsets with constants. Use __detail::__to_chars_len_2 instead > >of a generic __detail::__to_chars_len. > >* include/std/charconv (__detail::__to_chars): Remove > > Don't repeat the asterisk and filename for later changes in the same > file, i.e. > > (__detail::__to_chars): Remove zero termination from array of digits. > (__detail::__to_chars_2): Leading digit is always '1'. > > There's no changelog entry for the changes to __to_chars_len_8 and > __to_chars_len_16. > > Also, please don't include the ChangeLog diff in the patch, because it > just makes it hard to apply the patch (the ChangeLog part will almost > always fail to apply because somebody else will have modified the > ChangeLog file since you created the patch, and so that hunk won't > apply. The ChangeLog text should be sent as a separate (plain text) > attachment, or just in the email body (as you did above). Thanks! I'll take it into account next time. -- Best regards, Antony Polukhin
Re: [ PATCH ] C++20
On 30/08/19 15:22 -0400, JeanHeyd Meneide wrote: This patch implements as it currently exists in the C++20 Working Draft. Nice! Notes: - __gnu_cxx::__normal_iterator is not fully constexpr, so its not used here I'd prefer to make __normal_iterator constexpr, and use it. It needs to be constexpr anyway for string and vector. - P1394 might be slated to end up in C++20 per National Body Comments. Therefore, an early implementation is left in an implementation-defined _GLIBCXX_P1394 define. 2019-08-30 JeanHeyd "ThePhD" Meneide * include/std/span: Implement the entirety of span. * include/bits/range_access.h: Add __adl_* versions of access functions. * testsuite/23_containers/span/everything.cc: constexpr and non-constexpr tests. * include/Makefile.in: Add span to install. * include/Makefile.am: Likewise +++ b/libstdc++-v3/include/std/span @@ -0,0 +1,549 @@ +// Components for manipulating non-owning sequences of objects -*- C++ -*- + +// Copyright (C) 2019-2019 Free Software Foundation, Inc. Just 2019 please, not 2019-2019. +// WARNING: they forgot this feature test macro +// get on someone's back about it in Belfast!!! Please use FIXME: instead of WARNING: (for consistency with the rest of the sources, so people grepping for FIXME: can find this). The new feature test macro should be in too. There's no need to qualify ::std::true_type and ::std::size_t when within namespace std already. There's no ADL for type names, and normal unqualified lookup will find the right ones (and is easier to read). + static_assert( +_Count == ::std::dynamic_extent || _Extent == ::std::dynamic_extent || _Count <= _Extent, +"bad span length"); There are a number of lines that are too long, they need to be broken before 80 columns. Our static_assert messages should be stated as the positive condition that is being asserted. So the diagnostic reads like "assertion failed: thing being asserted" So "bad span length" makes it look like we asserted the length is bad, but actually it was good. I prefer to write something saying "X must be true", e.g. "count must be equal to dynamic_extent, or less than the span's extent". Do we need to check _Extent == ::std::dynamic_extent here, give nthat if it's true then _Count <= _Extent will be true as well? + std::vector value{ 0 }; Your new testcase uses std::vector without including .
Re: [PATCH] Optimize to_chars
On 30/08/19 11:03 -0600, Martin Sebor wrote: On 8/30/19 8:27 AM, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this saves 129 bytes of data and totally avoids a chance of cache misses on __digits. * For base == 16 replacing the lookup in __digits table with arithmetic computations leads to a few additional instructions, but totally avoids a chance of cache misses on __digits (- ~9 cache misses for worst case) and saves 513 bytes of const data. * Replacing __first[pos] and __first[pos - 1] with __first[1] and __first[0] on final iterations saves ~2% of code size. * Removing trailing '\0' from arrays of digits allows the linker to merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and "0123456789abcdef" could share the same address). This improves data locality and reduces binary sizes. * Using __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len makes the operation O(1) instead of O(N). It also makes the code two times shorter ( https://godbolt.org/z/Peq_PG) . In sum: this significantly reduces the size of a binary (for about 4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals with latency (CPU cache misses) without changing the iterations count and without adding costly instructions into the loops. Would it make sense to move some of this code into GCC as a built-in so that it could also be used by GCC to expand some strtol and sprintf calls? Makes sense, although we'd still need it in libstdc++ until Clang and EDG implement the same built-in.
[ PATCH ] C++20
This patch implements as it currently exists in the C++20 Working Draft. Notes: - __gnu_cxx::__normal_iterator is not fully constexpr, so its not used here - P1394 might be slated to end up in C++20 per National Body Comments. Therefore, an early implementation is left in an implementation-defined _GLIBCXX_P1394 define. 2019-08-30 JeanHeyd "ThePhD" Meneide * include/std/span: Implement the entirety of span. * include/bits/range_access.h: Add __adl_* versions of access functions. * testsuite/23_containers/span/everything.cc: constexpr and non-constexpr tests. * include/Makefile.in: Add span to install. * include/Makefile.am: Likewise ThePhD.span.patch Description: Binary data
Re: [PATCH, V3, #5 of 10], Make -mpcrel default on little endian Linux systems
On Mon, Aug 26, 2019 at 05:07:25PM -0400, Michael Meissner wrote: > +/* By default enable support for pc-relative and numeric prefixed addressing > on > + the 'future' system, unless it is overriden at build time. */ > +#ifndef TARGET_PREFIXED_ADDR_DEFAULT > +#define TARGET_PREFIXED_ADDR_DEFAULT 1 > +#endif > + > +#if !defined (TARGET_PCREL_DEFAULT) && TARGET_PREFIXED_ADDR_DEFAULT > +#define TARGET_PCREL_DEFAULT 1 > +#endif Spelling ("overridden"). How can it be overridden at build time? How can it be defined already, when linux64.h is included? Don't put in guards against things that cannot happen. > + if (TARGET_FUTURE) > +{ > + bool explicit_prefixed = ((rs6000_isa_flags_explicit > + & OPTION_MASK_PREFIXED_ADDR) != 0); > + bool explicit_pcrel = ((rs6000_isa_flags_explicit > + & OPTION_MASK_PCREL) != 0); > + > + /* Prefixed addressing requires 64-bit registers. */ Does it? Don't disable things just because you do not want to think about if and how to support them. Be much more exact in the comment here if you do have a reason to disable it here. > + if (!TARGET_POWERPC64) > + { > + if (TARGET_PCREL && explicit_pcrel) > + error ("%qs requires %qs", "-mpcrel", "-m64"); TARGET_POWERPC64 is -mpowerpc64. -m64 is TARGET_64BIT. > + /* Enable defaults if desired. */ > + else > + { > + if (!explicit_prefixed > + && (TARGET_PREFIXED_ADDR_DEFAULT > + || TARGET_PCREL > + || TARGET_PCREL_DEFAULT)) > + rs6000_isa_flags |= OPTION_MASK_PREFIXED_ADDR; > + > + if (!explicit_pcrel && TARGET_PCREL_DEFAULT > + && TARGET_CMODEL == CMODEL_MEDIUM) > + rs6000_isa_flags |= OPTION_MASK_PCREL; > + } Should these be the other way around? Segher
Re: [PATCH] Optimize to_chars
On 8/30/19 8:27 AM, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this saves 129 bytes of data and totally avoids a chance of cache misses on __digits. * For base == 16 replacing the lookup in __digits table with arithmetic computations leads to a few additional instructions, but totally avoids a chance of cache misses on __digits (- ~9 cache misses for worst case) and saves 513 bytes of const data. * Replacing __first[pos] and __first[pos - 1] with __first[1] and __first[0] on final iterations saves ~2% of code size. * Removing trailing '\0' from arrays of digits allows the linker to merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and "0123456789abcdef" could share the same address). This improves data locality and reduces binary sizes. * Using __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len makes the operation O(1) instead of O(N). It also makes the code two times shorter ( https://godbolt.org/z/Peq_PG) . In sum: this significantly reduces the size of a binary (for about 4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals with latency (CPU cache misses) without changing the iterations count and without adding costly instructions into the loops. Would it make sense to move some of this code into GCC as a built-in so that it could also be used by GCC to expand some strtol and sprintf calls? Martin Changelog: * include/std/charconv (__detail::__to_chars_8, __detail::__to_chars_16): Replace array of precomputed digits with arithmetic operations to avoid CPU cache misses. Remove zero termination from array of digits to allow symbol merge with generic implementation of __detail::__to_chars. Replace final offsets with constants. Use __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len. * include/std/charconv (__detail::__to_chars): Remove zero termination from array of digits. * include/std/charconv (__detail::__to_chars_2): Leading digit is always '1'.
Re: [PATCH 2/2] rs6000: Use unspec_volatile for darn (PR91481)
On Thu, Aug 22, 2019 at 07:20:48PM +, Segher Boessenkool wrote: > Every call to darn should deliver a *new* random number; such calls > shouldnot be CSEd together. So they should be unspec_volatile, not > plain unspec. > > Tested on powerpc64-linux {-m32,-m64}. (New testcase coming soon). > Committing to trunk, and will backport to 9, 8, 7 as well. Done now, all three patches (including the testcase). Segher
Re: [PATCH, V3, #4 of 10], Add general prefixed/pcrel support
Hi! (Please split off paddi to a separate patch?) On Mon, Aug 26, 2019 at 04:43:37PM -0400, Michael Meissner wrote: > (prefixed_paddi_p): Fix thinkos in last patch. Do that separately please. Don't hide this in another patch like this. Hrm, this is not in this patch at all? Fix the changelog, then :-) > --- gcc/config/rs6000/predicates.md (revision 274870) > +++ gcc/config/rs6000/predicates.md (working copy) > @@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre > (define_predicate "add_operand" >(if_then_else (match_code "const_int") > (match_test "satisfies_constraint_I (op) > - || satisfies_constraint_L (op)") > + || satisfies_constraint_L (op) > + || satisfies_constraint_eI (op)") > (match_operand 0 "gpc_reg_operand"))) > > ;; Return 1 if the operand is either a non-special register, or 0, or -1. > @@ -852,7 +853,8 @@ (define_predicate "adde_operand" > (define_predicate "non_add_cint_operand" >(and (match_code "const_int") > (match_test "!satisfies_constraint_I (op) > - && !satisfies_constraint_L (op)"))) > + && !satisfies_constraint_L (op) > + && !satisfies_constraint_eI (op)"))) (define_predicate "non_add_cint_operand" (and (match_code "const_int") (not (match_operand 0 "add_operand" ? You can do that *now*, and it is pre-approved. (This could use a better name btw., I always have to look up what it means; a longer name is fine as well of course, it is used only once or so). > @@ -933,6 +935,13 @@ (define_predicate "lwa_operand" > return false; > >addr = XEXP (inner, 0); > + > + /* The LWA instruction uses the DS-form format where the bottom two bits of > + the offset must be 0. The prefixed PLWA does not have this > + restriction. */ > + if (prefixed_local_addr_p (addr, mode, TRAD_INSN_DS)) > +return true; Why does the decision whether something is a valid prefixed lwa_operand need to know the non-prefixed lwa is a DS-form instruction? And "local" is a head-scratcher for this condition, too. > +;; Return 1 if op is a memory operand that is not prefixed. > +(define_predicate "non_prefixed_mem_operand" > + (match_code "mem") > +{ > + if (!memory_operand (op, mode)) > +return false; > + > + return !prefixed_local_addr_p (XEXP (op, 0), GET_MODE (op), > + TRAD_INSN_DEFAULT); > +}) Use match_operand for the first condition please (and then match_test for the second?) This does make it seem like we need a prefixed_local_mem_p as well? So that we need neither that XEXP nor that GET_MODE. > @@ -5735,6 +5735,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va > && (value >> 31 == -1 || value >> 31 == 0)) > return 1; > > + /* PADDI can support up to 34 bit signed integers. */ > + else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value)) > +return 1; Write this earlier, together with the 16BIT one? > @@ -6905,6 +6909,7 @@ rs6000_adjust_vec_address (rtx scalar_re >rtx element_offset; >rtx new_addr; >bool valid_addr_p; > + bool pcrel_p = TARGET_PCREL && pcrel_local_address (addr, Pmode); This is used 159 lines later. Please refactor things. That would make a separate patch *before* this one. > + /* Optimize pc-relative addresses. */ > + else if (pcrel_p) > +{ > + if (CONST_INT_P (element_offset)) > + { > + rtx addr2 = addr; This var needs a better name and/or comments. Or maybe just factoring. > @@ -7007,9 +7050,8 @@ rs6000_adjust_vec_address (rtx scalar_re > >/* If we have a PLUS, we need to see whether the particular register class > allows for D-FORM or X-FORM addressing. */ > - if (GET_CODE (new_addr) == PLUS) > + if (GET_CODE (new_addr) == PLUS || pcrel_p) That second condition needs a comment. > @@ -7609,7 +7675,7 @@ mem_operand_ds_form (rtx op, machine_mod > causes a wrap, so test only the low 16 bits. */ > offset = ((offset & 0x) ^ 0x8000) - 0x8000; > > - return offset + 0x8000 < 0x1u - extra; > + return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra); Please do all these things first too, as a separate patch. > - offset += 0x8000; > - return offset < 0x1 - extra; > + if (TARGET_PREFIXED_ADDR) > +return SIGNED_34BIT_OFFSET_EXTRA_P (offset, extra); > + else > +return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra); So this you could just do the 16BIT first, and then *this* patch will add the 34BIT thing, in an easy-to-read patch. > +{ > + /* There is no prefixed version of the load/store with update. */ > + return !prefixed_local_addr_p (XEXP (x, 1), mode, TRAD_INSN_DEFAULT); > +} If you pass the actual MEM, the prefixed_local_mem_p function can return false, itself. > - unsigned HOST_WIDE_INT val = INTVAL (XEXP (addr, 1)); > - return val + 0x8000 >= 0x1 - (TARGET_POWERPC64 ? 8 : 12); > + HOST_WIDE_INT val = INTVAL (
Re: [PATCH] Fix thinko in early bail out in tree-switch-conversion.
On 8/30/19 2:41 AM, Martin Liška wrote: > Hi. > > Thanks to Jakub, the patch addresses one memory leak in > bit_test_cluster::find_bit_tests (allocation of output). > And moreover, it implements proper guard when clustering > is not successful. In such situation we want a quick bail out. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? > Thanks, > Martin > > gcc/ChangeLog: > > 2019-08-29 Martin Liska > > * tree-switch-conversion.c (jump_table_cluster::find_jump_tables): > Bail out when we'll end up with the same number of clusters as > at the beginning. > (bit_test_cluster::find_bit_tests): Likewise for bit tests. > (jump_table_cluster::can_be_handled): Remove the guard > as it's already handled in ::is_enabled. Allocate output > after early bail out. > --- > gcc/tree-switch-conversion.c | 18 -- > 1 file changed, 8 insertions(+), 10 deletions(-) > > OK jeff
Re: [PATCH] PR libstdc++/89164 enforce constraints for uninitialized algos
On 30/08/19 14:54 +0100, Jonathan Wakely wrote: The memmove optimizations for std::uninitialized_copy/fill/_n will compile even if the type is not copy constructible, because std::copy doesn't require copy construction to work. But the uninitialized algorithms do require it. This adds explicit static assertions to ensure we don't allow ill-formed initializations. PR libstdc++/89164 * include/bits/stl_algobase.h (__copy_move): Give descriptive names to template parameters. * include/bits/stl_uninitialized.h (uninitialized_copy) (uninitialized_fill, uninitialized_fill_n): Add static assertions to diagnose invalid uses. * testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc: Adjust expected error. * testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_copy_n/ 89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill_n/ 89164.cc: New test. * testsuite/23_containers/vector/cons/89164.cc: New test. * testsuite/23_containers/vector/cons/89164_c++17.cc: New test. This is the patch sent in https://gcc.gnu.org/ml/libstdc++/2019-02/msg00034.html but with the dg-error directives in the tests fixed. Tested x86_64-linux, committed to trunk. Oops, looks like what I tested didn't have the new test checked in. This fixes it. commit 01e0d7b3957c4bd8cdac86e0e4f16135dc61d51c Author: Jonathan Wakely Date: Fri Aug 30 17:18:17 2019 +0100 Fix errors in new test * testsuite/23_containers/vector/cons/89164_c++17.cc: Fix errors. diff --git a/libstdc++-v3/testsuite/23_containers/vector/cons/89164_c++17.cc b/libstdc++-v3/testsuite/23_containers/vector/cons/89164_c++17.cc index 8b3afba4867..db7d8d5c850 100644 --- a/libstdc++-v3/testsuite/23_containers/vector/cons/89164_c++17.cc +++ b/libstdc++-v3/testsuite/23_containers/vector/cons/89164_c++17.cc @@ -22,6 +22,12 @@ // PR libstdc++/89164 +struct X +{ + X() = default; + X(const X&) = delete; +}; + void test01() { X x[1];
Re: [PATCH] Optimize to_chars
On 30/08/19 17:01 +0100, Jonathan Wakely wrote: On 30/08/19 17:27 +0300, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this saves 129 bytes of data and totally avoids a chance of cache misses on __digits. * For base == 16 replacing the lookup in __digits table with arithmetic computations leads to a few additional instructions, but totally avoids a chance of cache misses on __digits (- ~9 cache misses for worst case) and saves 513 bytes of const data. * Replacing __first[pos] and __first[pos - 1] with __first[1] and __first[0] on final iterations saves ~2% of code size. * Removing trailing '\0' from arrays of digits allows the linker to merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and "0123456789abcdef" could share the same address). This improves data locality and reduces binary sizes. * Using __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len makes the operation O(1) instead of O(N). It also makes the code two times shorter ( https://godbolt.org/z/Peq_PG) . In sum: this significantly reduces the size of a binary (for about 4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals with latency (CPU cache misses) without changing the iterations count and without adding costly instructions into the loops. This is great, thanks. Have you tried comparing the improved code to libc++'s implementation? I believe they use precomputed arrays of digits, but they use larger arrays that allow 4 bytes to be written at once, which is considerably faster (and those precomputed arrays libe in libc++.so not in the header). Would we be better off keeping the precomputed arrays and expanding them to do 4-byte writes? Since we don't have a patch to do that, I think I'll commit yours. We can always go back to precomputed arrays later if somebody does that work. My only comments are on the changelog: Changelog: * include/std/charconv (__detail::__to_chars_8, __detail::__to_chars_16): Replace array of precomputed digits When the list of changed functions is split across lines it should be like this: * include/std/charconv (__detail::__to_chars_8) (__detail::__to_chars_16): Replace array of precomputed digits i.e close the parentheses before the line break, and reopen on the next line. with arithmetic operations to avoid CPU cache misses. Remove zero termination from array of digits to allow symbol merge with generic implementation of __detail::__to_chars. Replace final offsets with constants. Use __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len. * include/std/charconv (__detail::__to_chars): Remove Don't repeat the asterisk and filename for later changes in the same file, i.e. (__detail::__to_chars): Remove zero termination from array of digits. (__detail::__to_chars_2): Leading digit is always '1'. There's no changelog entry for the changes to __to_chars_len_8 and __to_chars_len_16. Oh, there's no separate __to_chars_len_16 function anyway, and you did mention it as "Use __detail::__to_chars_len_2 instead ..." - sorry! I think we might as well inline __to_chars_len_8 into __to_chars_8, there's not much benefit to having a separate function. I'll do that as a follow up patch.
Re: [PATCH] Optimize to_chars
On 30/08/19 17:27 +0300, Antony Polukhin wrote: Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this saves 129 bytes of data and totally avoids a chance of cache misses on __digits. * For base == 16 replacing the lookup in __digits table with arithmetic computations leads to a few additional instructions, but totally avoids a chance of cache misses on __digits (- ~9 cache misses for worst case) and saves 513 bytes of const data. * Replacing __first[pos] and __first[pos - 1] with __first[1] and __first[0] on final iterations saves ~2% of code size. * Removing trailing '\0' from arrays of digits allows the linker to merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and "0123456789abcdef" could share the same address). This improves data locality and reduces binary sizes. * Using __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len makes the operation O(1) instead of O(N). It also makes the code two times shorter ( https://godbolt.org/z/Peq_PG) . In sum: this significantly reduces the size of a binary (for about 4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals with latency (CPU cache misses) without changing the iterations count and without adding costly instructions into the loops. This is great, thanks. Have you tried comparing the improved code to libc++'s implementation? I believe they use precomputed arrays of digits, but they use larger arrays that allow 4 bytes to be written at once, which is considerably faster (and those precomputed arrays libe in libc++.so not in the header). Would we be better off keeping the precomputed arrays and expanding them to do 4-byte writes? Since we don't have a patch to do that, I think I'll commit yours. We can always go back to precomputed arrays later if somebody does that work. My only comments are on the changelog: Changelog: * include/std/charconv (__detail::__to_chars_8, __detail::__to_chars_16): Replace array of precomputed digits When the list of changed functions is split across lines it should be like this: * include/std/charconv (__detail::__to_chars_8) (__detail::__to_chars_16): Replace array of precomputed digits i.e close the parentheses before the line break, and reopen on the next line. with arithmetic operations to avoid CPU cache misses. Remove zero termination from array of digits to allow symbol merge with generic implementation of __detail::__to_chars. Replace final offsets with constants. Use __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len. * include/std/charconv (__detail::__to_chars): Remove Don't repeat the asterisk and filename for later changes in the same file, i.e. (__detail::__to_chars): Remove zero termination from array of digits. (__detail::__to_chars_2): Leading digit is always '1'. There's no changelog entry for the changes to __to_chars_len_8 and __to_chars_len_16. Also, please don't include the ChangeLog diff in the patch, because it just makes it hard to apply the patch (the ChangeLog part will almost always fail to apply because somebody else will have modified the ChangeLog file since you created the patch, and so that hunk won't apply. The ChangeLog text should be sent as a separate (plain text) attachment, or just in the email body (as you did above). I'll test this and commit it, thanks!
Re: [committed] Suppress warnings for mips/r10k-cache-barrier-9.c
Jeff Law writes: > The improvements in our ability to issue diagnostics for out of bounds > accesses to trailing arrays is causing r10k-cache-barrier-9.c to fail on > MIPS targets. > > This is a bit of an odd test. It purposefully does out of bounds > accesses. Some accesses cross sub-object boundaries, others leave the > object entirely. Apparently the MIPS backend is supposed to detect this > stuff and issue some kind of cache barrier instruction. > > Additionally mips.exp overrides dg-options and makes it harder to pass > additional options, like -Wno-. You can use "-w", but not > "-Wno-<...>". We can't really use dg-warning markers because we don't > get warnings at all optimization levels. > > One might argue the test is bogus, but I'm going to assume it serves > some useful purpose. I'm adding "-w" to suppress warnings. Huh, yeah. Certainly can't remember now why this seemed worth testing, but I agree this patch is the best fix. Richard > Installing on the trunk, > > JEff > > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 0c5ec5325f5..5044002c09c 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,7 @@ > +2019-08-30 Jeff Law > + > + * gcc.target/mips/r10k-cache-barrier-9.c: Suppress warnings. > + > 2019-08-30 Martin Jambor > > tree-optimization/91579 > diff --git a/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-9.c > b/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-9.c > index 2f83968aad6..2516b663ca1 100644 > --- a/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-9.c > +++ b/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-9.c > @@ -1,4 +1,4 @@ > -/* { dg-options "-mr10k-cache-barrier=store -G8" } */ > +/* { dg-options "-mr10k-cache-barrier=store -G8 -w" } */ > > /* Test that out-of-range stores to components of static objects > are protected by a cache barrier. */
[Ada] Saturate the sizes for the target in -gnatR output
This changes the saturation from the host to the target for the large sizes displayed in the the -gnatR output. Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/decl.c (maybe_saturate_size): New function. (gnat_to_gnu_entity): Invoke it on the Esize of types before sending it for back-annotations. * gcc-interface/trans.c: Fix typo. -- Eric BotcazouIndex: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 275198) +++ gcc-interface/decl.c (working copy) @@ -232,6 +232,7 @@ static tree build_position_list (tree, bool, tree, static vec build_subst_list (Entity_Id, Entity_Id, bool); static vec build_variant_list (tree, vec, vec); +static tree maybe_saturate_size (tree); static tree validate_size (Uint, tree, Entity_Id, enum tree_code, bool, bool); static void set_rm_size (Uint, tree, Entity_Id); static unsigned int validate_alignment (Uint, Entity_Id, unsigned int); @@ -4327,9 +4328,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gn { tree gnu_size = TYPE_SIZE (gnu_type); - /* If the size is self-referential, annotate the maximum value. */ + /* If the size is self-referential, annotate the maximum value + after saturating it, if need be, to avoid a No_Uint value. */ if (CONTAINS_PLACEHOLDER_P (gnu_size)) - gnu_size = max_size (gnu_size, true); + gnu_size = maybe_saturate_size (max_size (gnu_size, true)); /* If we are just annotating types and the type is tagged, the tag and the parent components are not generated by the front-end so @@ -4365,7 +4367,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gn gnu_size = size_binop (PLUS_EXPR, gnu_size, offset); } - gnu_size = round_up (gnu_size, align); + gnu_size = maybe_saturate_size (round_up (gnu_size, align)); Set_Esize (gnat_entity, annotate_value (gnu_size)); /* Tagged types are Strict_Alignment so RM_Size = Esize. */ @@ -8723,6 +8725,19 @@ build_variant_list (tree qual_union_type, vec::create_ggc (512); } -/* Destroy data structures of the decl.c module. */ +/* Destroy the data structures of the decl.c module. */ void destroy_gnat_decl (void) Index: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 275198) +++ gcc-interface/trans.c (working copy) @@ -8790,7 +8790,7 @@ gnat_to_gnu (Node_Id gnat_node) 5. If this is a reference to an unconstrained array which is used as the prefix of an attribute reference that requires an lvalue, return the - result unmodified because we want return the original bounds. + result unmodified because we want to return the original bounds. 6. Finally, if the type of the result is already correct. */
Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
> Am 30.08.2019 um 16:40 schrieb Ilya Leoshkevich : > >> Am 30.08.2019 um 09:12 schrieb Richard Biener : >> >> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich wrote: >>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich : Bootstrap and regtest running on x86_64-redhat-linux and s390x-redhat-linux. This patch series adds signaling FP comparison support (both scalar and vector) to s390 backend. >>> >>> I'm running into a problem on ppc64 with this patch, and it would be >>> great if someone could help me figure out the best way to resolve it. >>> >>> vector36.C test is failing because gimplifier produces the following >>> >>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }; >>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; >>> >>> from >>> >>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) , >>> { -1, -1, -1, -1 } , >>> { 0, 0, 0, 0 } > >>> >>> Since the comparison tree code is now hidden behind a temporary, my code >>> does not have anything to pass to the backend. The reason for creating >>> a temporary is that the comparison can trap, and so the following check >>> in gimplify_expr fails: >>> >>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p)) >>> goto out; >>> >>> gimple_test_f is is_gimple_condexpr, and it eventually calls >>> operation_could_trap_p (GT). >>> >>> My current solution is to simply state that backend does not support >>> SSA_NAME in vector comparisons, however, I don't like it, since it may >>> cause performance regressions due to having to fall back to scalar >>> comparisons. >>> >>> I was thinking about two other possible solutions: >>> >>> 1. Change the gimplifier to allow trapping vector comparisons. That's >>> a bit complicated, because tree_could_throw_p checks not only for >>> floating point traps, but also e.g. for array index out of bounds >>> traps. So I would have to create a tree_could_throw_p version which >>> disregards specific kinds of traps. >>> >>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use >>> its tree_code instead of SSA_NAME. The potential problem I see with >>> this is that there appears to be no guarantee that _5 will be inlined >>> into _6 at a later point. So if we say that we don't need to fall >>> back to scalar comparisons based on availability of vector > >>> instruction and inlining does not happen, then what's actually will >>> be required is vector selection (vsel on S/390), which might not be >>> available in general case. >>> >>> What would be a better way to proceed here? >> >> On GIMPLE there isn't a good reason to split out trapping comparisons >> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs >> where it is important because we'd have no way to represent EH info >> when not done. It might be a bit awkward to preserve EH across RTL >> expansion though in case the [VEC_]COND_EXPR are not expanded >> as a single pattern, but I'm not sure. > > Ok, so I'm testing the following now - for the problematic test that > helped: > > diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c > index b0c9f9b671a..940aa394769 100644 > --- a/gcc/gimple-expr.c > +++ b/gcc/gimple-expr.c > @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t) > || TREE_CODE (t) == BIT_FIELD_REF); > } > > -/* Return true if T is a GIMPLE condition. */ > +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr. > */ > > -bool > -is_gimple_condexpr (tree t) > +static bool > +is_gimple_condexpr_1 (tree t, bool allow_traps) > { > return (is_gimple_val (t) || (COMPARISON_CLASS_P (t) > - && !tree_could_throw_p (t) > + && (allow_traps || !tree_could_throw_p (t)) > && is_gimple_val (TREE_OPERAND (t, 0)) > && is_gimple_val (TREE_OPERAND (t, 1; > } > > +/* Return true if T is a GIMPLE condition. */ > + > +bool > +is_gimple_condexpr (tree t) > +{ > + return is_gimple_condexpr_1 (t, false); > +} > + > +/* Like is_gimple_condexpr, but allow the T to trap. */ > + > +bool > +is_possibly_trapping_gimple_condexpr (tree t) > +{ > + return is_gimple_condexpr_1 (t, true); > +} > + > /* Return true if T is a gimple address. */ > > bool > diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h > index 1ad1432bd17..20546ca5b99 100644 > --- a/gcc/gimple-expr.h > +++ b/gcc/gimple-expr.h > @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum > tree_code *, tree *, > tree *); > extern bool is_gimple_lvalue (tree); > extern bool is_gimple_condexpr (tree); > +extern bool is_possibly_trapping_gimple_condexpr (tree); > extern bool is_gimple_address (const_tree); > extern bool is_gimple_invariant_address (const_tree); > extern bool is_gimple_ip_invariant_address (const_tree); > diff --git a/gcc/gimplify.c b/gcc/gimplify.c > in
[Ada] Add warning for explicit by-reference mechanism
This instructs gigi to issue a warning when an explicit by-reference mechanism specified by means of pragma Export_Function cannot be honored. Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/ada-tree.h (DECL_FORCED_BY_REF_P): New macro. * gcc-interface/decl.c (gnat_to_gnu_param): Set it on parameters whose mechanism was forced to by-reference. * gcc-interface/trans.c (Call_to_gnu): Do not issue a warning about a misaligned actual parameter if it is based on a CONSTRUCTOR. Remove obsolete warning for users of Starlet. Issue a warning if a temporary is make around the call for a parameter with DECL_FORCED_BY_REF_P set. (addressable_p): Return true for REAL_CST and ADDR_EXPR. -- Eric BotcazouIndex: gcc-interface/ada-tree.h === --- gcc-interface/ada-tree.h (revision 275062) +++ gcc-interface/ada-tree.h (working copy) @@ -482,6 +482,9 @@ do { \ value of a function call or 'reference to a function call. */ #define DECL_RETURN_VALUE_P(NODE) DECL_LANG_FLAG_5 (VAR_DECL_CHECK (NODE)) +/* Nonzero in a PARM_DECL if its mechanism was forced to by-reference. */ +#define DECL_FORCED_BY_REF_P(NODE) DECL_LANG_FLAG_5 (PARM_DECL_CHECK (NODE)) + /* In a FIELD_DECL corresponding to a discriminant, contains the discriminant number. */ #define DECL_DISCRIMINANT_NUMBER(NODE) DECL_INITIAL (FIELD_DECL_CHECK (NODE)) Index: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 275196) +++ gcc-interface/decl.c (working copy) @@ -5208,6 +5208,7 @@ gnat_to_gnu_param (Entity_Id gnat_param, tree gnu_ bool ro_param = in_param && !Address_Taken (gnat_param); bool by_return = false, by_component_ptr = false; bool by_ref = false; + bool forced_by_ref = false; bool restricted_aliasing_p = false; location_t saved_location = input_location; tree gnu_param; @@ -5235,7 +5236,11 @@ gnat_to_gnu_param (Entity_Id gnat_param, tree gnu_ /* Or else, see if a Mechanism was supplied that forced this parameter to be passed one way or another. */ else if (mech == Default || mech == By_Copy || mech == By_Reference) -; +forced_by_ref + = (mech == By_Reference + && !foreign + && !TYPE_IS_BY_REFERENCE_P (gnu_param_type) + && !Is_Aliased (gnat_param)); /* Positive mechanism means by copy for sufficiently small parameters. */ else if (mech > 0) @@ -5368,6 +5373,7 @@ gnat_to_gnu_param (Entity_Id gnat_param, tree gnu_ gnu_param = create_param_decl (gnu_param_name, gnu_param_type); TREE_READONLY (gnu_param) = ro_param || by_ref || by_component_ptr; DECL_BY_REF_P (gnu_param) = by_ref; + DECL_FORCED_BY_REF_P (gnu_param) = forced_by_ref; DECL_BY_COMPONENT_PTR_P (gnu_param) = by_component_ptr; DECL_POINTS_TO_READONLY_P (gnu_param) = (ro_param && (by_ref || by_component_ptr)); Index: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 275197) +++ gcc-interface/trans.c (working copy) @@ -5257,30 +5257,20 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_t /* Do not issue warnings for CONSTRUCTORs since this is not a copy but sort of an instantiation for them. */ - if (TREE_CODE (gnu_name) == CONSTRUCTOR) + if (TREE_CODE (remove_conversions (gnu_name, true)) == CONSTRUCTOR) ; - /* If the type is passed by reference, a copy is not allowed. */ - else if (TYPE_IS_BY_REFERENCE_P (gnu_formal_type)) + /* If the formal is passed by reference, a copy is not allowed. */ + else if (TYPE_IS_BY_REFERENCE_P (gnu_formal_type) + || Is_Aliased (gnat_formal)) post_error ("misaligned actual cannot be passed by reference", gnat_actual); - /* For users of Starlet we issue a warning because the interface - apparently assumes that by-ref parameters outlive the procedure - invocation. The code still will not work as intended, but we - cannot do much better since low-level parts of the back-end - would allocate temporaries at will because of the misalignment - if we did not do so here. */ - else if (Is_Valued_Procedure (Entity (Name (gnat_node - { - post_error - ("?possible violation of implicit assumption", gnat_actual); - post_error_ne - ("?made by pragma Import_Valued_Procedure on &", gnat_actual, - Entity (Name (gnat_node))); - post_error_ne ("?because of misalignment of &", gnat_actual, - gnat_formal); - } + /* If the mechanism was forced to by-ref, a copy is not allowed but + we issue only a warning because this case is not strict Ada. */ + else if (DECL_FORCED_BY_REF_P (gnu_formal)) + post_error ("misaligned actual cannot be passed by reference??", + gnat_actual); /* If the actual type of the object is already the
Re: [PATCH] or1k: Fix issue with set_got clobbering r9
LGTM. On 8/30/19 2:31 AM, Stafford Horne wrote: > Hello, any comments on this? > > If nothing I will commit in a few days. > > On Thu, Aug 22, 2019 at 08:44:04PM +0900, Stafford Horne wrote: >> When compiling glibc we found that the GOT register was being allocated >> r9 when the instruction was still set_got_tmp. That caused set_got to >> clobber r9. We cannot simply say set_got_tmp clobbers r9 as this is the >> reason for having the temporary set_got_tmp. >> >> Fix by using a register class constraint that does not allow r9 during >> register allocation. >> >> gcc/ChangeLog: >> >> * config/or1k/constraints.md (t): New constraint. >> * config/or1k/or1k.h (GOT_REGS): New register class. >> * config/or1k/or1k.md (set_got_tmp, set_got): Use t contraint. >> --- >> gcc/config/or1k/constraints.md | 4 >> gcc/config/or1k/or1k.h | 3 +++ >> gcc/config/or1k/or1k.md| 4 ++-- >> 3 files changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/config/or1k/constraints.md b/gcc/config/or1k/constraints.md >> index 8cac7eb5329..ba330c6b8c2 100644 >> --- a/gcc/config/or1k/constraints.md >> +++ b/gcc/config/or1k/constraints.md >> @@ -25,6 +25,7 @@ >> ; We use: >> ; c - sibcall registers >> ; d - double pair base registers (excludes r0, r30 and r31 which overflow) >> +; t - got address registers (excludes r9 is clobbered by set_got) > > I will changee this to (... r9 which is clobbered ...) > >> ; I - constant signed 16-bit >> ; K - constant unsigned 16-bit >> ; M - constant signed 16-bit shifted left 16-bits (l.movhi) >> @@ -36,6 +37,9 @@ >> (define_register_constraint "d" "DOUBLE_REGS" >>"Registers which can be used for double reg pairs.") >> >> +(define_register_constraint "t" "GOT_REGS" >> + "Registers which can be used to store the Global Offset Table (GOT) >> address.") >> + >> ;; Immediates >> (define_constraint "I" >>"A signed 16-bit immediate in the range -32768 to 32767." >> diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h >> index 2b29e62fdd3..4c32607bac1 100644 >> --- a/gcc/config/or1k/or1k.h >> +++ b/gcc/config/or1k/or1k.h >> @@ -190,6 +190,7 @@ enum reg_class >>NO_REGS, >>SIBCALL_REGS, >>DOUBLE_REGS, >> + GOT_REGS, >>GENERAL_REGS, >>FLAG_REGS, >>ALL_REGS, >> @@ -202,6 +203,7 @@ enum reg_class >>"NO_REGS",\ >>"SIBCALL_REGS", \ >>"DOUBLE_REGS",\ >> + "GOT_REGS", \ >>"GENERAL_REGS", \ >>"FLAG_REGS", \ >>"ALL_REGS" } >> @@ -215,6 +217,7 @@ enum reg_class >> { { 0x, 0x }, \ >>{ SIBCALL_REGS_MASK, 0 }, \ >>{ 0x7f7e, 0x }, \ >> + { 0xfdff, 0x }, \ >>{ 0x, 0x0003 }, \ >>{ 0x, 0x0004 }, \ >>{ 0x, 0x0007 }\ >> diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md >> index cee11d078cc..36bcee336ab 100644 >> --- a/gcc/config/or1k/or1k.md >> +++ b/gcc/config/or1k/or1k.md >> @@ -706,7 +706,7 @@ >> ;; set_got pattern below. This works because the set_got_tmp insn is the >> ;; first insn in the stream and that it isn't moved during RA. >> (define_insn "set_got_tmp" >> - [(set (match_operand:SI 0 "register_operand" "=r") >> + [(set (match_operand:SI 0 "register_operand" "=t") >> (unspec_volatile:SI [(const_int 0)] UNSPECV_SET_GOT))] >>"" >> { >> @@ -715,7 +715,7 @@ >> >> ;; The insn to initialize the GOT. >> (define_insn "set_got" >> - [(set (match_operand:SI 0 "register_operand" "=r") >> + [(set (match_operand:SI 0 "register_operand" "=t") >> (unspec:SI [(const_int 0)] UNSPEC_SET_GOT)) >> (clobber (reg:SI LR_REGNUM))] >>"" >> -- >> 2.21.0 >>
[Ada] Minor tweak for coverage
Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/trans.c (gnat_to_gnu): Do not set the location on an expression used for a tag. -- Eric BotcazouIndex: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 275191) +++ gcc-interface/trans.c (working copy) @@ -8727,10 +8727,16 @@ gnat_to_gnu (Node_Id gnat_node) set_gnu_expr_location_from_node (gnu_result, gnat_node); } - /* Set the location information on the result if it's not a simple name. + /* Set the location information on the result if it's not a simple name + or something that contains a simple name, for example a tag, because + we don"t want all the references to get the location of the first use. Note that we may have no result if we tried to build a CALL_EXPR node to a procedure with no side-effects and optimization is enabled. */ - else if (kind != N_Identifier && gnu_result && EXPR_P (gnu_result)) + else if (kind != N_Identifier + && !(kind == N_Selected_Component + && Chars (Selector_Name (gnat_node)) == Name_uTag) + && gnu_result + && EXPR_P (gnu_result)) set_gnu_expr_location_from_node (gnu_result, gnat_node); /* If we're supposed to return something of void_type, it means we have
[Ada] Allow tighter packing for component with variant part
This lifts an old restriction pertaining to the layout of components of record types whose nominal subtype contains a variant part: they couldn't be packed. Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/gigi.h (aggregate_type_contains_array_p): Declare. * gcc-interface/decl.c (gnat_to_gnu_entity) : For an extension, test Has_Record_Rep_Clause instead of Has_Specified_Layout. (adjust_packed): Return 0 if the type of the field is an aggregate type that contains (or is) a self-referential array. (type_has_variable_size): Delete. * gcc-interface/utils.c (inish_record_type): Constify a variable. (aggregate_type_contains_array_p): Add parameter SELF_REFERENTIAL. : Pass it in the recursive call. : If it is true, return true only if the array type is self-referential. (create_field_decl): Streamline the setting of the alignment on the field. Pass false to aggregate_type_contains_array_p. 2019-08-30 Eric Botcazou * gnat.dg/pack24.adb: New test. -- Eric BotcazouIndex: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 275188) +++ gcc-interface/decl.c (working copy) @@ -202,7 +202,6 @@ static void prepend_one_attribute_pragma (struct a static void prepend_attributes (struct attrib **, Entity_Id); static tree elaborate_expression (Node_Id, Entity_Id, const char *, bool, bool, bool); -static bool type_has_variable_size (tree); static tree elaborate_expression_1 (tree, Entity_Id, const char *, bool, bool); static tree elaborate_expression_2 (tree, Entity_Id, const char *, bool, bool, unsigned int); @@ -2953,10 +2952,13 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gn : 0; const bool has_align = Known_Alignment (gnat_entity); const bool has_discr = Has_Discriminants (gnat_entity); - const bool has_rep = Has_Specified_Layout (gnat_entity); const bool is_extension = (Is_Tagged_Type (gnat_entity) && Nkind (record_definition) == N_Derived_Type_Definition); + const bool has_rep + = is_extension + ? Has_Record_Rep_Clause (gnat_entity) + : Has_Specified_Layout (gnat_entity); const bool is_unchecked_union = Is_Unchecked_Union (gnat_entity); bool all_rep = has_rep; @@ -6865,11 +6867,13 @@ choices_to_gnu (tree gnu_operand, Node_Id gnat_cho static int adjust_packed (tree field_type, tree record_type, int packed) { - /* If the field contains an item of variable size, we cannot pack it - because we cannot create temporaries of non-fixed size in case - we need to take the address of the field. See addressable_p and - the notes on the addressability issues for further details. */ - if (type_has_variable_size (field_type)) + /* If the field contains an array with self-referential size, we'd better + not pack it because this would misalign it and, therefore, cause large + temporaries to be created in case we need to take the address of the + field. See addressable_p and the notes on the addressability issues + for further details. */ + if (AGGREGATE_TYPE_P (field_type) + && aggregate_type_contains_array_p (field_type, true)) return 0; /* In the other cases, we can honor the packing. */ @@ -7274,31 +7278,6 @@ components_need_strict_alignment (Node_Id componen return false; } -/* Return true if TYPE is a type with variable size or a padding type with a - field of variable size or a record that has a field with such a type. */ - -static bool -type_has_variable_size (tree type) -{ - tree field; - - if (!TREE_CONSTANT (TYPE_SIZE (type))) -return true; - - if (TYPE_IS_PADDING_P (type) - && !TREE_CONSTANT (DECL_SIZE (TYPE_FIELDS (type -return true; - - if (!RECORD_OR_UNION_TYPE_P (type)) -return false; - - for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) -if (type_has_variable_size (TREE_TYPE (field))) - return true; - - return false; -} - /* Return true if FIELD is an artificial field. */ static bool Index: gcc-interface/gigi.h === --- gcc-interface/gigi.h (revision 275174) +++ gcc-interface/gigi.h (working copy) @@ -835,6 +835,11 @@ extern tree get_base_type (tree type); in bits. If we don't know anything about the alignment, return 0. */ extern unsigned int known_alignment (tree exp); +/* Return true if TYPE, an aggregate type, contains (or is) an array. + If SELF_REFERENTIAL is true, then an additional requirement on the + array is that it be self-referential. */ +extern bool aggregate_type_contains_array_p (tree type, bool self_referential); + /* Return true if VALUE is a multiple of FACTOR. FACTOR must be a power of 2. */ extern bool value_factor_p (tree value, unsigned HOST_WIDE_INT factor); Index: gcc-interface/utils.c
[committed] Suppress warnings for mips/r10k-cache-barrier-9.c
The improvements in our ability to issue diagnostics for out of bounds accesses to trailing arrays is causing r10k-cache-barrier-9.c to fail on MIPS targets. This is a bit of an odd test. It purposefully does out of bounds accesses. Some accesses cross sub-object boundaries, others leave the object entirely. Apparently the MIPS backend is supposed to detect this stuff and issue some kind of cache barrier instruction. Additionally mips.exp overrides dg-options and makes it harder to pass additional options, like -Wno-. You can use "-w", but not "-Wno-<...>". We can't really use dg-warning markers because we don't get warnings at all optimization levels. One might argue the test is bogus, but I'm going to assume it serves some useful purpose. I'm adding "-w" to suppress warnings. Installing on the trunk, JEff diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 0c5ec5325f5..5044002c09c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2019-08-30 Jeff Law + + * gcc.target/mips/r10k-cache-barrier-9.c: Suppress warnings. + 2019-08-30 Martin Jambor tree-optimization/91579 diff --git a/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-9.c b/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-9.c index 2f83968aad6..2516b663ca1 100644 --- a/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-9.c +++ b/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-9.c @@ -1,4 +1,4 @@ -/* { dg-options "-mr10k-cache-barrier=store -G8" } */ +/* { dg-options "-mr10k-cache-barrier=store -G8 -w" } */ /* Test that out-of-range stores to components of static objects are protected by a cache barrier. */
Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
On Thu, 29 Aug 2019 at 23:26, Bernd Edlinger wrote: > > On 8/29/19 11:08 AM, Christophe Lyon wrote: > > On Thu, 29 Aug 2019 at 10:58, Kyrill Tkachov > > wrote: > >> > >> Hi Bernd, > >> > >> On 8/28/19 10:36 PM, Bernd Edlinger wrote: > >>> On 8/28/19 2:07 PM, Christophe Lyon wrote: > Hi, > > This patch causes an ICE when building libgcc's unwind-arm.o > when configuring GCC: > --target arm-none-linux-gnueabihf --with-mode thumb --with-cpu > cortex-a15 --with-fpu neon-vfpv4: > > The build works for the same target, but --with-mode arm --with-cpu > cortex a9 --with-fpu vfp > > In file included from > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144: > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc: > In function 'get_eit_entry': > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29: > warning: cast discards 'const' qualifier from pointer target type > [-Wcast-qual] > 245 | ucbp->pr_cache.ehtp = (_Unwind_EHT_Header > *)&eitp->content; > | ^ > during RTL pass: expand > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc: > In function 'unwind_phase2_forced': > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18: > internal compiler error: in gen_movdi, at config/arm/arm.md:5235 > 319 | saved_vrs.core = entry_vrs->core; > | ~~~^ > 0x126530f gen_movdi(rtx_def*, rtx_def*) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235 > 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318 > 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694 > 0x897083 emit_move_insn(rtx_def*, rtx_def*) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790 > 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582 > 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688 > 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440 > 0x89ba1e emit_block_move_via_cpymem > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808 > 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*, > block_op_methods, unsigned int, long, unsigned long, unsigned long, > unsigned long, bool, bool*) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627 > 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667 > 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845 > 0x88c1f9 store_field > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149 > 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool) > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304 > 0x761964 expand_gimple_stmt_1 > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779 > 0x761964 expand_gimple_stmt > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875 > 0x768583 expand_gimple_basic_block > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915 > 0x76abc6 execute > > /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538 > > Christophe > > >>> Okay, sorry for the breakage. > >>> > >>> What is happening in gen_cpymem_ldrd_strd is of course against the rules: > >>> > >>> It uses emit_move_insn on only 4-byte aligned DI-mode memory operands. > >>> > >>> I have a patch for this, which is able to fix the libgcc build on a > >>> cross, but have no > >>> possibility to bootstrap the affected target. > >>> > >>> Could you please help? > >> > >> Well it's good that the sanitisation is catching the bugs! > >> > > Yes, more than expected, though ;) > > >> Bootstrapping this patch I get another assert with the backtrace: > > >
Re: [ARM/FDPIC v5 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts
Christophe Lyon writes: > On Fri, 30 Aug 2019 at 11:00, Richard Sandiford > wrote: >> >> Christophe Lyon writes: >> > @@ -785,7 +785,7 @@ case ${target} in >> >esac >> >tmake_file="t-slibgcc" >> >case $target in >> > -*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | >> > *-*-kopensolaris*-gnu) >> > +*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | >> > *-*-kopensolaris*-gnu | *-*-uclinuxfdpiceabi) >> >:;; >> > *-*-gnu*) >> >native_system_header_dir=/include >> >> I don't think this is necessary, since this target will never match the >> following *-*-gnu*) stanza anyway. > OK (I thought it was clearer to add the fdpic config where we already > have linux that would not match) I think the idea is to match pure GNU systems only in the second stanza (i.e. GNU/Hurd). So we need the first stanza to exclude hybrid-GNU systems like GNU/Linux, GNU/Solaris, GNU/FreeBSD, etc. Since uclinuxfdpiceabi isn't a GNU-based system, I don't think it needs to appear at all. >> > diff --git a/libtool.m4 b/libtool.m4 >> > index 8966762..64e507a 100644 >> > --- a/libtool.m4 >> > +++ b/libtool.m4 >> > @@ -3734,7 +3739,7 @@ m4_if([$1], [CXX], [ >> > ;; >> > esac >> > ;; >> > - linux* | k*bsd*-gnu | kopensolaris*-gnu) >> > + linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*) >> > case $cc_basename in >> > KCC*) >> > # KAI C++ Compiler >> >> Is this needed? It seems to be in the !GCC branch of an if/else. > I must admit I didn't test this case. I thought it was needed because > this target does not match "linux*", in case someone tries to compile > with another compiler... > > >> >> If it is needed, the default: >> >> _LT_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no >> >> seems correct for non-FDPIC uclinux. >> > So, either use uclinuxfdpiceabi above, or do nothing and do not try to > support other compilers? Yeah. I think the latter's better, since in this context we only need libtool.m4 to support building with GCC. The decision might be different for upstream libtool, but do any commercial compilers support Arm FDPIC yet? >> > @@ -4032,7 +4037,7 @@ m4_if([$1], [CXX], [ >> >_LT_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' >> >;; >> > >> > -linux* | k*bsd*-gnu | kopensolaris*-gnu) >> > +linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*) >> >case $cc_basename in >> ># old Intel for x86_64 which still supported -KPIC. >> >ecc*) >> >> Same here. >> >> > @@ -5946,7 +5951,7 @@ if test "$_lt_caught_CXX_error" != yes; then >> > _LT_TAGVAR(inherit_rpath, $1)=yes >> > ;; >> > >> > - linux* | k*bsd*-gnu | kopensolaris*-gnu) >> > + linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi) >> > case $cc_basename in >> >KCC*) >> > # Kuck and Associates, Inc. (KAI) C++ Compiler >> >> Here too the code seems to be dealing specifically with non-GCC compilers. >> >> > @@ -6598,7 +6603,7 @@ interix[[3-9]]*) >> >_LT_TAGVAR(postdeps,$1)= >> >;; >> > >> > -linux*) >> > +linux* | uclinux*) >> >case `$CC -V 2>&1 | sed 5q` in >> >*Sun\ C*) >> > # Sun C++ 5.9 >> >> Here too. (It only seems to do anything for Sun's C compiler.) >> >> The fewer hunks we have to maintain downstream the better :-) >> > Sure. > > I thought safer/cleaner to prepare the cases for non-GCC compilers, I > guess it's better not to add that until proven useful? Yeah, I think so. I guess it depends on your POV. To me, it seems cleaner to add uclinux* and uclinuxfdpiceabi only where we know there's a specific need, since that's also how we decide which of uclinux* and uclinuxfdpiceabi to use. Thanks, Richard
[Ada] Fix minor inaccuracy in lvalue_required_p
It can only result in a missed early optimization. Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/trans.c (lvalue_required_p) : Adjust GNU_TYPE in the recursive call. : Likewise. -- Eric BotcazouIndex: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 275187) +++ gcc-interface/trans.c (working copy) @@ -873,12 +873,14 @@ lvalue_required_p (Node_Id gnat_node, tree gnu_typ if (Prefix (gnat_parent) != gnat_node) return 0; - return lvalue_required_p (gnat_parent, gnu_type, constant, -address_of_constant); + return lvalue_required_p (gnat_parent, +get_unpadded_type (Etype (gnat_parent)), +constant, address_of_constant); case N_Selected_Component: - return lvalue_required_p (gnat_parent, gnu_type, constant, -address_of_constant); + return lvalue_required_p (gnat_parent, +get_unpadded_type (Etype (gnat_parent)), +constant, address_of_constant); case N_Object_Renaming_Declaration: /* We need to preserve addresses through a renaming. */
[Ada] Fix crash on unconstrained array with convention C
This happens when the array has multiple dimensions. Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/utils.c (build_template): Deal with parameters passed by pointer to component of multi-dimensional arrays. -- Eric BotcazouIndex: gcc-interface/utils.c === --- gcc-interface/utils.c (revision 275062) +++ gcc-interface/utils.c (working copy) @@ -3953,15 +3953,12 @@ build_template (tree template_type, tree array_typ && TYPE_HAS_ACTUAL_BOUNDS_P (array_type))) bound_list = TYPE_ACTUAL_BOUNDS (array_type); - /* First make the list for a CONSTRUCTOR for the template. Go down the - field list of the template instead of the type chain because this - array might be an Ada array of arrays and we can't tell where the - nested arrays stop being the underlying object. */ - - for (field = TYPE_FIELDS (template_type); field; - (bound_list - ? (bound_list = TREE_CHAIN (bound_list)) - : (array_type = TREE_TYPE (array_type))), + /* First make the list for a CONSTRUCTOR for the template. Go down + the field list of the template instead of the type chain because + this array might be an Ada array of array and we can't tell where + the nested array stop being the underlying object. */ + for (field = TYPE_FIELDS (template_type); + field; field = DECL_CHAIN (DECL_CHAIN (field))) { tree bounds, min, max; @@ -3968,12 +3965,18 @@ build_template (tree template_type, tree array_typ /* If we have a bound list, get the bounds from there. Likewise for an ARRAY_TYPE. Otherwise, if expr is a PARM_DECL with - DECL_BY_COMPONENT_PTR_P, use the bounds of the field in the template. - This will give us a maximum range. */ + DECL_BY_COMPONENT_PTR_P, use the bounds of the field in the + template, but this will only give us a maximum range. */ if (bound_list) - bounds = TREE_VALUE (bound_list); + { + bounds = TREE_VALUE (bound_list); + bound_list = TREE_CHAIN (bound_list); + } else if (TREE_CODE (array_type) == ARRAY_TYPE) - bounds = TYPE_INDEX_TYPE (TYPE_DOMAIN (array_type)); + { + bounds = TYPE_INDEX_TYPE (TYPE_DOMAIN (array_type)); + array_type = TREE_TYPE (array_type); + } else if (expr && TREE_CODE (expr) == PARM_DECL && DECL_BY_COMPONENT_PTR_P (expr)) bounds = TREE_TYPE (field);
Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
> Am 30.08.2019 um 09:12 schrieb Richard Biener : > > On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich wrote: >> >>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich : >>> >>> Bootstrap and regtest running on x86_64-redhat-linux and >>> s390x-redhat-linux. >>> >>> This patch series adds signaling FP comparison support (both scalar and >>> vector) to s390 backend. >> >> I'm running into a problem on ppc64 with this patch, and it would be >> great if someone could help me figure out the best way to resolve it. >> >> vector36.C test is failing because gimplifier produces the following >> >> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }; >> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; >> >> from >> >> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) , >> { -1, -1, -1, -1 } , >> { 0, 0, 0, 0 } > >> >> Since the comparison tree code is now hidden behind a temporary, my code >> does not have anything to pass to the backend. The reason for creating >> a temporary is that the comparison can trap, and so the following check >> in gimplify_expr fails: >> >> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p)) >>goto out; >> >> gimple_test_f is is_gimple_condexpr, and it eventually calls >> operation_could_trap_p (GT). >> >> My current solution is to simply state that backend does not support >> SSA_NAME in vector comparisons, however, I don't like it, since it may >> cause performance regressions due to having to fall back to scalar >> comparisons. >> >> I was thinking about two other possible solutions: >> >> 1. Change the gimplifier to allow trapping vector comparisons. That's >> a bit complicated, because tree_could_throw_p checks not only for >> floating point traps, but also e.g. for array index out of bounds >> traps. So I would have to create a tree_could_throw_p version which >> disregards specific kinds of traps. >> >> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use >> its tree_code instead of SSA_NAME. The potential problem I see with >> this is that there appears to be no guarantee that _5 will be inlined >> into _6 at a later point. So if we say that we don't need to fall >> back to scalar comparisons based on availability of vector > >> instruction and inlining does not happen, then what's actually will >> be required is vector selection (vsel on S/390), which might not be >> available in general case. >> >> What would be a better way to proceed here? > > On GIMPLE there isn't a good reason to split out trapping comparisons > from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs > where it is important because we'd have no way to represent EH info > when not done. It might be a bit awkward to preserve EH across RTL > expansion though in case the [VEC_]COND_EXPR are not expanded > as a single pattern, but I'm not sure. Ok, so I'm testing the following now - for the problematic test that helped: diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c index b0c9f9b671a..940aa394769 100644 --- a/gcc/gimple-expr.c +++ b/gcc/gimple-expr.c @@ -602,17 +602,33 @@ is_gimple_lvalue (tree t) || TREE_CODE (t) == BIT_FIELD_REF); } -/* Return true if T is a GIMPLE condition. */ +/* Helper for is_gimple_condexpr and is_possibly_trapping_gimple_condexpr. */ -bool -is_gimple_condexpr (tree t) +static bool +is_gimple_condexpr_1 (tree t, bool allow_traps) { return (is_gimple_val (t) || (COMPARISON_CLASS_P (t) - && !tree_could_throw_p (t) + && (allow_traps || !tree_could_throw_p (t)) && is_gimple_val (TREE_OPERAND (t, 0)) && is_gimple_val (TREE_OPERAND (t, 1; } +/* Return true if T is a GIMPLE condition. */ + +bool +is_gimple_condexpr (tree t) +{ + return is_gimple_condexpr_1 (t, false); +} + +/* Like is_gimple_condexpr, but allow the T to trap. */ + +bool +is_possibly_trapping_gimple_condexpr (tree t) +{ + return is_gimple_condexpr_1 (t, true); +} + /* Return true if T is a gimple address. */ bool diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h index 1ad1432bd17..20546ca5b99 100644 --- a/gcc/gimple-expr.h +++ b/gcc/gimple-expr.h @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum tree_code *, tree *, tree *); extern bool is_gimple_lvalue (tree); extern bool is_gimple_condexpr (tree); +extern bool is_possibly_trapping_gimple_condexpr (tree); extern bool is_gimple_address (const_tree); extern bool is_gimple_invariant_address (const_tree); extern bool is_gimple_ip_invariant_address (const_tree); diff --git a/gcc/gimplify.c b/gcc/gimplify.c index daa0b71c191..4e6256390c0 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -12973,6 +12973,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, else if (gimple_test_f == is
[Ada] Fix infinite recursion on dynamic record type with -gnatR4
An oversight when -gnatR4 was invented. Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/decl.c (annotate_value) : Inline the call also if List_Representation_Info is greater than 3. -- Eric BotcazouIndex: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 275174) +++ gcc-interface/decl.c (working copy) @@ -8398,7 +8398,7 @@ annotate_value (tree gnu_size) /* In regular mode, inline back only if symbolic annotation is requested in order to avoid memory explosion on big discriminated record types. But not in ASIS mode, as symbolic annotation is required for DDA. */ - if (List_Representation_Info == 3 || type_annotate_only) + if (List_Representation_Info >= 3 || type_annotate_only) { tree t = maybe_inline_call_in_expr (gnu_size); return t ? annotate_value (t) : No_Uint;
Re: [ARM/FDPIC v5 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts
On Fri, 30 Aug 2019 at 11:00, Richard Sandiford wrote: > > Christophe Lyon writes: > > @@ -785,7 +785,7 @@ case ${target} in > >esac > >tmake_file="t-slibgcc" > >case $target in > > -*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu) > > +*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu > > | *-*-uclinuxfdpiceabi) > >:;; > > *-*-gnu*) > >native_system_header_dir=/include > > I don't think this is necessary, since this target will never match the > following *-*-gnu*) stanza anyway. OK (I thought it was clearer to add the fdpic config where we already have linux that would not match) > > > diff --git a/libtool.m4 b/libtool.m4 > > index 8966762..64e507a 100644 > > --- a/libtool.m4 > > +++ b/libtool.m4 > > @@ -3734,7 +3739,7 @@ m4_if([$1], [CXX], [ > > ;; > > esac > > ;; > > - linux* | k*bsd*-gnu | kopensolaris*-gnu) > > + linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*) > > case $cc_basename in > > KCC*) > > # KAI C++ Compiler > > Is this needed? It seems to be in the !GCC branch of an if/else. I must admit I didn't test this case. I thought it was needed because this target does not match "linux*", in case someone tries to compile with another compiler... > > If it is needed, the default: > > _LT_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no > > seems correct for non-FDPIC uclinux. > So, either use uclinuxfdpiceabi above, or do nothing and do not try to support other compilers? > > @@ -4032,7 +4037,7 @@ m4_if([$1], [CXX], [ > >_LT_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' > >;; > > > > -linux* | k*bsd*-gnu | kopensolaris*-gnu) > > +linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*) > >case $cc_basename in > ># old Intel for x86_64 which still supported -KPIC. > >ecc*) > > Same here. > > > @@ -5946,7 +5951,7 @@ if test "$_lt_caught_CXX_error" != yes; then > > _LT_TAGVAR(inherit_rpath, $1)=yes > > ;; > > > > - linux* | k*bsd*-gnu | kopensolaris*-gnu) > > + linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi) > > case $cc_basename in > >KCC*) > > # Kuck and Associates, Inc. (KAI) C++ Compiler > > Here too the code seems to be dealing specifically with non-GCC compilers. > > > @@ -6598,7 +6603,7 @@ interix[[3-9]]*) > >_LT_TAGVAR(postdeps,$1)= > >;; > > > > -linux*) > > +linux* | uclinux*) > >case `$CC -V 2>&1 | sed 5q` in > >*Sun\ C*) > > # Sun C++ 5.9 > > Here too. (It only seems to do anything for Sun's C compiler.) > > The fewer hunks we have to maintain downstream the better :-) > Sure. I thought safer/cleaner to prepare the cases for non-GCC compilers, I guess it's better not to add that until proven useful? Thanks, Christophe > Richard
Re: [ARM/FDPIC v5 06/21] [ARM] FDPIC: Add support for c++ exceptions
On 30/08/19 10:02 +0100, Kyrill Tkachov wrote: As Richard mentioned in an earlier post the generic libgcc and libstdc++ changes will need approval from the relevant maintainers. CC'ing the libstdc++ list and the libgcc maintainer. The libstdc++ change is OK for trunk. On 5/15/19 1:39 PM, Christophe Lyon wrote: The main difference with existing support is that function addresses are function descriptor addresses instead. This means that all code dealing with function pointers now has to cope with function descriptors instead. For the same reason, Linux kernel helpers can no longer be called by dereferencing their address, so we implement wrappers that directly call the kernel helpers. When restoring a function address, we also have to restore the FDPIC register value (r9). 2019-XX-XX Christophe Lyon Mickaël Guêné gcc/ * ginclude/unwind-arm-common.h (unwinder_cache): Add reserved5 field. (FDPIC_REGNUM): New define. libgcc/ * config/arm/linux-atomic.c (__kernel_cmpxchg): Add FDPIC support. (__kernel_dmb): Likewise. (__fdpic_cmpxchg): New function. (__fdpic_dmb): New function. * config/arm/unwind-arm.h (FDPIC_REGNUM): New define. (gnu_Unwind_Find_got): New function. (_Unwind_decode_typeinfo_ptr): Add FDPIC support. * unwind-arm-common.inc (UCB_PR_GOT): New. (funcdesc_t): New struct. (get_eit_entry): Add FDPIC support. (unwind_phase2): Likewise. (unwind_phase2_forced): Likewise. (__gnu_Unwind_RaiseException): Likewise. (__gnu_Unwind_Resume): Likewise. (__gnu_Unwind_Backtrace): Likewise. * unwind-pe.h (read_encoded_value_with_base): Likewise. libstdc++/ * libsupc++/eh_personality.cc (get_ttype_entry): Add FDPIC support. Change-Id: I64b81cfaf390a05f2fd121f44ba1912cb4b47cae diff --git a/gcc/ginclude/unwind-arm-common.h b/gcc/ginclude/unwind-arm-common.h index 6df783e..d4eb03e 100644 --- a/gcc/ginclude/unwind-arm-common.h +++ b/gcc/ginclude/unwind-arm-common.h @@ -91,7 +91,7 @@ extern "C" { _uw reserved2; /* Personality routine address */ _uw reserved3; /* Saved callsite address */ _uw reserved4; /* Forced unwind stop arg */ - _uw reserved5; + _uw reserved5; /* Personality routine GOT value in FDPIC mode. */ } unwinder_cache; /* Propagation barrier cache (valid after phase 1): */ diff --git a/libgcc/config/arm/linux-atomic.c b/libgcc/config/arm/linux-atomic.c index 06a6d46..565f829 100644 --- a/libgcc/config/arm/linux-atomic.c +++ b/libgcc/config/arm/linux-atomic.c @@ -25,11 +25,62 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see /* Kernel helper for compare-and-exchange. */ typedef int (__kernel_cmpxchg_t) (int oldval, int newval, int *ptr); -#define __kernel_cmpxchg (*(__kernel_cmpxchg_t *) 0x0fc0) + +#define STR(X) #X +#define XSTR(X) STR(X) + +#define KERNEL_CMPXCHG 0x0fc0 + +#if __FDPIC__ +/* Non-FDPIC ABIs call __kernel_cmpxchg directly by dereferencing its + address, but under FDPIC we would generate a broken call + sequence. That's why we have to implement __kernel_cmpxchg and + __kernel_dmb here: this way, the FDPIC call sequence works. */ +#define __kernel_cmpxchg __fdpic_cmpxchg +#else +#define __kernel_cmpxchg (*(__kernel_cmpxchg_t *) KERNEL_CMPXCHG) +#endif /* Kernel helper for memory barrier. */ typedef void (__kernel_dmb_t) (void); -#define __kernel_dmb (*(__kernel_dmb_t *) 0x0fa0) + +#define KERNEL_DMB 0x0fa0 + +#if __FDPIC__ +#define __kernel_dmb __fdpic_dmb +#else +#define __kernel_dmb (*(__kernel_dmb_t *) KERNEL_DMB) +#endif + +#if __FDPIC__ +static int __fdpic_cmpxchg (int oldval, int newval, int *ptr) +{ + int result; + + asm volatile ( + "ldr ip, 1f\n\t" + "bx ip\n\t" + "1:\n\t" + ".word " XSTR(KERNEL_CMPXCHG) "\n\t" + : "=r" (result) + : "r" (oldval) , "r" (newval), "r" (ptr) + : "r3", "memory"); + /* The result is actually returned by the kernel helper, we need + this to avoid a warning. */ + return result; +} + +static void __fdpic_dmb (void) +{ + asm volatile ( + "ldr ip, 1f\n\t" + "bx ip\n\t" + "1:\n\t" + ".word " XSTR(KERNEL_DMB) "\n\t" + ); +} + +#endif /* Note: we implement byte, short and int versions of atomic operations using the above kernel helpers; see linux-atomic-64bit.c for "long long" (64-bit) diff --git a/libgcc/config/arm/unwind-arm.h b/libgcc/config/arm/unwind-arm.h index 43c5379..2bf320a 100644 --- a/libgcc/config/arm/unwind-arm.h +++ b/libgcc/config/arm/unwind-arm.h @@ -33,9 +33,33 @@ /* Use IP as a scratch register within the personality routine. */ #define UNWIND_POINTER_REG 12 +#define FDPIC_REGNUM 9 + +#define
Re: [ARM/FDPIC v5 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts
On 29/08/19 16:54 +0200, Christophe Lyon wrote: On 12/07/2019 08:49, Richard Sandiford wrote: Christophe Lyon writes: The new arm-uclinuxfdpiceabi target behaves pretty much like arm-linux-gnueabi. In order the enable the same set of features, we have to update several configure scripts that generally match targets like *-*-linux*: in most places, we add *-uclinux* where there is already *-linux*, or uclinux* when there is already linux*. In gcc/config.gcc and libgcc/config.host we use *-*-uclinuxfdpiceabi because there is already a different behaviour for *-*uclinux* target. In libtool.m4, we use uclinuxfdpiceabi in cases where ELF shared libraries support is required, as uclinux does not guarantee that. 2019-XX-XX Christophe Lyon config/ * futex.m4: Handle *-uclinux*. * tls.m4 (GCC_CHECK_TLS): Likewise. gcc/ * config.gcc: Handle *-*-uclinuxfdpiceabi. libatomic/ * configure.tgt: Handle arm*-*-uclinux*. * configure: Regenerate. libgcc/ * config.host: Handle *-*-uclinuxfdpiceabi. libitm/ * configure.tgt: Handle *-*-uclinux*. * configure: Regenerate. libstdc++-v3/ * acinclude.m4: Handle uclinux*. * configure: Regenerate. * configure.host: Handle uclinux* * libtool.m4: Handle uclinux*. Has the libtool.m4 patch been submitted to upstream libtool? I think this is supposed to be handled by submitting there first and then cherry-picking into gcc, so that the change isn't lost by a future import. I added a comment to libtool.m4 about this. [...] diff --git a/config/tls.m4 b/config/tls.m4 index 1a5fc59..a487aa4 100644 --- a/config/tls.m4 +++ b/config/tls.m4 @@ -76,7 +76,7 @@ AC_DEFUN([GCC_CHECK_TLS], [ dnl Shared library options may depend on the host; this check dnl is only known to be needed for GNU/Linux. case $host in - *-*-linux*) + *-*-linux* | -*-uclinux*) LDFLAGS="-shared -Wl,--no-undefined $LDFLAGS" ;; esac Is this right for all uclinux targets? I don't think so, now restricted to -*-uclinuxfdpic* diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 84258d8..cb0fdc5 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 It'd probably be worth splitting out the libstdc++-v3 bits and submitting them separately, cc:ing libstd...@gcc.gnu.org. But... I've now split the patch into two parts (both attached here) @@ -1404,7 +1404,7 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME], [ ac_has_nanosleep=yes ac_has_sched_yield=yes ;; - gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu) + gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu | uclinux*) AC_MSG_CHECKING([for at least GNU libc 2.17]) AC_TRY_COMPILE( [#include ], is this the right thing to do? It seems odd to be testing the glibc version for uclibc. Do you want to support multiple possible settings of ac_has_clock_monotonic and ac_has_clock_realtime? Or could you just hard-code the values, given particular baseline assumptions about the version of uclibc etc.? Hard-coding would then make @@ -1526,7 +1526,7 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME], [ if test x"$ac_has_clock_monotonic" != x"yes"; then case ${target_os} in - linux*) + linux* | uclinux*) AC_MSG_CHECKING([for clock_gettime syscall]) AC_TRY_COMPILE( [#include ...this redundant. Right, now fixed. @@ -2415,7 +2415,7 @@ AC_DEFUN([GLIBCXX_ENABLE_CLOCALE], [ # Default to "generic". if test $enable_clocale_flag = auto; then case ${target_os} in - linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu) + linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu | uclinux*) enable_clocale_flag=gnu ;; darwin*) This too seems to be choosing a glibc setting for a uclibc target. Indeed. @@ -2661,7 +2661,7 @@ AC_DEFUN([GLIBCXX_ENABLE_ALLOCATOR], [ # Default to "new". if test $enable_libstdcxx_allocator_flag = auto; then case ${target_os} in - linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu) + linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu | uclinux*) enable_libstdcxx_allocator_flag=new ;; *) The full case is: # Probe for host-specific support if no specific model is specified. # Default to "new". if test $enable_libstdcxx_allocator_flag = auto; then case ${target_os} in linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu) enable_libstdcxx_allocator_flag=new ;; *) enable_libstdcxx_allocator_flag=new ;; esac fi which looks a bit redundant :-) Right :-) Thanks, Christophe Thanks, Richard . From 81c84839b8f004b7b52317850f27f58e05bec6ad Mon Sep 17 00:00:00 2001 From: Christophe Lyon Date: Fri, 4 May 2018 15:11:35 + Subject: [ARM/FDPIC v6 02/24] [ARM] FDPIC: Han
[PATCH] Optimize to_chars
Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this saves 129 bytes of data and totally avoids a chance of cache misses on __digits. * For base == 16 replacing the lookup in __digits table with arithmetic computations leads to a few additional instructions, but totally avoids a chance of cache misses on __digits (- ~9 cache misses for worst case) and saves 513 bytes of const data. * Replacing __first[pos] and __first[pos - 1] with __first[1] and __first[0] on final iterations saves ~2% of code size. * Removing trailing '\0' from arrays of digits allows the linker to merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and "0123456789abcdef" could share the same address). This improves data locality and reduces binary sizes. * Using __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len makes the operation O(1) instead of O(N). It also makes the code two times shorter ( https://godbolt.org/z/Peq_PG) . In sum: this significantly reduces the size of a binary (for about 4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals with latency (CPU cache misses) without changing the iterations count and without adding costly instructions into the loops. Changelog: * include/std/charconv (__detail::__to_chars_8, __detail::__to_chars_16): Replace array of precomputed digits with arithmetic operations to avoid CPU cache misses. Remove zero termination from array of digits to allow symbol merge with generic implementation of __detail::__to_chars. Replace final offsets with constants. Use __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len. * include/std/charconv (__detail::__to_chars): Remove zero termination from array of digits. * include/std/charconv (__detail::__to_chars_2): Leading digit is always '1'. -- Best regards, Antony Polukhin diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index eaa6f74..35706d0 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,17 @@ +2019-08-30 Antony Polukhin + + * include/std/charconv (__detail::__to_chars_8, + __detail::__to_chars_16): Replace array of precomputed digits + with arithmetic operations to avoid CPU cache misses. Remove + zero termination from array of digits to allow symbol merge with + generic implementation of __detail::__to_chars. Replace final + offsets with constants. Use __detail::__to_chars_len_2 instead + of a generic __detail::__to_chars_len. + * include/std/charconv (__detail::__to_chars): Remove + zero termination from array of digits. + * include/std/charconv (__detail::__to_chars_2): Leading digit + is always '1'. + 2019-08-29 Jonathan Wakely PR libstdc++/91067 diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index 53aa63e..4e94c39 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -131,7 +131,7 @@ namespace __detail : 1u; } else - return __to_chars_len(__value, 8); + return (__to_chars_len_2(__value) + 2) / 3; } // Generic implementation for arbitrary bases. @@ -155,8 +155,12 @@ namespace __detail unsigned __pos = __len - 1; - static constexpr char __digits[] - = "0123456789abcdefghijklmnopqrstuvwxyz"; + static constexpr char __digits[] = { + '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', + 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', + 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', + 'u', 'v', 'w', 'x', 'y', 'z' + }; while (__val >= __base) { @@ -181,7 +185,7 @@ namespace __detail to_chars_result __res; - const unsigned __len = __to_chars_len(__val, 0x10); + const unsigned __len = (__to_chars_len_2(__val) + 3) / 4; if (__builtin_expect((__last - __first) < __len, 0)) { @@ -190,32 +194,30 @@ namespace __detail return __res; } - static constexpr char __digits[513] = - "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f" - "202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f" - "404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f" - "606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f" - "808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9f" - "a0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf" - "c0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedf" - "e0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff"; + static constexpr char __digits[] = { + '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', + 'a',
[Ada] Add assertion for Size attribute et al.
The type to which, or the type of the object to which, it is applied must be frozen when the Size attribute, or a variant thereof, is processed. Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/trans.c (Attribute_to_gnu) : Add assertion. -- Eric BotcazouIndex: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 275174) +++ gcc-interface/trans.c (working copy) @@ -2351,6 +2351,9 @@ Attribute_to_gnu (Node_Id gnat_node, tree *gnu_res gnu_type = TREE_TYPE (DECL_CHAIN (TYPE_FIELDS (gnu_type))); } + /* The type must be frozen at this point. */ + gcc_assert (COMPLETE_TYPE_P (gnu_type)); + /* If we're looking for the size of a field, return the field size. */ if (TREE_CODE (gnu_prefix) == COMPONENT_REF) gnu_result = DECL_SIZE (TREE_OPERAND (gnu_prefix, 1));
Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
> Am 30.08.2019 um 09:16 schrieb Richard Biener : > > On Fri, Aug 30, 2019 at 9:12 AM Richard Biener > wrote: >> >> On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich wrote: >>> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich : Bootstrap and regtest running on x86_64-redhat-linux and s390x-redhat-linux. This patch series adds signaling FP comparison support (both scalar and vector) to s390 backend. >>> >>> I'm running into a problem on ppc64 with this patch, and it would be >>> great if someone could help me figure out the best way to resolve it. >>> >>> vector36.C test is failing because gimplifier produces the following >>> >>> _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }; >>> _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; >>> >>> from >>> >>> VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) , >>> { -1, -1, -1, -1 } , >>> { 0, 0, 0, 0 } > >>> >>> Since the comparison tree code is now hidden behind a temporary, my code >>> does not have anything to pass to the backend. The reason for creating >>> a temporary is that the comparison can trap, and so the following check >>> in gimplify_expr fails: >>> >>> if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p)) >>>goto out; >>> >>> gimple_test_f is is_gimple_condexpr, and it eventually calls >>> operation_could_trap_p (GT). >>> >>> My current solution is to simply state that backend does not support >>> SSA_NAME in vector comparisons, however, I don't like it, since it may >>> cause performance regressions due to having to fall back to scalar >>> comparisons. >>> >>> I was thinking about two other possible solutions: >>> >>> 1. Change the gimplifier to allow trapping vector comparisons. That's >>> a bit complicated, because tree_could_throw_p checks not only for >>> floating point traps, but also e.g. for array index out of bounds >>> traps. So I would have to create a tree_could_throw_p version which >>> disregards specific kinds of traps. >>> >>> 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use >>> its tree_code instead of SSA_NAME. The potential problem I see with >>> this is that there appears to be no guarantee that _5 will be inlined >>> into _6 at a later point. So if we say that we don't need to fall >>> back to scalar comparisons based on availability of vector > >>> instruction and inlining does not happen, then what's actually will >>> be required is vector selection (vsel on S/390), which might not be >>> available in general case. >>> >>> What would be a better way to proceed here? >> >> On GIMPLE there isn't a good reason to split out trapping comparisons >> from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs >> where it is important because we'd have no way to represent EH info >> when not done. It might be a bit awkward to preserve EH across RTL >> expansion though in case the [VEC_]COND_EXPR are not expanded >> as a single pattern, but I'm not sure. >> >> To go this route you'd have to split the is_gimple_condexpr check >> I guess and eventually users turning [VEC_]COND_EXPR into conditional >> code (do we have any?) have to be extra careful then. > > Oh, btw - the fact that we have an expression embedded in [VEC_]COND_EXPR > is something that bothers me for quite some time already and it makes > things like VN awkward and GIMPLE fincky. We've discussed alternatives > to dead with the simplest being moving the comparison out to a separate > stmt and others like having four operand [VEC_]COND_{EQ,NE,...}_EXPR > codes or simply treating {EQ,NE,...}_EXPR as quarternary on GIMPLE > with either optional 3rd and 4th operand (defaulting to > boolean_true/false_node) > or always explicit ones (and thus dropping [VEC_]COND_EXPR). > > What does LLVM do here? For void f(long long * restrict w, double * restrict x, double * restrict y, int n) { for (int i = 0; i < n; i++) w[i] = x[i] == y[i] ? x[i] : y[i]; } LLVM does %26 = fcmp oeq <2 x double> %21, %25 %27 = extractelement <2 x i1> %26, i32 0 %28 = select <2 x i1> %26, <2 x double> %21, <2 x double> %25 So they have separate operations for comparisons and ternary operator (fcmp + select).
Backports to 7.5 (part 2)
Hi! I've backported 97 commits from trunk to 7.5, bootstrapped/regtested them on x86_64-linux and i686-linux and committed. Here is the second half of them. Jakub 2019-08-30 Jakub Jelinek Backported from mainline 2019-02-20 Jakub Jelinek PR middle-end/88074 PR middle-end/89415 * toplev.c (do_compile): Double the emin/emax exponents to workaround buggy mpc_norm. 2019-02-19 Richard Biener PR middle-end/88074 * toplev.c (do_compile): Initialize mpfr's exponent range based on available float modes. 2019-02-20 Jakub Jelinek PR middle-end/88074 PR middle-end/89415 * gcc.dg/pr88074-2.c: New test. 2019-02-19 Richard Biener PR middle-end/88074 * gcc.dg/pr88074.c: New testcase. --- gcc/toplev.c(revision 270711) +++ gcc/toplev.c(revision 270712) @@ -1981,6 +1981,36 @@ do_compile () else int_n_enabled_p[i] = false; + /* Initialize mpfrs exponent range. This is important to get + underflow/overflow in a reasonable timeframe. */ + machine_mode mode; + int min_exp = -1; + int max_exp = 1; + for (mode = GET_CLASS_NARROWEST_MODE (MODE_FLOAT); + mode != VOIDmode; + mode = GET_MODE_WIDER_MODE (mode)) + if (SCALAR_FLOAT_MODE_P (mode)) + { + const real_format *fmt = REAL_MODE_FORMAT (mode); + if (fmt) + { + /* fmt->emin - fmt->p + 1 should be enough but the + back-and-forth dance in real_to_decimal_for_mode we + do for checking fails due to rounding effects then. */ + if ((fmt->emin - fmt->p) < min_exp) + min_exp = fmt->emin - fmt->p; + if (fmt->emax > max_exp) + max_exp = fmt->emax; + } + } + /* E.g. mpc_norm assumes it can square a number without bothering with +with range scaling, so until that is fixed, double the minimum +and maximum exponents, plus add some buffer for arithmetics +on the squared numbers. */ + if (mpfr_set_emin (2 * (min_exp - 1)) + || mpfr_set_emax (2 * (max_exp + 1))) + sorry ("mpfr not configured to handle all float modes"); + /* Set up the back-end if requested. */ if (!no_backend) backend_init (); --- gcc/testsuite/gcc.dg/pr88074.c (nonexistent) +++ gcc/testsuite/gcc.dg/pr88074.c (revision 270712) @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O" } */ + +#include + +int main() +{ + _Complex double x; + __real x = 3.091e+8; + __imag x = -4.045e+8; + /* This used to spend huge amounts of compile-time inside mpc. */ + volatile _Complex double y = ctan (x); + return 0; +} --- gcc/testsuite/gcc.dg/pr88074-2.c(nonexistent) +++ gcc/testsuite/gcc.dg/pr88074-2.c(revision 270712) @@ -0,0 +1,17 @@ +/* PR middle-end/88074 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-add-options float128 } */ +/* { dg-require-effective-target float128 } */ +/* { dg-final { scan-tree-dump-not "link_error " "optimized" } } */ + +extern void link_error (void); +int +main () +{ + if (((__FLT128_MAX__ * 0.5 + __FLT128_MAX__ * 0.5i) + / (__FLT128_MAX__ * 0.25 + __FLT128_MAX__ * 0.25i)) + != (_Complex _Float128) 2) +link_error (); + return 0; +} 2019-08-30 Jakub Jelinek Backported from mainline 2019-02-20 Jakub Jelinek David Malcolm PR middle-end/89091 * fold-const.c (decode_field_reference): Return NULL_TREE if lang_hooks.types.type_for_size returns NULL. Check it before overwriting *exp_. Use return NULL_TREE instead of return 0. * gcc.dg/torture/pr89091.c: New test. --- gcc/fold-const.c(revision 270712) +++ gcc/fold-const.c(revision 270713) @@ -4239,7 +4239,7 @@ decode_field_reference (location_t loc, There are problems with FP fields since the type_for_size call below can fail for, e.g., XFmode. */ if (! INTEGRAL_TYPE_P (TREE_TYPE (exp))) -return 0; +return NULL_TREE; /* We are interested in the bare arrangement of bits, so strip everything that doesn't affect the machine mode. However, record the type of the @@ -4255,7 +4255,7 @@ decode_field_reference (location_t loc, exp = TREE_OPERAND (exp, 0); STRIP_NOPS (exp); STRIP_NOPS (and_mask); if (TREE_CODE (and_mask) != INTEGER_CST) - return 0; + return NULL_TREE; } poly_int64 poly_bitsize, poly_bitpos; @@ -4271,7 +4271,11 @@ decode_field_reference (location_t loc, || (! AGGREGATE_TYPE_P (TREE_TYPE (inner)) && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)), *pbitpos + *pbitsize) < 0)) -return 0; +return NULL_TREE; + + unsigned_type = lang_hooks.types.type_for_siz
Backports to 7.5 (part 1)
Hi! I've backported 97 commits from trunk to 7.5, bootstrapped/regtested them on x86_64-linux and i686-linux and committed. Here is the first half of them, second half will follow in another mail. Jakub 2019-08-30 Jakub Jelinek Backported from mainline 2018-10-19 Jakub Jelinek PR middle-end/85488 PR middle-end/87649 * omp-low.c (check_omp_nesting_restrictions): Diagnose ordered without depend closely nested inside of loop with ordered clause with a parameter. * c-c++-common/gomp/doacross-2.c: New test. * c-c++-common/gomp/sink-3.c: Expect another error during error recovery. --- gcc/omp-low.c (revision 265800) +++ gcc/omp-low.c (revision 265801) @@ -2835,14 +2835,25 @@ check_omp_nesting_restrictions (gimple * case GIMPLE_OMP_FOR: if (gimple_omp_for_kind (ctx->stmt) == GF_OMP_FOR_KIND_TASKLOOP) goto ordered_in_taskloop; - if (omp_find_clause (gimple_omp_for_clauses (ctx->stmt), -OMP_CLAUSE_ORDERED) == NULL) + tree o; + o = omp_find_clause (gimple_omp_for_clauses (ctx->stmt), +OMP_CLAUSE_ORDERED); + if (o == NULL) { error_at (gimple_location (stmt), "% region must be closely nested inside " "a loop region with an % clause"); return false; } + if (OMP_CLAUSE_ORDERED_EXPR (o) != NULL_TREE + && omp_find_clause (c, OMP_CLAUSE_DEPEND) == NULL_TREE) + { + error_at (gimple_location (stmt), + "% region without % clause may " + "not be closely nested inside a loop region with " + "an % clause with a parameter"); + return false; + } return true; case GIMPLE_OMP_TARGET: if (gimple_omp_target_kind (ctx->stmt) --- gcc/testsuite/c-c++-common/gomp/sink-3.c(revision 265800) +++ gcc/testsuite/c-c++-common/gomp/sink-3.c(revision 265801) @@ -14,7 +14,7 @@ foo () for (i=0; i < 100; ++i) { #pragma omp ordered depend(sink:poo-1,paa+1) /* { dg-error "poo.*declared.*paa.*declared" } */ -bar(&i); +bar(&i);/* { dg-error "may not be closely nested" "" { target *-*-* } .-1 } */ #pragma omp ordered depend(source) } } --- gcc/testsuite/c-c++-common/gomp/doacross-2.c(nonexistent) +++ gcc/testsuite/c-c++-common/gomp/doacross-2.c(revision 265801) @@ -0,0 +1,49 @@ +/* PR middle-end/87649 */ + +void +foo (void) +{ + int i; + #pragma omp for ordered(1) + for (i = 0; i < 64; i++) +{ + #pragma omp ordered /* { dg-error "'ordered' region without 'depend' clause may not be closely nested inside a loop region with an 'ordered' clause with a parameter" } */ + ; +} + #pragma omp for ordered(1) + for (i = 0; i < 64; i++) +{ + #pragma omp ordered threads /* { dg-error "'ordered' region without 'depend' clause may not be closely nested inside a loop region with an 'ordered' clause with a parameter" } */ + ; +} +} + +void +bar (void) +{ + int i; + #pragma omp for ordered + for (i = 0; i < 64; i++) +{ + #pragma omp ordered depend(source) /* { dg-error "'ordered' construct with 'depend' clause must be closely nested inside a loop with 'ordered' clause with a parameter" } */ + #pragma omp ordered depend(sink: i - 1) /* { dg-error "'ordered' construct with 'depend' clause must be closely nested inside a loop with 'ordered' clause with a parameter" } */ +} + #pragma omp for + for (i = 0; i < 64; i++) +{ + #pragma omp ordered depend(source) /* { dg-error "'ordered' construct with 'depend' clause must be closely nested inside a loop with 'ordered' clause with a parameter" } */ + #pragma omp ordered depend(sink: i - 1) /* { dg-error "'ordered' construct with 'depend' clause must be closely nested inside a loop with 'ordered' clause with a parameter" } */ +} + #pragma omp for + for (i = 0; i < 64; i++) +{ + #pragma omp ordered /* { dg-error "'ordered' region must be closely nested inside a loop region with an 'ordered' clause" } */ + ; +} + #pragma omp for + for (i = 0; i < 64; i++) +{ + #pragma omp ordered threads /* { dg-error "'ordered' region must be closely nested inside a loop region with an 'ordered' clause" } */ + ; +} +} 2019-08-30 Jakub Jelinek Backported from mainline 2018-10-20 Jakub Jelinek PR middle-end/87647 * varasm.c (decode_addr_const): Handle COMPOUND_LITERAL_EXPR. * gcc.c-torture/compile/pr87647.c: New test. --- gcc/varasm.c(revision 265801) +++ gcc/varasm.
Re: C++ PATCH for P1152R4: Deprecating some uses of volatile (PR c++/91361)
On Thu, Aug 29, 2019 at 03:14:36PM -0600, Martin Sebor wrote: > On 8/28/19 5:56 PM, Marek Polacek wrote: > > --- gcc/doc/invoke.texi > > +++ gcc/doc/invoke.texi > > @@ -3516,6 +3516,19 @@ result in a call to @code{terminate}. > > Disable the warning about the case when a conversion function converts an > > object to the same type, to a base class of that type, or to void; such > > a conversion function will never be called. > > + > > +@item -Wvolatile @r{(C++ and Objective-C++ only)} > > +@opindex Wvolatile > > +@opindex Wno-volatile > > +Warn about deprecated uses of the volatile qualifier. This includes > > postfix > > +and prefix @code{++} and @code{--} expressions of volatile-qualified types, > > +using simple assignments where the left operand is a volatile-qualified > > +non-class type for their value, compound assignments where the left operand > > +is a volatile-qualified non-class type, volatile-qualified function return > > +type, volatile-qualified parameter type, and structured bindings of a > > +volatile-qualified type. This usage was deprecated in C++20. > > Just a minor thing: Since the text uses volatile as a keyword > (as opposed to an adjective) it should be @code{volatile}, > analogously how it's quoted in warnings. Oh right, thanks. Fixed with: 2019-08-30 Marek Polacek * doc/invoke.texi (-Wvolatile): Use @code for volatile. diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index aa9886ee09f..44a8801e558 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -3520,13 +3520,14 @@ a conversion function will never be called. @item -Wvolatile @r{(C++ and Objective-C++ only)} @opindex Wvolatile @opindex Wno-volatile -Warn about deprecated uses of the volatile qualifier. This includes postfix -and prefix @code{++} and @code{--} expressions of volatile-qualified types, -using simple assignments where the left operand is a volatile-qualified -non-class type for their value, compound assignments where the left operand -is a volatile-qualified non-class type, volatile-qualified function return -type, volatile-qualified parameter type, and structured bindings of a -volatile-qualified type. This usage was deprecated in C++20. +Warn about deprecated uses of the @code{volatile} qualifier. This includes +postfix and prefix @code{++} and @code{--} expressions of +@code{volatile}-qualified types, using simple assignments where the left +operand is a @code{volatile}-qualified non-class type for their value, +compound assignments where the left operand is a @code{volatile}-qualified +non-class type, @code{volatile}-qualified function return type, +@code{volatile}-qualified parameter type, and structured bindings of a +@code{volatile}-qualified type. This usage was deprecated in C++20. Enabled by default with @option{-std=c++2a}. @end table
[PATCH] PR libstdc++/89164 enforce constraints for uninitialized algos
The memmove optimizations for std::uninitialized_copy/fill/_n will compile even if the type is not copy constructible, because std::copy doesn't require copy construction to work. But the uninitialized algorithms do require it. This adds explicit static assertions to ensure we don't allow ill-formed initializations. PR libstdc++/89164 * include/bits/stl_algobase.h (__copy_move): Give descriptive names to template parameters. * include/bits/stl_uninitialized.h (uninitialized_copy) (uninitialized_fill, uninitialized_fill_n): Add static assertions to diagnose invalid uses. * testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc: Adjust expected error. * testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_copy_n/ 89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill_n/ 89164.cc: New test. * testsuite/23_containers/vector/cons/89164.cc: New test. * testsuite/23_containers/vector/cons/89164_c++17.cc: New test. This is the patch sent in https://gcc.gnu.org/ml/libstdc++/2019-02/msg00034.html but with the dg-error directives in the tests fixed. Tested x86_64-linux, committed to trunk. commit dde7756ade2ca2d9d886a4da66cdea8ddd493294 Author: Jonathan Wakely Date: Fri Aug 30 14:46:42 2019 +0100 PR libstdc++/89164 enforce constraints for uninitialized algos The memmove optimizations for std::uninitialized_copy/fill/_n will compile even if the type is not copy constructible, because std::copy doesn't require copy construction to work. But the uninitialized algorithms do require it. This adds explicit static assertions to ensure we don't allow ill-formed initializations. PR libstdc++/89164 * include/bits/stl_algobase.h (__copy_move): Give descriptive names to template parameters. * include/bits/stl_uninitialized.h (uninitialized_copy) (uninitialized_fill, uninitialized_fill_n): Add static assertions to diagnose invalid uses. * testsuite/20_util/specialized_algorithms/uninitialized_copy/1.cc: Adjust expected error. * testsuite/20_util/specialized_algorithms/uninitialized_copy/89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_copy_n/ 89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill/89164.cc: New test. * testsuite/20_util/specialized_algorithms/uninitialized_fill_n/ 89164.cc: New test. * testsuite/23_containers/vector/cons/89164.cc: New test. * testsuite/23_containers/vector/cons/89164_c++17.cc: New test. diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h index 36bb9ccb777..98d324827ed 100644 --- a/libstdc++-v3/include/bits/stl_algobase.h +++ b/libstdc++-v3/include/bits/stl_algobase.h @@ -359,7 +359,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // (2) If we're using random access iterators, then write the loop as // a for loop with an explicit count. - template + template struct __copy_move { template diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h b/libstdc++-v3/include/bits/stl_uninitialized.h index b29395cb7c0..cf42cbd9c76 100644 --- a/libstdc++-v3/include/bits/stl_uninitialized.h +++ b/libstdc++-v3/include/bits/stl_uninitialized.h @@ -130,9 +130,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #if __cplusplus < 201103L const bool __assignable = true; #else - // trivial types can have deleted assignment + // Trivial types can have deleted copy constructor, but the std::copy + // optimization that uses memmove would happily "copy" them anyway. + static_assert(is_constructible<_ValueType2, decltype(*__first)>::value, + "result type must be constructible from value type of input range"); + typedef typename iterator_traits<_InputIterator>::reference _RefType1; typedef typename iterator_traits<_ForwardIterator>::reference _RefType2; + // Trivial types can have deleted assignment, so using std::copy + // would be ill-formed. Require assignability before using std::copy: const bool __assignable = is_assignable<_RefType2, _RefType1>::value; #endif @@ -197,7 +203,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #if __cplusplus < 201103L const bool __assignable = true; #else - // trivial types can have deleted assignment + // Trivial types can have deleted copy constructor, but the std::fill + // optimization that uses memmove would happily "copy" them anyway. + static_assert(is_constructible<_ValueType, c
[Ada] Remove range check generation code in gigi
The generation of range checks is now entirely done by the front-end proper so the code to that effect in gigi is obsolete and can be removed, but the patch implements a watchdog mechanism to make sure no check is lost in the process. Tested on x86_64-suse-linux, applied on the mainline. 2019-08-30 Eric Botcazou * gcc-interface/gigi.h (gigi_checking_assert): New macro. * gcc-interface/decl.c (gnat_to_gnu_entity) : Remove redundant test and adjust comments. Minor tweaks. * gcc-interface/trans.c (Call_to_gnu): Do not generate range checks, instead assert that the Do_Range_Check flag is not set. Adjust call to convert_with_check. (gnat_to_gnu): Likewise. (assoc_to_constructor): Likewise. (pos_to_constructor): Likewise. Remove GNAT_COMPONENT_TYPE parameter. (emit_range_check): Delete. (convert_with_check): Remove RANGE_P parameter and adjust. Do a single overflow check for modular types. -- Eric BotcazouIndex: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 275062) +++ gcc-interface/decl.c (working copy) @@ -447,13 +447,13 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gn If we are not defining it, it must be a type or an entity that is defined elsewhere or externally, otherwise we should have defined it already. */ gcc_assert (definition - || type_annotate_only || is_type || kind == E_Discriminant || kind == E_Component || kind == E_Label || (kind == E_Constant && Present (Full_View (gnat_entity))) - || Is_Public (gnat_entity)); + || Is_Public (gnat_entity) + || type_annotate_only); /* Get the name of the entity and set up the line number and filename of the original definition for use in any decl we make. Make sure we do @@ -1758,34 +1758,29 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gn case E_Modular_Integer_Type: { - /* For modular types, make the unsigned type of the proper number - of bits and then set up the modulus, if required. */ - tree gnu_modulus, gnu_high = NULL_TREE; - /* Packed Array Impl. Types are supposed to be subtypes only. */ gcc_assert (!Is_Packed_Array_Impl_Type (gnat_entity)); + /* For modular types, make the unsigned type of the proper number + of bits and then set up the modulus, if required. */ gnu_type = make_unsigned_type (esize); - /* Get the modulus in this type. If it overflows, assume it is because - it is equal to 2**Esize. Note that there is no overflow checking - done on unsigned type, so we detect the overflow by looking for - a modulus of zero, which is otherwise invalid. */ - gnu_modulus = UI_To_gnu (Modulus (gnat_entity), gnu_type); + /* Get the modulus in this type. If the modulus overflows, assume + that this is because it was equal to 2**Esize. Note that there + is no overflow checking done on unsigned types, so we detect the + overflow by looking for a modulus of zero, which is invalid. */ + tree gnu_modulus = UI_To_gnu (Modulus (gnat_entity), gnu_type); + /* If the modulus is not 2**Esize, then this also means that the upper + bound of the type, i.e. modulus - 1, is not maximal, so we create an + extra subtype to carry it and set the modulus on the base type. */ if (!integer_zerop (gnu_modulus)) { + TYPE_NAME (gnu_type) = create_concat_name (gnat_entity, "UMT"); TYPE_MODULAR_P (gnu_type) = 1; SET_TYPE_MODULUS (gnu_type, gnu_modulus); - gnu_high = fold_build2 (MINUS_EXPR, gnu_type, gnu_modulus, -build_int_cst (gnu_type, 1)); - } - - /* If the upper bound is not maximal, make an extra subtype. */ - if (gnu_high - && !tree_int_cst_equal (gnu_high, TYPE_MAX_VALUE (gnu_type))) - { - TYPE_NAME (gnu_type) = create_concat_name (gnat_entity, "UMT"); + tree gnu_high = fold_build2 (MINUS_EXPR, gnu_type, gnu_modulus, + build_int_cst (gnu_type, 1)); gnu_type = create_extra_subtype (gnu_type, TYPE_MIN_VALUE (gnu_type), gnu_high); @@ -2987,8 +2982,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gn || Present (Record_Extension_Part (record_definition))) record_definition = Record_Extension_Part (record_definition); - gcc_assert (type_annotate_only - || Present (Parent_Subtype (gnat_entity))); + gcc_assert (Present (Parent_Subtype (gnat_entity)) + || type_annotate_only); } /* Make a node for the record. If we are not defining the record, Index: gcc-interface/gigi.h === --- gcc-interface/gigi.h (revision 275062) +++ gcc-interface/gigi.h (working copy) @@ -6,7 +6,7 @@ * * * C Header File * *
Re: [PATCH] Fix unused malloc return value warning
On 29/08/19 21:54 +0200, François Dumont wrote: Hi I am having this warning: /home/fdt/dev/gcc/git/libstdc++-v3/testsuite/util/testsuite_performance.h:170: attention: ignoring return value of « void* malloc(size_t) » declared with attribute « warn_unused_result » [-Wunused-result] 170 | malloc(0); // Needed for some implementations. Ok to fix it with attached patch ? OK for trunk. It seems trivial but I wonder if I shouldn't keep the malloc returned pointer and free it properly ? It's not causing any problems (it's only the testsuite) so let's not worry about it. Or maybe just remove the malloc cause there is not clear comment explaining why it's needed and I haven't found much in SVN audit trail. The comment says it's needed, so let's assume that's true. * testsuite_files/util/testsuite_performance.h (resource_counter::start): Ignore unused malloc(0) result. François diff --git a/libstdc++-v3/testsuite/util/testsuite_performance.h b/libstdc++-v3/testsuite/util/testsuite_performance.h index 556c78159be..8abc77cf31a 100644 --- a/libstdc++-v3/testsuite/util/testsuite_performance.h +++ b/libstdc++-v3/testsuite/util/testsuite_performance.h @@ -167,7 +167,7 @@ namespace __gnu_test { if (getrusage(who, &rusage_begin) != 0 ) memset(&rusage_begin, 0, sizeof(rusage_begin)); - malloc(0); // Needed for some implementations. + void* p __attribute__((unused)) = malloc(0); // Needed for some implementations. allocation_begin = mallinfo(); }
[preprocessor] Popping "" file names
On the modules branch, I needed a bunch of tests checking the interaction of #include nesting and the declarations therein. Doing this with the current testsuite infrastructure is quite awkward. Using # "name" [12] directives works just fine, /except/ in the unnesting case. There we need "name" to match the name of the main file. But we don't know that, because it's /some/directory/src/gcc//mytest.C Hence this change to the directive. when unnesting, if the given filename is "", we simply use the filename we already expect to be popped to. I've wanted this occasionally in the past, but my current requirements pushed me to implement this. 1) if the popped-to filename is actually "", the right thing happens anyway. So existing sources using this form of line control continue to be accepted. 2) the IS_MAIN_FILE test about mismatches isn't needed -- if we're in the main file, we'll have a NULL 'from' linemap pointer. 3) We do not document this form of the line control. Only the standard-mandated '#line' form, and that doesn't permit the trailing push/pop/system-header flags. I intend to commit this next week, to allow for comments. nathan -- Nathan Sidwell 2019-08-30 Nathan Sidwell New # semantics for popping to "" name. libcpp/ * directives.c (do_linemarker): Popping to "" name means get the name from the include stack.. Index: libcpp/directives.c === --- libcpp/directives.c (revision 275032) +++ libcpp/directives.c (working copy) @@ -1085,7 +1085,15 @@ do_linemarker (cpp_reader *pfile) const line_map_ordinary *from = linemap_included_from_linemap (line_table, map); - if (MAIN_FILE_P (map) - || (from - && filename_cmp (ORDINARY_MAP_FILE_NAME (from), new_file) != 0)) + + if (!from) + /* Not nested. */; + else if (!new_file[0]) + /* Leaving to "" means fill in the popped-to name. */ + new_file = ORDINARY_MAP_FILE_NAME (from); + else if (filename_cmp (ORDINARY_MAP_FILE_NAME (from), new_file) != 0) + /* It's the wrong name, Grommit! */ + from = NULL; + + if (!from) { cpp_warning (pfile, CPP_W_NONE, @@ -1095,4 +1103,5 @@ do_linemarker (cpp_reader *pfile) } } + /* Compensate for the increment in linemap_add that occurs in _cpp_do_file_change. We're currently at the start of the line Index: gcc/testsuite/c-c++-common/cpp/line-1.c === --- gcc/testsuite/c-c++-common/cpp/line-1.c (revision 0) +++ gcc/testsuite/c-c++-common/cpp/line-1.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do preprocess } */ +/* { dg-additional-options -Wno-pedantic } */ + +main-1 __FILE__ + +# 7 "inner.h" 1 +inner-1 __FILE__ +# 9 "inside.h" 1 +inside-1 __FILE__ +# 11 "" 2 +inner-2 __FILE__ +#13 "" 2 +main-2 __FILE__ + + +/* { dg-final { scan-file line-1.i "main-1 \"\[^\n]*line-1.c\"\n" } } */ +/* { dg-final { scan-file line-1.i "main-2 \"\[^\n]*line-1.c\"\n" } } */ +/* { dg-final { scan-file line-1.i "inner-1 \"inner.h\"\n" } } */ +/* { dg-final { scan-file line-1.i "inner-2 \"inner.h\"\n" } } */ +/* { dg-final { scan-file line-1.i "inside-1 \"inside.h\"\n" } } */
Re: [PATCH] Add .pd extension to c_exts.
On Fri, Aug 30, 2019 at 2:31 PM Alexander Monakov wrote: > > > > On Fri, 30 Aug 2019, Richard Biener wrote: > > > On Fri, Aug 30, 2019 at 12:58 PM Martin Liška wrote: > > > > > > Hi. > > > > > > I would like to add .pd to c_exts so that one > > > can have correctly set tab width, etc. > > > > But then it will auto-indent with too much spaces, no? > > I think it's fine, the script does > > setlocal cindent > setlocal cinoptions=>4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0 > > so it's a bit smarter than indenting everything by 2 spaces :) > > So +1 from me for the patch, but note that now we have a contrib/vim-gcc-dev > "plugin" directory, it might be more natural to add gcc-match indent rules > there. Yeah, that would be better than claiming .pd is C ... Richard. > > Alexander
Re: [PATCH] Add .pd extension to c_exts.
On Fri, 30 Aug 2019, Richard Biener wrote: > On Fri, Aug 30, 2019 at 12:58 PM Martin Liška wrote: > > > > Hi. > > > > I would like to add .pd to c_exts so that one > > can have correctly set tab width, etc. > > But then it will auto-indent with too much spaces, no? I think it's fine, the script does setlocal cindent setlocal cinoptions=>4,n-2,{2,^-2,:2,=2,g0,f0,h2,p4,t0,+2,(0,u0,w1,m0 so it's a bit smarter than indenting everything by 2 spaces :) So +1 from me for the patch, but note that now we have a contrib/vim-gcc-dev "plugin" directory, it might be more natural to add gcc-match indent rules there. Alexander
Re: [PATCH] Add .pd extension to c_exts.
On 8/30/19 2:09 PM, Richard Biener wrote: > But then it will auto-indent with too much spaces, no? It's doesn't understand the lisp-like notation (foo (bar (baz so no auto-indentation is happening. Martin
Re: [PATCH] Add .pd extension to c_exts.
On Fri, Aug 30, 2019 at 12:58 PM Martin Liška wrote: > > Hi. > > I would like to add .pd to c_exts so that one > can have correctly set tab width, etc. But then it will auto-indent with too much spaces, no? Richard. > Ready for trunk? > Thanks, > Martin > > contrib/ChangeLog: > > 2019-08-30 Martin Liska > > * vimrc: Add .pd extension. > --- > contrib/vimrc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >
Re: [PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions
Thanks for doing this. The patch looks good, so this review is mostly a list of very minor formatting comments, sorry. Yuliang Wang writes: > 2019-08-22 Yuliang Wang > Please add a line here pointing at the PR: PR tree-optimization/89386 The commit hooks pick this up automatically and link the commit to the bugzilla ticket. (The PR was filed for SSSE3, but the target-independent bits are still needed there.) > * config/aarch64/aarch64-sve2.md: support for SVE2 instructions > [S/U]MULL[T/B] + [R]SHRN[T/B] and MULHRS pattern variants Unfortunately the changelog format is pretty strict here. Lines have to be 80 chars or shorter, indented by tabs, and each pattern, function, variable or type needs to be listed individually regardless of how useful that seems. So I think this should be something like: * config/aarch64/aarch64-sve2.md (mull) (shrnb, shrnt, mulhs3): New patterns. (See below for why the "*" patterns aren't listed.) > * config/aarch64/iterators.md: iterators and attributes for above Here too the iterators need to be listed: * config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT) (UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT) (UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS) UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs. (MULLBT, SHRNB, SHRNT, MULHRS): New int iterators. (su, r): Handle the unspecs above. (bt): New int attribute. > * internal-fn.def: internal functions for MULH[R]S patterns * internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions. > * optabs.def: optabs definitions for above and sign variants * optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab) (umulhrs_optab): New optabs. > * tree-vect-patterns.c (vect_recog_multhi_pattern): pattern > recognition function for MULHRS * tree-vect-patterns.c (vect_recog_multhi_pattern): New function. (vect_vect_recog_func_ptrs): Add it. > * gcc.target/aarch64/sve2/mulhrs_1.c: new test for all variants Just: * gcc.target/aarch64/sve2/mulhrs_1.c: New test. (Sorry that this is so finicky. I'm just the messenger. :-)) > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index > 2334e5a7b7dc524bbd1f4d0a48ba5cd991970118..51783604ad8f83eb1d070c133009ed41a2a0252d > 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -63,3 +63,89 @@ > movprfx\t%0, %2\;h\t%0., %1/m, %0., > %3." >[(set_attr "movprfx" "*,yes")] > ) > + > +;; Multiply long top / bottom Very minor, but: GCC comments traditionally end with "." even if they're not full sentences. > +(define_insn "*mull" > + [(set (match_operand: 0 "register_operand" "=w") > +(unspec: [(match_operand:SVE_BHSI 1 "register_operand" "w") > + (match_operand:SVE_BHSI 2 "register_operand" "w")] > + MULLBT))] > + "TARGET_SVE2" > + "mull\t%0., %1., %2." > +) > + > +(define_expand "mull" > + [(set (match_operand: 0 "register_operand") > +(unspec: [(match_operand:SVE_BHSI 1 "register_operand") > + (match_operand:SVE_BHSI 2 "register_operand")] > + MULLBT))] > + "TARGET_SVE2" > + "" > +) It's more usual to put the define_expand first. But in this case we don't need separate define_expands, we can just make the define_insn the named pattern itself: ;; Multiply long top / bottom. (define_insn "mull" [(set (match_operand: 0 "register_operand" "=w") (unspec: [(match_operand:SVE_BHSI 1 "register_operand" "w") (match_operand:SVE_BHSI 2 "register_operand" "w")] MULLBT))] "TARGET_SVE2" "mull\t%0., %1., %2." ) > + > +;; (Rounding) Right shift narrow bottom > +(define_insn "*shrnb" > + [(set (match_operand:SVE_BHSI 0 "register_operand" "=w") > + (unspec:SVE_BHSI [(match_operand: 1 "register_operand" "w") > + (match_operand 2 > "aarch64_simd_shift_imm_offset_" "i")] This is more a personal thing, but I think it's better to leave out the "i" (just remove it rather than replace it with "") when the operand doesn't in fact accept all the things that "i" allows. > + SHRNB))] > + "TARGET_SVE2" > + "shrnb\t%0., %1., #%2" > +) > + > +(define_expand "shrnb" > + [(set (match_operand:SVE_BHSI 0 "register_operand") > + (unspec:SVE_BHSI [(match_operand: 1 "register_operand") > + (match_operand 2 > "aarch64_simd_shift_imm_offset_")] > + SHRNB))] > + "TARGET_SVE2" > + "" > +) > + > +;; (Rounding) Right shift narrow top > +(define_insn "*shrnt" > + [(set (match_operand:SVE_BHSI 0 "register_operand" "=w") > + (unspec:SVE_BHSI [(match_operand:SVE_BHSI 1 "register_operand" "%0") "%" indicates that operand 1 is commutative with operand 2, which isn't t
[C++ PATCH] vtable decl marking
I noticed that DECL_VTABLE_OR_VTT_P and DECL_VIRTUAL_P have the same semantics for VAR_DECLs. We set them both at the same time, and the C++ FE tests the former, but (I presume) the optimizers test the latter. It doesn't seem to matter that we set DECL_VIRTUAL_P for the VTT. The attached patch survives bootstraping, so I'll install it next week, if there's no objections. nathan -- Nathan Sidwell 2019-08-30 Nathan Sidwell * cp-tree.h (DECL_VTABLE_OR_VTT_P): Forward to DECL_VIRTUAL_P. Index: gcc/cp/cp-tree.h === --- gcc/cp/cp-tree.h (revision 275032) +++ gcc/cp/cp-tree.h (working copy) @@ -463,5 +463,4 @@ extern GTY(()) tree cp_global_trees[CPTI LOOKUP_FOUND_P (in RECORD_TYPE, UNION_TYPE, NAMESPACE_DECL) 5: IDENTIFIER_VIRTUAL_P (in IDENTIFIER_NODE) - DECL_VTABLE_OR_VTT_P (in VAR_DECL) FUNCTION_RVALUE_QUALIFIED (in FUNCTION_TYPE, METHOD_TYPE) CALL_EXPR_REVERSE_ARGS (in CALL_EXPR, AGGR_INIT_EXPR) @@ -3255,6 +3254,8 @@ struct GTY(()) lang_decl { #define DECL_TINFO_P(NODE) TREE_LANG_FLAG_4 (VAR_DECL_CHECK (NODE)) -/* 1 iff VAR_DECL node NODE is virtual table or VTT. */ -#define DECL_VTABLE_OR_VTT_P(NODE) TREE_LANG_FLAG_5 (VAR_DECL_CHECK (NODE)) +/* 1 iff VAR_DECL node NODE is virtual table or VTT. We forward to + DECL_VIRTUAL_P from the common code, as that has the semantics we + need. But we want a more descriptive name. */ +#define DECL_VTABLE_OR_VTT_P(NODE) DECL_VIRTUAL_P (VAR_DECL_CHECK (NODE)) /* 1 iff FUNCTION_TYPE or METHOD_TYPE has a ref-qualifier (either & or &&). */
[PATCH] Add .pd extension to c_exts.
Hi. I would like to add .pd to c_exts so that one can have correctly set tab width, etc. Ready for trunk? Thanks, Martin contrib/ChangeLog: 2019-08-30 Martin Liska * vimrc: Add .pd extension. --- contrib/vimrc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/vimrc b/contrib/vimrc index 7c0c5878c63..a9ec46bf52f 100644 --- a/contrib/vimrc +++ b/contrib/vimrc @@ -32,7 +32,7 @@ function! SetStyle() return endif let l:ext = fnamemodify(l:fname, ":e") - let l:c_exts = ['c', 'h', 'cpp', 'cc', 'C', 'H', 'def', 'java'] + let l:c_exts = ['c', 'h', 'cpp', 'cc', 'C', 'H', 'def', 'java', 'pd'] if index(l:c_exts, l:ext) != -1 setlocal cindent setlocal tabstop=8
Re: [SVE] PR86753
On Wed, Aug 28, 2019 at 11:02 AM Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford > > wrote: > >> > >> Richard should have the final say, but some comments... > >> > >> Prathamesh Kulkarni writes: > >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > >> > index 1e2dfe5d22d..862206b3256 100644 > >> > --- a/gcc/tree-vect-stmts.c > >> > +++ b/gcc/tree-vect-stmts.c > >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info > >> > loop_vinfo, tree vectype, > >> > > >> > static tree > >> > prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask, > >> > - gimple_stmt_iterator *gsi) > >> > + gimple_stmt_iterator *gsi, tree mask, > >> > + cond_vmask_map_type *cond_to_vec_mask) > >> > >> "scalar_mask" might be a better name. But maybe we should key off the > >> vector mask after all, now that we're relying on the code having no > >> redundancies. > >> > >> Passing the vinfo would be better than passing the cond_vmask_map_type > >> directly. > >> > >> > { > >> >gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE > >> > (vec_mask))); > >> >if (!loop_mask) > >> > return vec_mask; > >> > > >> >gcc_assert (TREE_TYPE (loop_mask) == mask_type); > >> > + > >> > + tree *slot = 0; > >> > + if (cond_to_vec_mask) > >> > >> The pointer should never be null in this context. > > Disabling check for NULL results in segfault with cond_arith_4.c because we > > reach prepare_load_store_mask via vect_schedule_slp, called from > > here in vect_transform_loop: > > /* Schedule the SLP instances first, then handle loop vectorization > > below. */ > > if (!loop_vinfo->slp_instances.is_empty ()) > > { > > DUMP_VECT_SCOPE ("scheduling SLP instances"); > > vect_schedule_slp (loop_vinfo); > > } > > > > which is before bb processing loop. > > We want this optimisation to be applied to SLP too though. Especially > since non-SLP will be going away at some point. > > But as Richard says, the problem with SLP is that the statements aren't > traversed in block order, so I guess we can't do the on-the-fly > redundancy elimination there... And the current patch AFAICS can generate wrong SSA for this reason. > Maybe an alternative would be to record during the analysis phase which > scalar conditions need which loop masks. Statements that need a loop > mask currently do: > > vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype); > > If we also pass the scalar condition, we can maintain a hash_set of > pairs, representing the conditions that have > loop masks applied at some point in the vectorised code. The COND_EXPR > code can use that set to decide whether to apply the loop mask or not. Yeah, that sounds better. Note that I don't like the extra "helpers" in fold-const.c/h, they do not look useful in general so put them into vectorizer private code. The decomposing also doesn't look too nice, instead prepare_load_store_mask could get such decomposed representation - possibly quite natural with the suggestion from Richard above. Richard. > Trying to avoid duplicate ANDs with the loop mask would then become a > separate follow-on change. Not sure whether it's worth it on its own. > > Thanks, > Richard
Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
On Fri, Aug 30, 2019 at 12:06 PM Segher Boessenkool wrote: > > On Fri, Aug 30, 2019 at 09:12:11AM +0200, Richard Biener wrote: > > On GIMPLE there isn't a good reason to split out trapping comparisons > > from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs > > where it is important because we'd have no way to represent EH info > > when not done. It might be a bit awkward to preserve EH across RTL > > expansion though in case the [VEC_]COND_EXPR are not expanded > > as a single pattern, but I'm not sure. > > The *language* specifies which comparisons trap on unordered and which > do not. (Hopefully it is similar for all or this is going to be a huge > mess :-) ) So Gimple needs to at least keep track of this. There also > are various optimisations that can be done -- two signaling comparisons > with the same arguments can be folded to just one, for example -- so it > seems to me you want to represent this in Gimple, never mind if you do > EH for them or just a magical trap or whatever. The issue with GIMPLE_CONDs is that we can't have EH edges out of blocks ending in them so to represent the compare-and-jump single GIMPLE insn raising exceptions we need to split the actually trapping compare away. For [VEC_]COND_EXPR we'd need to say the whole expression can throw/trap if the embedded comparison can. On GIMPLE this is enough precision but we of course have to handle it that way then. Currently operation_could_trap* do not consider [VEC_]COND_EXPR trapping nor does tree_could_trap_p. stmt_could_throw_1_p looks fine by means of falling back to checking individual operands and then running into the embedded comparison. But we do have operation_could_trap* callers which would need to be adjusted on the GIMPLE side. Richard. > > Segher
Re: [PATCH][middle-end/88784] Middle end is missing some optimizations about unsigned
@Richi: PING^1 On 7/16/19 8:35 AM, Li Jia He wrote: > > > On 2019/7/2 4:51 PM, Richard Biener wrote: >> On Tue, 2 Jul 2019, Richard Biener wrote: >> >>> On Tue, 2 Jul 2019, Li Jia He wrote: >>> On 2019/7/1 3:30 PM, Richard Biener wrote: > On Fri, 28 Jun 2019, Andrew Pinski wrote: > >> On Thu, Jun 27, 2019 at 9:55 PM Li Jia He wrote: >>> >>> >>> >>> On 2019/6/27 11:48 PM, Jeff Law wrote: On 6/27/19 12:11 AM, Li Jia He wrote: > Hi, > > According to the optimizable case described by Qi Feng on > issue 88784, we can combine the cases into the following: > > 1. x > y && x != XXX_MIN --> x > y > 2. x > y && x == XXX_MIN --> false > 3. x <= y && x == XXX_MIN --> x == XXX_MIN > > 4. x < y && x != XXX_MAX --> x < y > 5. x < y && x == XXX_MAX --> false > 6. x >= y && x == XXX_MAX --> x == XXX_MAX > > 7. x > y || x != XXX_MIN --> x != XXX_MIN > 8. x <= y || x != XXX_MIN --> true > 9. x <= y || x == XXX_MIN --> x <= y > > 10. x < y || x != XXX_MAX --> x != UXXX_MAX > 11. x >= y || x != XXX_MAX --> true > 12. x >= y || x == XXX_MAX --> x >= y > > Note: XXX_MIN represents the minimum value of type x. > XXX_MAX represents the maximum value of type x. > > Here we don't need to care about whether the operation is > signed or unsigned. For example, in the below equation: > > 'x > y && x != XXX_MIN --> x > y' > > If the x type is signed int and XXX_MIN is INT_MIN, we can > optimize it to 'x > y'. However, if the type of x is unsigned > int and XXX_MIN is 0, we can still optimize it to 'x > y'. > > The regression testing for the patch was done on GCC mainline on > > powerpc64le-unknown-linux-gnu (Power 9 LE) > > with no regressions. Is it OK for trunk ? > > Thanks, > Lijia He > > gcc/ChangeLog > > 2019-06-27 Li Jia He > Qi Feng > > PR middle-end/88784 > * gimple-fold.c (and_comparisons_contain_equal_operands): New > function. > (and_comparisons_1): Use > and_comparisons_contain_equal_operands. > (or_comparisons_contain_equal_operands): New function. > (or_comparisons_1): Use or_comparisons_contain_equal_operands. Would this be better done via match.pd? ISTM this transformation would be well suited for that framework. >>> >>> Hi, Jeff >>> >>> I did this because of the following test case: >>> ` >>> _Bool comp(unsigned x, unsigned y) >>> { >>> return x > y && x != 0; >>> } >>> ` >>> The gimple file dumped on the power platform is: >>> ` >>> comp (unsigned int x, unsigned int y) >>> { >>> _Bool D.2837; >>> int iftmp.0; >>> >>> if (x > y) goto ; else goto ; >>> : >>> if (x != 0) goto ; else goto ; >>> : >>> iftmp.0 = 1; >>> goto ; >>> : >>> iftmp.0 = 0; >>> : >>> D.2837 = (_Bool) iftmp.0; >>> return D.2837; >>> } >>> ` >>> However, the gimple file dumped on x86 is >>> ` >>> comp (unsigned int x, unsigned int y) >>> { >>> _Bool D.2837; >>> >>> _1 = x > y; >>> _2 = x != 0; >>> _3 = _1 & _2; >>> _4 = (int) _3; >>> D.2837 = (_Bool) _4; >>> return D.2837; >>> } >>> ` >>> >>> The reason for the inconsistency between these two behaviors is param >>> logical-op-non-short-circuit. If we add the pattern to the match.pd >>> file, we can only optimize the situation in which the statement is in >>> the same basic block (logical-op-non-short-circuit=1, x86). But for >>> a cross-basic block (logical-op-non-short-circuit=0, power), match.pd >>> can't handle this situation. >>> >>> Another reason is that I found out maybe_fold_and_comparisons and >>> maybe_fold_or_comparisons are not only called by ifcombine pass but >>> also by reassoc pass. Using this method can basically unify param >>> logical-op-non-short-circuit=0 or 1. >> >> >> As mentioned before ifcombine pass should be using gimple-match >> instead of fold_build. Try converting ifcombine over to gimple-match >> infrastructure and add these to match.pd. > > Yes, I mentioned that in the PR. The issue is that at the moment > to combine x > y with x <= y you'd have to build GENERIC trees > for both or temporary GIMPLE assign with a SSA def (and then feed > that into the GENERIC o
[PATCH 3/3][MSP430] Use default_elf_select_section to select sections for data where possible
With the "noinit" attribute now handled generically, direct assignment of data with the "noinit" attribute to the ".noinit" attribute can be removed from the msp430 back end, and default_elf_select_section can be used instead. default_elf_select_section can also be used for SECCAT_RODATA_MERGE_* sections, to enable the linker to merge constant data. >From 2d2bc7f11b7d5bfc918351a5963b041f69aac673 Mon Sep 17 00:00:00 2001 From: Jozef Lawrynowicz Date: Wed, 28 Aug 2019 17:54:22 +0100 Subject: [PATCH 3/3] MSP430: Use default_elf_select_section to determine sections for data where possible gcc/ChangeLog: 2019-08-29 Jozef Lawrynowicz * config/msp430/msp430.c (msp430_init_sections): Remove handling of the noinit section. (msp430_select_section): Handle decls with the "noinit" attribute with default_elf_select_section. Handle SECCAT_RODATA_MERGE_* section types with default_elf_select_section. Add comments about handling of unsupported section types. (msp430_section_type_flags): Remove handling of the noinit section. --- gcc/config/msp430/msp430.c | 81 +++--- 1 file changed, 50 insertions(+), 31 deletions(-) diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c index 1b2e9683a94..d409ea15df2 100644 --- a/gcc/config/msp430/msp430.c +++ b/gcc/config/msp430/msp430.c @@ -1785,7 +1785,6 @@ gen_prefix (tree decl) return NULL; } -static section * noinit_section; static section * persist_section; #undef TARGET_ASM_INIT_SECTIONS @@ -1794,8 +1793,6 @@ static section * persist_section; static void msp430_init_sections (void) { - noinit_section = get_unnamed_section (0, output_section_asm_op, - ".section .noinit,\"aw\""); persist_section = get_unnamed_section (0, output_section_asm_op, ".section .persistent,\"aw\""); } @@ -1806,6 +1803,10 @@ msp430_init_sections (void) static section * msp430_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT align) { + const char * prefix; + const char * sec_name; + const char * base_sec_name; + gcc_assert (decl != NULL_TREE); if (TREE_CODE (decl) == STRING_CST @@ -1821,51 +1822,71 @@ msp430_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT align) && is_interrupt_func (decl)) return get_section (".lowtext", SECTION_CODE | SECTION_WRITE , decl); - const char * prefix = gen_prefix (decl); - if (prefix == NULL) -{ - if (TREE_CODE (decl) == FUNCTION_DECL) - return text_section; - /* FIXME: ATTR_NOINIT is handled generically in - default_elf_select_section. */ - else if (has_attr (ATTR_NOINIT, decl)) - return noinit_section; - else if (has_attr (ATTR_PERSIST, decl)) - return persist_section; - else - return default_select_section (decl, reloc, align); -} + if (has_attr (ATTR_PERSIST, decl)) +return persist_section; + + /* ATTR_NOINIT is handled generically. */ + if (has_attr (ATTR_NOINIT, decl)) +return default_elf_select_section (decl, reloc, align); + + prefix = gen_prefix (decl); - const char * sec; switch (categorize_decl_for_section (decl, reloc)) { -case SECCAT_TEXT: sec = ".text"; break; -case SECCAT_DATA: sec = ".data"; break; -case SECCAT_BSS:sec = ".bss";break; -case SECCAT_RODATA: sec = ".rodata"; break; +case SECCAT_TEXT: + if (!prefix) + return text_section; + base_sec_name = ".text"; + break; +case SECCAT_DATA: + if (!prefix) + return data_section; + base_sec_name = ".data"; + break; +case SECCAT_BSS: + if (!prefix) + return bss_section; + base_sec_name = ".bss"; + break; +case SECCAT_RODATA: + if (!prefix) + return readonly_data_section; + base_sec_name = ".rodata"; + break; +/* Enable merging of constant data by the GNU linker using + default_elf_select_section and therefore enabling creation of + sections with the SHF_MERGE flag. */ case SECCAT_RODATA_MERGE_STR: case SECCAT_RODATA_MERGE_STR_INIT: case SECCAT_RODATA_MERGE_CONST: + return default_elf_select_section (decl, reloc, align); + +/* The sections listed below are are not supported for MSP430. + They should not be generated, but in case they are, we use + default_select_section so they get placed in sections + the msp430 assembler and linker understand. */ +/* "small data" sections are not supported. */ case SECCAT_SRODATA: -case SECCAT_DATA_REL: -case SECCAT_DATA_REL_LOCAL: -case SECCAT_DATA_REL_RO: -case SECCAT_DATA_REL_RO_LOCAL: case SECCAT_SDATA: case SECCAT_SBSS: +/* Thread-local storage (TLS) is not supported. */ case SECCAT_TDATA: case SECCAT_TBSS: +/* Sections used by a dynamic linker are not supported. */ +case SECCAT_DATA_REL: +case SECCAT_DATA_REL_LOCAL: +case SECCAT_DATA_REL_RO: +case SECCAT_DATA_REL_RO_LOCAL: return default_select_section (decl, reloc, align);
[PATCH 2/3][MSP430] Setup exclusion tables for function and data attributes
The attached patch removes hard-coded warnings from msp430 attribute handlers, and replaces them with exclusion rules specified in the attribute_spec table. Where msp430 attributes conflict with generic attributes, hard-coded warnings are still necessary. >From c6def571a47df5394ca9f5c5c4918f0ffcf97375 Mon Sep 17 00:00:00 2001 From: Jozef Lawrynowicz Date: Wed, 28 Aug 2019 17:44:16 +0100 Subject: [PATCH 2/3] MSP430: Setup exclusion tables for function and data attributes gcc/ChangeLog: 2019-08-29 Jozef Lawrynowicz * config/msp430/msp430.c (msp430_attr): Remove warnings about conflicting msp430-specific attributes. (msp430_section_attr): Likewise. Add warnings about conflicts with generic "noinit" and "section" attributes. Fix grammar in -mlarge error message. (msp430_data_attr): Rename to msp430_persist_attr. Add warnings about conflicts with generic "noinit" and "section" attributes. Add warning for when variable is not initialized. Chain conditionals which prevent the attribute being added. (ATTR_EXCL): New helper. (attr_reent_exclusions): New exclusion table. (attr_naked_exclusions): Likewise. (attr_crit_exclusions): Likewise. (attr_lower_exclusions): Likewise. (attr_upper_exclusions): Likewise. (attr_either_exclusions): Likewise. (attr_persist_exclusions): Likewise. (msp430_attribute_table): Update with exclusion rules. (msp430_output_aligned_decl_common): Don't output common symbol if decl has a section. gcc/testsuite/ChangeLog: 2019-08-29 Jozef Lawrynowicz * gcc.target/msp430/data-attributes-2.c: New test. * gcc.target/msp430/function-attributes-4.c: Update dg-warning strings. * gcc.target/msp430/region-attribute-misuse.c: Likewise. --- gcc/config/msp430/msp430.c| 199 +++--- .../gcc.target/msp430/data-attributes-2.c | 51 + .../gcc.target/msp430/function-attributes-4.c | 27 +-- .../msp430/region-attribute-misuse.c | 6 +- 4 files changed, 189 insertions(+), 94 deletions(-) create mode 100644 gcc/testsuite/gcc.target/msp430/data-attributes-2.c diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c index d54a3749437..1b2e9683a94 100644 --- a/gcc/config/msp430/msp430.c +++ b/gcc/config/msp430/msp430.c @@ -1345,35 +1345,19 @@ msp430_attr (tree * node, } if (is_critical_func (* node)) { + /* We always ignore the critical attribute when interrupt and + critical are used together. */ warning (OPT_Wattributes, "critical attribute has no effect on interrupt functions"); DECL_ATTRIBUTES (*node) = remove_attribute (ATTR_CRIT, DECL_ATTRIBUTES (* node)); } } - else if (TREE_NAME_EQ (name, ATTR_REENT)) -{ - if (is_naked_func (* node)) - message = "naked functions cannot be reentrant"; - else if (is_critical_func (* node)) - message = "critical functions cannot be reentrant"; -} else if (TREE_NAME_EQ (name, ATTR_CRIT)) { - if (is_naked_func (* node)) - message = "naked functions cannot be critical"; - else if (is_reentrant_func (* node)) - message = "reentrant functions cannot be critical"; - else if (is_interrupt_func ( *node)) + if (is_interrupt_func ( *node)) message = "critical attribute has no effect on interrupt functions"; } - else if (TREE_NAME_EQ (name, ATTR_NAKED)) -{ - if (is_critical_func (* node)) - message = "critical functions cannot be naked"; - else if (is_reentrant_func (* node)) - message = "reentrant functions cannot be naked"; -} if (message) { @@ -1396,32 +1380,15 @@ msp430_section_attr (tree * node, const char * message = NULL; - if (TREE_NAME_EQ (name, ATTR_UPPER)) -{ - if (has_attr (ATTR_LOWER, * node)) - message = "already marked with 'lower' attribute"; - else if (has_attr (ATTR_EITHER, * node)) - message = "already marked with 'either' attribute"; - else if (! msp430x) - message = "upper attribute needs a 430X cpu"; -} - else if (TREE_NAME_EQ (name, ATTR_LOWER)) -{ - if (has_attr (ATTR_UPPER, * node)) - message = "already marked with 'upper' attribute"; - else if (has_attr (ATTR_EITHER, * node)) - message = "already marked with 'either' attribute"; -} - else -{ - gcc_assert (TREE_NAME_EQ (name, ATTR_EITHER)); - - if (has_attr (ATTR_LOWER, * node)) - message = "already marked with 'lower' attribute"; - else if (has_attr (ATTR_UPPER, * node)) - message = "already marked with 'upper' attribute"; -} - + /* The "noinit" and "section" attributes are handled generically, so we + cannot set up additional target-specific attribute exclusions using the + existing mechanism. */ + if (has_attr (ATTR_NOINIT, * node)) +message = G_("ignoring attribute %qE because it conflicts with " + "attribute %"); + else if (has_attr ("section", * node)) +message = G_("ignoring attribute %qE because it conflicts with " + "attribute %"); /* It does not make sense to use up
[PATCH 1/3] Implement TARGET_HANDLE_GENERIC_ATTRIBUTE
The attached patch adds a new target hook "TARGET_HANDLE_GENERIC_ATTRIBUTE" which enables a back end to perform additional processing of an attribute that is normally handled by a front end. So far only the "section" and "noinit" attribute make use of this hook, as the msp430 back end requires additional attribute conflict checking to be performed on these generic attributes. >From e693da709114df378e2ea8b1d3729b105c99a495 Mon Sep 17 00:00:00 2001 From: Jozef Lawrynowicz Date: Wed, 28 Aug 2019 14:09:20 +0100 Subject: [PATCH 1/3] Implement TARGET_HANDLE_GENERIC_ATTRIBUTE gcc/ChangeLog: 2019-08-29 Jozef Lawrynowicz * config/msp430/msp430.c (TARGET_HANDLE_GENERIC_ATTRIBUTE): Define. (msp430_handle_generic_attribute): New function. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Add TARGET_HANDLE_GENERIC_ATTRIBUTE. * hooks.c (hook_tree_treeptr_tree_tree_int_boolptr_null): New. * hooks.h (hook_tree_treeptr_tree_tree_int_boolptr_null): New. * target.def: Define new hook TARGET_HANDLE_GENERIC_ATTRIBUTE. gcc/c-family/ChangeLog: 2019-08-29 Jozef Lawrynowicz * c-attribs.c (handle_section_attribute): Call the handle_generic_attribute target hook after performing target independent processing. (handle_noinit_attribute): Likewise. --- gcc/c-family/c-attribs.c | 39 +++-- gcc/config/msp430/msp430.c | 40 ++ gcc/doc/tm.texi| 8 gcc/doc/tm.texi.in | 2 ++ gcc/hooks.c| 6 ++ gcc/hooks.h| 1 + gcc/target.def | 11 +++ 7 files changed, 97 insertions(+), 10 deletions(-) diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index 820c83fa3b9..e572c6502bb 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -1875,6 +1875,7 @@ handle_section_attribute (tree *node, tree ARG_UNUSED (name), tree args, int ARG_UNUSED (flags), bool *no_add_attrs) { tree decl = *node; + tree res = NULL_TREE; if (!targetm_common.have_named_sections) { @@ -1922,12 +1923,20 @@ handle_section_attribute (tree *node, tree ARG_UNUSED (name), tree args, goto fail; } - set_decl_section_name (decl, TREE_STRING_POINTER (TREE_VALUE (args))); - return NULL_TREE; + res = targetm.handle_generic_attribute (node, name, args, flags, + no_add_attrs); + + /* If the back end confirms the attribute can be added then continue onto + final processing. */ + if (!(* no_add_attrs)) +{ + set_decl_section_name (decl, TREE_STRING_POINTER (TREE_VALUE (args))); + return res; +} fail: *no_add_attrs = true; - return NULL_TREE; + return res; } /* If in c++-11, check if the c++-11 alignment constraint with respect @@ -2249,6 +2258,7 @@ handle_noinit_attribute (tree * node, bool *no_add_attrs) { const char *message = NULL; + tree res = NULL_TREE; gcc_assert (DECL_P (*node)); gcc_assert (args == NULL); @@ -2271,14 +2281,23 @@ handle_noinit_attribute (tree * node, *no_add_attrs = true; } else -/* If this var is thought to be common, then change this. Common - variables are assigned to sections before the backend has a - chance to process them. Do this only if the attribute is - valid. */ -if (DECL_COMMON (*node)) - DECL_COMMON (*node) = 0; +{ + res = targetm.handle_generic_attribute (node, name, args, flags, + no_add_attrs); + /* If the back end confirms the attribute can be added then continue onto + final processing. */ + if (!(* no_add_attrs)) + { + /* If this var is thought to be common, then change this. Common + variables are assigned to sections before the backend has a + chance to process them. Do this only if the attribute is + valid. */ + if (DECL_COMMON (* node)) + DECL_COMMON (* node) = 0; + } +} - return NULL_TREE; + return res; } diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c index c5cf7044aef..d54a3749437 100644 --- a/gcc/config/msp430/msp430.c +++ b/gcc/config/msp430/msp430.c @@ -1518,6 +1518,46 @@ const struct attribute_spec msp430_attribute_table[] = { NULL, 0, 0, false, false, false, false, NULL, NULL } }; +#undef TARGET_HANDLE_GENERIC_ATTRIBUTE +#define TARGET_HANDLE_GENERIC_ATTRIBUTE msp430_handle_generic_attribute + +tree +msp430_handle_generic_attribute (tree * node, + tree name, + tree args ATTRIBUTE_UNUSED, + intflags ATTRIBUTE_UNUSED, + bool * no_add_attrs) + +{ + const char * message = NULL; + + if (!(TREE_NAME_EQ (name, ATTR_NOINIT) || TREE_NAME_EQ (name, "section"))) +return NULL_TREE; + + /* The front end has set up an exclusion between the "noinit" and "section" + attributes. */ + if (has_attr (ATTR_LOWER, * node)) +message = G_("ignoring attribute %qE because it conflicts with " + "attribute %"); + else if (has_attr (ATTR_UPPER, * node)) +message = G_("ign
[PATCH 0/3] MSP430: Improve attribute handling
The following series of patches improves the handling of msp430-specific attributes by making use of generic mechanisms for performing common tasks (i.e. handling attribute conflicts, putting data objects in sections). The patches also transition the msp430 back end to fully use the generic handling of the "noinit" attribute. Successfully bootstrapped and regtested on x86_64-pc-linux-gnu. Successfully regtested for msp430-elf. As a further sanity test I built GCC for arm-eabi and ran execute.exp=noinit-attribute.c to confirm the noinit attribute still works as expected for ARM. Ok for trunk? Jozef Lawrynowicz (3): Implement TARGET_HANDLE_GENERIC_ATTRIBUTE MSP430: Setup exclusion tables for function and data attributes MSP430: Use default_elf_select_section to determine sections for data where possible gcc/c-family/c-attribs.c | 39 ++- gcc/config/msp430/msp430.c| 320 -- gcc/doc/tm.texi | 8 + gcc/doc/tm.texi.in| 2 + gcc/hooks.c | 6 + gcc/hooks.h | 1 + gcc/target.def| 11 + .../gcc.target/msp430/data-attributes-2.c | 51 +++ .../gcc.target/msp430/function-attributes-4.c | 27 +- .../msp430/region-attribute-misuse.c | 6 +- 10 files changed, 336 insertions(+), 135 deletions(-) create mode 100644 gcc/testsuite/gcc.target/msp430/data-attributes-2.c
Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
On Fri, Aug 30, 2019 at 09:12:11AM +0200, Richard Biener wrote: > On GIMPLE there isn't a good reason to split out trapping comparisons > from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs > where it is important because we'd have no way to represent EH info > when not done. It might be a bit awkward to preserve EH across RTL > expansion though in case the [VEC_]COND_EXPR are not expanded > as a single pattern, but I'm not sure. The *language* specifies which comparisons trap on unordered and which do not. (Hopefully it is similar for all or this is going to be a huge mess :-) ) So Gimple needs to at least keep track of this. There also are various optimisations that can be done -- two signaling comparisons with the same arguments can be folded to just one, for example -- so it seems to me you want to represent this in Gimple, never mind if you do EH for them or just a magical trap or whatever. Segher
Re: [PATCH V6 05/11] bpf: new GCC port
> This patch adds a port for the Linux kernel eBPF architecture to GCC. > > ChangeLog: > > * configure.ac: Support for bpf-*-* targets. > * configure: Regenerate. > > contrib/ChangeLog: > > * config-list.mk (LIST): Disable go in bpf-*-* targets. > > gcc/ChangeLog: > > * config.gcc: Support for bpf-*-* targets. > * common/config/bpf/bpf-common.c: New file. > * config/bpf/t-bpf: Likewise. > * config/bpf/predicates.md: Likewise. > * config/bpf/constraints.md: Likewise. > * config/bpf/bpf.opt: Likewise. > * config/bpf/bpf.md: Likewise. > * config/bpf/bpf.h: Likewise. > * config/bpf/bpf.c: Likewise. > * config/bpf/bpf-protos.h: Likewise. > * config/bpf/bpf-opts.h: Likewise. > * config/bpf/bpf-helpers.h: Likewise. > * config/bpf/bpf-helpers.def: Likewise. Looks good to me, thanks. Thanks to you for the throughful, super useful review. Jeff also had detailed comments, so please wait for an ack from him too. Sure.
Re: [PATCH] or1k: Fix issue with set_got clobbering r9
Hello, any comments on this? If nothing I will commit in a few days. On Thu, Aug 22, 2019 at 08:44:04PM +0900, Stafford Horne wrote: > When compiling glibc we found that the GOT register was being allocated > r9 when the instruction was still set_got_tmp. That caused set_got to > clobber r9. We cannot simply say set_got_tmp clobbers r9 as this is the > reason for having the temporary set_got_tmp. > > Fix by using a register class constraint that does not allow r9 during > register allocation. > > gcc/ChangeLog: > > * config/or1k/constraints.md (t): New constraint. > * config/or1k/or1k.h (GOT_REGS): New register class. > * config/or1k/or1k.md (set_got_tmp, set_got): Use t contraint. > --- > gcc/config/or1k/constraints.md | 4 > gcc/config/or1k/or1k.h | 3 +++ > gcc/config/or1k/or1k.md| 4 ++-- > 3 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/or1k/constraints.md b/gcc/config/or1k/constraints.md > index 8cac7eb5329..ba330c6b8c2 100644 > --- a/gcc/config/or1k/constraints.md > +++ b/gcc/config/or1k/constraints.md > @@ -25,6 +25,7 @@ > ; We use: > ; c - sibcall registers > ; d - double pair base registers (excludes r0, r30 and r31 which overflow) > +; t - got address registers (excludes r9 is clobbered by set_got) I will changee this to (... r9 which is clobbered ...) > ; I - constant signed 16-bit > ; K - constant unsigned 16-bit > ; M - constant signed 16-bit shifted left 16-bits (l.movhi) > @@ -36,6 +37,9 @@ > (define_register_constraint "d" "DOUBLE_REGS" >"Registers which can be used for double reg pairs.") > > +(define_register_constraint "t" "GOT_REGS" > + "Registers which can be used to store the Global Offset Table (GOT) > address.") > + > ;; Immediates > (define_constraint "I" >"A signed 16-bit immediate in the range -32768 to 32767." > diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h > index 2b29e62fdd3..4c32607bac1 100644 > --- a/gcc/config/or1k/or1k.h > +++ b/gcc/config/or1k/or1k.h > @@ -190,6 +190,7 @@ enum reg_class >NO_REGS, >SIBCALL_REGS, >DOUBLE_REGS, > + GOT_REGS, >GENERAL_REGS, >FLAG_REGS, >ALL_REGS, > @@ -202,6 +203,7 @@ enum reg_class >"NO_REGS", \ >"SIBCALL_REGS",\ >"DOUBLE_REGS", \ > + "GOT_REGS",\ >"GENERAL_REGS",\ >"FLAG_REGS", \ >"ALL_REGS" } > @@ -215,6 +217,7 @@ enum reg_class > { { 0x, 0x },\ >{ SIBCALL_REGS_MASK, 0 },\ >{ 0x7f7e, 0x },\ > + { 0xfdff, 0x },\ >{ 0x, 0x0003 },\ >{ 0x, 0x0004 },\ >{ 0x, 0x0007 } \ > diff --git a/gcc/config/or1k/or1k.md b/gcc/config/or1k/or1k.md > index cee11d078cc..36bcee336ab 100644 > --- a/gcc/config/or1k/or1k.md > +++ b/gcc/config/or1k/or1k.md > @@ -706,7 +706,7 @@ > ;; set_got pattern below. This works because the set_got_tmp insn is the > ;; first insn in the stream and that it isn't moved during RA. > (define_insn "set_got_tmp" > - [(set (match_operand:SI 0 "register_operand" "=r") > + [(set (match_operand:SI 0 "register_operand" "=t") > (unspec_volatile:SI [(const_int 0)] UNSPECV_SET_GOT))] >"" > { > @@ -715,7 +715,7 @@ > > ;; The insn to initialize the GOT. > (define_insn "set_got" > - [(set (match_operand:SI 0 "register_operand" "=r") > + [(set (match_operand:SI 0 "register_operand" "=t") > (unspec:SI [(const_int 0)] UNSPEC_SET_GOT)) > (clobber (reg:SI LR_REGNUM))] >"" > -- > 2.21.0 >
Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
Hi Bernd, On 8/29/19 10:26 PM, Bernd Edlinger wrote: On 8/29/19 11:08 AM, Christophe Lyon wrote: On Thu, 29 Aug 2019 at 10:58, Kyrill Tkachov wrote: Hi Bernd, On 8/28/19 10:36 PM, Bernd Edlinger wrote: On 8/28/19 2:07 PM, Christophe Lyon wrote: Hi, This patch causes an ICE when building libgcc's unwind-arm.o when configuring GCC: --target arm-none-linux-gnueabihf --with-mode thumb --with-cpu cortex-a15 --with-fpu neon-vfpv4: The build works for the same target, but --with-mode arm --with-cpu cortex a9 --with-fpu vfp In file included from /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144: /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc: In function 'get_eit_entry': /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29: warning: cast discards 'const' qualifier from pointer target type [-Wcast-qual] 245 | ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content; | ^ during RTL pass: expand /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc: In function 'unwind_phase2_forced': /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18: internal compiler error: in gen_movdi, at config/arm/arm.md:5235 319 | saved_vrs.core = entry_vrs->core; | ~~~^ 0x126530f gen_movdi(rtx_def*, rtx_def*) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694 0x897083 emit_move_insn(rtx_def*, rtx_def*) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440 0x89ba1e emit_block_move_via_cpymem /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*, block_op_methods, unsigned int, long, unsigned long, unsigned long, unsigned long, bool, bool*) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845 0x88c1f9 store_field /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool) /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304 0x761964 expand_gimple_stmt_1 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779 0x761964 expand_gimple_stmt /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875 0x768583 expand_gimple_basic_block /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915 0x76abc6 execute /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538 Christophe Okay, sorry for the breakage. What is happening in gen_cpymem_ldrd_strd is of course against the rules: It uses emit_move_insn on only 4-byte aligned DI-mode memory operands. I have a patch for this, which is able to fix the libgcc build on a cross, but have no possibility to bootstrap the affected target. Could you please help? Well it's good that the sanitisation is catching the bugs! Yes, more than expected, though ;) Bootstrapping this patch I get another assert with the backtrace: Thanks for the additional testing, Kyrill! FWIW, my original report was with a failure to just build GCC for cortex-a15. I later got the reports of testing cross-toolchains, and saw other problems on cortex-a9 for instance. But I guess, you have noticed them with your bootstrap? on arm-linux-gnueabi gcc.target/arm/aapcs/align4.c (internal compiler error) gcc.target/arm/aapcs/align_rec4.c (internal compiler error) This appears to be yet unknown middle-end bug (not fixed by current patch) $ arm-linux-gnueabihf-gcc align4.c during RTL pass: expand In file included from align4.c:22: align4.c: In function 'testfunc': abitest.h:73:42: internal compiler error: in gen
Re: [ARM/FDPIC v5 01/21] [ARM] FDPIC: Add -mfdpic option support
Christophe Lyon writes: > On 16/07/2019 12:11, Richard Sandiford wrote: >> [This isn't really something that should be reviewed under global >> reviewership, but if it's either that or nothing, I'll do it anyway...] >> >> Christophe Lyon writes: >>> 2019-XX-XX Christophe Lyon >>> Mickaël Guêné >>> >>> gcc/ >>> * config/arm/arm.opt: Add -mfdpic option. >>> * doc/invoke.texi: Add documentation for -mfdpic. >>> >>> Change-Id: I0eabd1d11c9406fd4a43c4333689ebebbfcc4fe8 >>> >>> diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt >>> index 9067d49..2ed3bd5 100644 >>> --- a/gcc/config/arm/arm.opt >>> +++ b/gcc/config/arm/arm.opt >>> @@ -306,3 +306,7 @@ Cost to assume for a branch insn. >>> mgeneral-regs-only >>> Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save >>> Generate code which uses the core registers only (r0-r14). >>> + >>> +mfdpic >>> +Target Report Mask(FDPIC) >>> +Enable Function Descriptor PIC mode. >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >>> index 29585cf..805d7cc 100644 >>> --- a/gcc/doc/invoke.texi >>> +++ b/gcc/doc/invoke.texi >>> @@ -703,7 +703,8 @@ Objective-C and Objective-C++ Dialects}. >>> -mrestrict-it @gol >>> -mverbose-cost-dump @gol >>> -mpure-code @gol >>> --mcmse} >>> +-mcmse @gol >>> +-mfdpic} >>> >>> @emph{AVR Options} >>> @gccoptlist{-mmcu=@var{mcu} -mabsdata -maccumulate-args @gol >>> @@ -17912,6 +17913,23 @@ MOVT instruction. >>> Generate secure code as per the "ARMv8-M Security Extensions: >>> Requirements on >>> Development Tools Engineering Specification", which can be found on >>> >>> @url{http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/ECM0359818_armv8m_security_extensions_reqs_on_dev_tools_1_0.pdf}. >>> + >>> +@item -mfdpic >>> +@itemx -mno-fdpic >>> +@opindex mfdpic >>> +@opindex mno-fdpic >>> +Select the FDPIC ABI, which uses function descriptors to represent >> >> Maybe "64-bit function descriptors"? Just a suggestion, might not be useful. >> >> OK with that change, thanks. > > OK, here is a new version, where I added a few words to explain that -static > is not supported. > > Thanks, > Christophe > >> >> Richard >> >>> +pointers to functions. When the compiler is configured for >>> +@code{arm-*-uclinuxfdpiceabi} targets, this option is on by default >>> +and implies @option{-fPIE} if none of the PIC/PIE-related options is >>> +provided. On other targets, it only enables the FDPIC-specific code >>> +generation features, and the user should explicitly provide the >>> +PIC/PIE-related options as needed. >>> + >>> +The opposite @option{-mno-fdpic} option is useful (and required) to >>> +build the Linux kernel using the same (@code{arm-*-uclinuxfdpiceabi}) >>> +toolchain as the one used to build the userland programs. >>> + >>> @end table >>> >>> @node AVR Options >> . >> > > From c936684e2b77ff5716bd8b67c617dcad088c72e0 Mon Sep 17 00:00:00 2001 > From: Christophe Lyon > Date: Thu, 8 Feb 2018 10:44:32 +0100 > Subject: [ARM/FDPIC v6 01/24] [ARM] FDPIC: Add -mfdpic option support > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > 2019-XX-XX Christophe Lyon > Micka«l Guªn© > > gcc/ > * config/arm/arm.opt: Add -mfdpic option. > * doc/invoke.texi: Add documentation for -mfdpic. > > Change-Id: I05b98d6ae87c2b3fc04dd7fba415c730accdf33e > > diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt > index 9067d49..2ed3bd5 100644 > --- a/gcc/config/arm/arm.opt > +++ b/gcc/config/arm/arm.opt > @@ -306,3 +306,7 @@ Cost to assume for a branch insn. > mgeneral-regs-only > Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save > Generate code which uses the core registers only (r0-r14). > + > +mfdpic > +Target Report Mask(FDPIC) > +Enable Function Descriptor PIC mode. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 29585cf..b77fa06 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -703,7 +703,8 @@ Objective-C and Objective-C++ Dialects}. > -mrestrict-it @gol > -mverbose-cost-dump @gol > -mpure-code @gol > --mcmse} > +-mcmse @gol > +-mfdpic} > > @emph{AVR Options} > @gccoptlist{-mmcu=@var{mcu} -mabsdata -maccumulate-args @gol > @@ -17912,6 +17913,27 @@ MOVT instruction. > Generate secure code as per the "ARMv8-M Security Extensions: Requirements on > Development Tools Engineering Specification", which can be found on > > @url{http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/ECM0359818_armv8m_security_extensions_reqs_on_dev_tools_1_0.pdf}. > + > +@item -mfdpic > +@itemx -mno-fdpic > +@opindex mfdpic > +@opindex mno-fdpic > +Select the FDPIC ABI, which uses 64-bit function descriptors to > +represent pointers to functions. When the compiler is configured for > +@code{arm-*-uclinuxfdpiceabi} targets, this option is on by default > +and implies @option{-fPIE} if none of the PIC/PIE-related options is > +provided. On other targets, it o
Re: [ARM/FDPIC v5 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided
Christophe Lyon writes: > 2019-XX-XX Christophe Lyon > Micka«l Guªn© > > gcc/ > * config.gcc: Handle arm*-*-uclinuxfdpiceabi. > * config/arm/bpabi.h (TARGET_FDPIC_ASM_SPEC): New. > (SUBTARGET_EXTRA_ASM_SPEC): Use TARGET_FDPIC_ASM_SPEC. > * config/arm/linux-eabi.h (FDPIC_CC1_SPEC): New. > (CC1_SPEC): Use FDPIC_CC1_SPEC. > (MUSL_DYNAMIC_LINKER): Add -fdpic suffix when needed. > * config/arm/uclinuxfdpiceabi.h: New file. > > libsanitizer/ > * configure.tgt (arm*-*-*fdpiceabi): Sanitizers are > unsupported in this configuration. > > Change-Id: I74ac1fbb2e809e864d2b0acce66b173e76bcf92b > > diff --git a/gcc/config/arm/uclinuxfdpiceabi.h > b/gcc/config/arm/uclinuxfdpiceabi.h > new file mode 100644 > index 000..2d0c04b > --- /dev/null > +++ b/gcc/config/arm/uclinuxfdpiceabi.h > @@ -0,0 +1,54 @@ > +/* Configuration file for ARM GNU/Linux FDPIC EABI targets. > + Copyright (C) 2018 Free Software Foundation, Inc. Copyright year should be/include 2019. OK with that change if no Arm maintainer objects before the rest of the patch series is approved. > 2019-XX-XX Christophe Lyon > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_static): Disable > for ARM FDPIC target. OK, thanks. Richard
Re: [ARM/FDPIC v5 06/21] [ARM] FDPIC: Add support for c++ exceptions
As Richard mentioned in an earlier post the generic libgcc and libstdc++ changes will need approval from the relevant maintainers. CC'ing the libstdc++ list and the libgcc maintainer. On 5/15/19 1:39 PM, Christophe Lyon wrote: The main difference with existing support is that function addresses are function descriptor addresses instead. This means that all code dealing with function pointers now has to cope with function descriptors instead. For the same reason, Linux kernel helpers can no longer be called by dereferencing their address, so we implement wrappers that directly call the kernel helpers. When restoring a function address, we also have to restore the FDPIC register value (r9). 2019-XX-XX Christophe Lyon Mickaël Guêné gcc/ * ginclude/unwind-arm-common.h (unwinder_cache): Add reserved5 field. (FDPIC_REGNUM): New define. libgcc/ * config/arm/linux-atomic.c (__kernel_cmpxchg): Add FDPIC support. (__kernel_dmb): Likewise. (__fdpic_cmpxchg): New function. (__fdpic_dmb): New function. * config/arm/unwind-arm.h (FDPIC_REGNUM): New define. (gnu_Unwind_Find_got): New function. (_Unwind_decode_typeinfo_ptr): Add FDPIC support. * unwind-arm-common.inc (UCB_PR_GOT): New. (funcdesc_t): New struct. (get_eit_entry): Add FDPIC support. (unwind_phase2): Likewise. (unwind_phase2_forced): Likewise. (__gnu_Unwind_RaiseException): Likewise. (__gnu_Unwind_Resume): Likewise. (__gnu_Unwind_Backtrace): Likewise. * unwind-pe.h (read_encoded_value_with_base): Likewise. libstdc++/ * libsupc++/eh_personality.cc (get_ttype_entry): Add FDPIC support. Change-Id: I64b81cfaf390a05f2fd121f44ba1912cb4b47cae diff --git a/gcc/ginclude/unwind-arm-common.h b/gcc/ginclude/unwind-arm-common.h index 6df783e..d4eb03e 100644 --- a/gcc/ginclude/unwind-arm-common.h +++ b/gcc/ginclude/unwind-arm-common.h @@ -91,7 +91,7 @@ extern "C" { _uw reserved2; /* Personality routine address */ _uw reserved3; /* Saved callsite address */ _uw reserved4; /* Forced unwind stop arg */ - _uw reserved5; + _uw reserved5; /* Personality routine GOT value in FDPIC mode. */ } unwinder_cache; /* Propagation barrier cache (valid after phase 1): */ diff --git a/libgcc/config/arm/linux-atomic.c b/libgcc/config/arm/linux-atomic.c index 06a6d46..565f829 100644 --- a/libgcc/config/arm/linux-atomic.c +++ b/libgcc/config/arm/linux-atomic.c @@ -25,11 +25,62 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see /* Kernel helper for compare-and-exchange. */ typedef int (__kernel_cmpxchg_t) (int oldval, int newval, int *ptr); -#define __kernel_cmpxchg (*(__kernel_cmpxchg_t *) 0x0fc0) + +#define STR(X) #X +#define XSTR(X) STR(X) + +#define KERNEL_CMPXCHG 0x0fc0 + +#if __FDPIC__ +/* Non-FDPIC ABIs call __kernel_cmpxchg directly by dereferencing its + address, but under FDPIC we would generate a broken call + sequence. That's why we have to implement __kernel_cmpxchg and + __kernel_dmb here: this way, the FDPIC call sequence works. */ +#define __kernel_cmpxchg __fdpic_cmpxchg +#else +#define __kernel_cmpxchg (*(__kernel_cmpxchg_t *) KERNEL_CMPXCHG) +#endif /* Kernel helper for memory barrier. */ typedef void (__kernel_dmb_t) (void); -#define __kernel_dmb (*(__kernel_dmb_t *) 0x0fa0) + +#define KERNEL_DMB 0x0fa0 + +#if __FDPIC__ +#define __kernel_dmb __fdpic_dmb +#else +#define __kernel_dmb (*(__kernel_dmb_t *) KERNEL_DMB) +#endif + +#if __FDPIC__ +static int __fdpic_cmpxchg (int oldval, int newval, int *ptr) +{ + int result; + + asm volatile ( + "ldr ip, 1f\n\t" + "bx ip\n\t" + "1:\n\t" + ".word " XSTR(KERNEL_CMPXCHG) "\n\t" + : "=r" (result) + : "r" (oldval) , "r" (newval), "r" (ptr) + : "r3", "memory"); + /* The result is actually returned by the kernel helper, we need + this to avoid a warning. */ + return result; +} + +static void __fdpic_dmb (void) +{ + asm volatile ( + "ldr ip, 1f\n\t" + "bx ip\n\t" + "1:\n\t" + ".word " XSTR(KERNEL_DMB) "\n\t" + ); +} + +#endif /* Note: we implement byte, short and int versions of atomic operations using the above kernel helpers; see linux-atomic-64bit.c for "long long" (64-bit) diff --git a/libgcc/config/arm/unwind-arm.h b/libgcc/config/arm/unwind-arm.h index 43c5379..2bf320a 100644 --- a/libgcc/config/arm/unwind-arm.h +++ b/libgcc/config/arm/unwind-arm.h @@ -33,9 +33,33 @@ /* Use IP as a scratch register within the personality routine. */ #define UNWIND_POINTER_REG 12 +#define FDPIC_REGNUM 9 + +#define STR(x) #x +#define XSTR(x) STR(x) + #ifdef __cplusplus extern "C" { #endif +_Unwind_Pt
Re: [ARM/FDPIC v5 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts
Christophe Lyon writes: > @@ -785,7 +785,7 @@ case ${target} in >esac >tmake_file="t-slibgcc" >case $target in > -*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu) > +*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu > | *-*-uclinuxfdpiceabi) >:;; > *-*-gnu*) >native_system_header_dir=/include I don't think this is necessary, since this target will never match the following *-*-gnu*) stanza anyway. > diff --git a/libtool.m4 b/libtool.m4 > index 8966762..64e507a 100644 > --- a/libtool.m4 > +++ b/libtool.m4 > @@ -3734,7 +3739,7 @@ m4_if([$1], [CXX], [ > ;; > esac > ;; > - linux* | k*bsd*-gnu | kopensolaris*-gnu) > + linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*) > case $cc_basename in > KCC*) > # KAI C++ Compiler Is this needed? It seems to be in the !GCC branch of an if/else. If it is needed, the default: _LT_TAGVAR(lt_prog_compiler_can_build_shared, $1)=no seems correct for non-FDPIC uclinux. > @@ -4032,7 +4037,7 @@ m4_if([$1], [CXX], [ >_LT_TAGVAR(lt_prog_compiler_static, $1)='-non_shared' >;; > > -linux* | k*bsd*-gnu | kopensolaris*-gnu) > +linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*) >case $cc_basename in ># old Intel for x86_64 which still supported -KPIC. >ecc*) Same here. > @@ -5946,7 +5951,7 @@ if test "$_lt_caught_CXX_error" != yes; then > _LT_TAGVAR(inherit_rpath, $1)=yes > ;; > > - linux* | k*bsd*-gnu | kopensolaris*-gnu) > + linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi) > case $cc_basename in >KCC*) > # Kuck and Associates, Inc. (KAI) C++ Compiler Here too the code seems to be dealing specifically with non-GCC compilers. > @@ -6598,7 +6603,7 @@ interix[[3-9]]*) >_LT_TAGVAR(postdeps,$1)= >;; > > -linux*) > +linux* | uclinux*) >case `$CC -V 2>&1 | sed 5q` in >*Sun\ C*) > # Sun C++ 5.9 Here too. (It only seems to do anything for Sun's C compiler.) The fewer hunks we have to maintain downstream the better :-) Richard
Re: [PATCH 0/3] mklog improvements
PING^1 On 8/13/19 9:49 AM, Martin Liska wrote: > Hi. > > I'm sending format independent changes to mklog that should > improve the script. It addresses couple of improvement > mentioned here: > https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00031.html > > Martin > > Martin Liska (3): > Use argparse.ArgumentParser for mklog. > mklog: parse PR references from new test files > mklog: Do not print changed functions in testsuite > > contrib/mklog | 98 --- > 1 file changed, 47 insertions(+), 51 deletions(-) >
Re: [PATCH] Come up with json::integer_number and use it in GCOV.
PING^2 On 8/26/19 2:34 PM, Martin Liška wrote: > PING^1 > > On 8/13/19 1:51 PM, Martin Liška wrote: >> On 8/2/19 2:40 PM, David Malcolm wrote: >>> Something that occurred to me reading the updated patch: maybe it would >>> make things easier to have utility member functions of json::object to >>> implicitly make the child, e.g.: >>> >>> void >>> json::object::set (const char *key, long v) >>> { >>>set (key, new json::integer_number (v)); >>> } >>> >>> so that all those calls can be just: >>> >>> obj->set ("line", exploc.line); >>> obj->set ("column", exploc.column); >>> >>> etc (assuming overloading is unambiguous). >>> >>> But that's probably orthogonal to this patch. >> >> Looks good to me. It's a candidate for a follow up patch. >> >>> >>> And I changed all occurrences of float_number with integer_number as you suggested. >>> Thanks. >>> I'm currently testing the updated patch. Martin >>> The updated patch looks good to me, but technically I'm not a reviewer >>> for these files. >> >> Sure, I hope @Jakub or @Richi can approve me that? >> Thanks, >> Martin >> >>> >>> Dave >> >
Re: [PATCH] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)
> I was merely suggesting a better comment describing the function you are > introducing. Oh. I know. Good wording. > Thinking about it a bit more, I think you simply do not want to ever > push the extra VIEW_CONVERT_EXPR to the vector and in > evaluate_conditions_for_known_args always do a fold_convert to the > desired type (similarly like we do today). Yes. It is, but at cost of creating an extra memory slot. Feng
Re: [PATCH] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)
Hi, On Fri, Aug 30 2019, Feng Xue OS wrote: >> (It's a bad idea to make ChangeLog entries part of the patch, it won't >> apply to anyone, not even to you nowadays. ) > Got it. Will not include this kind of info in later patches. > >> I understand describing these things is difficult, but flatten is >> strange way to describe what the function does. What about somthing >> like the following? >> >> Analyze EXPR if it represents a series of simple operations performed on >> a function parameter and return true if so. FBI, STMT, INDEX_P, SIZE_P >> and AGGPOS have the same meaning like in >> unmodified_parm_or_parm_agg_item. Operations on the parameter are >> recorded to PARAM_OPS_P if it is not NULL. > Operations should be recorded in some place, and this is why PARAM_OPS_P > is used. Not quite understand this point. I was merely suggesting a better comment describing the function you are introducing. > >>> + /* Find use of parameter, add a convert operation to describe >>> + result type, which may not be same as parameter type. */ >>> + eval_op.val_is_rhs = false; >>> + eval_op.val = NULL_TREE; >>> + eval_op.code = VIEW_CONVERT_EXPR; >>> + eval_op.type = TREE_TYPE (expr); >>> + >>> + vec_safe_insert (*param_ops_p, 0, eval_op); > >> If we get here in the first iteration of the loop, could we not insert >> anything into the vector and handle such cases in >> evaluate_conditions_for_known_args like we do today (well, with >> fold_convert might be better)? It could save quite some memory and it >> is important to try keep the memory footprint down in IPA summaries. > Here is a little trick to make code of folding in > evaluate_conditions_for_known_args () > be simple. It does consume some memory for most cases. Will consider other way > and remove this. Thinking about it a bit more, I think you simply do not want to ever push the extra VIEW_CONVERT_EXPR to the vector and in evaluate_conditions_for_known_args always do a fold_convert to the desired type (similarly like we do today). Thanks, Martin > >> Also, I think you want a parameter to limit the maximum length of >> param_ops_p, at some point someone will come with some crazy >> machine-generated code that will create huge vectors. > Yes. Exactly. > > Thanks, > > Martin
[PATCH] Fix thinko in early bail out in tree-switch-conversion.
Hi. Thanks to Jakub, the patch addresses one memory leak in bit_test_cluster::find_bit_tests (allocation of output). And moreover, it implements proper guard when clustering is not successful. In such situation we want a quick bail out. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin gcc/ChangeLog: 2019-08-29 Martin Liska * tree-switch-conversion.c (jump_table_cluster::find_jump_tables): Bail out when we'll end up with the same number of clusters as at the beginning. (bit_test_cluster::find_bit_tests): Likewise for bit tests. (jump_table_cluster::can_be_handled): Remove the guard as it's already handled in ::is_enabled. Allocate output after early bail out. --- gcc/tree-switch-conversion.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c index 776db77f53a..988e923d7e8 100644 --- a/gcc/tree-switch-conversion.c +++ b/gcc/tree-switch-conversion.c @@ -1213,14 +1213,14 @@ jump_table_cluster::find_jump_tables (vec &clusters) } /* No result. */ - if (min[l].m_count == INT_MAX) + if (min[l].m_count == l) return clusters.copy (); vec output; output.create (4); /* Find and build the clusters. */ - for (int end = l;;) + for (unsigned int end = l;;) { int start = min[end].m_start; @@ -1257,11 +1257,9 @@ jump_table_cluster::can_be_handled (const vec &clusters, make a sequence of conditional branches instead of a dispatch. The definition of "much bigger" depends on whether we are - optimizing for size or for speed. */ - if (!flag_jump_tables) -return false; + optimizing for size or for speed. - /* For algorithm correctness, jump table for a single case must return + For algorithm correctness, jump table for a single case must return true. We bail out in is_beneficial if it's called just for a single case. */ if (start == end) @@ -1311,9 +1309,6 @@ jump_table_cluster::is_beneficial (const vec &, vec bit_test_cluster::find_bit_tests (vec &clusters) { - vec output; - output.create (4); - unsigned l = clusters.length (); auto_vec min; min.reserve (l + 1); @@ -1336,9 +1331,12 @@ bit_test_cluster::find_bit_tests (vec &clusters) } /* No result. */ - if (min[l].m_count == INT_MAX) + if (min[l].m_count == l) return clusters.copy (); + vec output; + output.create (4); + /* Find and build the clusters. */ for (unsigned end = l;;) {
[PATCH] Consider also negative edges in cycle detection.
Hi. The patch is enhancement of r271117 where I started detecting zero cycles. We also need consider negative edges. Patch survives gcov.exp and I'll install it after proper testing next week. Thanks, Martin gcc/ChangeLog: 2019-08-30 Martin Liska * gcov.c (path_contains_zero_cycle_arc): Rename to ... (path_contains_zero_or_negative_cycle_arc): ... this and handle also negative edges. (circuit): Handle also negative edges as they can happen in some situations. --- gcc/gcov.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/gcov.c b/gcc/gcov.c index c65b7153765..f4e65ee46da 100644 --- a/gcc/gcov.c +++ b/gcc/gcov.c @@ -730,10 +730,10 @@ unblock (const block_info *u, block_vector_t &blocked, /* Return true when PATH contains a zero cycle arc count. */ static bool -path_contains_zero_cycle_arc (arc_vector_t &path) +path_contains_zero_or_negative_cycle_arc (arc_vector_t &path) { for (unsigned i = 0; i < path.size (); i++) -if (path[i]->cs_count == 0) +if (path[i]->cs_count <= 0) return true; return false; } @@ -759,7 +759,7 @@ circuit (block_info *v, arc_vector_t &path, block_info *start, { block_info *w = arc->dst; if (w < start - || arc->cs_count == 0 + || arc->cs_count <= 0 || !linfo.has_block (w)) continue; @@ -770,7 +770,7 @@ circuit (block_info *v, arc_vector_t &path, block_info *start, handle_cycle (path, count); loop_found = true; } - else if (!path_contains_zero_cycle_arc (path) + else if (!path_contains_zero_or_negative_cycle_arc (path) && find (blocked.begin (), blocked.end (), w) == blocked.end ()) loop_found |= circuit (w, path, start, blocked, block_lists, linfo, count); @@ -785,7 +785,7 @@ circuit (block_info *v, arc_vector_t &path, block_info *start, { block_info *w = arc->dst; if (w < start - || arc->cs_count == 0 + || arc->cs_count <= 0 || !linfo.has_block (w)) continue;
Re: [ARM/FDPIC v5 09/21] [ARM] FDPIC: Add support for taking address of nested function
On 8/29/19 4:36 PM, Christophe Lyon wrote: On 31/07/2019 16:44, Christophe Lyon wrote: On Tue, 16 Jul 2019 at 14:42, Kyrill Tkachov wrote: On 7/16/19 12:18 PM, Kyrill Tkachov wrote: Hi Christophe On 5/15/19 1:39 PM, Christophe Lyon wrote: In FDPIC mode, the trampoline generated to support pointers to nested functions looks like: .word trampoline address .word trampoline GOT address ldr r12, [pc, #8] ldr r9, [pc, #8] ldr pc, [pc, #8] .word static chain value .word GOT address .word function's address because in FDPIC function pointers are actually pointers to function descriptors, we have to actually generate a function descriptor for the trampoline. 2019-XX-XX Christophe Lyon Mickaël Guêné gcc/ * config/arm/arm.c (arm_asm_trampoline_template): Add FDPIC support. (arm_trampoline_init): Likewise. (arm_trampoline_init): Likewise. * config/arm/arm.h (TRAMPOLINE_SIZE): Likewise. Change-Id: Idc4d5f629ae4f8d79bdf9623517481d524a0c144 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 40e3f3b..99d13bf 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -3976,13 +3976,50 @@ arm_warn_func_return (tree decl) .word static chain value .word function's address XXX FIXME: When the trampoline returns, r8 will be clobbered. */ +/* In FDPIC mode, the trampoline looks like: + .word trampoline address + .word trampoline GOT address + ldr r12, [pc, #8] ; #4 for Thumb2 + ldr r9, [pc, #8] ; #4 for Thumb2 + ldr pc, [pc, #8] ; #4 for Thumb2 + .word static chain value + .word GOT address + .word function's address +*/ I think this comment is not right for Thumb2. These load instructionshave 32-bit encodings, even in Thumb2 (they use high registers). Andre and Wilco pointed out to me offline that the offset should be #4 for Arm mode. The Arm ARM at E1.2.3 says: PC, the program counter * When executing an A32 instruction, PC reads as the address of the current instruction plus 8. * When executing a T32 instruction, PC reads as the address of the current instruction plus 4. Yes, it looks like the code is right, and the comment is wrong: - offset 8 for thumb2 mode - offset 4 for arm mode Here is the updated version Ok with a fixed ChangeLog (it currently mentions arm_trampoline_init twice but doesn't mention arm_trampoline_adjust_address) Thanks, Kyrill Thanks, Christophe Thanks, Kyrill Also, please merge this comment with the one above (no separate /**/) static void arm_asm_trampoline_template (FILE *f) { fprintf (f, "\t.syntax unified\n"); - if (TARGET_ARM) + if (TARGET_FDPIC) + { + /* The first two words are a function descriptor pointing to the + trampoline code just below. */ + if (TARGET_ARM) + fprintf (f, "\t.arm\n"); + else if (TARGET_THUMB2) + fprintf (f, "\t.thumb\n"); + else + /* Only ARM and Thumb-2 are supported. */ + gcc_unreachable (); + + assemble_aligned_integer (UNITS_PER_WORD, const0_rtx); + assemble_aligned_integer (UNITS_PER_WORD, const0_rtx); + /* Trampoline code which sets the static chain register but also + PIC register before jumping into real code. */ + asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n", + STATIC_CHAIN_REGNUM, PC_REGNUM, + TARGET_THUMB2 ? 8 : 4); + asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n", + PIC_OFFSET_TABLE_REGNUM, PC_REGNUM, + TARGET_THUMB2 ? 8 : 4); + asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n", + PC_REGNUM, PC_REGNUM, + TARGET_THUMB2 ? 8 : 4); As above, I think the offset should be 8 for both Arm and Thumb2. Thanks, Kyrill + assemble_aligned_integer (UNITS_PER_WORD, const0_rtx); + } + else if (TARGET_ARM) { fprintf (f, "\t.arm\n"); asm_fprintf (f, "\tldr\t%r, [%r, #0]\n", STATIC_CHAIN_REGNUM, PC_REGNUM); @@ -4023,12 +4060,40 @@ arm_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value) emit_block_move (m_tramp, assemble_trampoline_template (), GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL); - mem = adjust_address (m_tramp, SImode, TARGET_32BIT ? 8 : 12); - emit_move_insn (mem, chain_value); + if (TARGET_FDPIC) + { + rtx funcdesc = XEXP (DECL_RTL (fndecl), 0); + rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc); + rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4)); + /* The function start address is at offset 8, but in Thumb mode + we want bit 0 set to 1 to indicate Thumb-ness, hence 9 + below. */ + rtx trampoline_code_start + = plus_constant (Pmode, XEXP (m_tr
Re: [PATCH] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)
> (It's a bad idea to make ChangeLog entries part of the patch, it won't > apply to anyone, not even to you nowadays. ) Got it. Will not include this kind of info in later patches. > I understand describing these things is difficult, but flatten is > strange way to describe what the function does. What about somthing > like the following? > > Analyze EXPR if it represents a series of simple operations performed on > a function parameter and return true if so. FBI, STMT, INDEX_P, SIZE_P > and AGGPOS have the same meaning like in > unmodified_parm_or_parm_agg_item. Operations on the parameter are > recorded to PARAM_OPS_P if it is not NULL. Operations should be recorded in some place, and this is why PARAM_OPS_P is used. Not quite understand this point. >> + /* Find use of parameter, add a convert operation to describe >> + result type, which may not be same as parameter type. */ >> + eval_op.val_is_rhs = false; >> + eval_op.val = NULL_TREE; >> + eval_op.code = VIEW_CONVERT_EXPR; >> + eval_op.type = TREE_TYPE (expr); >> + >> + vec_safe_insert (*param_ops_p, 0, eval_op); > If we get here in the first iteration of the loop, could we not insert > anything into the vector and handle such cases in > evaluate_conditions_for_known_args like we do today (well, with > fold_convert might be better)? It could save quite some memory and it > is important to try keep the memory footprint down in IPA summaries. Here is a little trick to make code of folding in evaluate_conditions_for_known_args () be simple. It does consume some memory for most cases. Will consider other way and remove this. > Also, I think you want a parameter to limit the maximum length of > param_ops_p, at some point someone will come with some crazy > machine-generated code that will create huge vectors. Yes. Exactly. Thanks, Martin
Re: [ARM/FDPIC v5 05/21] [ARM] FDPIC: Fix __do_global_dtors_aux and frame_dummy generation
Christophe Lyon writes: > On 12/07/2019 08:06, Richard Sandiford wrote: >> Christophe Lyon writes: >>> In FDPIC, we need to make sure __do_global_dtors_aux and frame_dummy >>> are referenced by their address, not by pointers to the function >>> descriptors. >>> >>> 2019-XX-XX Christophe Lyon >>> Mickaël Guêné >>> >>> * libgcc/crtstuff.c: Add support for FDPIC. >>> >>> Change-Id: I0bc4b1232fbf3c69068fb23a1b9cafc895d141b1 >>> >>> diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c >>> index 4927a9f..159b461 100644 >>> --- a/libgcc/crtstuff.c >>> +++ b/libgcc/crtstuff.c >>> @@ -429,9 +429,18 @@ __do_global_dtors_aux (void) >>> #ifdef FINI_SECTION_ASM_OP >>> CRT_CALL_STATIC_FUNCTION (FINI_SECTION_ASM_OP, __do_global_dtors_aux) >>> #elif defined (FINI_ARRAY_SECTION_ASM_OP) >>> +#if defined(__FDPIC__) >>> +__asm__( >>> +" .section .fini_array\n" >>> +" .align 2\n" >>> +" .word __do_global_dtors_aux\n" >>> +); >>> +asm (TEXT_SECTION_ASM_OP); >>> +#else /* defined(__FDPIC__) */ >>> static func_ptr __do_global_dtors_aux_fini_array_entry[] >>> __attribute__ ((__used__, section(".fini_array"), >>> aligned(sizeof(func_ptr >>> = { __do_global_dtors_aux }; >>> +#endif /* defined(__FDPIC__) */ >>> #else /* !FINI_SECTION_ASM_OP && !FINI_ARRAY_SECTION_ASM_OP */ >>> static void __attribute__((used)) >>> __do_global_dtors_aux_1 (void) >> >> It'd be good to avoid hard-coding the pointer size. Would it work to do: >> >> __asm__("\t.equ\.t__do_global_dtors_aux_alias, __do_global_dtors_aux\n"); >> extern char __do_global_dtors_aux_alias; >> static void *__do_global_dtors_aux_fini_array_entry[] >> __attribute__ ((__used__, section(".fini_array"), aligned(sizeof(void >> * >> = { &__do_global_dtors_aux_alias }; >> >> ? Similarly for the init_array. >> > OK, done. > >> AFAICT this and 02/21 are the only patches that aren't Arm-specific, >> is that right? >> >> Thanks, >> Richard >> . >> > > From ea0eee1ddeddef92277ae68eac4af28994c2902c Mon Sep 17 00:00:00 2001 > From: Christophe Lyon > Date: Thu, 8 Feb 2018 11:12:52 +0100 > Subject: [ARM/FDPIC v6 05/24] [ARM] FDPIC: Fix __do_global_dtors_aux and > frame_dummy generation > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > In FDPIC, we need to make sure __do_global_dtors_aux and frame_dummy > are referenced by their address, not by pointers to the function > descriptors. > > 2019-XX-XX Christophe Lyon > Micka«l Guªn© > > libgcc/ > * libgcc/crtstuff.c: Add support for FDPIC. OK, thanks. Richard > > Change-Id: I0bc4b1232fbf3c69068fb23a1b9cafc895d141b1 > > diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c > index 4927a9f..6659039 100644 > --- a/libgcc/crtstuff.c > +++ b/libgcc/crtstuff.c > @@ -429,9 +429,17 @@ __do_global_dtors_aux (void) > #ifdef FINI_SECTION_ASM_OP > CRT_CALL_STATIC_FUNCTION (FINI_SECTION_ASM_OP, __do_global_dtors_aux) > #elif defined (FINI_ARRAY_SECTION_ASM_OP) > +#if defined(__FDPIC__) > +__asm__("\t.equ\t__do_global_dtors_aux_alias, __do_global_dtors_aux\n"); > +extern char __do_global_dtors_aux_alias; > +static void *__do_global_dtors_aux_fini_array_entry[] > +__attribute__ ((__used__, section(".fini_array"), aligned(sizeof(void * > + = { &__do_global_dtors_aux_alias }; > +#else /* defined(__FDPIC__) */ > static func_ptr __do_global_dtors_aux_fini_array_entry[] >__attribute__ ((__used__, section(".fini_array"), > aligned(sizeof(func_ptr >= { __do_global_dtors_aux }; > +#endif /* defined(__FDPIC__) */ > #else /* !FINI_SECTION_ASM_OP && !FINI_ARRAY_SECTION_ASM_OP */ > static void __attribute__((used)) > __do_global_dtors_aux_1 (void) > @@ -473,9 +481,17 @@ frame_dummy (void) > #ifdef __LIBGCC_INIT_SECTION_ASM_OP__ > CRT_CALL_STATIC_FUNCTION (__LIBGCC_INIT_SECTION_ASM_OP__, frame_dummy) > #else /* defined(__LIBGCC_INIT_SECTION_ASM_OP__) */ > +#if defined(__FDPIC__) > +__asm__("\t.equ\t__frame_dummy_alias, frame_dummy\n"); > +extern char __frame_dummy_alias; > +static void *__frame_dummy_init_array_entry[] > +__attribute__ ((__used__, section(".init_array"), aligned(sizeof(void * > + = { &__frame_dummy_alias }; > +#else /* defined(__FDPIC__) */ > static func_ptr __frame_dummy_init_array_entry[] >__attribute__ ((__used__, section(".init_array"), > aligned(sizeof(func_ptr >= { frame_dummy }; > +#endif /* defined(__FDPIC__) */ > #endif /* !defined(__LIBGCC_INIT_SECTION_ASM_OP__) */ > #endif /* USE_EH_FRAME_REGISTRY || USE_TM_CLONE_REGISTRY */
Re: [PATCH V6 05/11] bpf: new GCC port
jema...@gnu.org (Jose E. Marchesi) writes: > This patch adds a port for the Linux kernel eBPF architecture to GCC. > > ChangeLog: > > * configure.ac: Support for bpf-*-* targets. > * configure: Regenerate. > > contrib/ChangeLog: > > * config-list.mk (LIST): Disable go in bpf-*-* targets. > > gcc/ChangeLog: > > * config.gcc: Support for bpf-*-* targets. > * common/config/bpf/bpf-common.c: New file. > * config/bpf/t-bpf: Likewise. > * config/bpf/predicates.md: Likewise. > * config/bpf/constraints.md: Likewise. > * config/bpf/bpf.opt: Likewise. > * config/bpf/bpf.md: Likewise. > * config/bpf/bpf.h: Likewise. > * config/bpf/bpf.c: Likewise. > * config/bpf/bpf-protos.h: Likewise. > * config/bpf/bpf-opts.h: Likewise. > * config/bpf/bpf-helpers.h: Likewise. > * config/bpf/bpf-helpers.def: Likewise. Looks good to me, thanks. Jeff also had detailed comments, so please wait for an ack from him too. Richard
Re: [PATCH] Setup predicate for switch default case in IPA (PR ipa/91089)
That's good. Thanks for your comments. Feng From: Martin Jambor Sent: Thursday, August 29, 2019 11:00 PM To: Feng Xue OS; gcc-patches@gcc.gnu.org; Jan Hubicka Subject: Re: [PATCH] Setup predicate for switch default case in IPA (PR ipa/91089) Hi, On Fri, Jul 12 2019, Feng Xue OS wrote: > IPA does not construct executability predicate for default case of switch > statement. > So execution cost of default case is not properly evaluated in IPA-cp, this > might > prevent function clone for function containing switch statement, if certain > non-default > case is proved to be executed after constant propagation. > > This patch is composed to deduce predicate for default case, if it turns out > to be a > relative simple one, for example, we can try to merge case range, and use > comparison upon range bounds, and also range analysis information to simplify > predicate. > I have read through the patch and it looks OK to me but I cannot approve it, you have to ping Honza for that. Since you decided to use the value range info, it would be nice if you could also add a testcase where it plays a role. Also, please don't post changelog entries as a part of the patch, it basically guarantees it will not apply for anybody, not even for you when you update your trunk. Thanks for working on this, Martin > Feng > > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 3d92250b520..4de2f568990 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,10 @@ > +2019-07-12 Feng Xue > + > + PR ipa/91089 > + * ipa-fnsummary.c (set_switch_stmt_execution_predicate): Add predicate > + for switch default case using range analysis information. > + * params.def (PARAM_IPA_MAX_SWITCH_PREDICATE_BOUNDS): New. > + > 2019-07-11 Sunil K Pandey >
Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.
On Fri, Aug 30, 2019 at 9:22 AM Richard Biener wrote: > > On Thu, Aug 29, 2019 at 9:54 AM Uros Bizjak wrote: > > > > On Wed, Aug 28, 2019 at 5:12 PM Uros Bizjak wrote: > > > > > > Attached patch improves costing for STV shifts and corrects reject > > > condition for out of range shift count operands. > > > > > > 2019-08-28 Uroš Bizjak > > > > > > * config/i386/i386-features.c > > > (general_scalar_chain::compute_convert_gain): > > > Correct cost for double-word shifts. > > > (general_scalar_to_vector_candidate_p): Reject count operands > > > greater or equal to mode bitsize. > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > > > Committed to mainline SVN. > > > > Ouch... I mixed up patches and actually committed the patch that > > removes maximum from cost of sse<->int moves. > > > > I can leave the patch for a day, so we can see the effects of the cost > > change, and if the patch creates problems, I'll revert it. > > Regresses gromacs and namd quite a bit on Haswell, also perl in SPEC 2000. > I guess we should try understand why rather than reverting immediately > (I'd leave it in at least another few days to get more testers pick up the > rev.). The correct patch is actually [1] (I have updated the subject): 2019-08-28 Uroš Bizjak * config/i386/i386.c (ix86_register_move_cost): Do not limit the cost of moves to/from XMM register to minimum 8. There is no technical reason to limit the costs to minimum 8 anymore, and several targets (e.g skylake) also claim that moves between SSE and general regs are as cheap as moves between general regs. However, these values were never benchmarked properly due to the mentioned limitation and apparently cause some unwanted secondary effects (lowering the clock). I agree with your proposal to leave the change in the tree some more time. At the end, we could raise the costs for all targets to 8 to effectively revert to the previous state. [1] https://gcc.gnu.org/ml/gcc-patches/2019-08/msg02009.html Uros.
Re: [PR 91579] Avoid creating redundant PHI nodes in tail-call pass
On Thu, Aug 29, 2019 at 3:56 PM Martin Jambor wrote: > > Hi, > > On Thu, Aug 29 2019, Richard Biener wrote: > > On Thu, Aug 29, 2019 at 11:04 AM Martin Jambor wrote: > >> > >> Hi, > >> > >> when turning a tail-recursive call into a loop, the tail-call pass > >> creates a phi node for each gimple_reg function parameter that has any > >> use at all, even when the value passed to the original call is the same > >> as the received one, when it is the parameter's default definition. > >> This results in a redundant phi node in which one argument is the > >> default definition and all others are the LHS of the same phi node. See > >> the Bugzilla entry for an example. These phi nodes in turn confuses > >> ipa-prop.c which cannot skip them and may not create a pass-through jump > >> function when it should. > >> > >> Fixed by the following patch which just adds a bitmap to remember where > >> there are non-default-defs passed to a tail-recursive call and then > >> creates phi nodes only for such parameters. It has passed bootstrap and > >> testing on x86_64-linux. > >> > >> OK for trunk? > > > > OK. Eventually arg_needs_copy_p can be elided completely > > and pre-computed into the bitmap so we can just check > > the positional? And rename the bitmap to arg_need_copy itself. > > Like this? Bootstrapped and tested on x86_64-linux. Yes. This is OK. Thanks, Richard. > Thanks, > > Martin > > > 2019-08-29 Martin Jambor > > tree-optimization/91579 > * tree-tailcall.c (tailr_arg_needs_copy): New variable. > (find_tail_calls): Allocate tailr_arg_needs_copy and set its bits as > appropriate. > (arg_needs_copy_p): Removed. > (eliminate_tail_call): Test tailr_arg_needs_copy instead of calling > arg_needs_copy_p. > (tree_optimize_tail_calls_1): Likewise. Free tailr_arg_needs_copy. > > testsuite/ > * gcc.dg/tree-ssa/pr91579.c: New test. > --- > gcc/testsuite/gcc.dg/tree-ssa/pr91579.c | 22 > gcc/tree-tailcall.c | 48 + > 2 files changed, 47 insertions(+), 23 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr91579.c > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c > new file mode 100644 > index 000..ee752be1a85 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-tailr1" } */ > + > +typedef long unsigned int size_t; > +typedef int (*compare_t)(const void *, const void *); > + > +int partition (void *base, size_t nmemb, size_t size, compare_t cmp); > + > +void > +my_qsort (void *base, size_t nmemb, size_t size, compare_t cmp) > +{ > + int pt; > + if (nmemb > 1) > +{ > + pt = partition (base, nmemb, size, cmp); > + my_qsort (base, pt + 1, size, cmp); > + my_qsort ((void*)((char*) base + (pt + 1) * size), > + nmemb - pt - 1, size, cmp); > +} > +} > + > +/* { dg-final { scan-tree-dump-not "cmp\[^\r\n\]*PHI" "tailr1" } } */ > diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c > index a4b563efd73..4824a5e650f 100644 > --- a/gcc/tree-tailcall.c > +++ b/gcc/tree-tailcall.c > @@ -126,6 +126,11 @@ struct tailcall > accumulator. */ > static tree m_acc, a_acc; > > +/* Bitmap with a bit for each function parameter which is set to true if we > + have to copy the parameter for conversion of tail-recursive calls. */ > + > +static bitmap tailr_arg_needs_copy; > + > static bool optimize_tail_call (struct tailcall *, bool); > static void eliminate_tail_call (struct tailcall *); > > @@ -727,6 +732,18 @@ find_tail_calls (basic_block bb, struct tailcall **ret) > gimple_stmt_iterator mgsi = gsi_for_stmt (stmt); > gsi_move_before (&mgsi, &gsi); > } > + if (!tailr_arg_needs_copy) > + tailr_arg_needs_copy = BITMAP_ALLOC (NULL); > + for (param = DECL_ARGUMENTS (current_function_decl), idx = 0; > + param; > + param = DECL_CHAIN (param), idx++) > + { > + tree ddef, arg = gimple_call_arg (call, idx); > + if (is_gimple_reg (param) > + && (ddef = ssa_default_def (cfun, param)) > + && (arg != ddef)) > + bitmap_set_bit (tailr_arg_needs_copy, idx); > + } > } > >nw = XNEW (struct tailcall); > @@ -905,25 +922,6 @@ decrease_profile (basic_block bb, profile_count count) > } > } > > -/* Returns true if argument PARAM of the tail recursive call needs to be > copied > - when the call is eliminated. */ > - > -static bool > -arg_needs_copy_p (tree param) > -{ > - tree def; > - > - if (!is_gimple_reg (param)) > -return false; > - > - /* Parameters that are only defined but never used need not be copied. */ > - def = ssa_default_def (cfun, param); > - if (!def) > -return false; > - > - return true; > -} > - > /* Eliminates tail call described by
Re: [PATCH] use fallback location for warning (PR 91599)
On Fri, Aug 30, 2019 at 3:58 AM Martin Sebor wrote: > > warning_at() calls with the %G directive rely on the gimple statement > for both their location and the inlining context. When the statement > is not associated with a location, the warning doesn't point at any > line even if the location argument passed to the call was valid. > The attached patch changes the percent_G_percent handler to fall back > on the provided location in that case, and the recently added warning > for char assignments to pass to the function a fallback location if > the statement doesn't have one. > > Tested on x86_64-linux. OK unless David has any further comments. Richard. > > Martin
Re: [PATCH] correct MEM_REF bounds checking of arrays (PR 91584)
On Fri, Aug 30, 2019 at 12:36 AM Martin Sebor wrote: > > The -Warray-bounds enhancement I added to GCC 9 causes false > positives in languages like Fortran whose first array element > is at a non-zero index. The attached patch has the function > responsible for the warning normalize the array bounds to > always start at zero to avoid these false positives. > > Tested on x86_64-linux. OK. Thanks, Richard. > Martin
Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c
On Thu, Aug 29, 2019 at 7:36 PM Alexander Monakov wrote: > > On Thu, 29 Aug 2019, Maxim Kuvyrkov wrote: > > > >> r1 = [rb + 0] > > >> > > >> r2 = [rb + 8] > > >> > > >> r3 = [rb + 16] > > >> > > >> > > >> which, apparently, cortex-a53 autoprefetcher doesn't recognize. This > > >> schedule happens because r2= load gets lower priority than the > > >> "irrelevant" due to the above patch. > > >> > > >> If we think about it, the fact that "r1 = [rb + 0]" can be scheduled > > >> means that true dependencies of all similar base+offset loads are > > >> resolved. Therefore, for autoprefetcher-friendly schedule we should > > >> prioritize memory reads before "irrelevant" instructions. > > > > > > But isn't there also max number of load issues in a fetch window to > > > consider? > > > So interleaving arithmetic with loads might be profitable. > > > > It appears that cores with autoprefetcher hardware prefer loads and stores > > bundled together, not interspersed with other instructions to occupy the > > rest > > of CPU units. > > Let me point out that the motivating example has a bigger effect in play: > > (1) r1 = [rb + 0] > (2) > (3) r2 = [rb + 8] > (4) > (5) r3 = [rb + 16] > (6) > > here Cortex-A53, being an in-order core, cannot issue the load at (3) until > after the load at (1) has completed, because the use at (2) depends on it. > The good schedule allows the three loads to issue in a pipelined fashion. OK, so with dispatch/issue issues I was thinking of scheduling independent work like AGU ops inbetween the loads. It might be that some in-order cores like to see two adjacent loads to fire auto-prefetching but any such heuristic should probably be very sub-architecture specific. > So essentially the main issue is not a hardware peculiarity, but rather the > bad schedule being totally wrong (it could only make sense if loads had > 1-cycle > latency, which they do not). > > I think this highlights how implementing this autoprefetch heuristic via the > dfa_lookahead_guard interface looks questionable in the first place, but the > patch itself makes sense to me. > > Alexander
Re: [PATCH, i386]: Improve STV conversion of shifts
On Thu, Aug 29, 2019 at 9:54 AM Uros Bizjak wrote: > > On Wed, Aug 28, 2019 at 5:12 PM Uros Bizjak wrote: > > > > Attached patch improves costing for STV shifts and corrects reject > > condition for out of range shift count operands. > > > > 2019-08-28 Uroš Bizjak > > > > * config/i386/i386-features.c > > (general_scalar_chain::compute_convert_gain): > > Correct cost for double-word shifts. > > (general_scalar_to_vector_candidate_p): Reject count operands > > greater or equal to mode bitsize. > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > Committed to mainline SVN. > > Ouch... I mixed up patches and actually committed the patch that > removes maximum from cost of sse<->int moves. > > I can leave the patch for a day, so we can see the effects of the cost > change, and if the patch creates problems, I'll revert it. Regresses gromacs and namd quite a bit on Haswell, also perl in SPEC 2000. I guess we should try understand why rather than reverting immediately (I'd leave it in at least another few days to get more testers pick up the rev.). Richard. > Sorry for the mixup, > Uros.
Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.
On Fri, Aug 30, 2019 at 2:18 PM Uros Bizjak wrote: > > On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu wrote: > > > > On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak wrote: > > > > > > 2019-08-28 Uroš Bizjak > > > > > > * config/i386/i386.c (ix86_register_move_cost): Do not > > > limit the cost of moves to/from XMM register to minimum 8. > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > > > > > Actually committed as r274994 with the wrong ChangeLog. > > > > > > Uros. > > > > There is 11% regression in 548.exchange_r of SPEC2017. > > > > Reason for the regression: > > For 548.exchange_r, a lot of movements between gpr and xmm are > > generated as expected, > > and it reduced clocksticks by 3%. > > This is OK, and expected from the patch. > > > But however maybe too many xmm registers are used, > > a frequency reduction issue is triggered(average frequency reduced by 13%). > > So totally it takes more time. > > This is a secondary effect that is currently not modelled by the compiler. > > However, I expected that SSE <-> int moves in x86-tune-cost.h will > have to be retuned. Up to now, both directions were limited to minimum > 8, so any value lower than 8 was ignored. However, minimum was set to > work-around certain limitation in reload, which is not needed anymore. > > You can simply set the values of SSE <-> int moves to 8 (which is an > arbitrary value!) to restore the previous behaviour, but I think that > a more precise cost value should be determined, probably a different > one for each direction. But until register pressure effects are > modelled, any artificially higher value will represent a workaround > and not the true reg-reg move cost. > > Uros. Yes, we're still working on skylake_cost model. -- BR, Hongtao
Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
On Fri, Aug 30, 2019 at 9:12 AM Richard Biener wrote: > > On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich wrote: > > > > > Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich : > > > > > > Bootstrap and regtest running on x86_64-redhat-linux and > > > s390x-redhat-linux. > > > > > > This patch series adds signaling FP comparison support (both scalar and > > > vector) to s390 backend. > > > > I'm running into a problem on ppc64 with this patch, and it would be > > great if someone could help me figure out the best way to resolve it. > > > > vector36.C test is failing because gimplifier produces the following > > > > _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }; > > _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; > > > > from > > > > VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) , > > { -1, -1, -1, -1 } , > > { 0, 0, 0, 0 } > > > > > Since the comparison tree code is now hidden behind a temporary, my code > > does not have anything to pass to the backend. The reason for creating > > a temporary is that the comparison can trap, and so the following check > > in gimplify_expr fails: > > > > if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p)) > > goto out; > > > > gimple_test_f is is_gimple_condexpr, and it eventually calls > > operation_could_trap_p (GT). > > > > My current solution is to simply state that backend does not support > > SSA_NAME in vector comparisons, however, I don't like it, since it may > > cause performance regressions due to having to fall back to scalar > > comparisons. > > > > I was thinking about two other possible solutions: > > > > 1. Change the gimplifier to allow trapping vector comparisons. That's > >a bit complicated, because tree_could_throw_p checks not only for > >floating point traps, but also e.g. for array index out of bounds > >traps. So I would have to create a tree_could_throw_p version which > >disregards specific kinds of traps. > > > > 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use > >its tree_code instead of SSA_NAME. The potential problem I see with > >this is that there appears to be no guarantee that _5 will be inlined > >into _6 at a later point. So if we say that we don't need to fall > >back to scalar comparisons based on availability of vector > > >instruction and inlining does not happen, then what's actually will > >be required is vector selection (vsel on S/390), which might not be > >available in general case. > > > > What would be a better way to proceed here? > > On GIMPLE there isn't a good reason to split out trapping comparisons > from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs > where it is important because we'd have no way to represent EH info > when not done. It might be a bit awkward to preserve EH across RTL > expansion though in case the [VEC_]COND_EXPR are not expanded > as a single pattern, but I'm not sure. > > To go this route you'd have to split the is_gimple_condexpr check > I guess and eventually users turning [VEC_]COND_EXPR into conditional > code (do we have any?) have to be extra careful then. Oh, btw - the fact that we have an expression embedded in [VEC_]COND_EXPR is something that bothers me for quite some time already and it makes things like VN awkward and GIMPLE fincky. We've discussed alternatives to dead with the simplest being moving the comparison out to a separate stmt and others like having four operand [VEC_]COND_{EQ,NE,...}_EXPR codes or simply treating {EQ,NE,...}_EXPR as quarternary on GIMPLE with either optional 3rd and 4th operand (defaulting to boolean_true/false_node) or always explicit ones (and thus dropping [VEC_]COND_EXPR). What does LLVM do here? Richard. > Richard. > > >
Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions
On Thu, Aug 29, 2019 at 5:39 PM Ilya Leoshkevich wrote: > > > Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich : > > > > Bootstrap and regtest running on x86_64-redhat-linux and > > s390x-redhat-linux. > > > > This patch series adds signaling FP comparison support (both scalar and > > vector) to s390 backend. > > I'm running into a problem on ppc64 with this patch, and it would be > great if someone could help me figure out the best way to resolve it. > > vector36.C test is failing because gimplifier produces the following > > _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }; > _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>; > > from > > VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) , > { -1, -1, -1, -1 } , > { 0, 0, 0, 0 } > > > Since the comparison tree code is now hidden behind a temporary, my code > does not have anything to pass to the backend. The reason for creating > a temporary is that the comparison can trap, and so the following check > in gimplify_expr fails: > > if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p)) > goto out; > > gimple_test_f is is_gimple_condexpr, and it eventually calls > operation_could_trap_p (GT). > > My current solution is to simply state that backend does not support > SSA_NAME in vector comparisons, however, I don't like it, since it may > cause performance regressions due to having to fall back to scalar > comparisons. > > I was thinking about two other possible solutions: > > 1. Change the gimplifier to allow trapping vector comparisons. That's >a bit complicated, because tree_could_throw_p checks not only for >floating point traps, but also e.g. for array index out of bounds >traps. So I would have to create a tree_could_throw_p version which >disregards specific kinds of traps. > > 2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use >its tree_code instead of SSA_NAME. The potential problem I see with >this is that there appears to be no guarantee that _5 will be inlined >into _6 at a later point. So if we say that we don't need to fall >back to scalar comparisons based on availability of vector > >instruction and inlining does not happen, then what's actually will >be required is vector selection (vsel on S/390), which might not be >available in general case. > > What would be a better way to proceed here? On GIMPLE there isn't a good reason to split out trapping comparisons from [VEC_]COND_EXPR - the gimplifier does this for GIMPLE_CONDs where it is important because we'd have no way to represent EH info when not done. It might be a bit awkward to preserve EH across RTL expansion though in case the [VEC_]COND_EXPR are not expanded as a single pattern, but I'm not sure. To go this route you'd have to split the is_gimple_condexpr check I guess and eventually users turning [VEC_]COND_EXPR into conditional code (do we have any?) have to be extra careful then. Richard. >