Re: [PATCH v2] IBM Z: Define NO_PROFILE_COUNTERS

2021-06-23 Thread Andreas Krebbel via Gcc-patches
On 6/24/21 12:42 AM, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573348.html
> v1 -> v2: Use ATTRIBUTE_UNUSED, compact op[] array (Andreas).
>   I've also noticed that one of the nops that we generate for
>   -mnop-mcount is not needed now and removed it.  A couple
>   tests needed to be adjusted after that.
> 
> 
> 
> 
> s390 glibc does not need counters in the .data section, since it stores
> edge hits in its own data structure.  Therefore counters only waste
> space and confuse diffing tools (e.g. kpatch), so don't generate them.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.c (s390_function_profiler): Ignore labelno
>   parameter.
>   * config/s390/s390.h (NO_PROFILE_COUNTERS): Define.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new
>   prologue size.
>   * gcc.target/s390/mnop-mcount-m64.c: Likewise.

Ok. Thanks!

Andreas


Re: [PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/23/2021 1:47 PM, Martin Sebor via Gcc-patches wrote:

On 6/22/21 5:28 PM, David Malcolm wrote:

On Tue, 2021-06-22 at 19:18 -0400, David Malcolm wrote:

On Fri, 2021-06-04 at 15:41 -0600, Martin Sebor wrote:

The attached patch introduces the suppress_warning(),
warning_suppressed(), and copy_no_warning() APIs without making
use of them in the rest of GCC.  They are in three files:

    diagnostic-spec.{h,c}: Location-centric overloads.
    warning-control.cc: Tree- and gimple*-centric overloads.

The location-centric overloads are suitable to use from the
diagnostic
subsystem.  The rest can be used from the front ends and the middle
end.


[...snip...]


+/* Return true if warning OPT is suppressed for decl/expression
EXPR.
+   By default tests the disposition for any warning.  */
+
+bool
+warning_suppressed_p (const_tree expr, opt_code opt /* =
all_warnings */)
+{
+  const nowarn_spec_t *spec = get_nowarn_spec (expr);
+
+  if (!spec)
+    return get_no_warning_bit (expr);
+
+  const nowarn_spec_t optspec (opt);
+  bool dis = *spec & optspec;
+  gcc_assert (get_no_warning_bit (expr) || !dis);
+  return dis;


Looking through the patch, I don't like the use of "dis" for the "is
suppressed" bool, since...


It's an argument to a a function, unlikely to confuse anyone.  But
I also don't think it's important enough to argue about so I changed
it along with the comment below.



[...snip...]


+
+/* Enable, or by default disable, a warning for the statement STMT.
+   The wildcard OPT of -1 controls all warnings.  */


...I find the above comment to be confusingly worded due to the double-
negative.

If I'm reading your intent correctly, how about this wording:

/* Change the supression of warnings for statement STMT.
    OPT controls which warnings are affected.
    The wildcard OPT of -1 controls all warnings.
    If SUPP is true (the default), enable the suppression of the
warnings.
    If SUPP is false, disable the suppression of the warnings.  */

...and rename "dis" to "supp".

Or have I misread the intent of the patch?


+
+void
+suppress_warning (gimple *stmt, opt_code opt /* = all_warnings */,
+ bool dis /* = true */)



+{
+  if (opt == no_warning)
+    return;
+
+  const key_type_t key = convert_to_key (stmt);
+
+  dis = suppress_warning_at (key, opt, dis) || dis;
+  set_no_warning_bit (stmt, dis);
+}


[...snip...]


Also, I think I prefer having a separate entrypoints for the "all
warnings" case; on reading through the various patches I think that in
e.g.:

-  TREE_NO_WARNING (*expr_p) = 1;
+  suppress_warning (*expr_p);

I prefer:

   suppress_warnings (*expr_p);

(note the plural) since that way we can grep for them, and it seems
like better grammar to me.

Both entrypoints could be implemented by a static suppress_warning_1
internally if that makes it easier.

In that vein, "unsuppress_warning" seems far clearer to me that
"suppress_warning (FOO, false)"; IIRC there are very few uses of this
non-default arg (I couldn't find any in a quick look through the v2
kit).

Does this make sense?


In my experience, names that differ only slightly (such as in just
one letter) tend to easily lead to frustrating mistakes.  The new
warning_suppressed_p() added in this patch also handles multiple
warnings (via a wildcard) and uses a singular form, and as do
the gimple_{get,set}no_warning() functions that are being replaced.
So I don't think the suggested change is needed or would be
an improvement.

Similarly, there is no gimple_unset_no_warning() today and I don't
think adding unsuppress_warning() is necessary or would add value,
especially since there are just a handful of callers that re-enable
warnings.  For the callers that might need to enable or disable
a warning based on some condition, it's more convenient to do so
in a single call then by having to call different functions based
the value of the condition.

In the attached patch I've changed the comment as you asked and
also renamed the dis argument to supp but kept the function names
the same.  I've also moved a few changes to tree.h from a later
patch to this one to let it stand alone and regtested it on its
own on x86_64-linux.  I'll plan to go ahead with this version
unless requestsfor substantive changes come up in the review of
the rest  of the changes.

Thanks
Martin

gcc-no-warning-1.diff

Add support for per-location warning groups.

gcc/ChangeLog:

* Makefile.in (OBJS-libcommon): Add diagnostic-spec.o.
* gengtype.c (open_base_files): Add diagnostic-spec.h.
* diagnostic-spec.c: New file.
* diagnostic-spec.h: New file.
* tree.h (no_warning, all_warnings, suppress_warning_at): New
declarations.
* warning-control.cc: New file.

diff --git a/gcc/diagnostic-spec.h b/gcc/diagnostic-spec.h
new file mode 100644
index 000..62d9270ca6d
--- /dev/null
+++ b/gcc/diagnostic-spec.h
@@ -0,0 +1,140 @@
+/* Language-independent APIs to enable/disable 

Re: [PATCH 10/13] v2 Use new per-location warning APIs in the middle end

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:43 PM, Martin Sebor via Gcc-patches wrote:

The attached patch introduces declarations of the new
suppress_warning(), warning_suppressed_p(), and copy_warning() APIs,
and replaces the uses of TREE_NO_WARNING in the middle end with them.

gcc-no-warning-middle-end.diff

Add support for per-location warning groups.

gcc/ChangeLog:

* builtins.c (warn_string_no_nul): Replace uses of TREE_NO_WARNING,
gimple_no_warning_p and gimple_set_no_warning with
warning_suppressed_p, and suppress_warning.
(c_strlen): Same.
(maybe_warn_for_bound): Same.
(warn_for_access): Same.
(check_access): Same.
(expand_builtin_strncmp): Same.
(fold_builtin_varargs): Same.
* calls.c (maybe_warn_nonstring_arg): Same.
(maybe_warn_rdwr_sizes): Same.
* cfgexpand.c (expand_call_stmt): Same.
* cgraphunit.c (check_global_declaration): Same.
* fold-const.c (fold_undefer_overflow_warnings): Same.
(fold_truth_not_expr): Same.
(fold_unary_loc): Same.
(fold_checksum_tree): Same.
* gengtype.c (open_base_files): Same.
* gimple-array-bounds.cc (array_bounds_checker::check_array_ref): Same.
(array_bounds_checker::check_mem_ref): Same.
(array_bounds_checker::check_addr_expr): Same.
(array_bounds_checker::check_array_bounds): Same.
* gimple-expr.c (copy_var_decl): Same.
* gimple-fold.c (gimple_fold_builtin_strcpy): Same.
(gimple_fold_builtin_strncat): Same.
(gimple_fold_builtin_stxcpy_chk): Same.
(gimple_fold_builtin_stpcpy): Same.
(gimple_fold_builtin_sprintf): Same.
(fold_stmt_1): Same.
* gimple-ssa-isolate-paths.c (diag_returned_locals): Same.
* gimple-ssa-nonnull-compare.c (do_warn_nonnull_compare): Same.
* gimple-ssa-sprintf.c (handle_printf_call): Same.
* gimple-ssa-store-merging.c 
(imm_store_chain_info::output_merged_store): Same.
* gimple-ssa-warn-restrict.c (maybe_diag_overlap): Same.
(maybe_diag_access_bounds): Same.
(check_call): Same.
(check_bounds_or_overlap): Same.
* gimple.c (gimple_build_call_from_tree): Same.
* gimplify.c (gimplify_return_expr): Same.
(gimplify_cond_expr): Same.
(gimplify_modify_expr_complex_part): Same.
(gimplify_modify_expr): Same.
(gimple_push_cleanup): Same.
(gimplify_expr): Same.
* omp-expand.c (expand_omp_for_generic): Same.
(expand_omp_taskloop_for_outer): Same.
* omp-low.c (lower_rec_input_clauses): Same.
(lower_lastprivate_clauses): Same.
(lower_send_clauses): Same.
(lower_omp_target): Same.
* tree-cfg.c (pass_warn_function_return::execute): Same.
* tree-complex.c (create_one_component_var): Same.
* tree-inline.c (remap_gimple_op_r): Same.
(copy_tree_body_r): Same.
(declare_return_variable): Same.
(expand_call_inline): Same.
* tree-nested.c (lookup_field_for_decl): Same.
* tree-sra.c (create_access_replacement): Same.
(generate_subtree_copies): Same.
* tree-ssa-ccp.c (pass_post_ipa_warn::execute): Same.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Same.
* tree-ssa-loop-ch.c (ch_base::copy_headers): Same.
* tree-ssa-loop-im.c (execute_sm): Same.
* tree-ssa-phiopt.c (cond_store_replacement): Same.
* tree-ssa-strlen.c (maybe_warn_overflow): Same.
(handle_builtin_strcpy): Same.
(maybe_diag_stxncpy_trunc): Same.
(handle_builtin_stxncpy_strncat): Same.
(handle_builtin_strcat): Same.
* tree-ssa-uninit.c (get_no_uninit_warning): Same.
(set_no_uninit_warning): Same.
(uninit_undefined_value_p): Same.
(warn_uninit): Same.
(maybe_warn_operand): Same.
* tree-vrp.c (compare_values_warnv): Same.
* vr-values.c (vr_values::extract_range_for_var_from_comparison_expr): 
Same.
(test_for_singularity): Same.

* gimple.h (warning_suppressed_p): New function.
(suppress_warning): Same.
(copy_no_warning): Same.
(gimple_set_block): Call gimple_set_location.
(gimple_set_location): Call copy_warning.
* tree.h (no_warning, all_warnings): New constants.
(warning_suppressed_p): New function.
(suppress_warning): Same.
(copy_no_warning): Same.

OK once prereqs are approved.

jeff



Re: [PATCH 6/13] v2 Use new per-location warning APIs in the C++ front end

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the C++
front end with the new suppress_warning(), warning_suppressed_p(),
and copy_warning() APIs.

gcc-no-warning-cp.diff

Add support for per-location warning groups.

* call.c (build_over_call): Replace direct uses of TREE_NO_WARNING
with warning_suppressed_p, suppress_warning, and copy_no_warning, or
nothing if not necessary.
(set_up_extended_ref_temp): Same.
* class.c (layout_class_type): Same.
* constraint.cc (constraint_satisfaction_value): Same.
* coroutines.cc (finish_co_await_expr): Same.
(finish_co_yield_expr): Same.
(finish_co_return_stmt): Same.
(build_actor_fn): Same.
(coro_rewrite_function_body): Same.
(morph_fn_to_coro): Same.
* cp-gimplify.c (genericize_eh_spec_block): Same.
(gimplify_expr_stmt): Same.
(cp_genericize_r): Same.
(cp_fold): Same.
* cp-ubsan.c (cp_ubsan_instrument_vptr): Same.
* cvt.c (cp_fold_convert): Same.
(convert_to_void): Same.
* decl.c (wrapup_namespace_globals): Same.
(grokdeclarator): Same.
(finish_function): Same.
(require_deduced_type): Same.
* decl2.c (no_linkage_error): Same.
(c_parse_final_cleanups): Same.
* except.c (expand_end_catch_block): Same.
* init.c (build_new_1): Same.
(build_new): Same.
(build_vec_delete_1): Same.
(build_vec_init): Same.
(build_delete): Same.
* method.c (defaultable_fn_check): Same.
* parser.c (cp_parser_fold_expression): Same.
(cp_parser_primary_expression): Same.
* pt.c (push_tinst_level_loc): Same.
(tsubst_copy): Same.
(tsubst_omp_udr): Same.
(tsubst_copy_and_build): Same.
* rtti.c (build_if_nonnull): Same.
* semantics.c (maybe_convert_cond): Same.
(finish_return_stmt): Same.
(finish_parenthesized_expr): Same.
(cp_check_omp_declare_reduction): Same.
* tree.c (build_cplus_array_type): Same.
* typeck.c (build_ptrmemfunc_access_expr): Same.
(cp_build_indirect_ref_1): Same.
(cp_build_function_call_vec): Same.
(warn_for_null_address): Same.
(cp_build_binary_op): Same.
(unary_complex_lvalue): Same.
(cp_build_modify_expr): Same.
(build_x_modify_expr): Same.
(convert_for_assignment): Same.
OK once prereqs are approved.  Note that I wouldn't be terribly 
surprised if changes in the C++ front-end over the last couple weeks 
create a conflict.  Consider resolution of such conflicts (in the C++ 
front-end or elsewhere) or removing additional uses of TREE_NO_WARNING 
that have crept in over the last two weeks pre-approved.


Jeff



Re: [PATCH 3/13] v2 Use new per-location warning APIs in C front end

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:41 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the C front
end with the new suppress_warning(), warning_suppressed_p(), and
copy_warning() APIs.

gcc-no-warning-c.diff

Add support for per-location warning groups.

gcc/c/ChangeLog:

* c-decl.c (pop_scope): Replace direct uses of TREE_NO_WARNING with
warning_suppressed_p, suppress_warning, and copy_no_warning.
(diagnose_mismatched_decls): Same.
(duplicate_decls): Same.
(grokdeclarator): Same.
(finish_function): Same.
(c_write_global_declarations_1): Same.
* c-fold.c (c_fully_fold_internal): Same.
* c-parser.c (c_parser_expr_no_commas): Same.
(c_parser_postfix_expression): Same.
* c-typeck.c (array_to_pointer_conversion): Same.
(function_to_pointer_conversion): Same.
(default_function_array_conversion): Same.
(convert_lvalue_to_rvalue): Same.
(default_conversion): Same.
(build_indirect_ref): Same.
(build_function_call_vec): Same.
(build_atomic_assign): Same.
(build_unary_op): Same.
(c_finish_return): Same.
(emit_side_effect_warnings): Same.
(c_finish_stmt_expr): Same.
(c_omp_clause_copy_ctor): Same.

OK once prereqs are approved.
jeff



Re: [PATCH 2/13] v2 Use new per-location warning APIs in Ada.

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:41 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the Ada front
end with the new suppress_warning(), warning_suppressed_p(), and
copy_warning() APIs.

gcc-no-warning-ada.diff

Add support for per-location warning groups.

gcc/ada/ChangeLog:

* gcc-interface/trans.c (Handled_Sequence_Of_Statements_to_gnu):
Replace TREE_NO_WARNING with suppress_warning.
(gnat_gimplify_expr): Same.
* gcc-interface/utils.c (gnat_pushdecl): Same.

OK once prereqs are approved.
jeff



Re: [PATCH 4/13] v2 Use new per-location warning APIs in C family code

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the shared
C family front end with the new suppress_warning(),
warning_suppressed_p(), and copy_warning() APIs.

gcc-no-warning-c-family.diff

Add support for per-location warning groups.

gcc/c-family/ChangeLog:

* c-common.c (c_wrap_maybe_const): Remove TREE_NO_WARNING.
(c_common_truthvalue_conversion): Replace direct uses of
TREE_NO_WARNING with warning_suppressed_p, suppress_warning, and
copy_no_warning.
(check_function_arguments_recurse): Same.
* c-gimplify.c (c_gimplify_expr): Same.
* c-warn.c (overflow_warning): Same.
(warn_logical_operator): Same.
(warn_if_unused_value): Same.
(do_warn_unused_parameter): Same.

OK once prereqs are approved.
jeff



Re: [PATCH 5/13] v2 Use new per-location warning APIs in the RL78 back end

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the RL78
back end with the new suppress_warning() and warning_suppressed_p()
APIs.

gcc-no-warning-rl78.diff

Add support for per-location warning groups.

gcc/ChangeLog:

* config/rl78/rl78.c (rl78_handle_naked_attribute): Replace a direct
use of TREE_NO_WARNING with suppress_warning.

OK once prereqs are approved.

Jeff


Re: [PATCH 7/13] v2 Use new per-location warning APIs in the FORTRAN front end

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the FORTRAN
front end with the new suppress_warning() API.

gcc-no-warning-fortran.diff

Add support for per-location warning groups.

gcc/fortran/ChangeLog:

* trans-array.c (trans_array_constructor): Replace direct uses
of TREE_NO_WARNING with warning_suppressed_p, and suppress_warning.
* trans-decl.c (gfc_build_qualified_array): Same.
(gfc_build_dummy_array_decl): Same.
(generate_local_decl): Same.
(gfc_generate_function_code): Same.
* trans-openmp.c (gfc_omp_clause_default_ctor): Same.
(gfc_omp_clause_copy_ctor): Same.
* trans-types.c (get_dtype_type_node): Same.
(gfc_get_desc_dim_type): Same.
(gfc_get_array_descriptor_base): Same.
(gfc_get_caf_vector_type): Same.
(gfc_get_caf_reference_type): Same.
* trans.c (gfc_create_var_np): Same.

OK once prereqs are approved.
jeff



Re: [PATCH 8/13] v2 Use new per-location warning APIs in libcc1

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:42 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in libcc1 with
the new suppress_warning() API.

gcc-no-warning-libcc1.diff

Add support for per-location warning groups.

libcc1/ChangeLog:

* libcp1plugin.cc (record_decl_address): Replace a direct use
of TREE_NO_WARNING with suppress_warning.

OK once prereqs are approved.
jeff



Re: [PATCH 9/13] v2 Use new per-location warning APIs in LTO

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:43 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the LTO
front end with the new suppress_warning() API.  It adds a couple of
FIXMEs that I plan to take care of in a follow up.

gcc-no-warning-lto.diff

Add support for per-location warning groups.

gcc/lto/ChangeLog:

* gimple-streamer-out.c (output_gimple_stmt): Same.
* lto-common.c (compare_tree_sccs_1): Expand use of TREE_NO_WARNING.
* lto-streamer-out.c (hash_tree): Same.
* tree-streamer-in.c (unpack_ts_base_value_fields): Same.
* tree-streamer-out.c (pack_ts_base_value_fields): Same.

OK once prereqs are approved.

jeff



Re: [PATCH 11/13] v2 Use new per-location warning APIs in the Objective-C front end

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:43 PM, Martin Sebor via Gcc-patches wrote:

The attached patch replaces the uses of TREE_NO_WARNING in
the Objective-C front end with the new suppress_warning(),
warning_suppressed_p(), and copy_warning() APIs.

gcc-no-warning-objc.diff

Add support for per-location warning groups.

gcc/objc/ChangeLog:

* objc-act.c (objc_maybe_build_modify_expr): Replace direct uses
of TREE_NO_WARNING with warning_suppressed_p, and suppress_warning.
(objc_build_incr_expr_for_property_ref): Same.
(objc_build_struct): Same.
(synth_module_prologue): Same.
* objc-gnu-runtime-abi-01.c (gnu_runtime_01_initialize): Same.
* objc-next-runtime-abi-01.c (next_runtime_01_initialize): Same.
* objc-next-runtime-abi-02.c (next_runtime_02_initialize): Same.

OK once prereqs are approved.

Jeff


Re: [PATCH 12/13] v2 Remove TREE_NO_WARNING and gimple*no_warning* APIs

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:43 PM, Martin Sebor via Gcc-patches wrote:

The attached patch removes the definitions of the TREE_NO_WARNING macro
and the gimple_get_no_warning_p() and gimple_set_no_warning() functions.

gcc-no-warning-remove-api.diff

Remove legacy TREE_NO_WARNING amd gimple_*no_warning* APIs.

gcc/ChangeLog:

* tree.h (TREE_NO_WARNING): Remove.
* gimple.h (gimple_no_warning_p): Remove.
(gimple_suppress_warning): Same.
Is there any value in marking these as poisoned?  We've done that for 
some macros in the past, but we haven't been consistent with it.    If 
you think it's worth poisoning, the place to do that is system.h.


Approved with or without poisoning once prereqs are approved.

jeff



Re: [PATCH] [i386] Support avx512 vector shift with vector [PR98434]

2021-06-23 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 23, 2021 at 5:08 PM Richard Biener
 wrote:
>
> On Wed, Jun 23, 2021 at 10:01 AM Jakub Jelinek  wrote:
> >
> > On Wed, Jun 23, 2021 at 09:53:27AM +0200, Richard Biener via Gcc-patches 
> > wrote:
> > > On Wed, Jun 23, 2021 at 9:19 AM Hongtao Liu via Gcc-patches
> > >  wrote:
> > > >
> > > > Here's the patch I'm going to check in.
> > > >
> > > > The patch will regress pr91838.C with extra options: -march=cascadelake
> > > >
> > > > using T = unsigned char; // or ushort, or uint
> > > > using V [[gnu::vector_size(8)]] = T;
> > > > V f(V x) { return x >> 8 * sizeof(T); }
> > > >
> > > > Basically, the testcase is UB with logic right shift more than 8 bits
> > >
> > > I don't see any UB here, it's just x >> 8 but we indeed fail to constant
> > > fold this.  For scalars bit CCP does this, but we seem to lack a
> >
> > For scalar T y = ...; y >> 8 is not UB because of integral promotion to
> > int, but we do not perform such integral promotions for vector types,
> > so arguably x >> 8 is UB.
>
> But this vector shift is a GCC extension and we dont document this UB
> which means the natural reading would be that the shift is applied
> to the promoted element and then the result truncated?  We document:
>
> It is possible to use shifting operators @code{<<}, @code{>>} on
> integer-type vectors. The operation is defined as following: @code{@{a0,
> a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
> @dots{}, an >> bn@}}@. Vector operands must have the same number of
> elements.
>
> so IMHO it's not UB and nothing optimizes it since VRP and bit CCP only
> operate on scalars, not vectors.  Of course it would take quite some
> work in those passes to also fold sth like
>
>__builtin_convertvector (v4qi, v4si) >> 8
>
> thus where a unsigned char vector is widened to int and the int vector
> shifted.
>
> Richard.
>
> > Jakub
> >

I've committed the patch, and opened PR101187 for the failed case.


-- 
BR,
Hongtao


Re: [PATCH 13/13] v2 Add regression tests for PR 74765 and 74762

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/4/2021 3:43 PM, Martin Sebor via Gcc-patches wrote:

The attached patch adds regression tests for two closely related bugs
resolved by the patch series.

gcc-no-warning-tests.diff

Regression tests for TREE_NO_WARNING enhancement to warning groups.

PR middle-end/74765 - missing uninitialized warning (parenthesis, 
TREE_NO_WARNING abuse)
PR middle-end/74762 - [9/10/11/12 Regression] missing uninitialized warning 
(C++, parenthesized expr, TREE_NO_WARNING)

gcc/testsuite/ChangeLog:

* g++.dg/uninit-pr74762.C: New test.
* g++.dg/warn/uninit-pr74765.C: Same.

diff --git a/gcc/testsuite/g++.dg/uninit-pr74762.C 
b/gcc/testsuite/g++.dg/uninit-pr74762.C

This is OK once the prereqs are all approved.
jeff



Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-23 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 23, 2021 at 5:55 PM Uros Bizjak  wrote:
>
> On Wed, Jun 23, 2021 at 11:41 AM Uros Bizjak  wrote:
> >
> > On Wed, Jun 23, 2021 at 11:32 AM Hongtao Liu  wrote:
> >
> > > > > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, 
> > > > > > > > allocate
> > > > > > > > MASK_REGS first since it has already been disparaged.
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > PR target/101142
> > > > > > > > * config/i386/i386.md: (*anddi_1): Disparage slightly 
> > > > > > > > the mask
> > > > > > > > register alternative.
> > > > > > > > (*and_1): Ditto.
> > > > > > > > (*andqi_1): Ditto.
> > > > > > > > (*andn_1): Ditto.
> > > > > > > > (*_1): Ditto.
> > > > > > > > (*qi_1): Ditto.
> > > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > > * config/i386/i386.c (x86_order_regs_for_local_alloc): 
> > > > > > > > Change
> > > > > > > > the order of mask registers to be before general 
> > > > > > > > registers.
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > > PR target/101142
> > > > > > > > * gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
> > > > > > > > * gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
> > > > > > > > * gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
> > > > > > > > * gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
> > > > > > >
> > > > > > > OK with a comment addition, see inline.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Uros.
> > > > > > >
> > > > > > > > ---
> > > > > > > >  gcc/config/i386/i386.c|  8 +-
> > > > > > > >  gcc/config/i386/i386.md   | 20 ++---
> > > > > > > >  .../gcc.target/i386/spill_to_mask-1.c | 89 
> > > > > > > > +--
> > > > > > > >  .../gcc.target/i386/spill_to_mask-2.c | 11 ++-
> > > > > > > >  .../gcc.target/i386/spill_to_mask-3.c | 11 ++-
> > > > > > > >  .../gcc.target/i386/spill_to_mask-4.c | 11 ++-
> > > > > > > >  6 files changed, 91 insertions(+), 59 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > > > index a61255857ff..a651853ca3b 100644
> > > > > > > > --- a/gcc/config/i386/i386.c
> > > > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > > > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
> > > > > > > > int pos = 0;
> > > > > > > > int i;
> > > > > > > >
> > > > > > > > +   /* Mask register.  */
> > > > > > > > +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > > > > > > > + reg_alloc_order [pos++] = i;
> > > > > > >
> > > > > > > Please add a comment why mask registers should come first.
> > > > > > Thanks for the review, this is the patch i'm check in.
> > > > >
> > > > > This patch again caused unwanted mask instructions with -m32 in cpuid
> > > > > check code, e.g.:
> > > > >
> > > > > Running target unix/-m32
> > > > > FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test
> > > > > FAIL: gcc.target/i386/pr96814.c execution test
> > > > >
> > > > > Debugging pr96814 failure:
> > > > >
> > > > >   0x0804921d <+66>:mov%edx,%ecx
> > > > >   0x0804921f <+68>:cpuid
> > > > > => 0x08049221 <+70>:kmovd  %edx,%k0
> > > > >   0x08049225 <+74>:mov%eax,-0x8(%ebp)
> > > > >   0x08049228 <+77>:mov%ebx,-0xc(%ebp)
> > > > >
> > > > > It looks to me that putting mask registers in front of GPR is looking
> > > > So it's not functionality but performance issue here, under 32-bit
> > > > mode there are only 8 gprs which result in higher register pressure,
> > > > and for this we do have mask->integer and integer->mask cost, with
> > > > -mtune=bdver1 where cost of kmovd is quite high(16, 20 /*
> > > > mask->integer and integer->mask moves */), there's no mask
> > > > instructions in cpuid.
> > > > I guess we can adjust mask->integer and integer->mask for 32-bit mode
> > > > to avoid such a situation?
> > > I notice the default option is O0, with -O there's no mask instructions.
> > > IMHO, We don't need to change cost unless there's -O2 cases where mask
> > > instructions regress performance here.
> >
> > No, this reasoning is not acceptable. The compiled code will SIGILL on
> > targets where unsupported mask registers are involved, so GPR should
> > always have priority compared to mask registers. Based on these
> > findings, x86_order_regs_for_local_alloc change should be reverted,
> > and register move costs compensated elsewhere.
>
> Longterm, I think that introducing vector BImode and VxBI vectors
> should solve this issue. This would make a clear cut between mask and
> GPR operations. I think that generic vector infrastructure supports
> BImode vectors, so using e.g. "a & b" instead of builtins 

Re: [PATCH 1/3] Add IEEE 128-bit min/max support on PowerPC.

2021-06-23 Thread Segher Boessenkool
Hi!

On Thu, Jun 17, 2021 at 03:18:48PM -0400, Michael Meissner wrote:
> > The actual insns only check TARGET_POWER10 (so no TARGET_FLOAT128_HW).
> > Which is right, this or that?
> 
> It should include TARGET_FLOAT128_HW.

Okay, so fix that :-)

> The problem area is a power10 running in
> big endian mode and running 32-bit code.  Because we don't have TImode, we
> can't enable the IEEE 128-bit hardware instructions.

I don't see why not?

> > > +/* { dg-require-effective-target ppc_float128_hw } */
> > > +/* { dg-require-effective-target power10_ok } */
> > > +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffast-math" } */
> > 
> > In testcases we can assume that float128_hw is set whenever we have a
> > p10; we don't manually disable it to make live hard for ourselves ;-)
> 
> Again, I put it in case somebody builds a BE power10 compiler.

This should still be fixed.  And yes, people do test BE p10, of course.
And BE p10 *should* enable the QP float insns.  Does it not currently?


Segher


Re: [PATCH] tree-optimization/101173 - fix interchange dependence checking

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/23/2021 7:16 AM, Richard Biener wrote:

This adjusts the loop interchange dependence checking to disallow
an outer loop dependence distance of zero.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-06-23  Richard Biener  

PR tree-optimization/101173
* gimple-loop-interchange.cc
(tree_loop_interchange::valid_data_dependences): Disallow outer
loop dependence distance of zero.

* gcc.dg/torture/pr101173.c: New testcase.

OK
jeff



Re: [PATCH] tree-optimization/101105 - fix runtime alias test optimization

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:46 AM, Richard Biener wrote:

We were ignoring DR_STEP for VF == 1 which is OK only in case
the scalar order is preserved or both DR steps are the same.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-06-23  Richard Biener  

PR tree-optimization/101105
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
Only ignore steps when they are equal or scalar order is preserved.

* gcc.dg/torture/pr101105.c: New testcase.

OK
jeff



Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-23 Thread Martin Sebor via Gcc-patches

On 6/22/21 11:23 PM, Trevor Saunders wrote:

On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:

On 6/21/21 1:15 AM, Richard Biener wrote:

On Fri, Jun 18, 2021 at 6:03 PM Martin Sebor  wrote:


On 6/18/21 4:38 AM, Richard Biener wrote:

On Thu, Jun 17, 2021 at 4:43 PM Martin Sebor  wrote:


On 6/17/21 12:03 AM, Richard Biener wrote:

On Wed, Jun 16, 2021 at 6:01 PM Martin Sebor  wrote:


On 6/16/21 6:46 AM, Richard Sandiford via Gcc-patches wrote:

Richard Biener via Gcc-patches  writes:

On Tue, Jun 15, 2021 at 8:02 AM Trevor Saunders  wrote:


This makes it clear the caller owns the vector, and ensures it is cleaned up.

Signed-off-by: Trevor Saunders 

bootstrapped and regtested on x86_64-linux-gnu, ok?


OK.

Btw, are "standard API" returns places we can use 'auto'?  That would avoid
excessive indent for

-  dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
-bbs.address (),
-bbs.length ());
+  auto_vec dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
+  bbs.address (),
+  bbs.length ());

and just uses

   auto dom_bbs = get_dominated_by_region (...

Not asking you to do this, just a question for the audience.


Personally I think this would be surprising for something that doesn't
have copy semantics.  (Not that I'm trying to reopen that debate here :-)
FWIW, I agree not having copy semantics is probably the most practical
way forward for now.)


But you did open the door for me to reiterate my strong disagreement
with that.  The best C++ practice going back to the early 1990's is
to make types safely copyable and assignable.  It is the default for
all types, in both C++ and C, and so natural and expected.

Preventing copying is appropriate in special and rare circumstances
(e.g, a mutex may not be copyable, or a file or iostream object may
not be because they represent a unique physical resource.)

In the absence of such special circumstances preventing copying is
unexpected, and in the case of an essential building block such as
a container, makes the type difficult to use.

The only argument for disabling copying that has been given is
that it could be surprising(*).  But because all types are copyable
by default the "surprise" is usually when one can't be.

I think Richi's "surprising" has to do with the fact that it lets
one inadvertently copy a large amount of data, thus leading to
an inefficiency.  But by analogy, there are infinitely many ways
to end up with inefficient code (e.g., deep recursion, or heap
allocation in a loop), and they are not a reason to ban the coding
constructs that might lead to it.

IIUC, Jason's comment about surprising effects was about implicit
conversion from auto_vec to vec.  I share that concern, and agree
that it should be addressed by preventing the conversion (as Jason
suggested).


But fact is that how vec<> and auto_vec<> are used today in GCC
do not favor that.  In fact your proposed vec<> would be quite radically
different (and IMHO vec<> and auto_vec<> should be unified then to
form your proposed new container).  auto_vec<> at the moment simply
maintains ownership like a smart pointer - which is _also_ not copyable.


Yes, as we discussed in the review below, vec is not a good model
because (as you note again above) it's constrained by its legacy
uses.  The best I think we can do for it is to make it safer to
use.
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571622.html


Which is what Trevors patches do by simply disallowing things
that do not work at the moment.


(Smart pointers don't rule out copying.  A std::unique_ptr does
and std::shared_ptr doesn't.  But vec and especially auto_vec
are designed to be containers, not "unique pointers" so any
relationship there is purely superficial and a distraction.)

That auto_vec and vec share a name and an is-a relationship is
incidental, an implementation detail leaked into the API.  A better
name than vector is hard to come up with, but the public inheritance
is a design flaw, a bug waiting to be introduced due to the conversion
and the assumptions the base vec makes about POD-ness and shallow
copying.  Hindsight is always 20/20 but past mistakes should not
dictate the design of a general purpose vector-like container in
GCC.


That auto_vec<> "decays" to vec<> was on purpose design.

By-value passing of vec<> is also on purpose to avoid an extra
pointer indirection on each access.


I think you may have misunderstood what I mean by is-a relationship.
It's fine to convert an auto_vec to another interface.  The danger
is in allowing that to happen implicitly because that tends to let
it happen even when it's not intended.  The usual way to avoid
that risk is to provide a conversion function, like
auto_vec::to_vec().  This is also why standard classes like
std::vector or std::string don't allow such 

[PATCH] c++: requires-expression folding [PR101182]

2021-06-23 Thread Patrick Palka via Gcc-patches
Here we're crashing because cp_fold_function walks into the (templated)
requirements of a requires-expression outside a template, but the
folding routines aren't prepared to handle templated trees.  This patch
fixes this by making cp_fold use evaluate_requires_expr to fold a
requires-expression as a whole, which also means we no longer need to
explicitly do so during gimplification.  (We don't immediately fold
non-dependent requires-expressions at parse time for sake of better
diagnostics in case one appears as the condition of a failed
static_assert.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested on
cmcstl2 and range-v3, does this look OK for trunk/11?

PR c++/101182

gcc/cp/ChangeLog:

* constraint.cc (evaluate_requires_expr): Adjust function comment.
* cp-gimplify.c (cp_genericize_r) : Move to ...
(cp_fold) : ... here.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires27.C: New test.
---
 gcc/cp/constraint.cc |  2 +-
 gcc/cp/cp-gimplify.c | 10 --
 gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C | 10 ++
 3 files changed, 15 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 74b16d27101..286332ba2a7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3340,7 +3340,7 @@ evaluate_concept_check (tree check)
 }
 
 /* Evaluate the requires-expression T, returning either boolean_true_node
-   or boolean_false_node.  This is used during gimplification and constexpr
+   or boolean_false_node.  This is used during folding and constexpr
evaluation.  */
 
 tree
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 96d91b6ef2f..33e10556e32 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1465,12 +1465,6 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
TARGET_EXPR_NO_ELIDE (stmt) = 1;
   break;
 
-case REQUIRES_EXPR:
-  /* Emit the value of the requires-expression.  */
-  *stmt_p = evaluate_requires_expr (stmt);
-  *walk_subtrees = 0;
-  break;
-
 case TEMPLATE_ID_EXPR:
   gcc_assert (concept_check_p (stmt));
   /* Emit the value of the concept check.  */
@@ -2708,6 +2702,10 @@ cp_fold (tree x)
x = r;
   break;
 
+case REQUIRES_EXPR:
+  x = evaluate_requires_expr (x);
+  break;
+
 default:
   return org_x;
 }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C
new file mode 100644
index 000..87a759a689a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires27.C
@@ -0,0 +1,10 @@
+// PR c++/101182
+// { dg-do compile { target concepts } }
+
+int a;
+int g(bool);
+
+int f() {
+  g(requires { a++; });
+  return requires { a++; };
+}
-- 
2.32.0.93.g670b81a890



Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-23 Thread Martin Sebor via Gcc-patches

On 6/23/21 1:43 AM, Richard Biener wrote:

On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders  wrote:


On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:

On 6/21/21 1:15 AM, Richard Biener wrote:

[...]


But maybe I'm misunderstanding C++ too much :/

Well, I guess b) from above means auto_vec<> passing to
vec<> taking functions will need changes?


Converting an auto_vec object to a vec slices off its data members.
The auto_vec specialization has no data members so that's not
a bug in and of itself, but auto_vec does have data members
so that would be a bug.  The risk is not just passing it to
functions by value but also returning it.  That risk was made
worse by the addition of the move ctor.


I would agree that the conversion from auto_vec<> to vec<> is
questionable, and should get some work at some point, perhaps just
passingauto_vec references is good enough, or perhaps there is value in
some const_vec view to avoid having to rely on optimizations, I'm not
sure without looking more at the usage.


We do need to be able to provide APIs that work with both auto_vec<>
and vec<>, I agree that those currently taking a vec<> by value are
fragile (and we've had bugs there before), but I'm not ready to say
that changing them all to [const] vec<>& is OK.  The alternative
would be passing a const_vec<> by value, passing that along to
const vec<>& APIs should be valid then (I can see quite some API
boundary cleanups being necessary here ...).

But with all this I don't know how to adjust auto_vec<> to no
longer "decay" to vec<> but still being able to pass it to vec<>&
and being able to call vec<> member functions w/o jumping through
hoops.  Any hints on that?  private inheritance achieves the first
but also hides all the API ...


The simplest way to do that is by preventing the implicit conversion
between the two types and adding a to_vec() member function to
auto_vec as Jason suggested.  This exposes the implicit conversions
to the base vec, forcing us to review each and "fix" it either by
changing the argument to either vec& or const vec&, or less often
to avoid the indirection if necessary, by converting the auto_vec
to vec explicitly by calling to_vec().  The attached diff shows
a very rough POC of what  that might look like.  A side benefit
is that it improves const-safety and makes the effects of functions
on their callers easier to understand.

But that's orthogonal to making auto_vec copy constructible and copy
assignable.  That change can be made independently and with much less
effort and no risk.

Martin
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index a671a3eb740..ab6db3860f5 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -759,8 +759,9 @@ extern tree c_finish_omp_clauses (tree, enum c_omp_region_type);
 extern tree c_build_va_arg (location_t, tree, location_t, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
-extern tree c_build_function_call_vec (location_t, vec, tree,
-   vec *, vec *);
+extern tree c_build_function_call_vec (location_t, const vec&,
+   tree, vec *,
+   vec *);
 extern tree c_omp_clause_copy_ctor (tree, tree, tree);
 
 /* Set to 0 at beginning of a function definition, set to 1 if
diff --git a/gcc/dominance.c b/gcc/dominance.c
index 6a262ce8283..cc63391a39a 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -1227,7 +1227,7 @@ recompute_dominator (enum cdi_direction dir, basic_block bb)
from BBS.  */
 
 static void
-prune_bbs_to_update_dominators (vec bbs,
+prune_bbs_to_update_dominators (vec ,
 bool conservative)
 {
   unsigned i;
@@ -1379,7 +1379,7 @@ determine_dominators_for_sons (struct graph *g, vec bbs,
a block of BBS in the current dominance tree dominate it.  */
 
 void
-iterate_fix_dominators (enum cdi_direction dir, vec bbs,
+iterate_fix_dominators (enum cdi_direction dir, vec ,
 			bool conservative)
 {
   unsigned i;
diff --git a/gcc/dominance.h b/gcc/dominance.h
index 1a8c248ee98..970da02c594 100644
--- a/gcc/dominance.h
+++ b/gcc/dominance.h
@@ -78,7 +78,7 @@ checking_verify_dominators (cdi_direction dir)
 
 basic_block recompute_dominator (enum cdi_direction, basic_block);
 extern void iterate_fix_dominators (enum cdi_direction,
-vec , bool);
+vec &, bool);
 extern void add_to_dominance_info (enum cdi_direction, basic_block);
 extern void delete_from_dominance_info (enum cdi_direction, basic_block);
 extern basic_block first_dom_son (enum cdi_direction, basic_block);
diff --git a/gcc/genautomata.c b/gcc/genautomata.c
index 6bbfc684afa..e488c5f28ef 100644
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
@@ -6137,7 +6137,7 @@ evaluate_equiv_classes (automaton_t automaton, vec *equiv_classes)
 
 /* The function merges equivalent states of AUTOMATON.  */
 static void
-merge_states (automaton_t automaton, vec equiv_classes)
+merge_states (automaton_t automaton, const vec _classes)
 {
   state_t curr_state;
   state_t new_state;
diff --git 

Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-06-23 Thread Segher Boessenkool
Hi!

On Wed, Jun 23, 2021 at 03:22:34PM +0200, Martin Liška wrote:
> As mentioned in the "Fallout: save/restore target options in 
> handle_optimize_attribute"
> thread, we need to support target option restore of 
> rs6000_long_double_type_size == FLOAT_PRECISION_TFmode.

I have no idea?  Could you explain please?

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4185,6 +4185,8 @@ rs6000_option_override_internal (bool global_init_p)
>else
>   rs6000_long_double_type_size = default_long_double_size;
>  }
> +  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
> +; /* The option can be restored a TREE_TARGET_OPTION.  */

What does that mean?  It is not grammatical, and not obvious what it
should mean.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { powerpc*-*-linux* } } } */

Why on Linux only?  That doesn't sound right.  Do you need some other
selector(s)?

> +/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */
> +
> +extern unsigned long int x;
> +extern float f (float);
> +extern __typeof (f) f_power8;
> +extern __typeof (f) f_power9;
> +extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc")));
> +static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof (f) *

-fno-stack-protector is default.

> +f_ifunc (void)
> +{
> +  __typeof (f) *res = x ? f_power9 : f_power8;
> +  return res;
> +}

The testcase should say what it is testing for, it is not obvious?


Segher


[PATCH v2] IBM Z: Define NO_PROFILE_COUNTERS

2021-06-23 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573348.html
v1 -> v2: Use ATTRIBUTE_UNUSED, compact op[] array (Andreas).
  I've also noticed that one of the nops that we generate for
  -mnop-mcount is not needed now and removed it.  A couple
  tests needed to be adjusted after that.




s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure.  Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.

gcc/testsuite/ChangeLog:

* gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new
prologue size.
* gcc.target/s390/mnop-mcount-m64.c: Likewise.
---
 gcc/config/s390/s390.c| 42 +++
 gcc/config/s390/s390.h|  2 +
 .../gcc.target/s390/mnop-mcount-m31-mzarch.c  |  2 +-
 .../gcc.target/s390/mnop-mcount-m64.c |  2 +-
 4 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..590dd8f35bc 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13110,33 +13110,25 @@ output_asm_nops (const char *user, int hw)
 }
 }
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
+/* Output assembler code to FILE to call a profiler hook.  */
 
 void
-s390_function_profiler (FILE *file, int labelno)
+s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
 {
-  rtx op[8];
-
-  char label[128];
-  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
+  rtx op[4];
 
   fprintf (file, "# function profiler \n");
 
   op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
   op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
   op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
-  op[7] = GEN_INT (UNITS_PER_LONG);
-
-  op[2] = gen_rtx_REG (Pmode, 1);
-  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
-  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
+  op[3] = GEN_INT (UNITS_PER_LONG);
 
-  op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
+  op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
   if (flag_pic)
 {
-  op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT);
-  op[4] = gen_rtx_CONST (Pmode, op[4]);
+  op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT);
+  op[2] = gen_rtx_CONST (Pmode, op[2]);
 }
 
   if (flag_record_mcount)
@@ -13150,20 +13142,19 @@ s390_function_profiler (FILE *file, int labelno)
warning (OPT_Wcannot_profile, "nested functions cannot be profiled "
 "with %<-mfentry%> on s390");
   else
-   output_asm_insn ("brasl\t0,%4", op);
+   output_asm_insn ("brasl\t0,%2", op);
 }
   else if (TARGET_64BIT)
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 +
-/* brasl */ 3 + /* lg */ 3);
+   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 +
+/* lg */ 3);
   else
{
  output_asm_insn ("stg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("lg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
@@ -13172,15 +13163,14 @@ s390_function_profiler (FILE *file, int labelno)
   else
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 +
-/* brasl */ 3 + /* l */ 2);
+   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 +
+/* l */ 2);
   else
{
  output_asm_insn ("st\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("l\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 3b876160420..fb16a455a03 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
 
 #define PROFILE_BEFORE_PROLOGUE 1
 
+#define NO_PROFILE_COUNTERS 1
+
 
 /* Trampolines 

[PATCH 7/7] Port most of the A CMP 0 ? A : -A to match

2021-06-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

To improve phiopt and be able to remove abs_replacement, this ports
most of "A CMP 0 ? A : -A" from fold_cond_expr_with_comparison to
match.pd.  There is a few extra changes that are needed to remove
the "A CMP 0 ? A : -A" part from fold_cond_expr_with_comparison:
   * Need to handle (A - B) case
   * Need to handle UN* comparisons.

I will handle those in a different patch.

Note phi-opt-15.c test needed to be updated as we get ABSU now
instead of not getting ABS.  When ABSU was added phiopt was not
updated even to use ABSU instead of not creating ABS.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (A CMP 0 ? A : -A): New patterns.
* tree-ssa-phiopt.c (abs_replacement): Delete function.
(tree_ssa_phiopt_worker): Don't call abs_replacement.
Update comment about abs_replacement.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-15.c: Update test to expect
ABSU and still not expect ABS_EXPR.
---
 gcc/match.pd   |  60 +
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c |   4 +-
 gcc/tree-ssa-phiopt.c  | 134 +
 3 files changed, 64 insertions(+), 134 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 39fb57ee1f4..0c790dfa741 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3976,6 +3976,66 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cnd (logical_inverted_value truth_valued_p@0) @1 @2)
   (cnd @0 @2 @1)))
 
+/* abs/negative simplifications moved from fold_cond_expr_with_comparison,
+   Need to handle (A - B) case as fold_cond_expr_with_comparison does.
+   Need to handle UN* comparisons.
+
+   None of these transformations work for modes with signed
+   zeros.  If A is +/-0, the first two transformations will
+   change the sign of the result (from +0 to -0, or vice
+   versa).  The last four will fix the sign of the result,
+   even though the original expressions could be positive or
+   negative, depending on the sign of A.
+
+   Note that all these transformations are correct if A is
+   NaN, since the two alternatives (A and -A) are also NaNs.  */
+
+(for cnd (cond vec_cond)
+ /* A == 0? A : -Asame as -A */
+ (for cmp (eq uneq)
+  (simplify
+   (cnd (cmp @0 zerop) @0 (negate@1 @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
+ @1))
+  (simplify
+   (cnd (cmp @0 zerop) zerop (negate@1 @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
+ @1))
+ )
+ /* A != 0? A : -Asame as A */
+ (for cmp (ne ltgt)
+  (simplify
+   (cnd (cmp @0 zerop) @0 (negate @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
+ @0))
+  (simplify
+   (cnd (cmp @0 zerop) @0 zerop)
+(if (!HONOR_SIGNED_ZEROS (element_mode (type)))
+ @0))
+ )
+ /* A >=/> 0? A : -Asame as abs (A) */
+ (for cmp (ge gt)
+  (simplify
+   (cnd (cmp @0 zerop) @0 (negate @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type))
+&& !TYPE_UNSIGNED (type))
+ (abs @0
+ /* A <=/< 0? A : -Asame as -abs (A) */
+ (for cmp (le lt)
+  (simplify
+   (cnd (cmp @0 zerop) @0 (negate @0))
+(if (!HONOR_SIGNED_ZEROS (element_mode (type))
+&& !TYPE_UNSIGNED (type))
+ (if (ANY_INTEGRAL_TYPE_P (type)
+ && !TYPE_OVERFLOW_WRAPS (type))
+  (with {
+   tree utype = unsigned_type_for (type);
+   }
+   (convert (negate (absu:utype @0
+   (negate (abs @0)
+ )
+)
+
 /* -(type)!A -> (type)A - 1.  */
 (simplify
  (negate (convert?:s (logical_inverted_value:s @0)))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
index ac3018ef533..6aec68961cf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c
@@ -9,4 +9,6 @@ foo (int i)
   return i;
 }
 
-/* { dg-final { scan-tree-dump-not "ABS" "optimized" } } */
+/* We should not have ABS_EXPR but ABSU_EXPR instead. */
+/* { dg-final { scan-tree-dump-not "ABS_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump "ABSU" "optimized" } } */
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 5f099eca343..cd582d08dfc 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -63,8 +63,6 @@ static int value_replacement (basic_block, basic_block,
  edge, edge, gphi *, tree, tree);
 static bool minmax_replacement (basic_block, basic_block,
edge, edge, gphi *, tree, tree);
-static bool abs_replacement (basic_block, basic_block,
-edge, edge, gphi *, tree, tree);
 static bool spaceship_replacement (basic_block, basic_block,
   edge, edge, gphi *, tree, tree);
 static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, basic_block,
@@ -350,8 +348,6 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
   arg0, arg1,

[PATCH 6/7] Lower for loops before lowering cond in genmatch

2021-06-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

While converting some fold_cond_expr_with_comparison
to match, I found that I wanted to use "for cnd (cond vec_cond)"
but that was not causing the lowering of cond to happen.
What was happening was the lowering of the for loop
was happening after the lowering of the cond. So
swapping was the correct thing to do but it also
means we need to copy for_subst_vec in lower_cond.

OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* genmatch.c (lower_cond): Copy for_subst_vec
for the simplify also.
(lower): Swap the order for lower_for and lower_cond.
---
 gcc/genmatch.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 4d476720c9e..970a2ebba68 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -1306,6 +1306,7 @@ lower_cond (simplify *s, vec& simplifiers)
 {
   simplify *ns = new simplify (s->kind, s->id, matchers[i], s->result,
   s->for_vec, s->capture_ids);
+  ns->for_subst_vec.safe_splice (s->for_subst_vec);
   simplifiers.safe_push (ns);
 }
 }
@@ -1543,24 +1544,27 @@ static void
 lower (vec& simplifiers, bool gimple)
 {
   auto_vec out_simplifiers;
-  for (unsigned i = 0; i < simplifiers.length (); ++i)
-lower_opt (simplifiers[i], out_simplifiers);
+  for (auto s: simplifiers)
+lower_opt (s, out_simplifiers);
 
   simplifiers.truncate (0);
-  for (unsigned i = 0; i < out_simplifiers.length (); ++i)
-lower_commutative (out_simplifiers[i], simplifiers);
+  for (auto s: out_simplifiers)
+lower_commutative (s, simplifiers);
 
+  /* Lower for needs to happen before lowering cond
+ to support (for cnd (cond vec_cond)).  This is
+ safe as substitution delay does not happen for
+ cond or vec_cond. */
   out_simplifiers.truncate (0);
-  if (gimple)
-for (unsigned i = 0; i < simplifiers.length (); ++i)
-  lower_cond (simplifiers[i], out_simplifiers);
-  else
-out_simplifiers.safe_splice (simplifiers);
-
+  for (auto s: simplifiers)
+lower_for (s, out_simplifiers);
 
   simplifiers.truncate (0);
-  for (unsigned i = 0; i < out_simplifiers.length (); ++i)
-lower_for (out_simplifiers[i], simplifiers);
+  if (gimple)
+for (auto s: out_simplifiers)
+  lower_cond (s, simplifiers);
+  else
+simplifiers.safe_splice (out_simplifiers);
 }
 
 
-- 
2.27.0



[PATCH 4/7] Allow match-and-simplified phiopt to run in early phiopt

2021-06-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

To move a few things more to match-and-simplify from phiopt,
we need to allow match_simplify_replacement to run in early
phiopt. To do this we add a replacement for gimple_simplify
that is explictly for phiopt.

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (match_simplify_replacement):
Add early_p argument. Call gimple_simplify_phiopt
instead of gimple_simplify.
(tree_ssa_phiopt_worker): Update call to
match_simplify_replacement and allow unconditionally.
(phiopt_early_allow): New function.
(gimple_simplify_phiopt): New function.
---
 gcc/tree-ssa-phiopt.c | 89 ++-
 1 file changed, 70 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 147397ad82c..8b0e68c1e90 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -50,12 +50,13 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "internal-fn.h"
 #include "gimple-range.h"
+#include "gimple-match.h"
 
 static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
 static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
   tree, tree);
 static bool match_simplify_replacement (basic_block, basic_block,
-   edge, edge, gphi *, tree, tree);
+   edge, edge, gphi *, tree, tree, bool);
 static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, tree,
gimple *);
 static int value_replacement (basic_block, basic_block,
@@ -345,9 +346,9 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
  /* Do the replacement of conditional if it can be done.  */
  if (!early_p && two_value_replacement (bb, bb1, e2, phi, arg0, arg1))
cfgchanged = true;
- else if (!early_p
-  && match_simplify_replacement (bb, bb1, e1, e2, phi,
- arg0, arg1))
+ else if (match_simplify_replacement (bb, bb1, e1, e2, phi,
+  arg0, arg1,
+  early_p))
cfgchanged = true;
  else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
@@ -803,6 +804,67 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   return true;
 }
 
+/* Return TRUE if CODE should be allowed during early phiopt.
+   Currently this is to allow MIN/MAX and ABS/NEGATE.  */
+static bool
+phiopt_early_allow (enum tree_code code)
+{
+  switch (code)
+{
+  case MIN_EXPR:
+  case MAX_EXPR:
+  case ABS_EXPR:
+  case ABSU_EXPR:
+  case NEGATE_EXPR:
+  case SSA_NAME:
+   return true;
+  default:
+   return false;
+}
+}
+
+/* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
+   Return NULL if nothing can be simplified or the resulting simplified value
+   with parts pushed if EARLY_P was true. Also rejects non allowed tree code
+   if EARLY_P is set.
+   Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries
+   to simplify CMP ? ARG0 : ARG1.  */
+static tree
+gimple_simplify_phiopt (bool early_p, tree type, gimple *comp_stmt,
+   tree arg0, tree arg1,
+   gimple_seq *seq)
+{
+  tree result;
+  enum tree_code comp_code = gimple_cond_code (comp_stmt);
+  location_t loc = gimple_location (comp_stmt);
+  tree cmp0 = gimple_cond_lhs (comp_stmt);
+  tree cmp1 = gimple_cond_rhs (comp_stmt);
+  /* To handle special cases like floating point comparison, it is easier and
+ less error-prone to build a tree and gimplify it on the fly though it is
+ less efficient.
+ Don't use fold_build2 here as that might create (bool)a instead of just
+ "a != 0".  */
+  tree cond = build2_loc (loc, comp_code, boolean_type_node,
+ cmp0, cmp1);
+  gimple_match_op op (gimple_match_cond::UNCOND,
+ COND_EXPR, type, cond, arg0, arg1);
+
+  if (op.resimplify (early_p ? NULL : seq, follow_all_ssa_edges))
+{
+  /* Early we want only to allow some generated tree codes. */
+  if (!early_p
+ || op.code.is_tree_code ()
+ || phiopt_early_allow ((tree_code)op.code))
+   {
+ result = maybe_push_res_to_seq (, seq);
+ if (result)
+   return result;
+   }
+}
+
+  return NULL;
+}
+
 /*  The function match_simplify_replacement does the main work of doing the
 replacement using match and simplify.  Return true if the replacement is 
done.
 Otherwise return false.
@@ -812,10 +874,9 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
 static bool
 match_simplify_replacement 

[PATCH 5/7] Try inverted comparison for match_simplify in phiopt

2021-06-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Since match and simplify does not have all of the inverted
comparison patterns, it make sense to just have
phi-opt try to do the inversion and try match and simplify again.

OK? Bootstrapped and tested on x86_64-linux-gnu.

Thanks,
Andrew Pinski

gcc/ChangeLog:

* tree-ssa-phiopt.c (gimple_simplify_phiopt):
If "A ? B : C" fails to simplify, try "(!A) ? C : B".
---
 gcc/tree-ssa-phiopt.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 8b0e68c1e90..5f099eca343 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -828,7 +828,8 @@ phiopt_early_allow (enum tree_code code)
with parts pushed if EARLY_P was true. Also rejects non allowed tree code
if EARLY_P is set.
Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries
-   to simplify CMP ? ARG0 : ARG1.  */
+   to simplify CMP ? ARG0 : ARG1.
+   Also try to simplify (!CMP) ? ARG0 : ARG1 if the non-inverse failed.  */
 static tree
 gimple_simplify_phiopt (bool early_p, tree type, gimple *comp_stmt,
tree arg0, tree arg1,
@@ -861,6 +862,30 @@ gimple_simplify_phiopt (bool early_p, tree type, gimple 
*comp_stmt,
return result;
}
 }
+  /* Try the inverted comparison, that is !COMP ? ARG1 : ARG0. */
+  comp_code = invert_tree_comparison (comp_code, HONOR_NANS (cmp0));
+
+  if (comp_code == ERROR_MARK)
+return NULL;
+
+  cond = build2_loc (loc,
+comp_code, boolean_type_node,
+cmp0, cmp1);
+  gimple_match_op op1 (gimple_match_cond::UNCOND,
+  COND_EXPR, type, cond, arg1, arg0);
+
+  if (op1.resimplify (early_p ? NULL : seq, follow_all_ssa_edges))
+{
+  /* Early we want only to allow some generated tree codes. */
+  if (!early_p
+ || op1.code.is_tree_code ()
+ || phiopt_early_allow ((tree_code)op1.code))
+   {
+ result = maybe_push_res_to_seq (, seq);
+ if (result)
+   return result;
+   }
+}
 
   return NULL;
 }
-- 
2.27.0



[PATCH 3/7] Duplicate the range information of the phi onto the new ssa_name

2021-06-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Since match_simplify_replacement uses gimple_simplify, there is a new
ssa name created sometimes and then we go and replace the phi edge with
this new ssa name, the range information on the phi is lost.
Placing this in replace_phi_edge_with_variable is the best option instead
of doing it in each time replace_phi_edge_with_variable is called which is
what is done today.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (replace_phi_edge_with_variable): Duplicate range
info if we're the only things setting the target PHI.
(value_replacement): Don't duplicate range here.
(minmax_replacement): Likewise.
---
 gcc/tree-ssa-phiopt.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 24cbce9955a..147397ad82c 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -391,6 +391,24 @@ replace_phi_edge_with_variable (basic_block cond_block,
   basic_block bb = gimple_bb (phi);
   basic_block block_to_remove;
   gimple_stmt_iterator gsi;
+  tree phi_result = PHI_RESULT (phi);
+
+  /* Duplicate range info as needed.  */
+  if (TREE_CODE (new_tree) == SSA_NAME
+  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
+  && INTEGRAL_TYPE_P (TREE_TYPE (phi_result)))
+{
+  if (!SSA_NAME_RANGE_INFO (new_tree)
+ && SSA_NAME_RANGE_INFO (phi_result))
+   duplicate_ssa_name_range_info (new_tree,
+  SSA_NAME_RANGE_TYPE (phi_result),
+  SSA_NAME_RANGE_INFO (phi_result));
+  if (SSA_NAME_RANGE_INFO (new_tree)
+ && !SSA_NAME_RANGE_INFO (phi_result))
+   duplicate_ssa_name_range_info (phi_result,
+  SSA_NAME_RANGE_TYPE (new_tree),
+  SSA_NAME_RANGE_INFO (new_tree));
+}
 
   /* Change the PHI argument to new.  */
   SET_USE (PHI_ARG_DEF_PTR (phi, e->dest_idx), new_tree);
@@ -1385,16 +1403,6 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   :
   # u_3 = PHI   */
   reset_flow_sensitive_info (lhs);
-  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
-   {
- /* If available, we can use VR of phi result at least.  */
- tree phires = gimple_phi_result (phi);
- struct range_info_def *phires_range_info
-   = SSA_NAME_RANGE_INFO (phires);
- if (phires_range_info)
-   duplicate_ssa_name_range_info (lhs, SSA_NAME_RANGE_TYPE (phires),
-  phires_range_info);
-   }
   gimple_stmt_iterator gsi_from;
   for (int i = prep_cnt - 1; i >= 0; --i)
{
@@ -1794,13 +1802,6 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
   gimple_seq stmts = NULL;
   tree phi_result = PHI_RESULT (phi);
   result = gimple_build (, minmax, TREE_TYPE (phi_result), arg0, arg1);
-  /* Duplicate range info if we're the only things setting the target PHI.  */
-  if (!gimple_seq_empty_p (stmts)
-  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
-  && !POINTER_TYPE_P (TREE_TYPE (phi_result))
-  && SSA_NAME_RANGE_INFO (phi_result))
-duplicate_ssa_name_range_info (result, SSA_NAME_RANGE_TYPE (phi_result),
-  SSA_NAME_RANGE_INFO (phi_result));
 
   gsi = gsi_last_bb (cond_bb);
   gsi_insert_seq_before (, stmts, GSI_NEW_STMT);
-- 
2.27.0



[PATCH 1/7] Expand the comparison argument of fold_cond_expr_with_comparison

2021-06-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

To make things slightly easiler to convert fold_cond_expr_with_comparison
over to match.pd, expanding the arg0 argument into 3 different arguments
is done. Also this was simple because we don't use arg0 after grabbing
the code and the two operands.
Also since we do this, we don't need to fold the comparison to
get the inverse but just use invert_tree_comparison directly.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* fold-const.c (fold_cond_expr_with_comparison):
Exand arg0 into comp_code, arg00, and arg01.
(fold_ternary_loc): Use invert_tree_comparison
instead of fold_invert_truthvalue for the case
where we have A CMP B ? C : A.
---
 gcc/fold-const.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0b33ee99a81..5e4bdeace1e 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -126,7 +126,8 @@ static tree range_binop (enum tree_code, tree, tree, int, 
tree, int);
 static tree range_predecessor (tree);
 static tree range_successor (tree);
 static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
-static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, 
tree);
+static tree fold_cond_expr_with_comparison (location_t, tree, enum tree_code,
+   tree, tree, tree, tree);
 static tree unextend (tree, int, int, tree);
 static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
 static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
@@ -5735,20 +5736,19 @@ merge_ranges (int *pin_p, tree *plow, tree *phigh, int 
in0_p, tree low0,
 
 
 /* Subroutine of fold, looking inside expressions of the form
-   A op B ? A : C, where ARG0, ARG1 and ARG2 are the three operands
-   of the COND_EXPR.  This function is being used also to optimize
-   A op B ? C : A, by reversing the comparison first.
+   A op B ? A : C, where (ARG00, COMP_CODE, ARG01), ARG1 and ARG2
+   are the three operands of the COND_EXPR.  This function is
+   being used also to optimize A op B ? C : A, by reversing the
+   comparison first.
 
Return a folded expression whose code is not a COND_EXPR
anymore, or NULL_TREE if no folding opportunity is found.  */
 
 static tree
 fold_cond_expr_with_comparison (location_t loc, tree type,
-   tree arg0, tree arg1, tree arg2)
+   enum tree_code comp_code,
+   tree arg00, tree arg01, tree arg1, tree arg2)
 {
-  enum tree_code comp_code = TREE_CODE (arg0);
-  tree arg00 = TREE_OPERAND (arg0, 0);
-  tree arg01 = TREE_OPERAND (arg0, 1);
   tree arg1_type = TREE_TYPE (arg1);
   tree tem;
 
@@ -12822,7 +12822,10 @@ fold_ternary_loc (location_t loc, enum tree_code code, 
tree type,
  && operand_equal_for_comparison_p (TREE_OPERAND (arg0, 0), op1)
  && !HONOR_SIGNED_ZEROS (element_mode (op1)))
{
- tem = fold_cond_expr_with_comparison (loc, type, arg0, op1, op2);
+ tem = fold_cond_expr_with_comparison (loc, type, TREE_CODE (arg0),
+   TREE_OPERAND (arg0, 0),
+   TREE_OPERAND (arg0, 1),
+   op1, op2);
  if (tem)
return tem;
}
@@ -12831,14 +12834,16 @@ fold_ternary_loc (location_t loc, enum tree_code 
code, tree type,
  && operand_equal_for_comparison_p (TREE_OPERAND (arg0, 0), op2)
  && !HONOR_SIGNED_ZEROS (element_mode (op2)))
{
- location_t loc0 = expr_location_or (arg0, loc);
- tem = fold_invert_truthvalue (loc0, arg0);
- if (tem && COMPARISON_CLASS_P (tem))
-   {
- tem = fold_cond_expr_with_comparison (loc, type, tem, op2, op1);
- if (tem)
-   return tem;
-   }
+ enum tree_code comp_code = TREE_CODE (arg0);
+ tree arg00 = TREE_OPERAND (arg0, 0);
+ tree arg01 = TREE_OPERAND (arg0, 1);
+ comp_code = invert_tree_comparison (comp_code, HONOR_NANS (arg00));
+ tem = fold_cond_expr_with_comparison (loc, type, comp_code,
+   arg00,
+   arg01,
+   op2, op1);
+ if (tem)
+   return tem;
}
 
   /* If the second operand is simpler than the third, swap them
-- 
2.27.0



[PATCH 2/7] Reset the range info on the moved instruction in PHIOPT

2021-06-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

I had missed this when wrote the patch which allowed the
gimple to be moved from inside the conditional as it.  It
was also missed in the review.  Anyways the range information
needs to be reset for the moved gimple as it was under a
conditional and the flow has changed to be unconditional.
I have not seen any testcase in the wild that produces wrong code
yet which is why there is no testcase but this is similar to what
the other code in phiopt does so after moving those to match, there
might be some.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (match_simplify_replacement): Reset
flow senatitive info on the moved ssa set.
---
 gcc/tree-ssa-phiopt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 02e26f974a5..24cbce9955a 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -836,7 +836,7 @@ match_simplify_replacement (basic_block cond_bb, 
basic_block middle_bb,
   if (!is_gimple_assign (stmt_to_move))
return false;
 
-  tree lhs = gimple_assign_lhs  (stmt_to_move);
+  tree lhs = gimple_assign_lhs (stmt_to_move);
   gimple *use_stmt;
   use_operand_p use_p;
 
@@ -892,6 +892,7 @@ match_simplify_replacement (basic_block cond_bb, 
basic_block middle_bb,
}
   gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt_to_move);
   gsi_move_before (, );
+  reset_flow_sensitive_info (gimple_assign_lhs (stmt_to_move));
 }
   if (seq)
 gsi_insert_seq_before (, seq, GSI_SAME_STMT);
-- 
2.27.0



[PATCH 0/7] PHI-OPT move abs_replacement to match.pd

2021-06-23 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

To able to move PHI-OPT's abs_replacement to match.pd, a bunch
of support needed to be added to PHI-OPT.
This is a set of 7 patches which allows us to remove abs_replacement
and even does one set further and does a few extra transformations
that abs_replacement did not do (just because of the moving from
fold to match).

Andrew Pinski (7):
  Expand the comparison argument of fold_cond_expr_with_comparison
  Reset the range info on the moved instruction in PHIOPT
  Duplicate the range information of the phi onto the new ssa_name
  Allow match-and-simplified phiopt to run in early phiopt
  Try inverted comparison for match_simplify in phiopt
  Lower for loops before lowering cond in genmatch
  Port most of the A CMP 0 ? A : -A to match

 gcc/fold-const.c   |  39 +--
 gcc/genmatch.c |  28 +-
 gcc/match.pd   |  60 +
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-15.c |   4 +-
 gcc/tree-ssa-phiopt.c  | 286 +
 5 files changed, 217 insertions(+), 200 deletions(-)

-- 
2.27.0



Re: [PATCH] c++: excessive instantiation during CTAD [PR101174]

2021-06-23 Thread Jason Merrill via Gcc-patches

On 6/23/21 4:49 PM, Patrick Palka wrote:

On Wed, 23 Jun 2021, Jason Merrill wrote:


On 6/23/21 2:18 PM, Patrick Palka wrote:

We set DECL_CONTEXT on implicitly generated deduction guides so that
their access is consistent with that of the constructor.  But this
apparently leads to excessive instantiation in some cases, ultimately
because instantiation of a deduction guide should be independent of
instantiation of the resulting class specialization, but setting the
DECL_CONTEXT of the former to the latter breaks this independence.

To fix this, this patch makes push_access_scope handle artificial
deduction guides specifically instead of setting their DECL_CONTEXT
in build_deduction_guide.  We could alternatively make the class
befriend the guide via DECL_BEFRIENDING_CLASSES, but that'd break
class-deduction-access3.C below since friendship isn't transitive.

Bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for
trunk?

PR c++/101174

gcc/cp/ChangeLog:

* pt.c (push_access_scope): For artificial deduction guides,
set the access scope to that of the constructor.
(pop_access_scope): Likewise.
(build_deduction_guide): Don't set DECL_CONTEXT.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/multiset/cons/deduction.cc:
Uncomment CTAD example that was rejected by this bug.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction-access3.C: New test.
* g++.dg/cpp1z/class-deduction91.C: New test.
---
   gcc/cp/pt.c   | 11 ++
   .../g++.dg/cpp1z/class-deduction-access3.C| 22 +++
   .../g++.dg/cpp1z/class-deduction91.C  | 16 ++
   .../23_containers/multiset/cons/deduction.cc  |  2 --
   .../23_containers/set/cons/deduction.cc   |  2 --
   5 files changed, 45 insertions(+), 8 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction91.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 732fb405adf..5c55507203a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -236,6 +236,10 @@ push_access_scope (tree t)
   push_nested_class (DECL_FRIEND_CONTEXT (t));
 else if (DECL_CLASS_SCOPE_P (t))
   push_nested_class (DECL_CONTEXT (t));
+  else if (deduction_guide_p (t) && DECL_ARTIFICIAL (t))
+/* An artificial deduction guide should have the same access as
+   the constructor.  */
+push_nested_class (TREE_TYPE (TREE_TYPE (t)));


Let's be more specific and actually push into the access scope of the
constructor (DECL_ABSTRACT_ORIGIN) if set.


Hmm, IIUC this doesn't seem to work because DECL_ABSTRACT_ORIGIN is
not set on specializations of a deduction guide, only on the
TEMPLATE_DECL.


Ah, true.  Your patch is OK, then.


 else
   push_to_top_level ();
   @@ -255,7 +259,9 @@ pop_access_scope (tree t)
 if (TREE_CODE (t) == FUNCTION_DECL)
   current_function_decl = saved_access_scope->pop();
   -  if (DECL_FRIEND_CONTEXT (t) || DECL_CLASS_SCOPE_P (t))
+  if (DECL_FRIEND_CONTEXT (t)
+  || DECL_CLASS_SCOPE_P (t)
+  || (deduction_guide_p (t) && DECL_ARTIFICIAL (t)))
   pop_nested_class ();
 else
   pop_from_top_level ();
@@ -28804,9 +28810,6 @@ build_deduction_guide (tree type, tree ctor, tree
outer_args, tsubst_flags_t com
   DECL_ABSTRACT_ORIGIN (ded_tmpl) = fn_tmpl;
 if (ci)
   set_constraints (ded_tmpl, ci);
-  /* The artificial deduction guide should have same access as the
- constructor.  */
-  DECL_CONTEXT (ded_fn) = type;
   return ded_tmpl;
   }
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
b/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
new file mode 100644
index 000..44b4b5c455f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
@@ -0,0 +1,22 @@
+// { dg-do compile { target c++17 } }
+
+template
+struct Cont;
+
+template
+struct Base
+{
+private:
+  using type = T;
+  friend Cont;
+};
+
+template
+struct Cont
+{
+  using argument_type = typename Base::type;
+
+  Cont(T, argument_type) { }
+};
+
+Cont c(1, 1);
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
b/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
new file mode 100644
index 000..f474c8e35ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
@@ -0,0 +1,16 @@
+// PR c++/101174
+// { dg-do compile { target c++17 } }
+
+struct S { using type = int; };
+
+template
+struct multiset {
+  using type = typename U::type;
+  multiset(T);
+  multiset(U);
+};
+
+template
+multiset(T) -> multiset;
+
+multiset c(42);
diff --git a/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
b/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
index a4ccc6fa467..8b7a16042a4 100644
--- a/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
+++ 

Re: [PATCH] c++: excessive instantiation during CTAD [PR101174]

2021-06-23 Thread Patrick Palka via Gcc-patches
On Wed, 23 Jun 2021, Jason Merrill wrote:

> On 6/23/21 2:18 PM, Patrick Palka wrote:
> > We set DECL_CONTEXT on implicitly generated deduction guides so that
> > their access is consistent with that of the constructor.  But this
> > apparently leads to excessive instantiation in some cases, ultimately
> > because instantiation of a deduction guide should be independent of
> > instantiation of the resulting class specialization, but setting the
> > DECL_CONTEXT of the former to the latter breaks this independence.
> > 
> > To fix this, this patch makes push_access_scope handle artificial
> > deduction guides specifically instead of setting their DECL_CONTEXT
> > in build_deduction_guide.  We could alternatively make the class
> > befriend the guide via DECL_BEFRIENDING_CLASSES, but that'd break
> > class-deduction-access3.C below since friendship isn't transitive.
> > 
> > Bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for
> > trunk?
> > 
> > PR c++/101174
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.c (push_access_scope): For artificial deduction guides,
> > set the access scope to that of the constructor.
> > (pop_access_scope): Likewise.
> > (build_deduction_guide): Don't set DECL_CONTEXT.
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > * testsuite/23_containers/multiset/cons/deduction.cc:
> > Uncomment CTAD example that was rejected by this bug.
> > * testsuite/23_containers/set/cons/deduction.cc: Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1z/class-deduction-access3.C: New test.
> > * g++.dg/cpp1z/class-deduction91.C: New test.
> > ---
> >   gcc/cp/pt.c   | 11 ++
> >   .../g++.dg/cpp1z/class-deduction-access3.C| 22 +++
> >   .../g++.dg/cpp1z/class-deduction91.C  | 16 ++
> >   .../23_containers/multiset/cons/deduction.cc  |  2 --
> >   .../23_containers/set/cons/deduction.cc   |  2 --
> >   5 files changed, 45 insertions(+), 8 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
> > 
> > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > index 732fb405adf..5c55507203a 100644
> > --- a/gcc/cp/pt.c
> > +++ b/gcc/cp/pt.c
> > @@ -236,6 +236,10 @@ push_access_scope (tree t)
> >   push_nested_class (DECL_FRIEND_CONTEXT (t));
> > else if (DECL_CLASS_SCOPE_P (t))
> >   push_nested_class (DECL_CONTEXT (t));
> > +  else if (deduction_guide_p (t) && DECL_ARTIFICIAL (t))
> > +/* An artificial deduction guide should have the same access as
> > +   the constructor.  */
> > +push_nested_class (TREE_TYPE (TREE_TYPE (t)));
> 
> Let's be more specific and actually push into the access scope of the
> constructor (DECL_ABSTRACT_ORIGIN) if set.

Hmm, IIUC this doesn't seem to work because DECL_ABSTRACT_ORIGIN is
not set on specializations of a deduction guide, only on the
TEMPLATE_DECL.

> 
> > else
> >   push_to_top_level ();
> >   @@ -255,7 +259,9 @@ pop_access_scope (tree t)
> > if (TREE_CODE (t) == FUNCTION_DECL)
> >   current_function_decl = saved_access_scope->pop();
> >   -  if (DECL_FRIEND_CONTEXT (t) || DECL_CLASS_SCOPE_P (t))
> > +  if (DECL_FRIEND_CONTEXT (t)
> > +  || DECL_CLASS_SCOPE_P (t)
> > +  || (deduction_guide_p (t) && DECL_ARTIFICIAL (t)))
> >   pop_nested_class ();
> > else
> >   pop_from_top_level ();
> > @@ -28804,9 +28810,6 @@ build_deduction_guide (tree type, tree ctor, tree
> > outer_args, tsubst_flags_t com
> >   DECL_ABSTRACT_ORIGIN (ded_tmpl) = fn_tmpl;
> > if (ci)
> >   set_constraints (ded_tmpl, ci);
> > -  /* The artificial deduction guide should have same access as the
> > - constructor.  */
> > -  DECL_CONTEXT (ded_fn) = type;
> >   return ded_tmpl;
> >   }
> > diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
> > b/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
> > new file mode 100644
> > index 000..44b4b5c455f
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
> > @@ -0,0 +1,22 @@
> > +// { dg-do compile { target c++17 } }
> > +
> > +template
> > +struct Cont;
> > +
> > +template
> > +struct Base
> > +{
> > +private:
> > +  using type = T;
> > +  friend Cont;
> > +};
> > +
> > +template
> > +struct Cont
> > +{
> > +  using argument_type = typename Base::type;
> > +
> > +  Cont(T, argument_type) { }
> > +};
> > +
> > +Cont c(1, 1);
> > diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
> > b/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
> > new file mode 100644
> > index 000..f474c8e35ec
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
> > @@ -0,0 +1,16 @@
> > +// PR c++/101174
> > +// { dg-do compile { target c++17 } }
> > +
> > +struct S { using type = int; };
> > +
> > +template
> > +struct multiset {
> > +  using type = typename U::type;
> > +  

[committed] [PATCH v2] doc/lto.texi: List slim object format as the default

2021-06-23 Thread Dimitar Dimitrov
On Mon, Jun 21, 2021 at 08:09:20AM +0200, Richard Biener wrote:
> On Sun, 20 Jun 2021, Dimitar Dimitrov wrote:
> 
> > Slim LTO object files have been the default for quite a while, since:
> >   commit e9f67e625c2a4225a7169d7220dcb85b6fdd7ca9
> >   Author: Jan Hubicka 
> >   common.opt (ffat-lto-objects): Disable by default.
> > 
> > That commit did not update lto.texi, so do it now.
> 
> LGTM.  Btw, on targets where linker plugin support is not detected
> by configury fat objects are still the default.
> 

Thanks. I added a comment about fat objects being default if ar/nm
plugin is not detected when GCC is configured.  I committed the
following:


[PATCH v2] doc/lto.texi: List slim object format as the default

Slim LTO object files have been the default for quite a while, since:
  commit e9f67e625c2a4225a7169d7220dcb85b6fdd7ca9
  Author: Jan Hubicka 
  common.opt (ffat-lto-objects): Disable by default.

That commit did not update lto.texi, so do it now.

gcc/ChangeLog:

* doc/lto.texi (Design Overview): Update that slim objects are
the default.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/doc/lto.texi | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/gcc/doc/lto.texi b/gcc/doc/lto.texi
index 1f55216328a..3c5de2144d0 100644
--- a/gcc/doc/lto.texi
+++ b/gcc/doc/lto.texi
@@ -36,11 +36,18 @@ bytecode representation of GIMPLE that is emitted in 
special sections
 of @code{.o} files.  Currently, LTO support is enabled in most
 ELF-based systems, as well as darwin, cygwin and mingw systems.
 
-Since GIMPLE bytecode is saved alongside final object code, object
-files generated with LTO support are larger than regular object files.
-This ``fat'' object format makes it easy to integrate LTO into
-existing build systems, as one can, for instance, produce archives of
-the files.  Additionally, one might be able to ship one set of fat
+By default, object files generated with LTO support contain only GIMPLE
+bytecode.  Such objects are called ``slim'', and they require that
+tools like @code{ar} and @code{nm} understand symbol tables of LTO
+sections.  For most targets these tools have been extended to use the
+plugin infrastructure, so GCC can support ``slim'' objects consisting
+of the intermediate code alone.
+
+GIMPLE bytecode could also be saved alongside final object code if
+the @option{-ffat-lto-objects} option is passed, or if no plugin support
+is detected for @code{ar} and @code{nm} when GCC is configured.  It makes
+the object files generated with LTO support larger than regular object
+files.  This ``fat'' object format allows to ship one set of fat
 objects which could be used both for development and the production of
 optimized builds.  A, perhaps surprising, side effect of this feature
 is that any mistake in the toolchain leads to LTO information not
@@ -49,14 +56,6 @@ This is both an advantage, as the system is more robust, and 
a
 disadvantage, as the user is not informed that the optimization has
 been disabled.
 
-The current implementation only produces ``fat'' objects, effectively
-doubling compilation time and increasing file sizes up to 5x the
-original size.  This hides the problem that some tools, such as
-@code{ar} and @code{nm}, need to understand symbol tables of LTO
-sections.  These tools were extended to use the plugin infrastructure,
-and with these problems solved, GCC will also support ``slim'' objects
-consisting of the intermediate code alone.
-
 At the highest level, LTO splits the compiler in two.  The first half
 (the ``writer'') produces a streaming representation of all the
 internal data structures needed to optimize and generate code.  This
-- 
2.31.1



[committed] fortran/dump-parse-tree.c: Use proper enum type (was: Re: [Patch ]Fortran/OpenMP: Extend defaultmap clause for OpenMP 5 [PR92568])

2021-06-23 Thread Tobias Burnus

Indeed, I somehow managed to use the wrong of the two enums in the cast ...
Fixed in  r12-1760-gcac3527793b38164e2a83c7ccbfe0cfcf5ac95b8

On 23.06.21 16:13, Martin Liška wrote:


I noticed the patch causes the following clang warnings:
gcc/fortran/dump-parse-tree.c:1786:11: warning: comparison of
different enumeration types in switch statement ('enum
gfc_omp_defaultmap' and 'gfc_omp_defaultmap_category')
[-Wenum-compare-switch]


Thanks for reporting!

Tobias


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
commit cac3527793b38164e2a83c7ccbfe0cfcf5ac95b8
Author: Tobias Burnus 
Date:   Wed Jun 23 22:10:43 2021 +0200

fortran/dump-parse-tree.c: Use proper enum type

gcc/fortran/ChangeLog:

* dump-parse-tree.c (show_omp_clauses): Fix enum type used
for dumping gfc_omp_defaultmap_category.

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 07e98b7e30d..26841eefb7d 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1781,7 +1781,7 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   if (i != OMP_DEFAULTMAP_CAT_UNCATEGORIZED)
 	{
 	  fputc (':', dumpfile);
-	  switch ((enum gfc_omp_defaultmap) i)
+	  switch ((enum gfc_omp_defaultmap_category) i)
 	{
 	case OMP_DEFAULTMAP_CAT_SCALAR: dfltmap = "SCALAR"; break;
 	case OMP_DEFAULTMAP_CAT_AGGREGATE: dfltmap = "AGGREGATE"; break;


Re: [PATCH] c++: excessive instantiation during CTAD [PR101174]

2021-06-23 Thread Jason Merrill via Gcc-patches

On 6/23/21 2:18 PM, Patrick Palka wrote:

We set DECL_CONTEXT on implicitly generated deduction guides so that
their access is consistent with that of the constructor.  But this
apparently leads to excessive instantiation in some cases, ultimately
because instantiation of a deduction guide should be independent of
instantiation of the resulting class specialization, but setting the
DECL_CONTEXT of the former to the latter breaks this independence.

To fix this, this patch makes push_access_scope handle artificial
deduction guides specifically instead of setting their DECL_CONTEXT
in build_deduction_guide.  We could alternatively make the class
befriend the guide via DECL_BEFRIENDING_CLASSES, but that'd break
class-deduction-access3.C below since friendship isn't transitive.

Bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for
trunk?

PR c++/101174

gcc/cp/ChangeLog:

* pt.c (push_access_scope): For artificial deduction guides,
set the access scope to that of the constructor.
(pop_access_scope): Likewise.
(build_deduction_guide): Don't set DECL_CONTEXT.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/multiset/cons/deduction.cc:
Uncomment CTAD example that was rejected by this bug.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction-access3.C: New test.
* g++.dg/cpp1z/class-deduction91.C: New test.
---
  gcc/cp/pt.c   | 11 ++
  .../g++.dg/cpp1z/class-deduction-access3.C| 22 +++
  .../g++.dg/cpp1z/class-deduction91.C  | 16 ++
  .../23_containers/multiset/cons/deduction.cc  |  2 --
  .../23_containers/set/cons/deduction.cc   |  2 --
  5 files changed, 45 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction91.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 732fb405adf..5c55507203a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -236,6 +236,10 @@ push_access_scope (tree t)
  push_nested_class (DECL_FRIEND_CONTEXT (t));
else if (DECL_CLASS_SCOPE_P (t))
  push_nested_class (DECL_CONTEXT (t));
+  else if (deduction_guide_p (t) && DECL_ARTIFICIAL (t))
+/* An artificial deduction guide should have the same access as
+   the constructor.  */
+push_nested_class (TREE_TYPE (TREE_TYPE (t)));


Let's be more specific and actually push into the access scope of the 
constructor (DECL_ABSTRACT_ORIGIN) if set.



else
  push_to_top_level ();
  
@@ -255,7 +259,9 @@ pop_access_scope (tree t)

if (TREE_CODE (t) == FUNCTION_DECL)
  current_function_decl = saved_access_scope->pop();
  
-  if (DECL_FRIEND_CONTEXT (t) || DECL_CLASS_SCOPE_P (t))

+  if (DECL_FRIEND_CONTEXT (t)
+  || DECL_CLASS_SCOPE_P (t)
+  || (deduction_guide_p (t) && DECL_ARTIFICIAL (t)))
  pop_nested_class ();
else
  pop_from_top_level ();
@@ -28804,9 +28810,6 @@ build_deduction_guide (tree type, tree ctor, tree 
outer_args, tsubst_flags_t com
  DECL_ABSTRACT_ORIGIN (ded_tmpl) = fn_tmpl;
if (ci)
  set_constraints (ded_tmpl, ci);
-  /* The artificial deduction guide should have same access as the
- constructor.  */
-  DECL_CONTEXT (ded_fn) = type;
  
return ded_tmpl;

  }
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
new file mode 100644
index 000..44b4b5c455f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
@@ -0,0 +1,22 @@
+// { dg-do compile { target c++17 } }
+
+template
+struct Cont;
+
+template
+struct Base
+{
+private:
+  using type = T;
+  friend Cont;
+};
+
+template
+struct Cont
+{
+  using argument_type = typename Base::type;
+
+  Cont(T, argument_type) { }
+};
+
+Cont c(1, 1);
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
new file mode 100644
index 000..f474c8e35ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
@@ -0,0 +1,16 @@
+// PR c++/101174
+// { dg-do compile { target c++17 } }
+
+struct S { using type = int; };
+
+template
+struct multiset {
+  using type = typename U::type;
+  multiset(T);
+  multiset(U);
+};
+
+template
+multiset(T) -> multiset;
+
+multiset c(42);
diff --git a/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc 
b/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
index a4ccc6fa467..8b7a16042a4 100644
--- a/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
+++ b/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
@@ -19,11 +19,9 @@ static_assert(std::is_same_v<
  decltype(std::multiset{{1, 2, 3}, std::less{}, {}}),
  std::multiset>);
  
-/* FIXME: GCC 12 rejects this due to PR c++/101174

  

Re: [PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)

2021-06-23 Thread Martin Sebor via Gcc-patches

On 6/22/21 5:28 PM, David Malcolm wrote:

On Tue, 2021-06-22 at 19:18 -0400, David Malcolm wrote:

On Fri, 2021-06-04 at 15:41 -0600, Martin Sebor wrote:

The attached patch introduces the suppress_warning(),
warning_suppressed(), and copy_no_warning() APIs without making
use of them in the rest of GCC.  They are in three files:

    diagnostic-spec.{h,c}: Location-centric overloads.
    warning-control.cc: Tree- and gimple*-centric overloads.

The location-centric overloads are suitable to use from the
diagnostic
subsystem.  The rest can be used from the front ends and the middle
end.


[...snip...]


+/* Return true if warning OPT is suppressed for decl/expression
EXPR.
+   By default tests the disposition for any warning.  */
+
+bool
+warning_suppressed_p (const_tree expr, opt_code opt /* =
all_warnings */)
+{
+  const nowarn_spec_t *spec = get_nowarn_spec (expr);
+
+  if (!spec)
+    return get_no_warning_bit (expr);
+
+  const nowarn_spec_t optspec (opt);
+  bool dis = *spec & optspec;
+  gcc_assert (get_no_warning_bit (expr) || !dis);
+  return dis;


Looking through the patch, I don't like the use of "dis" for the "is
suppressed" bool, since...


It's an argument to a a function, unlikely to confuse anyone.  But
I also don't think it's important enough to argue about so I changed
it along with the comment below.



[...snip...]


+
+/* Enable, or by default disable, a warning for the statement STMT.
+   The wildcard OPT of -1 controls all warnings.  */


...I find the above comment to be confusingly worded due to the double-
negative.

If I'm reading your intent correctly, how about this wording:

/* Change the supression of warnings for statement STMT.
    OPT controls which warnings are affected.
    The wildcard OPT of -1 controls all warnings.
    If SUPP is true (the default), enable the suppression of the
warnings.
    If SUPP is false, disable the suppression of the warnings.  */

...and rename "dis" to "supp".

Or have I misread the intent of the patch?


+
+void
+suppress_warning (gimple *stmt, opt_code opt /* = all_warnings */,
+ bool dis /* = true */)



+{
+  if (opt == no_warning)
+    return;
+
+  const key_type_t key = convert_to_key (stmt);
+
+  dis = suppress_warning_at (key, opt, dis) || dis;
+  set_no_warning_bit (stmt, dis);
+}


[...snip...]


Also, I think I prefer having a separate entrypoints for the "all
warnings" case; on reading through the various patches I think that in
e.g.:

- TREE_NO_WARNING (*expr_p) = 1;
+ suppress_warning (*expr_p);

I prefer:

   suppress_warnings (*expr_p);

(note the plural) since that way we can grep for them, and it seems
like better grammar to me.

Both entrypoints could be implemented by a static suppress_warning_1
internally if that makes it easier.

In that vein, "unsuppress_warning" seems far clearer to me that
"suppress_warning (FOO, false)"; IIRC there are very few uses of this
non-default arg (I couldn't find any in a quick look through the v2
kit).

Does this make sense?


In my experience, names that differ only slightly (such as in just
one letter) tend to easily lead to frustrating mistakes.  The new
warning_suppressed_p() added in this patch also handles multiple
warnings (via a wildcard) and uses a singular form, and as do
the gimple_{get,set}no_warning() functions that are being replaced.
So I don't think the suggested change is needed or would be
an improvement.

Similarly, there is no gimple_unset_no_warning() today and I don't
think adding unsuppress_warning() is necessary or would add value,
especially since there are just a handful of callers that re-enable
warnings.  For the callers that might need to enable or disable
a warning based on some condition, it's more convenient to do so
in a single call then by having to call different functions based
the value of the condition.

In the attached patch I've changed the comment as you asked and
also renamed the dis argument to supp but kept the function names
the same.  I've also moved a few changes to tree.h from a later
patch to this one to let it stand alone and regtested it on its
own on x86_64-linux.  I'll plan to go ahead with this version
unless requestsfor substantive changes come up in the review of
the rest  of the changes.

Thanks
Martin
Add support for per-location warning groups.

gcc/ChangeLog:

	* Makefile.in (OBJS-libcommon): Add diagnostic-spec.o.
	* gengtype.c (open_base_files): Add diagnostic-spec.h.
	* diagnostic-spec.c: New file.
	* diagnostic-spec.h: New file.
	* tree.h (no_warning, all_warnings, suppress_warning_at): New
	declarations.
	* warning-control.cc: New file.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d32de22e4f2..3d8c2b94d70 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1698,6 +1698,7 @@ OBJS = \
 	vmsdbgout.o \
 	vr-values.o \
 	vtable-verify.o \
+	warning-control.o \
 	web.o \
 	wide-int.o \
 	wide-int-print.o \
@@ -1709,8 +1710,8 @@ OBJS = \
 
 # Objects in libcommon.a, potentially 

Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-06-23 Thread Andrew MacLeod via Gcc-patches

On 6/23/21 2:37 PM, Richard Biener via Gcc-patches wrote:

On June 23, 2021 5:03:05 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
 wrote:

The call to gimple_call_fntype() in gimple_call_return_type() may
return
NULL, which causes the TREE_TYPE(lhs) to ICE.  I think it would be best
to
return NULL (or void_type_node) rather than aborting.

I'm running into this because fold_using_range::range_of_call, calls
gimple_call_return_type which may ICE for builtins with no LHS.
Instead
of special casing things in range_of_call, perhaps it's best to plug
the
source.

Does this sound reasonable?

No, you need to make sure to not call this on an internal function call instead.
Otherwise it is never NULL.

Richard.


Out of curiosity, why is it not OK to call this on an internal function 
call?   Shouldn't all calls return something at least? like VOIDmode if 
they don't return anything?  It just seems like it needs to be special 
cased either at any possible call site, or simply in the routine.   we 
stumbled across it and it wasn't obvious why.


Andrew




Re: Ping: [PATCH] Generate 128-bit divide/modulus

2021-06-23 Thread Michael Meissner via Gcc-patches
Note, as I mentioned previously, this patch needs to go into GCC 11.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: Ping: [PATCH 2/3 V2] Fix IEEE 128-bit min/max test.

2021-06-23 Thread Michael Meissner via Gcc-patches
Note, this patch should go into GCC 11 after baking on the master branch.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Ping: [PATCH 3/3] Add IEEE 128-bit fp conditional move on PowerPC.

2021-06-23 Thread Michael Meissner via Gcc-patches
Ping this patch.

| Date: Tue, 8 Jun 2021 20:24:47 -0400
| Subject: [PATCH 3/3] Add IEEE 128-bit fp conditional move on PowerPC.
| Message-ID: <20210609002447.gc18...@ibm-toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Ping: [PATCH 2/3 V2] Fix IEEE 128-bit min/max test.

2021-06-23 Thread Michael Meissner via Gcc-patches
Ping this patch:

| Date: Thu, 17 Jun 2021 18:56:09 -0400
| Subject: [PATCH 2/3 V2] Fix IEEE 128-bit min/max test.
| Message-ID: <20210617225609.ga4...@ibm-toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Ping: [PATCH] Generate 128-bit divide/modulus

2021-06-23 Thread Michael Meissner via Gcc-patches
Ping this patch.

| Date: Fri, 4 Jun 2021 11:10:37 -0400
| Subject: Generate 128-bit divide/modulus
| Message-ID: <20210604151037.ga27...@ibm-toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] testsuite: add -fwrapv for 950704-1.c

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/22/2021 3:41 AM, Xi Ruoyao via Gcc-patches wrote:

On Tue, 2021-06-22 at 10:37 +0200, Richard Biener wrote:

On Mon, Jun 21, 2021 at 6:53 PM Xi Ruoyao via Gcc-patches
 wrote:

This test relies on wrap behavior of signed overflow.  Without -
fwrapv
it is known to fail on mips (and maybe some other targets as well).

OK.

I don't have write access, requiring a maintainer to commit the change.
If you're going to contribute regularly, then it's probably best to go 
ahead and get you write access so that you can apply approved patches.


https://gcc.gnu.org/gitwrite.html#authenticated


There should be a link to a form you'll need to fill out.  You can list 
me as a sponsor.


jeff



Re: [PATCH] testsuite: add -fwrapv for 950704-1.c

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/21/2021 10:51 AM, Xi Ruoyao via Gcc-patches wrote:

This test relies on wrap behavior of signed overflow.  Without -fwrapv
it is known to fail on mips (and maybe some other targets as well).

gcc/testsuite/

* gcc.c-torture/execute/950704-1.c: Add -fwrapv to avoid
  undefined behavior.

Thanks.  I've installed this on the trunk.
jeff



Re: [RFC] Return NULL from gimple_call_return_type if no return available.

2021-06-23 Thread Richard Biener via Gcc-patches
On June 23, 2021 5:03:05 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
 wrote:
>The call to gimple_call_fntype() in gimple_call_return_type() may
>return
>NULL, which causes the TREE_TYPE(lhs) to ICE.  I think it would be best
>to
>return NULL (or void_type_node) rather than aborting.
>
>I'm running into this because fold_using_range::range_of_call, calls
>gimple_call_return_type which may ICE for builtins with no LHS. 
>Instead
>of special casing things in range_of_call, perhaps it's best to plug
>the
>source.
>
>Does this sound reasonable?

No, you need to make sure to not call this on an internal function call 
instead. 
Otherwise it is never NULL. 

Richard. 

>gcc/ChangeLog:
>
>   * gimple.h (gimple_call_return_type): Return NULL when no type
>   and no lhs is available.
>---
> gcc/gimple.h | 5 -
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
>diff --git a/gcc/gimple.h b/gcc/gimple.h
>index e7dc2a45a13..2a01fe631ec 100644
>--- a/gcc/gimple.h
>+++ b/gcc/gimple.h
>@@ -3182,7 +3182,10 @@ gimple_call_return_type (const gcall *gs)
>   tree type = gimple_call_fntype (gs);
> 
>   if (type == NULL_TREE)
>-return TREE_TYPE (gimple_call_lhs (gs));
>+{
>+  tree lhs = gimple_call_lhs (gs);
>+  return lhs ? TREE_TYPE (lhs) : NULL_TREE;
>+}
> 
>   /* The type returned by a function is the type of its
>  function type.  */



[PATCH] c++: excessive instantiation during CTAD [PR101174]

2021-06-23 Thread Patrick Palka via Gcc-patches
We set DECL_CONTEXT on implicitly generated deduction guides so that
their access is consistent with that of the constructor.  But this
apparently leads to excessive instantiation in some cases, ultimately
because instantiation of a deduction guide should be independent of
instantiation of the resulting class specialization, but setting the
DECL_CONTEXT of the former to the latter breaks this independence.

To fix this, this patch makes push_access_scope handle artificial
deduction guides specifically instead of setting their DECL_CONTEXT
in build_deduction_guide.  We could alternatively make the class
befriend the guide via DECL_BEFRIENDING_CLASSES, but that'd break
class-deduction-access3.C below since friendship isn't transitive.

Bootstrapped and regtested on x86_64-pc-linxu-gnu, does this look OK for
trunk?

PR c++/101174

gcc/cp/ChangeLog:

* pt.c (push_access_scope): For artificial deduction guides,
set the access scope to that of the constructor.
(pop_access_scope): Likewise.
(build_deduction_guide): Don't set DECL_CONTEXT.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/multiset/cons/deduction.cc:
Uncomment CTAD example that was rejected by this bug.
* testsuite/23_containers/set/cons/deduction.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction-access3.C: New test.
* g++.dg/cpp1z/class-deduction91.C: New test.
---
 gcc/cp/pt.c   | 11 ++
 .../g++.dg/cpp1z/class-deduction-access3.C| 22 +++
 .../g++.dg/cpp1z/class-deduction91.C  | 16 ++
 .../23_containers/multiset/cons/deduction.cc  |  2 --
 .../23_containers/set/cons/deduction.cc   |  2 --
 5 files changed, 45 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction91.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 732fb405adf..5c55507203a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -236,6 +236,10 @@ push_access_scope (tree t)
 push_nested_class (DECL_FRIEND_CONTEXT (t));
   else if (DECL_CLASS_SCOPE_P (t))
 push_nested_class (DECL_CONTEXT (t));
+  else if (deduction_guide_p (t) && DECL_ARTIFICIAL (t))
+/* An artificial deduction guide should have the same access as
+   the constructor.  */
+push_nested_class (TREE_TYPE (TREE_TYPE (t)));
   else
 push_to_top_level ();
 
@@ -255,7 +259,9 @@ pop_access_scope (tree t)
   if (TREE_CODE (t) == FUNCTION_DECL)
 current_function_decl = saved_access_scope->pop();
 
-  if (DECL_FRIEND_CONTEXT (t) || DECL_CLASS_SCOPE_P (t))
+  if (DECL_FRIEND_CONTEXT (t)
+  || DECL_CLASS_SCOPE_P (t)
+  || (deduction_guide_p (t) && DECL_ARTIFICIAL (t)))
 pop_nested_class ();
   else
 pop_from_top_level ();
@@ -28804,9 +28810,6 @@ build_deduction_guide (tree type, tree ctor, tree 
outer_args, tsubst_flags_t com
 DECL_ABSTRACT_ORIGIN (ded_tmpl) = fn_tmpl;
   if (ci)
 set_constraints (ded_tmpl, ci);
-  /* The artificial deduction guide should have same access as the
- constructor.  */
-  DECL_CONTEXT (ded_fn) = type;
 
   return ded_tmpl;
 }
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
new file mode 100644
index 000..44b4b5c455f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction-access3.C
@@ -0,0 +1,22 @@
+// { dg-do compile { target c++17 } }
+
+template
+struct Cont;
+
+template
+struct Base
+{
+private:
+  using type = T;
+  friend Cont;
+};
+
+template
+struct Cont
+{
+  using argument_type = typename Base::type;
+
+  Cont(T, argument_type) { }
+};
+
+Cont c(1, 1);
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
new file mode 100644
index 000..f474c8e35ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction91.C
@@ -0,0 +1,16 @@
+// PR c++/101174
+// { dg-do compile { target c++17 } }
+
+struct S { using type = int; };
+
+template
+struct multiset {
+  using type = typename U::type;
+  multiset(T);
+  multiset(U);
+};
+
+template
+multiset(T) -> multiset;
+
+multiset c(42);
diff --git a/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc 
b/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
index a4ccc6fa467..8b7a16042a4 100644
--- a/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
+++ b/libstdc++-v3/testsuite/23_containers/multiset/cons/deduction.cc
@@ -19,11 +19,9 @@ static_assert(std::is_same_v<
  decltype(std::multiset{{1, 2, 3}, std::less{}, {}}),
  std::multiset>);
 
-/* FIXME: GCC 12 rejects this due to PR c++/101174
 static_assert(std::is_same_v<
  decltype(std::multiset{{1, 2, 3}, std::less{}}),
  std::multiset>);
-*/
 
 static_assert(std::is_same_v<
  decltype(std::multiset{{1, 2, 3}, 

Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-23 Thread Jonathan Wakely via Gcc-patches

On 23/06/21 18:51 +0100, Jonathan Wakely wrote:

Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
Pushed to trunk.






commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
Author: Cassio Neri 
Date:   Wed Jun 23 15:32:16 2021

   libstdc++: More efficient std::chrono::year::leap
   
   Simple change to std::chrono::year::is_leap. If a year is multiple of 100,

   then it's divisible by 400 if and only if it's divisible by 16. The latter
   allows for better code generation.
   
   The expression is then either y%16 or y%4 which are both powers of two

   and so it can be rearranged to use simple bitmask operations.
   
   Co-authored-by: Jonathan Wakely 

   Co-authored-by: Ulrich Drepper 
   
   libstdc++-v3/ChangeLog:
   
   * include/std/chrono (chrono::year::is_leap()): Optimize.


diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..863b6a27bdf 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// [1] https://github.com/cassioneri/calendar
// [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16

+   // Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0,
+   // so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0


But Ulrich pointed out I got my boolean logic all muddled up in the
comment. Fixed with the attached patch!

commit 4a404f66b09d661bdc60e7e0d5d08f00c4e194db
Author: Jonathan Wakely 
Date:   Wed Jun 23 18:50:03 2021

libstdc++: Fix comment in chrono::year::is_leap()

libstdc++-v3/ChangeLog:

* include/std/chrono (chrono::year::is_leap()): Fix incorrect
logic in comment.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 863b6a27bdf..0b597be1944 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1606,8 +1606,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// [1] https://github.com/cassioneri/calendar
 	// [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
 
-	// Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0,
-	// so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0
+	// Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0,
+	// so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0,
 	// which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
 	// See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
 


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-23 Thread Jonathan Wakely via Gcc-patches

On 23/06/21 14:16 +0100, Jonathan Wakely wrote:

On 23/06/21 12:45 +0100, Jonathan Wakely wrote:

On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:

I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

 * include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
  const bool __is_multiple_of_100
= __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.

^^
N.B. this comment should say !=


+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;


If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with 
GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither 
of

these seems to improve compared to my first rearrangement above.


This version from Ulrich Drepper produces the smallest code of all
(and also performs well, if not the absolute fastest) in my
benchmarks:

 return (y & (__is_multiple_of_100 ? 15 : 3)) == 0;


Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
Pushed to trunk.



commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
Author: Cassio Neri 
Date:   Wed Jun 23 15:32:16 2021

libstdc++: More efficient std::chrono::year::leap

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

The expression is then either y%16 or y%4 which are both powers of two
and so it can be rearranged to use simple bitmask operations.

Co-authored-by: Jonathan Wakely 
Co-authored-by: Ulrich Drepper 

libstdc++-v3/ChangeLog:

* include/std/chrono (chrono::year::is_leap()): Optimize.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..863b6a27bdf 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// [1] https://github.com/cassioneri/calendar
 	// [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
 
+	// Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0,
+	// so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0
+	// which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
+	// See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
+
 	constexpr uint32_t __multiplier   = 42949673;
 	constexpr uint32_t __bound= 42949669;
 	constexpr uint32_t __max_dividend = 1073741799;
 	constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
 	const bool __is_multiple_of_100
 	  = __multiplier * (_M_y + __offset) < __bound;
-	return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+	return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
   }
 
   explicit constexpr


Re: [EXTERNAL] Re: rs6000: Fix typos in float128 ISA3.1 support

2021-06-23 Thread Segher Boessenkool
On Wed, Jun 23, 2021 at 12:17:07PM +0800, Kewen.Lin wrote:
> >> +#ifdef FLOAT128_HW_INSNS_ISA3_1
> >>  TFtype __floattikf (TItype_ppc)
> >>__attribute__ ((__ifunc__ ("__floattikf_resolve")));
> > 
> > I wonder if we now need TItype_ppc at all anymore, btw?
> 
> Sorry that I don't quite follow this question.

I thought it may do the same as just TItype now, but the ifunc stuff
probably makes it different still :-)


Segher


Re: [PATCH] Modula-2 into the GCC tree on master

2021-06-23 Thread Segher Boessenkool
On Tue, Jun 22, 2021 at 12:41:27AM +0200, Jakub Jelinek wrote:
> On Mon, Jun 21, 2021 at 11:36:48PM +0100, Gaius Mulley via Gcc-patches wrote:
> > > : error: the file containing the definition module 
> > > <80><98>M2RTS
> > > <80><99> cannot be found
> > > compiler exited with status 1
> > > output is:
> > > : error: the file containing the definition module 
> > > <80><98>M2RTS
> > > <80><99> cannot be found
> > 
> > ah yes, it would be good to make it autoconf locale utf-8
> 
> No, whether gcc is configured on an UTF-8 capable terminal or using UTF-8
> locale doesn't imply whether it will actually be used in such a terminal
> later on.
> See e.g. gcc/intl.c (gcc_init_libintl) how it decides whether to use UTF-8
> or normal quotes.

Right, and I build and regcheck GCC with LANG=C (and no LC_*) so I
should never see anything translated at all, and nothing should generate
UTF-8 for me.


Segher


Re: GCC documentation: porting to Sphinx

2021-06-23 Thread Joseph Myers
On Wed, 23 Jun 2021, Martin Liška wrote:

> @Joseph: Can you share your thoughts about the used Makefile integration? What
> do you suggest for 2)
> (note that explicit listing of all .rst file would be crazy)?

You can write dependencies on e.g. doc/gcc/*.rst (which might be more 
files than actually are relevant in some cases, if the directory includes 
some common files shared by some but not all manuals, but should be 
conservatively safe if you list appropriate directories there), rather 
than needing to name all the individual files.  Doing things with makefile 
dependencies seems better than relying on what sphinx-build does when 
rerun unnecessarily (if sphinx-build avoids rebuilding in some cases where 
the makefiles think a rebuild is needed, that's fine as an optimization).

It looks like this makefile integration loses some of the srcinfo / srcman 
support.  That support should stay (be updated for the use of Sphinx) so 
that release tarballs (as generated by maintainer-scripts/gcc_release, 
which uses --enable-generated-files-in-srcdir) continue to include man 
pages / info files (and make sure that, if those files are present in the 
source directory, then building and installing GCC does install them even 
when sphinx-build is absent at build/install time).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] [libstdc++] Cleanup atomic timed wait implementation

2021-06-23 Thread Jonathan Wakely via Gcc-patches
On Fri, 4 Jun 2021 at 23:31, Thomas Rodgers  wrote:
>
> This cleans up the implementation of atomic_timed_wait.h and fixes the
> accidental pessimization of spinning after waiting in
> __timed_waiter_pool::_M_do_wait_until.

This one's still pending review, right?

>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_timed_wait.h (__wait_clock_t): Define
> conditionally.
> (__cond_wait_until_impl): Define conditionally.


> (__cond_wait_until): Define conditionally. Simplify clock
> type detection/conversion.
> (__timed_waiter_pool::_M_do_wait_until): Move the spin above
> the wait.
>
> ---
>  libstdc++-v3/include/bits/atomic_timed_wait.h | 26 ++-
>  1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
> b/libstdc++-v3/include/bits/atomic_timed_wait.h
> index ec7ff51cdbc..19386e5806a 100644
> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
> @@ -51,7 +51,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>namespace __detail
>{
> +#ifdef _GLIBCXX_HAVE_LINUX_FUTEX || _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
>  using __wait_clock_t = chrono::steady_clock;
> +#else
> +using __wait_clock_t = chrono::system_clock;
> +#endif
>
>  template
>__wait_clock_t::time_point
> @@ -133,11 +137,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> return false;
>   }
>}
> -#else
> +// #elsif 

"elsif" should be "elif" in this comment.

>  // define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement 
> __platform_wait_until()
>  // if there is a more efficient primitive supported by the platform
>  // (e.g. __ulock_wait())which is better than pthread_cond_clockwait
> -#endif // ! PLATFORM_TIMED_WAIT
> +#else
> +// Use wait on condition variable
>
>  // Returns true if wait ended before timeout.
>  // _Clock must be either steady_clock or system_clock.
> @@ -173,12 +178,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>__cond_wait_until(__condvar& __cv, mutex& __mx,
>   const chrono::time_point<_Clock, _Dur>& __atime)
>{
> -#ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
> -   if constexpr (is_same_v<_Clock, chrono::steady_clock>)
> - return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
> -   else
> -#endif
> -   if constexpr (is_same_v<_Clock, chrono::system_clock>)
> +   if constexpr (is_same_v<__wait_clock_t, _Clock>)

This doesn't seem quite right. __cond_wait_until_impl can always
accept time points using system_clock, even when __wait_clock is
steady_clock. With this change a system_clock timeout will be
converted to steady_clock if pthread_cond_clockwait is available. That
will happen even though __cond_wait_until_impl can always work with a
system_clock timeout. Is that what's intended?

If somebody calls try_acquire_until(system_clock::now() + 1s) then
they're asking to wait on the system clock, right? So the underlying
wait on a condition variable should use that clock if possible, right?
But when pthread_cond_clockwait is available, all absolute time points
are converted to steady_clock, even though system_clock is also
supported.


>   return __detail::__cond_wait_until_impl(__cv, __mx, __atime);
> else
>   {
> @@ -194,6 +194,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> return false;
>   }
>}
> +#endif // ! PLATFORM_TIMED_WAIT
>
>  struct __timed_waiter_pool : __waiter_pool_base
>  {
> @@ -300,17 +301,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   const chrono::time_point<_Clock, _Dur>&
>   __atime) 
> noexcept
>   {
> +
> for (auto __now = _Clock::now(); __now < __atime;
>   __now = _Clock::now())
>   {
> +   if (__base_type::_M_do_spin(__pred, __val,
> +  __timed_backoff_spin_policy(__atime, __now)))
> + return true;
> +
> if (__base_type::_M_w._M_do_wait_until(
>   __base_type::_M_addr, __val, __atime)
> && __pred())
>   return true;
> -
> -   if (__base_type::_M_do_spin(__pred, __val,
> -  __timed_backoff_spin_policy(__atime, __now)))
> - return true;
>   }
> return false;
>   }
> --
> 2.26.2
>



Re: [PATCH 0/3] Improve and document stdx::simd testsuite

2021-06-23 Thread Jonathan Wakely via Gcc-patches
On Tue, 8 Jun 2021 at 09:57, Matthias Kretz  wrote:
>
> As discussed a long time ago on IRC, this improves (i.e. decreases by default)
> the verbosity of make check-simd, gives more verbosity options, and finally
> documents how the simd testsuite is used and how it works. In addition, after
> PR98834 was resolved, remove the -fno-tree-vrp workaround.
>
> Tested on x86_64-linux (and more).

Great, thanks, I'm about to push all 3 patches to trunk.


>
>
> Matthias Kretz (3):
>   libstdc++: Remove -fno-tree-vrp after PR98834 was resolved
>   libstdc++: Improve output verbosity options and default
>   libstdc++: Document simd testsuite
>
>  libstdc++-v3/testsuite/Makefile.am|   3 +-
>  libstdc++-v3/testsuite/Makefile.in|   3 +-
>  .../testsuite/experimental/simd/README.md | 257 ++
>  .../testsuite/experimental/simd/driver.sh | 137 +++---
>  .../experimental/simd/generate_makefile.sh|  33 ++-
>  5 files changed, 380 insertions(+), 53 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/experimental/simd/README.md
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  std::experimental::simd  https://github.com/VcDevel/std-simd
> ──
>


[RFC] Return NULL from gimple_call_return_type if no return available.

2021-06-23 Thread Aldy Hernandez via Gcc-patches
The call to gimple_call_fntype() in gimple_call_return_type() may return
NULL, which causes the TREE_TYPE(lhs) to ICE.  I think it would be best to
return NULL (or void_type_node) rather than aborting.

I'm running into this because fold_using_range::range_of_call, calls
gimple_call_return_type which may ICE for builtins with no LHS.  Instead
of special casing things in range_of_call, perhaps it's best to plug the
source.

Does this sound reasonable?

gcc/ChangeLog:

* gimple.h (gimple_call_return_type): Return NULL when no type
and no lhs is available.
---
 gcc/gimple.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index e7dc2a45a13..2a01fe631ec 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -3182,7 +3182,10 @@ gimple_call_return_type (const gcall *gs)
   tree type = gimple_call_fntype (gs);
 
   if (type == NULL_TREE)
-return TREE_TYPE (gimple_call_lhs (gs));
+{
+  tree lhs = gimple_call_lhs (gs);
+  return lhs ? TREE_TYPE (lhs) : NULL_TREE;
+}
 
   /* The type returned by a function is the type of its
  function type.  */
-- 
2.31.1



Re: [PATCH, rs6000] Do not enable pcrel-opt by default

2021-06-23 Thread Segher Boessenkool
Hi!

On Tue, Jun 22, 2021 at 04:13:36PM -0500, Aaron Sawdey wrote:
> SPEC2017 testing on p10 shows that this optimization does not have a
> positive impact on performance. So we are no longer going to enable it
> by default. The test cases for it needed to be updated so they always
> enable it to test it.

> gcc/
>   * config/rs6000/rs6000-cpus.def: Take OPTION_MASK_PCREL_OPT out
>of OTHER_POWER10_MASKS so it will not be enabled by default.

Stray space at start of line.

> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -75,9 +75,10 @@
>| OPTION_MASK_P9_VECTOR)
>  
>  /* Flags that need to be turned off if -mno-power10.  */
> +/* PCREL_OPT is now disabled by default so we comment it out here.  */

"Now" is meaningless to future readers ;-)

Please also say *why* it is disabled.

Okay for trunk like that.  Also okay for 11 if it regstraps.  Thanks!


Segher


Re: [PATCH] AArch64: Add support for __builtin_roundeven[f] [PR100966]

2021-06-23 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Hi Richard,
>
>> So rather than have two patterns that generate frintn, I think
>> it would be better to change the existing frint_pattern entry to
>> "roundeven" instead, and fix whatever the fallout is.  Hopefully it
>> shouldn't be too bad, since we already use the optab names for the
>> other UNSPEC_FRINT* codes.
>
> Well it requires various changes to the arm_neon headers since they use
> existing intrinsics. If that is not considered a risky ABI change then here is
> v2:
>
> Enable __builtin_roundeven[f] by changing existing frintn to roundeven.

Yeah, changing the arm_neon.h->compiler interface is OK, since we only
support using the version of arm_neon.h that ships with the compiler,
not an older or newer version.  There have already been quite a few
changes like that since GCC 11 branched (to get better optimisation).

> Bootstrap OK and passes regress.
>
> ChangeLog:
> 2021-06-18  Wilco Dijkstra  
>
> PR target/100966
> * config/aarch64/aarch64.md (frint_pattern): Update comment.
> * config/aarch64/aarch64-simd-builtins.def: Change frintn to 
> roundeven.
> * config/aarch64/arm_fp16.h: Change frintn to roundeven.
> * config/aarch64/arm_neon.h: Likewise.
> * config/aarch64/iterators.md (frint_pattern): Use roundeven for
> FRINTN.
>
> gcc/testsuite
> PR target/100966
> * gcc.target/aarch64/frint.x: Add roundeven tests.
> * gcc.target/aarch64/frint_double.c: Likewise.
> * gcc.target/aarch64/frint_float.c: Likewise.

OK, thanks.

Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/
> aarch64-simd-builtins.def
> index
> b885bd5b38bf7ad83eb9d801284bf9b34db17210..534cc8ccb538c4fb0c208a7035020e131656260d
> 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -485,7 +485,7 @@
>BUILTIN_VHSDF (UNOP, nearbyint, 2, FP)
>BUILTIN_VHSDF (UNOP, rint, 2, FP)
>BUILTIN_VHSDF (UNOP, round, 2, FP)
> -  BUILTIN_VHSDF_HSDF (UNOP, frintn, 2, FP)
> +  BUILTIN_VHSDF_HSDF (UNOP, roundeven, 2, FP)
>  
>VAR1 (UNOP, btrunc, 2, FP, hf)
>VAR1 (UNOP, ceil, 2, FP, hf)
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index
> 30effca6f3562f6870a6cc8097750e63bb0d424d..8977330b142d2cde1d2faa4a01282b01a68e25c5
> 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5922,7 +5922,7 @@ (define_insn "*bswapsi2_uxtw"
>  ;; ---
>  
>  ;; frint floating-point round to integral standard patterns.
> -;; Expands to btrunc, ceil, floor, nearbyint, rint, round, frintn.
> +;; Expands to btrunc, ceil, floor, nearbyint, rint, round, roundeven.
>  
>  (define_insn "2"
>[(set (match_operand:GPF_F16 0 "register_operand" "=w")
> diff --git a/gcc/config/aarch64/arm_fp16.h b/gcc/config/aarch64/arm_fp16.h
> index
> 2afbd1203361b54d6e1315ffaa1bec21834c060e..3efa7e1f19817df1409bf781266f4e238c128f0b
> 100644
> --- a/gcc/config/aarch64/arm_fp16.h
> +++ b/gcc/config/aarch64/arm_fp16.h
> @@ -333,7 +333,7 @@ vrndmh_f16 (float16_t __a)
>  __extension__ static __inline float16_t __attribute__ ((__always_inline__))
>  vrndnh_f16 (float16_t __a)
>  {
> -  return __builtin_aarch64_frintnhf (__a);
> +  return __builtin_aarch64_roundevenhf (__a);
>  }
>  
>  __extension__ static __inline float16_t __attribute__ ((__always_inline__))
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index
> baa30bd5a9d96c1bf04a37fb105091ea56a6444a..c88a8a627d3082d3577b0fe222381e93c35d7251
> 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -24657,35 +24657,35 @@ __extension__ extern __inline float32_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrndns_f32 (float32_t __a)
>  {
> -  return __builtin_aarch64_frintnsf (__a);
> +  return __builtin_aarch64_roundevensf (__a);
>  }
>  
>  __extension__ extern __inline float32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrndn_f32 (float32x2_t __a)
>  {
> -  return __builtin_aarch64_frintnv2sf (__a);
> +  return __builtin_aarch64_roundevenv2sf (__a);
>  }
>  
>  __extension__ extern __inline float64x1_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrndn_f64 (float64x1_t __a)
>  {
> -  return (float64x1_t) {__builtin_aarch64_frintndf (__a[0])};
> +  return (float64x1_t) {__builtin_aarch64_roundevendf (__a[0])};
>  }
>  
>  __extension__ extern __inline float32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrndnq_f32 (float32x4_t __a)
>  {
> -  return __builtin_aarch64_frintnv4sf (__a);
> +  return __builtin_aarch64_roundevenv4sf (__a);
>  }
>  
>  __extension__ extern __inline float64x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vrndnq_f64 (float64x2_t __a)
>  

Re: [PATCH] Fix --gdwarf-5 configure tests for Windows

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/23/2021 3:53 AM, Eric Botcazou wrote:

The issues are that 1) they use readelf instead of objdump and 2) they use
ELF syntax in the assembly code.

Tested on x86-64/Linux and x86[-64]/Windows, OK for mainline and 11 branch?


2021-06-23  Eric Botcazou  

* configure.ac (--gdwarf-5 option): Use objdump instead of readelf.
(working --gdwarf-4/--gdwarf-5 for all sources): Likewise.
(--gdwarf-4 not refusing generated .debug_line): Adjust for Windows.
* configure: Regenerate.

OK for trunk and backport (again your call on timing of the backport).

jeff



Re: [PATCH] Emit .file 0 marker earlier in DWARF 5

2021-06-23 Thread Jeff Law via Gcc-patches




On 6/23/2021 4:06 AM, Eric Botcazou wrote:

Hi,

when the assembler supports it, the compiler automatically passes --gdwarf-5
to it, which has an interesting side effect: any assembly instruction prior
to the first .file directive defines a new line associated with .file 0 in
the .debug_line section and, of course, the numbering of these implicit lines
has nothing to do with that of the source code.  This can be problematic in
Ada where we do not generate .file/.loc directives for some compiled-generated
functions to avoid too jumpy a debugging experience.

The attached patch moves the .file 0 marker from the .debug_line section to the
beginning of the file (next to the .cfi_sections marker).

Tested on x86-64/Linux and x86[-64]/Windows, OK for mainline and 11 branch?


2021-06-23  Eric Botcazou  

* dwarf2out.c (dwarf2out_assembly_start): Emit .file 0 marker here...
(dwarf2out_finish): ...instead of here.
OK for the trunk now and 11 backport (your call on backporting now or 
after soak time).


jeff



[COMMITTED] Split gimple-range into gimple-range-fold and gimple-range.

2021-06-23 Thread Andrew MacLeod via Gcc-patches
This file removes all the fold_using_range related code from 
gimple-range.{h,cc} and moves it into its own file, gimple-range-fold.{h,cc}


A couple of routines were also relocated to gimple-range-gori.{h,cc} as 
they were strictly related to unwind calculations and belong there.


This leaves gimple-range to contain mostly the ranger, and any derived 
versions.  This is purely a structural change, no other impact.  Cleaned 
up the include file list in gimple-range as well.


Bootstrap on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From 4c85ff754927c518ed97da5e0221eeea742c9aa7 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 22 Jun 2021 11:41:30 -0400
Subject: [PATCH 4/4] Split gimple-range into gimple-range-fold and
 gimple-range.

Split the fold_using_range functions from gimple-range into gimple-range-fold.
Also move the gimple_range_calc* routines into gimple-range-gori.

	* Makefile.in (OBJS): Add gimple-range-fold.o
	* gimple-range-fold.cc: New.
	* gimple-range-fold.h: New.
	* gimple-range-gori.cc (gimple_range_calc_op1): Move to here.
	(gimple_range_calc_op2): Ditto.
	* gimple-range-gori.h: Move prototypes to here.
	* gimple-range.cc: Adjust include files.
	(fur_source:fur_source): Relocate to gimple-range-fold.cc.
	(fur_source::get_operand): Ditto.
	(fur_source::get_phi_operand): Ditto.
	(fur_source::query_relation): Ditto.
	(fur_source::register_relation): Ditto.
	(class fur_edge): Ditto.
	(fur_edge::fur_edge): Ditto.
	(fur_edge::get_operand): Ditto.
	(fur_edge::get_phi_operand): Ditto.
	(fur_stmt::fur_stmt): Ditto.
	(fur_stmt::get_operand): Ditto.
	(fur_stmt::get_phi_operand): Ditto.
	(fur_stmt::query_relation): Ditto.
	(class fur_depend): Relocate to gimple-range-fold.h.
	(fur_depend::fur_depend): Relocate to gimple-range-fold.cc.
	(fur_depend::register_relation): Ditto.
	(fur_depend::register_relation): Ditto.
	(class fur_list): Ditto.
	(fur_list::fur_list): Ditto.
	(fur_list::get_operand): Ditto.
	(fur_list::get_phi_operand): Ditto.
	(fold_range): Ditto.
	(adjust_pointer_diff_expr): Ditto.
	(gimple_range_adjustment): Ditto.
	(gimple_range_base_of_assignment): Ditto.
	(gimple_range_operand1): Ditto.
	(gimple_range_operand2): Ditto.
	(gimple_range_calc_op1): Relocate to gimple-range-gori.cc.
	(gimple_range_calc_op2): Ditto.
	(fold_using_range::fold_stmt): Relocate to gimple-range-fold.cc.
	(fold_using_range::range_of_range_op): Ditto.
	(fold_using_range::range_of_address): Ditto.
	(fold_using_range::range_of_phi): Ditto.
	(fold_using_range::range_of_call): Ditto.
	(fold_using_range::range_of_builtin_ubsan_call): Ditto.
	(fold_using_range::range_of_builtin_call): Ditto.
	(fold_using_range::range_of_cond_expr): Ditto.
	(fold_using_range::range_of_ssa_name_with_loop_info): Ditto.
	(fold_using_range::relation_fold_and_or): Ditto.
	(fold_using_range::postfold_gcond_edges): Ditto.
	* gimple-range.h: Add gimple-range-fold.h to include files. Change
	GIMPLE_RANGE_STMT_H to GIMPLE_RANGE_H.
	(gimple_range_handler): Relocate to gimple-range-fold.h.
	(gimple_range_ssa_p): Ditto.
	(range_compatible_p): Ditto.
	(class fur_source): Ditto.
	(class fur_stmt): Ditto.
	(class fold_using_range): Ditto.
	(gimple_range_calc_op1): Relocate to gimple-range-gori.h
	(gimple_range_calc_op2): Ditto.
---
 gcc/Makefile.in  |1 +
 gcc/gimple-range-fold.cc | 1331 
 gcc/gimple-range-fold.h  |  163 +
 gcc/gimple-range-gori.cc |   66 ++
 gcc/gimple-range-gori.h  |9 +
 gcc/gimple-range.cc  | 1379 --
 gcc/gimple-range.h   |  144 +---
 7 files changed, 1574 insertions(+), 1519 deletions(-)
 create mode 100644 gcc/gimple-range-fold.cc
 create mode 100644 gcc/gimple-range-fold.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ebf26442992..d32de22e4f2 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1398,6 +1398,7 @@ OBJS = \
 	gimple-range.o \
 	gimple-range-cache.o \
 	gimple-range-edge.o \
+	gimple-range-fold.o \
 	gimple-range-gori.o \
 	gimple-ssa-backprop.o \
 	gimple-ssa-evrp.o \
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
new file mode 100644
index 000..583348e6e36
--- /dev/null
+++ b/gcc/gimple-range-fold.cc
@@ -0,0 +1,1331 @@
+/* Code for GIMPLE range related routines.
+   Copyright (C) 2019-2021 Free Software Foundation, Inc.
+   Contributed by Andrew MacLeod 
+   and Aldy Hernandez .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file 

[COMMITTED 2/2] tree-optimization/101148 - Do not continue propagating values which cannot be set properly.

2021-06-23 Thread Andrew MacLeod via Gcc-patches

As described in PR 101014, this patch should fix both PRs.

The cache propagation engine works by combining incoming ranges to a 
block and if that is different than that current on-entry range, 
proceeds to push this new value on outgoing edges.


What was happening here is this new value that was calculated was beyond 
the 15 allowed by the sparse bitmap implem,entation. When it was stored, 
it was storing a different value (VARYING).  This block was in a cycle 
feeding back to itself.  When it calculated the on-entry value again and 
compared, it thought it needed to update again.  Which of course failed 
again... and the endless loop of trying to propagate was born.


This patch checks that the value being stored to the cache was 
successful.  If it was not, then this block is flagged to deny future 
attempts to propagate from this block (during this update cycle).


Bootstrap on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew

>From a03e944e92ee51ae583382079d4739b64bd93b35 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 22 Jun 2021 17:46:05 -0400
Subject: [PATCH 3/4] Do not continue propagating values which cannot be set
 properly.

If the on-entry cache cannot properly represent a range, do not continue
trying to propagate it.

	PR tree-optimization/101148
	PR tree-optimization/101014
	* gimple-range-cache.cc (ranger_cache::ranger_cache): Adjust.
	(ranger_cache::~ranger_cache): Adjust.
	(ranger_cache::block_range): Check if propagation disallowed.
	(ranger_cache::propagate_cache): Disallow propagation if new value
	can't be stored properly.
	* gimple-range-cache.h (ranger_cache::m_propfail): New member.
---
 gcc/gimple-range-cache.cc | 11 ++-
 gcc/gimple-range-cache.h  |  1 +
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index bdecd212691..a377261c5c7 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -731,10 +731,12 @@ ranger_cache::ranger_cache ()
   if (bb)
 	m_gori.exports (bb);
 }
+  m_propfail = BITMAP_ALLOC (NULL);
 }
 
 ranger_cache::~ranger_cache ()
 {
+  BITMAP_FREE (m_propfail);
   if (m_oracle)
 delete m_oracle;
   delete m_temporal;
@@ -990,7 +992,9 @@ ranger_cache::block_range (irange , basic_block bb, tree name, bool calc)
 void
 ranger_cache::add_to_update (basic_block bb)
 {
-  if (!m_update_list.contains (bb))
+  // If propagation has failed for BB, or its already in the list, don't
+  // add it again.
+  if (!bitmap_bit_p (m_propfail, bb->index) &&  !m_update_list.contains (bb))
 m_update_list.quick_push (bb);
 }
 
@@ -1007,6 +1011,7 @@ ranger_cache::propagate_cache (tree name)
   int_range_max current_range;
   int_range_max e_range;
 
+  gcc_checking_assert (bitmap_empty_p (m_propfail));
   // Process each block by seeing if its calculated range on entry is
   // the same as its cached value. If there is a difference, update
   // the cache to reflect the new value, and check to see if any
@@ -1063,6 +1068,9 @@ ranger_cache::propagate_cache (tree name)
   if (new_range != current_range)
 	{
 	  bool ok_p = m_on_entry.set_bb_range (name, bb, new_range);
+	  // If the cache couldn't set the value, mark it as failed.
+	  if (!ok_p)
+	bitmap_set_bit (m_propfail, bb->index);
 	  if (DEBUG_RANGE_CACHE) 
 	{
 	  if (!ok_p)
@@ -1092,6 +1100,7 @@ ranger_cache::propagate_cache (tree name)
   print_generic_expr (dump_file, name, TDF_SLIM);
   fprintf (dump_file, "\n");
 }
+  bitmap_clear (m_propfail);
 }
 
 // Check to see if an update to the value for NAME in BB has any effect
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 1d2e1b99200..ecf63dc01b3 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -121,6 +121,7 @@ private:
 
   void propagate_updated_value (tree name, basic_block bb);
 
+  bitmap m_propfail;
   vec m_workback;
   vec m_update_list;
 };
-- 
2.17.2



[COMMITTED 1/2] Adjust on_entry cache to indicate if the value was set properly.

2021-06-23 Thread Andrew MacLeod via Gcc-patches
The introduction of the sparse bitmap on-entry cache for large BB 
functions also introduced the concept that the value we ask to be 
written to the cache may not be properly represented.  It is limited to 
15 unique ranges for any given ssa-name, then  it reverts to varying for 
any additional values to be safe.


This patch adds a boolean return value to the cache set routines, 
allowing us to check if the value was properly written.


Other than that, no functional change.

Bootstrap on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew

>From ca4d381662c37733b2a1d49d6c8f5fcfc1348f3d Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 22 Jun 2021 17:21:32 -0400
Subject: [PATCH 2/4] Adjust on_entry cache to indicate if the value was set
 properly.

	* gimple-range-cache.cc (class ssa_block_ranges): Adjust prototype.
	(sbr_vector::set_bb_range): Return true.
	(class sbr_sparse_bitmap): Adjust.
	(sbr_sparse_bitmap::set_bb_range): Return value.
	(block_range_cache::set_bb_range): Return value.
	(ranger_cache::propagate_cache): Use return value to print msg.
	* gimple-range-cache.h (class block_range_cache): Adjust.
---
 gcc/gimple-range-cache.cc | 44 ++-
 gcc/gimple-range-cache.h  |  2 +-
 2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 4347485cf98..bdecd212691 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -132,7 +132,7 @@ non_null_ref::process_name (tree name)
 class ssa_block_ranges
 {
 public:
-  virtual void set_bb_range (const basic_block bb, const irange ) = 0;
+  virtual bool set_bb_range (const basic_block bb, const irange ) = 0;
   virtual bool get_bb_range (irange , const basic_block bb) = 0;
   virtual bool bb_range_p (const basic_block bb) = 0;
 
@@ -165,7 +165,7 @@ class sbr_vector : public ssa_block_ranges
 public:
   sbr_vector (tree t, irange_allocator *allocator);
 
-  virtual void set_bb_range (const basic_block bb, const irange ) OVERRIDE;
+  virtual bool set_bb_range (const basic_block bb, const irange ) OVERRIDE;
   virtual bool get_bb_range (irange , const basic_block bb) OVERRIDE;
   virtual bool bb_range_p (const basic_block bb) OVERRIDE;
 protected:
@@ -196,7 +196,7 @@ sbr_vector::sbr_vector (tree t, irange_allocator *allocator)
 
 // Set the range for block BB to be R.
 
-void
+bool
 sbr_vector::set_bb_range (const basic_block bb, const irange )
 {
   irange *m;
@@ -208,6 +208,7 @@ sbr_vector::set_bb_range (const basic_block bb, const irange )
   else
 m = m_irange_allocator->allocate (r);
   m_tab[bb->index] = m;
+  return true;
 }
 
 // Return the range associated with block BB in R.  Return false if
@@ -252,7 +253,7 @@ class sbr_sparse_bitmap : public ssa_block_ranges
 {
 public:
   sbr_sparse_bitmap (tree t, irange_allocator *allocator, bitmap_obstack *bm);
-  virtual void set_bb_range (const basic_block bb, const irange ) OVERRIDE;
+  virtual bool set_bb_range (const basic_block bb, const irange ) OVERRIDE;
   virtual bool get_bb_range (irange , const basic_block bb) OVERRIDE;
   virtual bool bb_range_p (const basic_block bb) OVERRIDE;
 private:
@@ -312,13 +313,13 @@ sbr_sparse_bitmap::bitmap_get_quad (const_bitmap head, int quad)
 
 // Set the range on entry to basic block BB to R.
 
-void
+bool
 sbr_sparse_bitmap::set_bb_range (const basic_block bb, const irange )
 {
   if (r.undefined_p ())
 {
   bitmap_set_quad (bitvec, bb->index, SBR_UNDEF);
-  return;
+  return true;
 }
 
   // Loop thru the values to see if R is already present.
@@ -328,11 +329,11 @@ sbr_sparse_bitmap::set_bb_range (const basic_block bb, const irange )
 	if (!m_range[x])
 	  m_range[x] = m_irange_allocator->allocate (r);
 	bitmap_set_quad (bitvec, bb->index, x + 1);
-	return;
+	return true;
   }
   // All values are taken, default to VARYING.
   bitmap_set_quad (bitvec, bb->index, SBR_VARYING);
-  return;
+  return false;
 }
 
 // Return the range associated with block BB in R.  Return false if
@@ -387,7 +388,7 @@ block_range_cache::~block_range_cache ()
 // Set the range for NAME on entry to block BB to R.
 // If it has not been accessed yet, allocate it first.
 
-void
+bool
 block_range_cache::set_bb_range (tree name, const basic_block bb,
  const irange )
 {
@@ -413,7 +414,7 @@ block_range_cache::set_bb_range (tree name, const basic_block bb,
 		m_irange_allocator);
 	}
 }
-  m_ssa_ranges[v]->set_bb_range (bb, r);
+  return m_ssa_ranges[v]->set_bb_range (bb, r);
 }
 
 
@@ -1061,13 +1062,18 @@ ranger_cache::propagate_cache (tree name)
   // If the range on entry has changed, update it.
   if (new_range != current_range)
 	{
+	  bool ok_p = m_on_entry.set_bb_range (name, bb, new_range);
 	  if (DEBUG_RANGE_CACHE) 
 	{
-	  fprintf (dump_file, "  Updating range to ");
-	  new_range.dump (dump_file);
+	  if (!ok_p)
+		fprintf (dump_file, " Cache failure to store value.");
+	  

[COMMITTED] Dump should be read only. Do not trigger new lookups.

2021-06-23 Thread Andrew MacLeod via Gcc-patches
When we are dumping the rangers state, we tend to do it at the end when 
EVRP is shutting down.  The dumping routine is checking to see if the 
cache has an on-entry value or the edge generates a range before 
dumping, but when it does the actual dump, its calling range_on_edge 
which can go and do additional lookups. That late in the game, it can 
cause a trap in SCEV if a PHI node attempts to be rcalculated when a 
loop has been rewritten.


Instead, query the range_on_edge from the cache, which will only show 
information we know, and not cause additional lookups.


Bootstrap on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew

>From 9d674b735f22aa9cf85629851783ce38f25087b5 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 22 Jun 2021 09:20:47 -0400
Subject: [PATCH 1/4] Dump should be read only. Do not trigger new lookups.

	* gimple-range.cc (dump_bb): Use range_on_edge from the cache.
---
 gcc/gimple-range.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 385cecf330b..49d26509230 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -1457,7 +1457,7 @@ gimple_ranger::dump_bb (FILE *f, basic_block bb)
 		  m_cache.block_range (range, bb, name, false) ||
 		  m_cache.block_range (range, e->dest, name, false))
 		{
-		  range_on_edge (range, e, name);
+		  m_cache.range_on_edge (range, e, name);
 		  if (!range.varying_p ())
 		{
 		  fprintf (f, "%d->%d ", e->src->index,
-- 
2.17.2



[committed] Use more logicals to eliminate useless test/compare instructions

2021-06-23 Thread Jeff Law via Gcc-patches
Nothing too special here, this allows using generic logicals to 
eliminate redundant test/compare instructions.


We need to pass in the INSN to the support routines so that it can be 
examined to see if CC is relevant after this insn.  If the condition 
codes are relevant, then we can't use the various sub-word sequences 
that are used when the 2nd source is a constant with limited bits 
set/clear.  We need that for both the output template and the length 
calculation.  These cases don't happen often and even when they do we 
still win, particularly for SImode.


There is one case I'm aware of where we could use a special sequence and 
have valid condition codes.  An IOR with a constant operand with at 
least 0x8000 set in the mask.  In this case we an IOR the low word 
and use extrs to propagate the bit into the high word.  But while it 
does set ZN in the expected way, they're both compile time constants in 
this case (0,1 respectively) so it's not particularly useful to try and 
express the CC status.


Committed to the trunk.

Jeff

commit 402c818ac0b19d168e9ffc0b3413344dd6020f6a
Author: Jeff Law 
Date:   Tue Jun 22 15:25:11 2021 -0400

Use more logicals to eliminate useless test/compare instructions

gcc/
* config/h8300/logical.md (3): Use 
so this pattern can be used for test/compare removal.  Pass
current insn to compute_logical_op_length and output_logical_op.
* config/h8300/h8300.c (compute_logical_op_cc): Remove.
(h8300_and_costs): Add argument to compute_logical_op_length.
(output_logical_op): Add new argument.  Use it to determine if the
condition codes are used and adjust the output accordingly.
(compute_logical_op_length): Add new argument and update length
computations when condition codes are used.
* config/h8300/h8300-protos.h (compute_logical_op_length): Update
prototype.
(output_logical_op): Likewise.

diff --git a/gcc/config/h8300/h8300-protos.h b/gcc/config/h8300/h8300-protos.h
index af653292a9d..d7efa978aa0 100644
--- a/gcc/config/h8300/h8300-protos.h
+++ b/gcc/config/h8300/h8300-protos.h
@@ -36,10 +36,11 @@ extern const char *output_simode_bld (int, rtx[]);
 extern void final_prescan_insn (rtx_insn *, rtx *, int);
 extern int h8300_expand_movsi (rtx[]);
 extern machine_mode  h8300_select_cc_mode (RTX_CODE, rtx, rtx);
-extern const char *output_logical_op (machine_mode, rtx_code code, rtx *);
-extern unsigned int compute_logical_op_length (machine_mode, rtx_code, rtx *);
+extern const char *output_logical_op (machine_mode, rtx_code code,
+ rtx *, rtx_insn *);
+extern unsigned int compute_logical_op_length (machine_mode, rtx_code,
+ rtx *, rtx_insn *);
 
-extern int compute_logical_op_cc (machine_mode, rtx *);
 extern int compute_a_shift_cc (rtx, rtx *);
 #ifdef HAVE_ATTR_cc
 extern enum attr_cc compute_plussi_cc (rtx *);
diff --git a/gcc/config/h8300/h8300.c b/gcc/config/h8300/h8300.c
index 2b88325d2f7..511c2b28e40 100644
--- a/gcc/config/h8300/h8300.c
+++ b/gcc/config/h8300/h8300.c
@@ -1100,7 +1100,7 @@ h8300_and_costs (rtx x)
   operands[1] = XEXP (x, 0);
   operands[2] = XEXP (x, 1);
   operands[3] = x;
-  return compute_logical_op_length (GET_MODE (x), AND, operands) / 2;
+  return compute_logical_op_length (GET_MODE (x), AND, operands, NULL) / 2;
 }
 
 /* Compute the cost of a shift insn.  */
@@ -2881,7 +2881,7 @@ compute_plussi_cc (rtx *operands)
 /* Output a logical insn.  */
 
 const char *
-output_logical_op (machine_mode mode, rtx_code code, rtx *operands)
+output_logical_op (machine_mode mode, rtx_code code, rtx *operands, rtx_insn 
*insn)
 {
   /* Pretend that every byte is affected if both operands are registers.  */
   const unsigned HOST_WIDE_INT intval =
@@ -2906,6 +2906,19 @@ output_logical_op (machine_mode mode, rtx_code code, rtx 
*operands)
   const char *opname;
   char insn_buf[100];
 
+  /* INSN is the current insn, we examine its overall form to see if we're
+ supposed to set or clobber the condition codes.
+
+ This is important to know.  If we are setting condition codes, then we
+ must do the operation in MODE and not in some smaller size.
+
+ The key is to look at the second object in the PARALLEL. If it is not
+ a CLOBBER, then we care about the condition codes.  */
+  rtx pattern = PATTERN (insn);
+  gcc_assert (GET_CODE (pattern) == PARALLEL);
+  rtx second_op = XVECEXP (pattern, 0, 1);
+  bool cc_meaningful = (GET_CODE (second_op) != CLOBBER);
+
   switch (code)
 {
 case AND:
@@ -2928,8 +2941,9 @@ output_logical_op (machine_mode mode, rtx_code code, rtx 
*operands)
   output_asm_insn (insn_buf, operands);
   break;
 case E_HImode:
-  /* First, see if we can finish with one insn.  */
-  if (b0 != 0 && b1 != 0)
+  /* First, see if we can (or must) finish with one 

Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid (list of pending patches)

2021-06-23 Thread dhumieres.dominique--- via Gcc-patches

Hi José,


> Thus: Do you have a list of patches pending review?



https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html


PRs 100029 and 100040. I have the patch in my working tree
for a long time. It works as expected. OK to commit.


https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html


PRs 100097 and 100098. New patch at

https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html

OK to commit the new patch.


https://gcc.gnu.org/pipermail/fortran/2021-June/056168.html


PR 96870.


https://gcc.gnu.org/pipermail/fortran/2021-June/056167.html


PR 96724. OK to commit.


https://gcc.gnu.org/pipermail/fortran/2021-June/056163.html


PRs 93308, 93963, 94327, 94331, and 97046.
Already OKed at
https://gcc.gnu.org/pipermail/fortran/2021-June/056184.html
and
https://gcc.gnu.org/pipermail/fortran/2021-June/056193.html


https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html


PR 94104.


https://gcc.gnu.org/pipermail/fortran/2021-June/056155.html


PR 100948.


https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html


PRs 100906, 100907, 100911, 100914, 100915, and 100916.
Tis patch works for me when applied to GCC12 (not GCC11)
but seems to conflict with the patch for
PRs 93308, 93963, 94327, 94331, and 97046.


https://gcc.gnu.org/pipermail/fortran/2021-June/056152.html


PR 10148.


https://gcc.gnu.org/pipermail/fortran/2021-June/056159.html


PR 92621.


https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html


PR 100245.


https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html


PR 100136.


https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html


PR 100132.


https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html


PR 100103.


https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html


PRs 100097 and 100098. OK for me.


https://gcc.gnu.org/pipermail/fortran/2021-April/055921.html


PRs 100024 and 100025. OK for me.

Thanks for the great work, but please don't forget to mark the PRs
as ASSIGNED and don't hesitate to PING after a week.

Dominique



[PATCH] i386: Add PPERM two-operand 64bit vector permutation [PR89021]

2021-06-23 Thread Uros Bizjak via Gcc-patches
Add emulation of V8QI PPERM permutations for TARGET_XOP target.  Similar
to PSHUFB, the permutation is performed with V16QI PPERM instruction,
where selector is defined in V16QI mode with inactive elements set to 0x80.
Specific to two operand permutations is the remapping of elements from
the second operand (e.g. e[8] -> e[16]), as we have to account for the
inactive elements from the first operand.

2021-06-23  Uroš Bizjak  

gcc/
PR target/89021
* config/i386/i386-expand.c (expand_vec_perm_pshufb):
Handle 64bit modes for TARGET_XOP.  Use indirect gen_* functions.
* config/i386/mmx.md (mmx_ppermv64): New insn pattern.
* config/i386/i386.md (unspec): Move UNSPEC_XOP_PERMUTE from ...
* config/i386/sse.md (unspec): ... here.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 2986b49065c..9c922bf1bf1 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -17467,10 +17467,23 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
 
   if (!d->one_operand_p)
 {
-  if (!TARGET_XOP || GET_MODE_SIZE (d->vmode) != 16)
+  if (GET_MODE_SIZE (d->vmode) == 8)
+   {
+ if (!TARGET_XOP)
+   return false;
+ vmode = V8QImode;
+   }
+  else if (GET_MODE_SIZE (d->vmode) == 16)
+   {
+ if (!TARGET_XOP)
+   return false;
+   }
+  else if (GET_MODE_SIZE (d->vmode) == 32)
{
- if (TARGET_AVX2
- && valid_perm_using_mode_p (V2TImode, d))
+ if (!TARGET_AVX2)
+   return false;
+
+ if (valid_perm_using_mode_p (V2TImode, d))
{
  if (d->testing_p)
return true;
@@ -17492,6 +17505,8 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
}
  return false;
}
+  else
+   return false;
 }
   else
 {
@@ -17651,8 +17666,22 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
 {
   rtx m128 = GEN_INT (-128);
 
+  /* Remap elements from the second operand, as we have to
+account for inactive top 8 elements from the first operand.  */
+  if (!d->one_operand_p)
+   for (i = 0; i < nelt; ++i)
+ {
+   int ival = INTVAL (rperm[i]);
+   if (ival >= 8)
+ ival += 8;
+   rperm[i] = GEN_INT (ival);
+ }
+
+  /* V8QI is emulated with V16QI instruction, fill inactive
+elements in the top 8 positions with zeros.  */
   for (i = nelt; i < 16; ++i)
rperm[i] = m128;
+
   vpmode = V16QImode;
 }
 
@@ -17660,36 +17689,54 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
gen_rtvec_v (GET_MODE_NUNITS (vpmode), rperm));
   vperm = force_reg (vpmode, vperm);
 
-  target = d->target;
-  if (d->vmode != vmode)
+  if (vmode == d->vmode)
+target = d->target;
+  else
 target = gen_reg_rtx (vmode);
+
   op0 = gen_lowpart (vmode, d->op0);
+
   if (d->one_operand_p)
 {
+  rtx (*gen) (rtx, rtx, rtx);
+
   if (vmode == V8QImode)
-   emit_insn (gen_mmx_pshufbv8qi3 (target, op0, vperm));
+   gen = gen_mmx_pshufbv8qi3;
   else if (vmode == V16QImode)
-   emit_insn (gen_ssse3_pshufbv16qi3 (target, op0, vperm));
+   gen = gen_ssse3_pshufbv16qi3;
   else if (vmode == V32QImode)
-   emit_insn (gen_avx2_pshufbv32qi3 (target, op0, vperm));
+   gen = gen_avx2_pshufbv32qi3;
   else if (vmode == V64QImode)
-   emit_insn (gen_avx512bw_pshufbv64qi3 (target, op0, vperm));
+   gen = gen_avx512bw_pshufbv64qi3;
   else if (vmode == V8SFmode)
-   emit_insn (gen_avx2_permvarv8sf (target, op0, vperm));
+   gen = gen_avx2_permvarv8sf;
   else if (vmode == V8SImode)
-   emit_insn (gen_avx2_permvarv8si (target, op0, vperm));
+   gen = gen_avx2_permvarv8si;
   else if (vmode == V16SFmode)
-   emit_insn (gen_avx512f_permvarv16sf (target, op0, vperm));
+   gen = gen_avx512f_permvarv16sf;
   else if (vmode == V16SImode)
-   emit_insn (gen_avx512f_permvarv16si (target, op0, vperm));
+   gen = gen_avx512f_permvarv16si;
   else
gcc_unreachable ();
+
+  emit_insn (gen (target, op0, vperm));
 }
   else
 {
+  rtx (*gen) (rtx, rtx, rtx, rtx);
+
   op1 = gen_lowpart (vmode, d->op1);
-  emit_insn (gen_xop_pperm (target, op0, op1, vperm));
+
+  if (vmode == V8QImode)
+   gen = gen_mmx_ppermv64;
+  else if (vmode == V16QImode)
+   gen = gen_xop_pperm;
+  else
+   gcc_unreachable ();
+
+  emit_insn (gen (target, op0, op1, vperm));
 }
+
   if (target != d->target)
 emit_move_insn (d->target, gen_lowpart (d->vmode, target));
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4e242105719..9043be3105d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -120,6 +120,7 @@ 

Re: [Patch ]Fortran/OpenMP: Extend defaultmap clause for OpenMP 5 [PR92568]

2021-06-23 Thread Martin Liška

Hello.

I noticed the patch causes the following clang warnings:

gcc/fortran/dump-parse-tree.c:1786:11: warning: comparison of different 
enumeration types in switch statement ('enum gfc_omp_defaultmap' and 
'gfc_omp_defaultmap_category') [-Wenum-compare-switch]
gcc/fortran/dump-parse-tree.c:1787:11: warning: comparison of different 
enumeration types in switch statement ('enum gfc_omp_defaultmap' and 
'gfc_omp_defaultmap_category') [-Wenum-compare-switch]
gcc/fortran/dump-parse-tree.c:1788:11: warning: comparison of different 
enumeration types in switch statement ('enum gfc_omp_defaultmap' and 
'gfc_omp_defaultmap_category') [-Wenum-compare-switch]
gcc/fortran/dump-parse-tree.c:1789:11: warning: comparison of different 
enumeration types in switch statement ('enum gfc_omp_defaultmap' and 
'gfc_omp_defaultmap_category') [-Wenum-compare-switch]

Can you please take a look?
Martin


Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-06-23 Thread Martin Liška

On 5/21/21 10:29 AM, Martin Liška wrote:

On 5/20/21 5:55 PM, Jan Hubicka wrote:

Quick solution is to also modify partitioner to use the local symbol
names when doing incremental linking (those mixing in source code and
random seeds) to avoid clashes.


Good hint. I added hash based on object file name (I don't want to handle
proper string escaping) and -frandom-seed.

What do you think about the patch?
Thanks,
Martin


@Honza: Can you please take a look at this patch?

Cheers,
Martin


[PATCH] refactor SLP permute propagation

2021-06-23 Thread Richard Biener
This refactors SLP permute propagation to record the outgoing permute
separately from the incoming/materialized one.  Instead of separate
arrays/bitmaps I've now created a struct to represent the state.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

WDYT?

Richard.

2021-06-23  Richard Biener  

* tree-vect-slp.c (slpg_vertex): New struct.
(vect_slp_build_vertices): Adjust.
(vect_optimize_slp): Likewise.  Maintain an outgoing permute
and a materialized one.
---
 gcc/tree-vect-slp.c | 91 -
 1 file changed, 49 insertions(+), 42 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 6c98acbe722..29db56ed532 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3467,11 +3467,27 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
   return opt_result::success ();
 }
 
+struct slpg_vertex
+{
+  slpg_vertex (slp_tree node_)
+: node (node_), visited (0), perm_out (0), materialize (0) {}
+
+  int get_perm_in () const { return materialize ? materialize : perm_out; }
+
+  slp_tree node;
+  unsigned visited : 1;
+  /* The permutation on the outgoing lanes (towards SLP parents).  */
+  int perm_out;
+  /* The permutation that is applied by this node.  perm_out is
+ relative to this.  */
+  int materialize;
+};
+
 /* Fill the vertices and leafs vector with all nodes in the SLP graph.  */
 
 static void
 vect_slp_build_vertices (hash_set , slp_tree node,
-vec , vec )
+vec , vec )
 {
   unsigned i;
   slp_tree child;
@@ -3480,7 +3496,7 @@ vect_slp_build_vertices (hash_set , 
slp_tree node,
 return;
 
   node->vertex = vertices.length ();
-  vertices.safe_push (node);
+  vertices.safe_push (slpg_vertex (node));
 
   bool leaf = true;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
@@ -3496,7 +3512,7 @@ vect_slp_build_vertices (hash_set , 
slp_tree node,
 /* Fill the vertices and leafs vector with all nodes in the SLP graph.  */
 
 static void
-vect_slp_build_vertices (vec_info *info, vec ,
+vect_slp_build_vertices (vec_info *info, vec ,
 vec )
 {
   hash_set visited;
@@ -3568,39 +3584,28 @@ vect_optimize_slp (vec_info *vinfo)
 
   slp_tree node;
   unsigned i;
-  auto_vec vertices;
+  auto_vec vertices;
   auto_vec leafs;
   vect_slp_build_vertices (vinfo, vertices, leafs);
 
   struct graph *slpg = new_graph (vertices.length ());
-  FOR_EACH_VEC_ELT (vertices, i, node)
-{
-  unsigned j;
-  slp_tree child;
-  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child)
-   if (child)
- add_edge (slpg, i, child->vertex);
-}
+  for (slpg_vertex  : vertices)
+for (slp_tree child : SLP_TREE_CHILDREN (v.node))
+  if (child)
+   add_edge (slpg, v.node->vertex, child->vertex);
 
   /* Compute (reverse) postorder on the inverted graph.  */
   auto_vec ipo;
   graphds_dfs (slpg, [0], leafs.length (), , false, NULL, NULL);
 
-  auto_sbitmap n_visited (vertices.length ());
-  auto_sbitmap n_materialize (vertices.length ());
-  auto_vec n_perm (vertices.length ());
   auto_vec > perms;
-
-  bitmap_clear (n_visited);
-  bitmap_clear (n_materialize);
-  n_perm.quick_grow_cleared (vertices.length ());
   perms.safe_push (vNULL); /* zero is no permute */
 
   /* Produce initial permutations.  */
   for (i = 0; i < leafs.length (); ++i)
 {
   int idx = leafs[i];
-  slp_tree node = vertices[idx];
+  slp_tree node = vertices[idx].node;
 
   /* Handle externals and constants optimistically throughout the
 iteration.  */
@@ -3611,7 +3616,7 @@ vect_optimize_slp (vec_info *vinfo)
   /* Leafs do not change across iterations.  Note leafs also double
 as entries to the reverse graph.  */
   if (!slpg->vertices[idx].succ)
-   bitmap_set_bit (n_visited, idx);
+   vertices[idx].visited = 1;
   /* Loads are the only thing generating permutes.  */
   if (!SLP_TREE_LOAD_PERMUTATION (node).exists ())
continue;
@@ -3660,7 +3665,7 @@ vect_optimize_slp (vec_info *vinfo)
   for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j)
perm[j] = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin;
   perms.safe_push (perm);
-  n_perm[idx] = perms.length () - 1;
+  vertices[idx].perm_out = perms.length () - 1;
 }
 
   /* Propagate permutes along the graph and compute materialization points.  */
@@ -3674,13 +3679,13 @@ vect_optimize_slp (vec_info *vinfo)
   for (i = vertices.length (); i > 0 ; --i)
{
  int idx = ipo[i-1];
- slp_tree node = vertices[idx];
+ slp_tree node = vertices[idx].node;
  /* For leafs there's nothing to do - we've seeded permutes
 on those above.  */
  if (SLP_TREE_DEF_TYPE (node) != vect_internal_def)
continue;
 
- bitmap_set_bit (n_visited, idx);
+ vertices[idx].visited = 1;
 
  /* We cannot move a 

Re: [PATCH] ARM: reset arm_fp16_format

2021-06-23 Thread Martin Liška

On 6/1/21 5:35 PM, Richard Earnshaw wrote:

My initial reaction is 'that can't be right'.

How would -mfp16-format=alternative ever work in this case?


You are right. I'm going to revert the hunk done
in g:ebd5e86c0f41dc1d692f9b2b68a510b1f6835a3e

Martin
>From 371c1992624c9269e2d5747561a8b27b30e485ee Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 23 Jun 2021 15:30:17 +0200
Subject: [PATCH] arm: Revert partially
 ebd5e86c0f41dc1d692f9b2b68a510b1f6835a3e

	PR target/98636

gcc/ChangeLog:

	* optc-save-gen.awk: Put back arm_fp16_format to
	checked_options.
---
 gcc/optc-save-gen.awk | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index e2a9a496bfd..e363ac731e1 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -1442,6 +1442,8 @@ checked_options["flag_omit_frame_pointer"]++
 checked_options["TARGET_ALIGN_CALL"]++
 checked_options["TARGET_CASE_VECTOR_PC_RELATIVE"]++
 checked_options["arc_size_opt_level"]++
+# arm exceptions
+checked_options["arm_fp16_format"]++
 
 
 for (i = 0; i < n_opts; i++) {
-- 
2.32.0



Re: [PATCH] tree-inline: Fix TREE_READONLY of parameter replacements

2021-06-23 Thread Richard Biener
On Wed, 23 Jun 2021, Martin Jambor wrote:

> Hi,
> 
> On Wed, Jun 23 2021, Richard Biener wrote:
> > On Wed, 23 Jun 2021, Martin Jambor wrote:
> >
> >> Hi,
> >> 
> >> tree-inline leaves behind VAR_DECLs which are TREE_READONLY (because
> >> they are copies of const parameters) but are written to because they
> >> need to be initialized.  This patch resets the flag if any
> >> initialization is performed, regardless of if the type needs
> >> construction or not.
> >
> > I wonder if even that is premature optimization - it also removes
> > a still useful comment.  So why not at the same place simply do
> >
> >   /* Even if P was TREE_READONLY, the new VAR should not be.
> >  In the original code, we would have constructed a
> >  temporary, and then the function body would have never
> >  changed the value of P.  However, now, we will be
> >  constructing VAR directly.  Therefore, it must not
> >  be TREE_READONLY.  */ 
> >   TREE_READONLY (var) = 0;
> >
> > ?  Since technically when in SSA form you wouldn't need to bother
> > either, so your placement looks both premature and not good enough
> > optimization.
> >
> > Did you check the above suggestion already and it failed for some
> > reason?
> >
> 
> No, I believe the above would work fine.  I just looked at the code and
> the TREE_READONLY resetting is now just above two early exits which only
> do an insert_init_debug_bind, so I thought it made sense to move the
> reset of TREE_READONLY only to the same branch which actually generates
> a write to var.

Yeah, but we _do_ create the var and register it as copy of p earlier

  insert_decl_map (id, p, var);

so it seems consistent to adjust things whether or not there are
actual writes.

> But only because I thought it would be more consistent, and "feel" more
> logical, not as an optimization.  I do not think it makes any difference
> in practice.  So if your feelings are different, I can certainly leave
> it where it is.
> 
> As far as the comment is concerned, I like having comments about almost
> everything, but this one seemed to me a bit too obvious, especially with
> the TYPE_NEEDS_CONSTRUCTING condition removed.  But I can certainly keep
> it too.

Yes to both please.  OK with those changes.

Thanks,
Richard.


[PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-06-23 Thread Martin Liška

Hello.

As mentioned in the "Fallout: save/restore target options in 
handle_optimize_attribute"
thread, we need to support target option restore of 
rs6000_long_double_type_size == FLOAT_PRECISION_TFmode.

Patch can bootstrap on ppc64le-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_option_override_internal): When
a target option is restored, it can have
rs6000_long_double_type_size set to FLOAT_PRECISION_TFmode.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pragma-optimize.c: New test.
---
 gcc/config/rs6000/rs6000.c |  2 ++
 gcc/testsuite/gcc.target/powerpc/pragma-optimize.c | 14 ++
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pragma-optimize.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 835af7708f9..2c811480db9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4185,6 +4185,8 @@ rs6000_option_override_internal (bool global_init_p)
   else
rs6000_long_double_type_size = default_long_double_size;
 }
+  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
+; /* The option can be restored a TREE_TARGET_OPTION.  */
   else if (rs6000_long_double_type_size == 128)
 rs6000_long_double_type_size = FLOAT_PRECISION_TFmode;
   else if (global_options_set.x_rs6000_ieeequad)
diff --git a/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c 
b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
new file mode 100644
index 000..629bfcee0ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-linux* } } } */
+/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */
+
+extern unsigned long int x;
+extern float f (float);
+extern __typeof (f) f_power8;
+extern __typeof (f) f_power9;
+extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc")));
+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof (f) *
+f_ifunc (void)
+{
+  __typeof (f) *res = x ? f_power9 : f_power8;
+  return res;
+}
--
2.32.0



Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-23 Thread Jonathan Wakely via Gcc-patches

On 23/06/21 12:45 +0100, Jonathan Wakely wrote:

On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:

I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

  * include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
   const bool __is_multiple_of_100
 = __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.

 ^^
 N.B. this comment should say !=


+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;


If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

 return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

 return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

 return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC 
(see https://gcc.gnu.org/PR101179 regarding that). But neither of

these seems to improve compared to my first rearrangement above.


This version from Ulrich Drepper produces the smallest code of all
(and also performs well, if not the absolute fastest) in my
benchmarks:

  return (y & (__is_multiple_of_100 ? 15 : 3)) == 0;




[PATCH] tree-optimization/101173 - fix interchange dependence checking

2021-06-23 Thread Richard Biener
This adjusts the loop interchange dependence checking to disallow
an outer loop dependence distance of zero.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-06-23  Richard Biener  

PR tree-optimization/101173
* gimple-loop-interchange.cc
(tree_loop_interchange::valid_data_dependences): Disallow outer
loop dependence distance of zero.

* gcc.dg/torture/pr101173.c: New testcase.
---
 gcc/gimple-loop-interchange.cc  |  4 ++--
 gcc/testsuite/gcc.dg/torture/pr101173.c | 18 ++
 2 files changed, 20 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101173.c

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 80f749b6071..43045c5455e 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -1043,8 +1043,8 @@ tree_loop_interchange::valid_data_dependences (unsigned 
i_idx, unsigned o_idx,
continue;
 
  /* Be conservative, skip case if either direction at i_idx/o_idx
-levels is not '=' or '<'.  */
- if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0)
+levels is not '=' (for the inner loop) or '<'.  */
+ if (dist_vect[i_idx] < 0 || dist_vect[o_idx] <= 0)
return false;
}
 }
diff --git a/gcc/testsuite/gcc.dg/torture/pr101173.c 
b/gcc/testsuite/gcc.dg/torture/pr101173.c
new file mode 100644
index 000..0c9090d6686
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101173.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-additional-options "-floop-interchange" } */
+
+int a[6][9];
+int main()
+{
+  a[1][3] = 8;
+  for (int b = 1; b <= 5; b++)
+for (int d = 0; d <= 5; d++)
+#pragma GCC unroll 0
+  for (int c = 0; c <= 5; c++)
+a[b][c] = a[b][c + 2] & 216;
+  for (int e = 0; e < 6; e++)
+for (int f = 0; f < 9; f++)
+  if (a[e][f] != 0)
+__builtin_abort ();
+  return 0;
+}
-- 
2.26.2


Re: [PATCH] inline: do not inline when no_profile_instrument_function is different

2021-06-23 Thread Martin Liška

On 6/23/21 2:38 PM, Jan Hubicka wrote:

Is there reason to prevent the inlining once instrumentation is done?


No ;)


I think you can just block it for early inliner.


Sure. Do you have a handy predicate function that tells if einliner is done?

Thanks,
Martin


Re: GCC documentation: porting to Sphinx

2021-06-23 Thread Martin Liška

Hello.

I've just made a first version of the patchset that sits in GCC source tree:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/sphinx-v2

Changes since the last submission:
1) I made a brief proofreading and fixed most of the formatting and other issues
2) target hook documentation was ported to RST format and gcc/genhooks 
generated a tm.rst.in
   that is then used via .. include:: directives
3) sphinx-build detection is implemented in gcc/configure.ac
3) make integration is done (currently covering info and man pages) for gcc and 
gcc/fortran targets

As before, one can see the result of generated output here:
https://splichal.eu/scripts/sphinx/

Known limitations:
1) I found a bug in man page generation that is currently fixed:
https://github.com/sphinx-doc/sphinx/issues/1860#issuecomment-861793094
Note the fix will appear in the upcoming 4.1.0 release. Without the patch, one 
can see wrong
font selection in a generated manual page

2) Right now, I rely on caching capability of sphinx-build. That means when no 
source change is detected,
sphinx-build exits immediately. However, it's not working for all targets 
(info, man) and I've suggested
a patch for it:
https://github.com/sphinx-doc/sphinx/issues/9359

3) I haven't prepared patch for .texi removal (should be quite easily doable).

4) the currently existing Sphinx manuals (Ada and libgccjit) are not integrated 
yet.

@Joseph: Can you share your thoughts about the used Makefile integration? What 
do you suggest for 2)
(note that explicit listing of all .rst file would be crazy)?

Thoughts?
Thanks,
Martin




Re: [PATCH] inline: do not inline when no_profile_instrument_function is different

2021-06-23 Thread Jan Hubicka
> Hello.
> 
> Similarly to e.g. sanitizer attributes, we sould prevent inlining when one 
> function
> is marked as not instrumented. We should do that with -fprofile-generate only.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> Adds test-case for:
>   PR gcov-profile/80223
> 
> gcc/ChangeLog:
> 
>   * ipa-inline.c (can_inline_edge_p): Similarly to sanitizer
>   options, do not inline when no_profile_instrument_function
>   attributes are different.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/no_profile_instrument_function-attr-2.c: New test.
> ---
>  gcc/ipa-inline.c  | 10 ++
>  .../no_profile_instrument_function-attr-2.c   | 15 +++
>  2 files changed, 25 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c
> 
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 9d896beb2ac..786ab2e7f7f 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -396,6 +396,16 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
>e->inline_failed = CIF_SANITIZE_ATTRIBUTE_MISMATCH;
>inlinable = false;
>  }
> +  else if (profile_arc_flag
> +&& lookup_attribute ("no_profile_instrument_function",
> + DECL_ATTRIBUTES (caller->decl))
> +   != lookup_attribute ("no_profile_instrument_function",
> +DECL_ATTRIBUTES (callee->decl)))
> +{
> +  e->inline_failed = CIF_UNSPECIFIED;
> +  inlinable = false;
> +}

Is there reason to prevent the inlining once instrumentation is done?
I think you can just block it for early inliner.

Honza
> +
>if (!inlinable && report)
>  report_inline_failed_reason (e);
>return inlinable;
> diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c 
> b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c
> new file mode 100644
> index 000..586962a1c76
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c
> @@ -0,0 +1,15 @@
> +/* { dg-require-effective-target global_constructor } */
> +/* { dg-options "-O2 -fprofile-generate -fprofile-update=single 
> -fdump-tree-optimized" } */
> +
> +__attribute__ ((no_profile_instrument_function))
> +int foo()
> +{
> +  return 0;
> +}
> +
> +int bar()
> +{
> +  return foo();
> +}
> +
> +/* { dg-final { scan-tree-dump-times " = foo \\(\\)" 1 "optimized"} } */
> -- 
> 2.32.0
> 


[PATCH] inline: do not inline when no_profile_instrument_function is different

2021-06-23 Thread Martin Liška

Hello.

Similarly to e.g. sanitizer attributes, we sould prevent inlining when one 
function
is marked as not instrumented. We should do that with -fprofile-generate only.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

Adds test-case for:
PR gcov-profile/80223

gcc/ChangeLog:

* ipa-inline.c (can_inline_edge_p): Similarly to sanitizer
options, do not inline when no_profile_instrument_function
attributes are different.

gcc/testsuite/ChangeLog:

* gcc.dg/no_profile_instrument_function-attr-2.c: New test.
---
 gcc/ipa-inline.c  | 10 ++
 .../no_profile_instrument_function-attr-2.c   | 15 +++
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c

diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 9d896beb2ac..786ab2e7f7f 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -396,6 +396,16 @@ can_inline_edge_p (struct cgraph_edge *e, bool report,
   e->inline_failed = CIF_SANITIZE_ATTRIBUTE_MISMATCH;
   inlinable = false;
 }
+  else if (profile_arc_flag
+  && lookup_attribute ("no_profile_instrument_function",
+   DECL_ATTRIBUTES (caller->decl))
+ != lookup_attribute ("no_profile_instrument_function",
+  DECL_ATTRIBUTES (callee->decl)))
+{
+  e->inline_failed = CIF_UNSPECIFIED;
+  inlinable = false;
+}
+
   if (!inlinable && report)
 report_inline_failed_reason (e);
   return inlinable;
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c 
b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c
new file mode 100644
index 000..586962a1c76
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c
@@ -0,0 +1,15 @@
+/* { dg-require-effective-target global_constructor } */
+/* { dg-options "-O2 -fprofile-generate -fprofile-update=single 
-fdump-tree-optimized" } */
+
+__attribute__ ((no_profile_instrument_function))
+int foo()
+{
+  return 0;
+}
+
+int bar()
+{
+  return foo();
+}
+
+/* { dg-final { scan-tree-dump-times " = foo \\(\\)" 1 "optimized"} } */
--
2.32.0



Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-23 Thread Jonathan Wakely via Gcc-patches

On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:

I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

   * include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
const bool __is_multiple_of_100
  = __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.

  ^^
  N.B. this comment should say !=


+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;


If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

  return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

  return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

  return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC 
(see https://gcc.gnu.org/PR101179 regarding that). But neither of

these seems to improve compared to my first rearrangement above.




[PATCH] combine: Check for paradoxical subreg

2021-06-23 Thread Robin Dapp via Gcc-patches

Hi,

while evaluating another patch that introduces more lvalue paradoxical 
subregs I ran into an ICE in combine at


  wide_int o = wi::insert (rtx_mode_t (outer, temp_mode),
   rtx_mode_t (inner, dest_mode),
   offset, width);

because width (=GET_MODE_PRECISION (dest_mode)) > GET_MODE_PRECISION 
(temp_mode).


From the comments and the code it looks like we do not want to handle a 
paradoxical subreg here so I quickly added a check for it which prevents 
the ICE.  The check could also be added to reg_subword_p () I guess.


Is this the right thing to do or am I missing something, i.e. my other 
patch might be doing something it should not?


Bootstrap on s390x was successful and testsuite is looking good so far.

Regards
 Robin

--

gcc/ChangeLog:

* combine.c (try_combine): Check for paradoxical subreg.
>From 2b74c66f8d97c30c45e99336c9871cccd8cf94e1 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Tue, 22 Jun 2021 17:35:37 +0200
Subject: [PATCH 11/11] combine paradoxical subreg fix.

---
 gcc/combine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 6476812a212..de04560db21 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2782,7 +2782,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
   && CONST_SCALAR_INT_P (SET_SRC (temp_expr))
   && GET_CODE (PATTERN (i3)) == SET
   && CONST_SCALAR_INT_P (SET_SRC (PATTERN (i3)))
-  && reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp_expr)))
+  && reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp_expr))
+  && !paradoxical_subreg_p (SET_DEST (PATTERN (i3
 {
   rtx dest = SET_DEST (PATTERN (i3));
   rtx temp_dest = SET_DEST (temp_expr);
-- 
2.31.1



Re: [PATCH] tree-inline: Fix TREE_READONLY of parameter replacements

2021-06-23 Thread Martin Jambor
Hi,

On Wed, Jun 23 2021, Richard Biener wrote:
> On Wed, 23 Jun 2021, Martin Jambor wrote:
>
>> Hi,
>> 
>> tree-inline leaves behind VAR_DECLs which are TREE_READONLY (because
>> they are copies of const parameters) but are written to because they
>> need to be initialized.  This patch resets the flag if any
>> initialization is performed, regardless of if the type needs
>> construction or not.
>
> I wonder if even that is premature optimization - it also removes
> a still useful comment.  So why not at the same place simply do
>
>   /* Even if P was TREE_READONLY, the new VAR should not be.
>  In the original code, we would have constructed a
>  temporary, and then the function body would have never
>  changed the value of P.  However, now, we will be
>  constructing VAR directly.  Therefore, it must not
>  be TREE_READONLY.  */ 
>   TREE_READONLY (var) = 0;
>
> ?  Since technically when in SSA form you wouldn't need to bother
> either, so your placement looks both premature and not good enough
> optimization.
>
> Did you check the above suggestion already and it failed for some
> reason?
>

No, I believe the above would work fine.  I just looked at the code and
the TREE_READONLY resetting is now just above two early exits which only
do an insert_init_debug_bind, so I thought it made sense to move the
reset of TREE_READONLY only to the same branch which actually generates
a write to var.

But only because I thought it would be more consistent, and "feel" more
logical, not as an optimization.  I do not think it makes any difference
in practice.  So if your feelings are different, I can certainly leave
it where it is.

As far as the comment is concerned, I like having comments about almost
everything, but this one seemed to me a bit too obvious, especially with
the TYPE_NEEDS_CONSTRUCTING condition removed.  But I can certainly keep
it too.

Martin



Re: [PATCH v3] libstdc++: Improve std::lock algorithm

2021-06-23 Thread Christophe LYON via Gcc-patches



On 23/06/2021 12:11, Jonathan Wakely wrote:

On Wed, 23 Jun 2021 at 10:43, Christophe LYON wrote:


On 23/06/2021 11:17, Jonathan Wakely via Libstdc++ wrote:

On Wed, 23 Jun 2021 at 08:21, Christophe Lyon wrote:

This patch causes GCC build failures for bare-metal targets
(aarch64-elf, arm-eabi). For instance on arm-eabi, I'm seeing:

In file included from
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
   from
/tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
In function 'int std::__detail::__try_lock_impl(_Lockable&)':
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
error: expected primary-expression before ',' token
522 | if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
| ^
In file included from
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
   from
/tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
In function 'int std::__detail::__try_lock_impl(_Lockable&)':
/tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
error: expected primary-expression before ',' token
522 | if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
| ^
make[4]: *** [Makefile:1862:
arm-none-eabi/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1

Sigh, __lockable is a macro in newlib:

newlib/libc/include/sys/cdefs.h:#define __lockable
__lock_annotate(lockable)

I'll change the name.

Thanks (sorry for being back ;-)

Fixed with this patch, tested x86_64 and pushed to trunk.


Thanks for the prompt fix, I confirm it's OK again on my side.


Christophe




[PATCH] i386: Prevent unwanted combine from LZCNT to BSR [PR101175]

2021-06-23 Thread Uros Bizjak via Gcc-patches
The current RTX pattern for BSR allows combine pass to convert LZCNT insn
to BSR. Note that the LZCNT has a defined behavior to return the operand
size when operand is zero, where BSR has not.

Add a BSR specific setting of zero-flag to RTX pattern of BSR insn
in order to avoid matching unwanted combinations.

2021-06-23  Uroš Bizjak  

gcc/
PR target/101175
* config/i386/i386.md (bsr_rex64): Add zero-flag setting RTX.
(bsr): Ditto.
(*bsrhi): Remove.
(clz2): Update RTX pattern for additions.

gcc/testsuite/

PR target/101175
* gcc.target/i386/pr101175.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master, will be backported to release branches.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 700c158a432..4e242105719 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14533,10 +14533,12 @@ (define_insn "*ctzsi2_zext_falsedep"
(set_attr "mode" "SI")])
 
 (define_insn "bsr_rex64"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ (match_operand:DI 1 "nonimmediate_operand" "rm")
+(const_int 0)))
+   (set (match_operand:DI 0 "register_operand" "=r")
(minus:DI (const_int 63)
- (clz:DI (match_operand:DI 1 "nonimmediate_operand" "rm"
-   (clobber (reg:CC FLAGS_REG))]
+ (clz:DI (match_dup 1]
   "TARGET_64BIT"
   "bsr{q}\t{%1, %0|%0, %1}"
   [(set_attr "type" "alu1")
@@ -14545,10 +14547,12 @@ (define_insn "bsr_rex64"
(set_attr "mode" "DI")])
 
 (define_insn "bsr"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ (match_operand:SI 1 "nonimmediate_operand" "rm")
+(const_int 0)))
+   (set (match_operand:SI 0 "register_operand" "=r")
(minus:SI (const_int 31)
- (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm"
-   (clobber (reg:CC FLAGS_REG))]
+ (clz:SI (match_dup 1]
   ""
   "bsr{l}\t{%1, %0|%0, %1}"
   [(set_attr "type" "alu1")
@@ -14556,25 +14560,15 @@ (define_insn "bsr"
(set_attr "znver1_decode" "vector")
(set_attr "mode" "SI")])
 
-(define_insn "*bsrhi"
-  [(set (match_operand:HI 0 "register_operand" "=r")
-   (minus:HI (const_int 15)
- (clz:HI (match_operand:HI 1 "nonimmediate_operand" "rm"
-   (clobber (reg:CC FLAGS_REG))]
-  ""
-  "bsr{w}\t{%1, %0|%0, %1}"
-  [(set_attr "type" "alu1")
-   (set_attr "prefix_0f" "1")
-   (set_attr "znver1_decode" "vector")
-   (set_attr "mode" "HI")])
-
 (define_expand "clz2"
   [(parallel
- [(set (match_operand:SWI48 0 "register_operand")
+ [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ (match_operand:SWI48 1 "nonimmediate_operand" "rm")
+(const_int 0)))
+  (set (match_operand:SWI48 0 "register_operand")
   (minus:SWI48
 (match_dup 2)
-(clz:SWI48 (match_operand:SWI48 1 "nonimmediate_operand"
-  (clobber (reg:CC FLAGS_REG))])
+(clz:SWI48 (match_dup 1])
(parallel
  [(set (match_dup 0) (xor:SWI48 (match_dup 0) (match_dup 2)))
   (clobber (reg:CC FLAGS_REG))])]
diff --git a/gcc/testsuite/gcc.target/i386/pr101175.c 
b/gcc/testsuite/gcc.target/i386/pr101175.c
new file mode 100644
index 000..ed7a08110a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101175.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mlzcnt" } */
+/* { dg-require-effective-target lzcnt } */
+
+#include "lzcnt-check.h"
+
+static int
+foo (unsigned int v)
+{
+  return v ? __builtin_clz (v) : 32;
+}
+
+/* returns -1 if x == 0 */
+int
+__attribute__ ((noinline, noclone))
+bar (unsigned int x)
+{
+  return 31 - foo (x);
+}
+
+static void
+lzcnt_test ()
+{
+  int r = bar (0);
+
+  if (r != -1)
+abort ();
+}


[PATCH] tree-optimization/101105 - fix runtime alias test optimization

2021-06-23 Thread Richard Biener
We were ignoring DR_STEP for VF == 1 which is OK only in case
the scalar order is preserved or both DR steps are the same.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2021-06-23  Richard Biener  

PR tree-optimization/101105
* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list):
Only ignore steps when they are equal or scalar order is preserved.

* gcc.dg/torture/pr101105.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr101105.c | 19 +++
 gcc/tree-vect-data-refs.c   |  9 +++--
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101105.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr101105.c 
b/gcc/testsuite/gcc.dg/torture/pr101105.c
new file mode 100644
index 000..9222351683d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101105.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+
+short a;
+int b[5][4] = {2, 2};
+int d;
+short e(int f) { return f == 0 || a && f == 1 ? 0 : a; }
+int main() {
+  int g, h;
+  g = 3;
+  for (; g >= 0; g--) {
+h = 3;
+for (; h >= 0; h--)
+  b[g][h] = b[0][1] && e(1);
+  }
+  d = b[0][1];
+  if (d != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index be067c8923b..579149dfd61 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3479,9 +3479,9 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
loop_vinfo)
   /* Step values are irrelevant for aliasing if the number of vector
  iterations is equal to the number of scalar iterations (which can
  happen for fully-SLP loops).  */
-  bool ignore_step_p = known_eq (LOOP_VINFO_VECT_FACTOR (loop_vinfo), 1U);
+  bool vf_one_p = known_eq (LOOP_VINFO_VECT_FACTOR (loop_vinfo), 1U);
 
-  if (!ignore_step_p)
+  if (!vf_one_p)
 {
   /* Convert the checks for nonzero steps into bound tests.  */
   tree value;
@@ -3534,6 +3534,11 @@ vect_prune_runtime_alias_test_list (loop_vec_info 
loop_vinfo)
 
   bool preserves_scalar_order_p
= vect_preserves_scalar_order_p (dr_info_a, dr_info_b);
+  bool ignore_step_p
+ = (vf_one_p
+&& (preserves_scalar_order_p
+|| operand_equal_p (DR_STEP (dr_info_a->dr),
+DR_STEP (dr_info_b->dr;
 
   /* Skip the pair if inter-iteration dependencies are irrelevant
 and intra-iteration dependencies are guaranteed to be honored.  */
-- 
2.26.2


Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-23 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, Jun 23, 2021 at 7:23 AM Trevor Saunders  wrote:
>>
>> On Tue, Jun 22, 2021 at 02:01:24PM -0600, Martin Sebor wrote:
>> > On 6/21/21 1:15 AM, Richard Biener wrote:
> [...]
>> > >
>> > > But maybe I'm misunderstanding C++ too much :/
>> > >
>> > > Well, I guess b) from above means auto_vec<> passing to
>> > > vec<> taking functions will need changes?
>> >
>> > Converting an auto_vec object to a vec slices off its data members.
>> > The auto_vec specialization has no data members so that's not
>> > a bug in and of itself, but auto_vec does have data members
>> > so that would be a bug.  The risk is not just passing it to
>> > functions by value but also returning it.  That risk was made
>> > worse by the addition of the move ctor.
>>
>> I would agree that the conversion from auto_vec<> to vec<> is
>> questionable, and should get some work at some point, perhaps just
>> passingauto_vec references is good enough, or perhaps there is value in
>> some const_vec view to avoid having to rely on optimizations, I'm not
>> sure without looking more at the usage.
>
> We do need to be able to provide APIs that work with both auto_vec<>
> and vec<>, I agree that those currently taking a vec<> by value are
> fragile (and we've had bugs there before), but I'm not ready to say
> that changing them all to [const] vec<>& is OK.  The alternative
> would be passing a const_vec<> by value, passing that along to
> const vec<>& APIs should be valid then (I can see quite some API
> boundary cleanups being necessary here ...).

FWIW, as far as const_vec<> goes, we already have array_slice, which is
mostly a version of std::span.  (The only real reason for not using
std::span itself is that it requires a later version of C++.)

Taking those as arguments has the advantage that we can pass normal
arrays as well.

Thanks,
Richard


Re: [PATCH] tree-inline: Fix TREE_READONLY of parameter replacements

2021-06-23 Thread Richard Biener
On Wed, 23 Jun 2021, Martin Jambor wrote:

> Hi,
> 
> tree-inline leaves behind VAR_DECLs which are TREE_READONLY (because
> they are copies of const parameters) but are written to because they
> need to be initialized.  This patch resets the flag if any
> initialization is performed, regardless of if the type needs
> construction or not.

I wonder if even that is premature optimization - it also removes
a still useful comment.  So why not at the same place simply do

  /* Even if P was TREE_READONLY, the new VAR should not be.
 In the original code, we would have constructed a
 temporary, and then the function body would have never
 changed the value of P.  However, now, we will be
 constructing VAR directly.  Therefore, it must not
 be TREE_READONLY.  */ 
  TREE_READONLY (var) = 0;

?  Since technically when in SSA form you wouldn't need to bother
either, so your placement looks both premature and not good enough
optimization.

Did you check the above suggestion already and it failed for some
reason?

Thanks,
Richard

> There are other sources of variables which are incorrectly marked as
> TREE_READOLY, but with this patch and a verifier catching them I can
> at least compile the Ada run-time library.
> 
> Bootstrapped and tested on x86_64-linux and aarch64-linux, LTO bootstrap
> on x86_64-linux is ongoing.  OK for trunk if it passes?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2021-06-22  Martin Jambor  
> 
>   * tree-inline.c (setup_one_parameter): Set TREE_READONLY of the
>   param replacement to zero if it is initialized, regardless if it
>   needs constructing.
> ---
>  gcc/tree-inline.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 4a0dc3b6b60..ac8abcd400b 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -3491,18 +3491,6 @@ setup_one_parameter (copy_body_data *id, tree p, tree 
> value, tree fn,
>   automatically replaced by the VAR_DECL.  */
>insert_decl_map (id, p, var);
>  
> -  /* Even if P was TREE_READONLY, the new VAR should not be.
> - In the original code, we would have constructed a
> - temporary, and then the function body would have never
> - changed the value of P.  However, now, we will be
> - constructing VAR directly.  The constructor body may
> - change its value multiple times as it is being
> - constructed.  Therefore, it must not be TREE_READONLY;
> - the back-end assumes that TREE_READONLY variable is
> - assigned to only once.  */
> -  if (TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (p)))
> -TREE_READONLY (var) = 0;
> -
>tree rhs = value;
>if (value
>&& value != error_mark_node
> @@ -3549,6 +3537,7 @@ setup_one_parameter (copy_body_data *id, tree p, tree 
> value, tree fn,
> return insert_init_debug_bind (id, bb, var, rhs, NULL);
>   }
>  
> +  TREE_READONLY (var) = 0;
>STRIP_USELESS_TYPE_CONVERSION (rhs);
>  
>/* If we are in SSA form properly remap the default definition
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [gitignore] Add cscope.out to .gitignore

2021-06-23 Thread Martin Liška

On 6/23/21 11:30 AM, Prathamesh Kulkarni via Gcc-patches wrote:

Hi,
This patch adds an entry for cscope.out in .gitignore.
OK to commit ?


Sure.

Note that clangd provides much better support than legacy C cscope.

Cheers,
Martin



Thanks,
Prathamesh





Re: [PATCH v3] libstdc++: Improve std::lock algorithm

2021-06-23 Thread Jonathan Wakely via Gcc-patches
On Wed, 23 Jun 2021 at 10:43, Christophe LYON wrote:
>
>
> On 23/06/2021 11:17, Jonathan Wakely via Libstdc++ wrote:
> > On Wed, 23 Jun 2021 at 08:21, Christophe Lyon wrote:
> >> This patch causes GCC build failures for bare-metal targets
> >> (aarch64-elf, arm-eabi). For instance on arm-eabi, I'm seeing:
> >>
> >> In file included from
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
> >>   from
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
> >> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
> >> error: expected primary-expression before ',' token
> >>522 | if (unique_lock<_Lockable> __lock{__lockable, 
> >> try_to_lock})
> >>| ^
> >> In file included from
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/future:38,
> >>   from
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/include/precompiled/stdc++.h:105:
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:
> >> In function 'int std::__detail::__try_lock_impl(_Lockable&)':
> >> /tmp/1229695_7.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/arm-none-eabi/libstdc++-v3/include/mutex:522:53:
> >> error: expected primary-expression before ',' token
> >>522 | if (unique_lock<_Lockable> __lock{__lockable, 
> >> try_to_lock})
> >>| ^
> >> make[4]: *** [Makefile:1862:
> >> arm-none-eabi/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
> > Sigh, __lockable is a macro in newlib:
> >
> > newlib/libc/include/sys/cdefs.h:#define __lockable
> > __lock_annotate(lockable)
> >
> > I'll change the name.
>
> Thanks (sorry for being back ;-)

Fixed with this patch, tested x86_64 and pushed to trunk.
commit 75404109dce57d2f8dac0f90808010233928418f
Author: Jonathan Wakely 
Date:   Wed Jun 23 11:05:51 2021

libstdc++: Avoid "__lockable" name defined as macro by newlib

libstdc++-v3/ChangeLog:

* include/std/mutex (__detail::__try_lock_impl): Rename
parameter to avoid clashing with newlib's __lockable macro.
(try_lock): Add 'inline' specifier.
* testsuite/17_intro/names.cc: Add check for __lockable.
* testsuite/30_threads/try_lock/5.cc: Add options for pthreads.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index c18ca1a1955..eeb51fdb840 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -517,9 +517,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // Lock the last lockable, after all previous ones are locked.
 template
   inline int
-  __try_lock_impl(_Lockable& __lockable)
+  __try_lock_impl(_Lockable& __l)
   {
-   if (unique_lock<_Lockable> __lock{__lockable, try_to_lock})
+   if (unique_lock<_Lockable> __lock{__l, try_to_lock})
  {
__lock.release();
return -1;
@@ -585,7 +585,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  Sequentially calls try_lock() on each argument.
*/
   template
-int
+inline int
 try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
 {
   return __detail::__try_lock_impl(__l1, __l2, __l3...);
diff --git a/libstdc++-v3/testsuite/17_intro/names.cc 
b/libstdc++-v3/testsuite/17_intro/names.cc
index 624e3ed9ccf..534dab70ff5 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -16,6 +16,7 @@
 // .
 
 // { dg-do compile }
+// { dg-add-options no_pch }
 
 // Define macros for some common variables names that we must not use for
 // naming variables, parameters etc. in the library.
@@ -216,6 +217,11 @@
 #undef y
 #endif
 
+#if ! __has_include()
+// newlib's  defines __lockable as a macro, so we can't use it.
+# define __lockablecannot be used as an identifier
+#endif
+
 #ifdef __sun__
 // See https://gcc.gnu.org/ml/libstdc++/2019-05/msg00175.html
 #undef ptr
diff --git a/libstdc++-v3/testsuite/30_threads/try_lock/5.cc 
b/libstdc++-v3/testsuite/30_threads/try_lock/5.cc
index a5574ff01fb..b9ce1cc9b90 100644
--- a/libstdc++-v3/testsuite/30_threads/try_lock/5.cc
+++ b/libstdc++-v3/testsuite/30_threads/try_lock/5.cc
@@ -1,4 +1,7 @@
-// { dg-do run { target c++11 } }
+// { dg-do run }
+// { dg-additional-options "-pthread" { target pthread } }
+// { dg-require-effective-target c++11 }
+// { dg-require-gthreads "" 

[PATCH] Emit .file 0 marker earlier in DWARF 5

2021-06-23 Thread Eric Botcazou
Hi,

when the assembler supports it, the compiler automatically passes --gdwarf-5
to it, which has an interesting side effect: any assembly instruction prior
to the first .file directive defines a new line associated with .file 0 in
the .debug_line section and, of course, the numbering of these implicit lines
has nothing to do with that of the source code.  This can be problematic in
Ada where we do not generate .file/.loc directives for some compiled-generated
functions to avoid too jumpy a debugging experience.

The attached patch moves the .file 0 marker from the .debug_line section to the
beginning of the file (next to the .cfi_sections marker).

Tested on x86-64/Linux and x86[-64]/Windows, OK for mainline and 11 branch?


2021-06-23  Eric Botcazou  

* dwarf2out.c (dwarf2out_assembly_start): Emit .file 0 marker here...
(dwarf2out_finish): ...instead of here.

-- 
Eric Botcazoudiff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 88eb3f9c455..9a91981acb0 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -29363,6 +29363,30 @@ dwarf2out_assembly_start (void)
   && dwarf2out_do_cfi_asm ()
   && !dwarf2out_do_eh_frame ())
 fprintf (asm_out_file, "\t.cfi_sections\t.debug_frame\n");
+
+#if defined(HAVE_AS_GDWARF_5_DEBUG_FLAG) && defined(HAVE_AS_WORKING_DWARF_N_FLAG)
+  if (output_asm_line_debug_info () && dwarf_version >= 5)
+{
+  /* When gas outputs DWARF5 .debug_line[_str] then we have to
+	 tell it the comp_dir and main file name for the zero entry
+	 line table.  */
+  const char *comp_dir, *filename0;
+
+  comp_dir = comp_dir_string ();
+  if (comp_dir == NULL)
+	comp_dir = "";
+
+  filename0 = get_AT_string (comp_unit_die (), DW_AT_name);
+  if (filename0 == NULL)
+	filename0 = "";
+
+  fprintf (asm_out_file, "\t.file 0 ");
+  output_quoted_string (asm_out_file, remap_debug_filename (comp_dir));
+  fputc (' ', asm_out_file);
+  output_quoted_string (asm_out_file, remap_debug_filename (filename0));
+  fputc ('\n', asm_out_file);
+}
+#endif
 }
 
 /* A helper function for dwarf2out_finish called through
@@ -32315,27 +32339,6 @@ dwarf2out_finish (const char *filename)
   ASM_OUTPUT_LABEL (asm_out_file, debug_line_section_label);
   if (! output_asm_line_debug_info ())
 output_line_info (false);
-  else if (asm_outputs_debug_line_str ())
-{
-  /* When gas outputs DWARF5 .debug_line[_str] then we have to
-	 tell it the comp_dir and main file name for the zero entry
-	 line table.  */
-  const char *comp_dir, *filename0;
-
-  comp_dir = comp_dir_string ();
-  if (comp_dir == NULL)
-	comp_dir = "";
-
-  filename0 = get_AT_string (comp_unit_die (), DW_AT_name);
-  if (filename0 == NULL)
-	filename0 = "";
-
-  fprintf (asm_out_file, "\t.file 0 ");
-  output_quoted_string (asm_out_file, remap_debug_filename (comp_dir));
-  fputc (' ', asm_out_file);
-  output_quoted_string (asm_out_file, remap_debug_filename (filename0));
-  fputc ('\n', asm_out_file);
-}
 
   if (dwarf_split_debug_info && info_section_emitted)
 {


Re: [wwwdocs] gcc-12/changes.html: OpenMP + GCN update

2021-06-23 Thread Andrew Stubbs

On 23/06/2021 10:53, Tobias Burnus wrote:

+  additionally the following features which were available in C and C++
+  before:  depobj, mutexinoutset and


I realise that you did not invent this awkward wording, but I'd prefer ...

"the following features that were previously only available in C and 
C++: "


Andrew


Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-23 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 23, 2021 at 11:41 AM Uros Bizjak  wrote:
>
> On Wed, Jun 23, 2021 at 11:32 AM Hongtao Liu  wrote:
>
> > > > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, 
> > > > > > > allocate
> > > > > > > MASK_REGS first since it has already been disparaged.
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > PR target/101142
> > > > > > > * config/i386/i386.md: (*anddi_1): Disparage slightly the 
> > > > > > > mask
> > > > > > > register alternative.
> > > > > > > (*and_1): Ditto.
> > > > > > > (*andqi_1): Ditto.
> > > > > > > (*andn_1): Ditto.
> > > > > > > (*_1): Ditto.
> > > > > > > (*qi_1): Ditto.
> > > > > > > (*one_cmpl2_1): Ditto.
> > > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > > * config/i386/i386.c (x86_order_regs_for_local_alloc): 
> > > > > > > Change
> > > > > > > the order of mask registers to be before general 
> > > > > > > registers.
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > PR target/101142
> > > > > > > * gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
> > > > > > > * gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
> > > > > > > * gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
> > > > > > > * gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
> > > > > >
> > > > > > OK with a comment addition, see inline.
> > > > > >
> > > > > > Thanks,
> > > > > > Uros.
> > > > > >
> > > > > > > ---
> > > > > > >  gcc/config/i386/i386.c|  8 +-
> > > > > > >  gcc/config/i386/i386.md   | 20 ++---
> > > > > > >  .../gcc.target/i386/spill_to_mask-1.c | 89 
> > > > > > > +--
> > > > > > >  .../gcc.target/i386/spill_to_mask-2.c | 11 ++-
> > > > > > >  .../gcc.target/i386/spill_to_mask-3.c | 11 ++-
> > > > > > >  .../gcc.target/i386/spill_to_mask-4.c | 11 ++-
> > > > > > >  6 files changed, 91 insertions(+), 59 deletions(-)
> > > > > > >
> > > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > > index a61255857ff..a651853ca3b 100644
> > > > > > > --- a/gcc/config/i386/i386.c
> > > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
> > > > > > > int pos = 0;
> > > > > > > int i;
> > > > > > >
> > > > > > > +   /* Mask register.  */
> > > > > > > +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > > > > > > + reg_alloc_order [pos++] = i;
> > > > > >
> > > > > > Please add a comment why mask registers should come first.
> > > > > Thanks for the review, this is the patch i'm check in.
> > > >
> > > > This patch again caused unwanted mask instructions with -m32 in cpuid
> > > > check code, e.g.:
> > > >
> > > > Running target unix/-m32
> > > > FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test
> > > > FAIL: gcc.target/i386/pr96814.c execution test
> > > >
> > > > Debugging pr96814 failure:
> > > >
> > > >   0x0804921d <+66>:mov%edx,%ecx
> > > >   0x0804921f <+68>:cpuid
> > > > => 0x08049221 <+70>:kmovd  %edx,%k0
> > > >   0x08049225 <+74>:mov%eax,-0x8(%ebp)
> > > >   0x08049228 <+77>:mov%ebx,-0xc(%ebp)
> > > >
> > > > It looks to me that putting mask registers in front of GPR is looking
> > > So it's not functionality but performance issue here, under 32-bit
> > > mode there are only 8 gprs which result in higher register pressure,
> > > and for this we do have mask->integer and integer->mask cost, with
> > > -mtune=bdver1 where cost of kmovd is quite high(16, 20 /*
> > > mask->integer and integer->mask moves */), there's no mask
> > > instructions in cpuid.
> > > I guess we can adjust mask->integer and integer->mask for 32-bit mode
> > > to avoid such a situation?
> > I notice the default option is O0, with -O there's no mask instructions.
> > IMHO, We don't need to change cost unless there's -O2 cases where mask
> > instructions regress performance here.
>
> No, this reasoning is not acceptable. The compiled code will SIGILL on
> targets where unsupported mask registers are involved, so GPR should
> always have priority compared to mask registers. Based on these
> findings, x86_order_regs_for_local_alloc change should be reverted,
> and register move costs compensated elsewhere.

Longterm, I think that introducing vector BImode and VxBI vectors
should solve this issue. This would make a clear cut between mask and
GPR operations. I think that generic vector infrastructure supports
BImode vectors, so using e.g. "a & b" instead of builtins should work
well, too.

This approach would also avoid costly moves between register sets.

Uros.


[wwwdocs] gcc-12/changes.html: OpenMP + GCN update

2021-06-23 Thread Tobias Burnus

Hi all,

this patch updates OpenMP for the non-bug-fix commits which have
been done in the last weeks.

It also updates GCN. The change assumes that the just approved
patch is committed ...

Comments, thoughts, wording suggestions? Did I miss some commit/feature?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
gcc-12/changes.html: OpenMP + GCN update

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 07f70b8b..b854c4e6 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -49,10 +49,14 @@ a work-in-progress.
 New Languages and Language specific improvements
 
 
-  OpenMP 5.0 support for Fortran has been extended by the following features
-  which were available in C and C++ before:  depobj
-  and mutexinoutset can now also be used with the
-  depend clause.
+  OpenMP 5.0 support has been extended: The close map modifier
+  and the affinity clause are now supported and for Fortran
+  additionally the following features which were available in C and C++
+  before:  depobj, mutexinoutset and
+   iterator can now also be used with the depend
+  clause, defaultmap has been updated for OpenMP 5.0, and the
+  loop directive and combined directives
+  involving master directive have been added.
   
   The new warning flag -Wopenacc-parallelism was added for
   OpenACC. It warns about potentially suboptimal choices related to
@@ -97,14 +101,17 @@ a work-in-progress.
 
 
 
+AMD Radeon (GCN)
+
+  Debug experience with ROCGDB has been improved.
+
+
 
 
 
 
 
 
-
-
 
 
 


[PATCH] Fix --gdwarf-5 configure tests for Windows

2021-06-23 Thread Eric Botcazou
The issues are that 1) they use readelf instead of objdump and 2) they use
ELF syntax in the assembly code.

Tested on x86-64/Linux and x86[-64]/Windows, OK for mainline and 11 branch?


2021-06-23  Eric Botcazou  

* configure.ac (--gdwarf-5 option): Use objdump instead of readelf.
(working --gdwarf-4/--gdwarf-5 for all sources): Likewise.
(--gdwarf-4 not refusing generated .debug_line): Adjust for Windows.
* configure: Regenerate.

-- 
Eric Botcazoudiff --git a/gcc/configure.ac b/gcc/configure.ac
index 5f30f80833e..70089394429 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5457,13 +5457,23 @@ if test x"$insn" != x; then
  gcc_GAS_CHECK_FEATURE([--gdwarf-5 option],
   gcc_cv_as_gdwarf_5_flag,
   [elf,2,36,0], [--gdwarf-5], [$insn],
-  [if test x$gcc_cv_readelf != x \
-  && $gcc_cv_readelf -wi conftest.o 2>&1 \
+  [if test x$gcc_cv_objdump != x \
+  && $gcc_cv_objdump -Wi conftest.o 2>&1 \
 	 | grep DW_TAG_compile_unit > /dev/null 2>&1; then
  gcc_cv_as_gdwarf_5_flag=yes;
fi],[AC_DEFINE(HAVE_AS_GDWARF_5_DEBUG_FLAG, 1,
 [Define if your assembler supports the --gdwarf-5 option.])])
 
+ case $target_os in
+   win32 | pe | cygwin* | mingw32*)
+ section_flags=\"dr\"
+ function_type=".def foo; .scl 2; .type 32; .endef"
+ function_size="";;
+   *)
+ section_flags=\"\",%progbits
+ function_type=".type foo, %function"
+ function_size=".size foo, .-foo";;
+ esac
  dwarf4_debug_info_size=0x46
  dwarf4_high_pc_form=7
  dwarf4_debug_aranges_size=0x2c
@@ -5475,16 +5485,16 @@ if test x"$insn" != x; then
 .Ltext0:
 	.p2align 4
 	.globl	foo
-	.type	foo, %function
+	$function_type
 foo:
 .LFB0:
 .LM1:
 	$insn
 .LM2:
 .LFE0:
-	.size	foo, .-foo
+	$function_size
 .Letext0:
-	.section	.debug_info,\"\",%progbits
+	.section	.debug_info,$section_flags
 .Ldebug_info0:
 	.4byte	$dwarf4_debug_info_size
 	.2byte	0x4
@@ -5508,7 +5518,7 @@ foo:
 	.byte	0x1
 	.byte	0x9c
 	.byte	0
-	.section	.debug_abbrev,\"\",%progbits
+	.section	.debug_abbrev,$section_flags
 .Ldebug_abbrev0:
 	.byte	0x1
 	.byte	0x11
@@ -5551,7 +5561,7 @@ foo:
 	.byte	0
 	.byte	0
 	.byte	0
-	.section	.debug_aranges,\"\",%progbits
+	.section	.debug_aranges,$section_flags
 	.4byte	$dwarf4_debug_aranges_size
 	.2byte	0x2
 	.4byte	.Ldebug_info0
@@ -5563,7 +5573,7 @@ foo:
 	.${dwarf4_addr_size}byte	.Letext0-.Ltext0
 	.${dwarf4_addr_size}byte	0
 	.${dwarf4_addr_size}byte	0
-	.section	.debug_line,\"\",%progbits
+	.section	.debug_line,$section_flags
 .Ldebug_line0:
 	.4byte	.LELT0-.LSLT0
 .LSLT0:
@@ -5617,7 +5627,7 @@ foo:
 	.byte	0x1
 	.byte	0x1
 .LELT0:
-	.section	.debug_str,\"\",%progbits
+	.section	.debug_str,$section_flags
 	.ident	\"GCC\"
 "
dwarf4_success=no
@@ -5673,10 +5683,10 @@ foo:
conftest_s="\
 	.text
 	.globl	foo
-	.type	foo, %function
+	$function_type
 foo:
 	$insn
-	.size	foo, .-foo
+	$function_size
 	.file	1 \"foo.c\"
 "
gcc_GAS_CHECK_FEATURE([working --gdwarf-4/--gdwarf-5 for all sources],
@@ -5684,8 +5694,8 @@ foo:
  [--gdwarf-4],
  [$conftest_s],
  [changequote(,)dnl
-  if test x$gcc_cv_readelf != x \
-	 && $gcc_cv_readelf -w conftest.o 2>&1 \
+  if test x$gcc_cv_objdump != x \
+	 && $gcc_cv_objdump -W conftest.o 2>&1 \
 		| grep conftest.s > /dev/null 2>&1; then
 	gcc_cv_as_working_gdwarf_n_flag=no
   else


Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-23 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 23, 2021 at 11:32 AM Hongtao Liu  wrote:

> > > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, 
> > > > > > allocate
> > > > > > MASK_REGS first since it has already been disparaged.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > PR target/101142
> > > > > > * config/i386/i386.md: (*anddi_1): Disparage slightly the 
> > > > > > mask
> > > > > > register alternative.
> > > > > > (*and_1): Ditto.
> > > > > > (*andqi_1): Ditto.
> > > > > > (*andn_1): Ditto.
> > > > > > (*_1): Ditto.
> > > > > > (*qi_1): Ditto.
> > > > > > (*one_cmpl2_1): Ditto.
> > > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > > (*one_cmplqi2_1): Ditto.
> > > > > > * config/i386/i386.c (x86_order_regs_for_local_alloc): 
> > > > > > Change
> > > > > > the order of mask registers to be before general registers.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > PR target/101142
> > > > > > * gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
> > > > > > * gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
> > > > > > * gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
> > > > > > * gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
> > > > >
> > > > > OK with a comment addition, see inline.
> > > > >
> > > > > Thanks,
> > > > > Uros.
> > > > >
> > > > > > ---
> > > > > >  gcc/config/i386/i386.c|  8 +-
> > > > > >  gcc/config/i386/i386.md   | 20 ++---
> > > > > >  .../gcc.target/i386/spill_to_mask-1.c | 89 
> > > > > > +--
> > > > > >  .../gcc.target/i386/spill_to_mask-2.c | 11 ++-
> > > > > >  .../gcc.target/i386/spill_to_mask-3.c | 11 ++-
> > > > > >  .../gcc.target/i386/spill_to_mask-4.c | 11 ++-
> > > > > >  6 files changed, 91 insertions(+), 59 deletions(-)
> > > > > >
> > > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > > index a61255857ff..a651853ca3b 100644
> > > > > > --- a/gcc/config/i386/i386.c
> > > > > > +++ b/gcc/config/i386/i386.c
> > > > > > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
> > > > > > int pos = 0;
> > > > > > int i;
> > > > > >
> > > > > > +   /* Mask register.  */
> > > > > > +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > > > > > + reg_alloc_order [pos++] = i;
> > > > >
> > > > > Please add a comment why mask registers should come first.
> > > > Thanks for the review, this is the patch i'm check in.
> > >
> > > This patch again caused unwanted mask instructions with -m32 in cpuid
> > > check code, e.g.:
> > >
> > > Running target unix/-m32
> > > FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test
> > > FAIL: gcc.target/i386/pr96814.c execution test
> > >
> > > Debugging pr96814 failure:
> > >
> > >   0x0804921d <+66>:mov%edx,%ecx
> > >   0x0804921f <+68>:cpuid
> > > => 0x08049221 <+70>:kmovd  %edx,%k0
> > >   0x08049225 <+74>:mov%eax,-0x8(%ebp)
> > >   0x08049228 <+77>:mov%ebx,-0xc(%ebp)
> > >
> > > It looks to me that putting mask registers in front of GPR is looking
> > So it's not functionality but performance issue here, under 32-bit
> > mode there are only 8 gprs which result in higher register pressure,
> > and for this we do have mask->integer and integer->mask cost, with
> > -mtune=bdver1 where cost of kmovd is quite high(16, 20 /*
> > mask->integer and integer->mask moves */), there's no mask
> > instructions in cpuid.
> > I guess we can adjust mask->integer and integer->mask for 32-bit mode
> > to avoid such a situation?
> I notice the default option is O0, with -O there's no mask instructions.
> IMHO, We don't need to change cost unless there's -O2 cases where mask
> instructions regress performance here.

No, this reasoning is not acceptable. The compiled code will SIGILL on
targets where unsupported mask registers are involved, so GPR should
always have priority compared to mask registers. Based on these
findings, x86_order_regs_for_local_alloc change should be reverted,
and register move costs compensated elsewhere.

Uros.


Re: [PATCH 3/3] [amdgcn] Add hook for DWARF address spaces.

2021-06-23 Thread Andrew Stubbs

On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:

Map GCN address spaces to the proposed DWARF address spaces defined by AMD at
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-dwarf-address-class-mapping-table

gcc/

* config/gcn/gcn.c: Include dwarf2.h.
(gcn_addr_space_debug): New function.
(TARGET_ADDR_SPACE_DEBUG): New hook.
---
  gcc/config/gcn/gcn.c | 29 +
  1 file changed, 29 insertions(+)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 0eac3aa3844..25996dc83de 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -50,6 +50,7 @@
  #include "varasm.h"
  #include "intl.h"
  #include "rtl-iter.h"
+#include "dwarf2.h"
  
  /* This file should be included last.  */

  #include "target-def.h"
@@ -1497,6 +1498,32 @@ gcn_addr_space_convert (rtx op, tree from_type, tree 
to_type)
  gcc_unreachable ();
  }
  
+/* Implement TARGET_ADDR_SPACE_DEBUG.

+
+   Return the dwarf address space class for each hardware address space.  */
+
+static int
+gcn_addr_space_debug (addr_space_t as)
+{
+  switch (as)
+{
+  case ADDR_SPACE_DEFAULT:
+  case ADDR_SPACE_FLAT:
+  case ADDR_SPACE_SCALAR_FLAT:
+  case ADDR_SPACE_FLAT_SCRATCH:
+   return DW_ADDR_none;
+  case ADDR_SPACE_GLOBAL:
+   return 1;  // DW_ADDR_LLVM_global
+  case ADDR_SPACE_LDS:
+   return 3;  // DW_ADDR_LLVM_group
+  case ADDR_SPACE_SCRATCH:
+   return 4;  // DW_ADDR_LLVM_private
+  case ADDR_SPACE_GDS:
+   return 0x8000; // DW_ADDR_AMDGPU_region
+}
+  gcc_unreachable ();
+}
+
  
  /* Implement REGNO_MODE_CODE_OK_FOR_BASE_P via gcn.h
 
@@ -6354,6 +6381,8 @@ gcn_dwarf_register_span (rtx rtl)
  
  #undef  TARGET_ADDR_SPACE_ADDRESS_MODE

  #define TARGET_ADDR_SPACE_ADDRESS_MODE gcn_addr_space_address_mode
+#undef  TARGET_ADDR_SPACE_DEBUG
+#define TARGET_ADDR_SPACE_DEBUG gcn_addr_space_debug
  #undef  TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P
  #define TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P \
gcn_addr_space_legitimate_address_p



OK.

Andrew


Re: [PATCH 2/3] [amdgcn] Use frame pointer for CFA expressions.

2021-06-23 Thread Andrew Stubbs

On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:

As size of address is bigger than registers in amdgcn, we are forced to use
DW_CFA_def_cfa_expression to make an expression that concatenates multiple
registers for the value of the CFA.  This then prohibits us from using many
of the dwarf ops which expect CFA rule to be a single regsiter plus an offset.
Using frame pointer in the CFA rule is only real possibility as it is saved
in every frame and it is easy to unwind its value.

So unless user gives fomit-frame-pointer, we use frame pointer for the
cfi information.  This options also has a different default now.

gcc/

* common/config/gcn/gcn-common.c
(gcn_option_optimization_table): Change OPT_fomit_frame_pointer to -O3.
(gcn_expand_prologue): Prefer the frame pointer when emitting CFI.
(gcn_frame_pointer_rqd): New function.
(TARGET_FRAME_POINTER_REQUIRED): New hook.
---
  gcc/common/config/gcn/gcn-common.c |  2 +-
  gcc/config/gcn/gcn.c   | 60 +++---
  2 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/gcc/common/config/gcn/gcn-common.c 
b/gcc/common/config/gcn/gcn-common.c
index 305c310f940..695eb467e34 100644
--- a/gcc/common/config/gcn/gcn-common.c
+++ b/gcc/common/config/gcn/gcn-common.c
@@ -27,7 +27,7 @@
  /* Set default optimization options.  */
  static const struct default_options gcn_option_optimization_table[] =
{
-{ OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
  { OPT_LEVELS_NONE, 0, NULL, 0 }
};
  
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c

index 3ab16548aad..0eac3aa3844 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2900,10 +2900,14 @@ gcn_expand_prologue ()
  rtx adjustment = gen_int_mode (sp_adjust, SImode);
  rtx insn = emit_insn (gen_addsi3_scalar_carry (sp_lo, sp_lo,
 adjustment, scc));
- RTX_FRAME_RELATED_P (insn) = 1;
- add_reg_note (insn, REG_FRAME_RELATED_EXPR,
-   gen_rtx_SET (sp,
-gen_rtx_PLUS (DImode, sp, adjustment)));
+ if (!offsets->need_frame_pointer)
+   {
+ RTX_FRAME_RELATED_P (insn) = 1;
+ add_reg_note (insn, REG_FRAME_RELATED_EXPR,
+   gen_rtx_SET (sp,
+gen_rtx_PLUS (DImode, sp,
+  adjustment)));
+   }
  emit_insn (gen_addcsi3_scalar_zero (sp_hi, sp_hi, scc));
}
  
@@ -2917,25 +2921,24 @@ gcn_expand_prologue ()

  rtx adjustment = gen_int_mode (fp_adjust, SImode);
  rtx insn = emit_insn (gen_addsi3_scalar_carry(fp_lo, sp_lo,
adjustment, scc));
- RTX_FRAME_RELATED_P (insn) = 1;
- add_reg_note (insn, REG_FRAME_RELATED_EXPR,
-   gen_rtx_SET (fp,
-gen_rtx_PLUS (DImode, sp, adjustment)));
  emit_insn (gen_addcsi3_scalar (fp_hi, sp_hi,
 (fp_adjust < 0 ? GEN_INT (-1)
  : const0_rtx),
 scc, scc));
+
+ /* Set the CFA to the entry stack address, as an offset from the
+frame pointer.  This is preferred because the frame pointer is
+saved in each frame, whereas the stack pointer is not.  */
+ RTX_FRAME_RELATED_P (insn) = 1;
+ add_reg_note (insn, REG_CFA_DEF_CFA,
+   gen_rtx_PLUS (DImode, fp,
+ GEN_INT (-(offsets->pretend_size
++ offsets->callee_saves;
}
  
rtx_insn *seq = get_insns ();

end_sequence ();
  
-  /* FIXME: Prologue insns should have this flag set for debug output, etc.

-but it causes issues for now.
-  for (insn = seq; insn; insn = NEXT_INSN (insn))
-if (INSN_P (insn))
- RTX_FRAME_RELATED_P (insn) = 1;*/
-
emit_insn (seq);
  }
else
@@ -3011,6 +3014,16 @@ gcn_expand_prologue ()
gen_rtx_SET (sp, gen_rtx_PLUS (DImode, sp,
   dbg_adjustment)));
  
+  if (offsets->need_frame_pointer)

+   {
+ /* Set the CFA to the entry stack address, as an offset from the
+frame pointer.  This is necessary when alloca is used, and
+harmless otherwise.  */
+ rtx neg_adjust = gen_int_mode (-offsets->callee_saves, DImode);
+ add_reg_note (insn, REG_CFA_DEF_CFA,
+   gen_rtx_PLUS (DImode, fp, neg_adjust));
+   }
+
/* Make sure the flat scratch reg doesn't get optimised away.  */
emit_insn (gen_prologue_use (gen_rtx_REG 

Re: [PATCH 1/3] [amdgcn] Update CFI configuration

2021-06-23 Thread Andrew Stubbs

On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:

Currently we don't get any call frame information for the amdgcn target.
This patch makes necessary adjustments to generate CFI that can work with
ROCGDB (ROCm 3.8+).

gcc/

* config/gcn/gcn.c (move_callee_saved_registers): Emit CFI notes for
prologue register saves.
(gcn_debug_unwind_info): Use UI_DWARF2.
(gcn_dwarf_register_number): Map DWARF_LINK_REGISTER to DWARF PC.
(gcn_dwarf_register_span): DWARF_LINK_REGISTER doesn't span.
* config/gcn/gcn.h: (DWARF_FRAME_RETURN_COLUMN): New define.
(DWARF_LINK_REGISTER): New define.
(FIRST_PSEUDO_REGISTER): Increment.
(FIXED_REGISTERS): Add entry for DWARF_LINK_REGISTER.
(CALL_USED_REGISTERS): Likewise.
(REGISTER_NAMES): Likewise.
---
  gcc/config/gcn/gcn.c | 82 
  gcc/config/gcn/gcn.h | 10 +++---
  2 files changed, 81 insertions(+), 11 deletions(-)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 283a91fe50a..3ab16548aad 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2649,6 +2649,7 @@ move_callee_saved_registers (rtx sp, machine_function 
*offsets,
rtx as = gen_rtx_CONST_INT (VOIDmode, STACK_ADDR_SPACE);
HOST_WIDE_INT exec_set = 0;
int offreg_set = 0;
+  auto_vec saved_sgprs;
  
start_sequence ();
  
@@ -2665,7 +2666,10 @@ move_callee_saved_registers (rtx sp, machine_function *offsets,

int lane = saved_scalars % 64;
  
  	if (prologue)

- emit_insn (gen_vec_setv64si (vreg, reg, GEN_INT (lane)));
+ {
+   emit_insn (gen_vec_setv64si (vreg, reg, GEN_INT (lane)));
+   saved_sgprs.safe_push (regno);
+ }
else
  emit_insn (gen_vec_extractv64sisi (reg, vreg, GEN_INT (lane)));
  
@@ -2698,7 +2702,7 @@ move_callee_saved_registers (rtx sp, machine_function *offsets,

  gcn_gen_undef (V64SImode), exec));
  
/* Move vectors.  */

-  for (regno = FIRST_VGPR_REG, offset = offsets->pretend_size;
+  for (regno = FIRST_VGPR_REG, offset = 0;
 regno < FIRST_PSEUDO_REGISTER; regno++)
  if ((df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
|| (regno == VGPR_REGNO (6) && saved_scalars > 0)
@@ -2719,8 +2723,67 @@ move_callee_saved_registers (rtx sp, machine_function 
*offsets,
  }
  
  	if (prologue)

- emit_insn (gen_scatterv64si_insn_1offset_exec (vsp, const0_rtx, reg,
-as, const0_rtx, exec));
+ {
+   rtx insn = emit_insn (gen_scatterv64si_insn_1offset_exec
+ (vsp, const0_rtx, reg, as, const0_rtx,
+  exec));
+
+   /* Add CFI metadata.  */
+   rtx note;
+   if (regno == VGPR_REGNO (6) || regno == VGPR_REGNO (7))
+ {
+   int start = (regno == VGPR_REGNO (7) ? 64 : 0);
+   int count = MIN (saved_scalars - start, 64);
+   int add_lr = (regno == VGPR_REGNO (6)
+ && df_regs_ever_live_p (LINK_REGNUM));
+   int lrdest = -1;
+   rtvec seq = rtvec_alloc (count + add_lr);
+
+   /* Add an REG_FRAME_RELATED_EXPR entry for each scalar
+  register that was saved in this batch.  */
+   for (int idx = 0; idx < count; idx++)
+ {
+   int stackaddr = offset + idx * 4;
+   rtx dest = gen_rtx_MEM (SImode,
+   gen_rtx_PLUS
+   (DImode, sp,
+GEN_INT (stackaddr)));
+   rtx src = gen_rtx_REG (SImode, saved_sgprs[start + idx]);
+   rtx set = gen_rtx_SET (dest, src);
+   RTX_FRAME_RELATED_P (set) = 1;
+   RTVEC_ELT (seq, idx) = set;
+
+   if (saved_sgprs[start + idx] == LINK_REGNUM)
+ lrdest = stackaddr;
+ }
+
+   /* Add an additional expression for DWARF_LINK_REGISTER if
+  LINK_REGNUM was saved.  */
+   if (lrdest != -1)
+ {
+   rtx dest = gen_rtx_MEM (DImode,
+   gen_rtx_PLUS
+   (DImode, sp,
+GEN_INT (lrdest)));
+   rtx src = gen_rtx_REG (DImode, DWARF_LINK_REGISTER);
+   rtx set = gen_rtx_SET (dest, src);
+   RTX_FRAME_RELATED_P (set) = 1;
+   RTVEC_ELT (seq, count) = set;
+ }
+
+   note = gen_rtx_SEQUENCE (VOIDmode, seq);
+ }
+   else
+ {
+   rtx dest = gen_rtx_MEM (V64SImode,
+   

Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-23 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 23, 2021 at 4:50 PM Hongtao Liu  wrote:
>
> On Wed, Jun 23, 2021 at 3:59 PM Uros Bizjak  wrote:
> >
> > On Mon, Jun 21, 2021 at 10:08 AM Hongtao Liu  wrote:
> > >
> > > On Mon, Jun 21, 2021 at 3:28 PM Uros Bizjak via Gcc-patches
> > >  wrote:
> > > >
> > > > On Mon, Jun 21, 2021 at 6:56 AM liuhongt  wrote:
> > > > >
> > > > > The avx512 supports bitwise operations with mask registers, but the
> > > > > throughput of those instructions is much lower than that of the
> > > > > corresponding gpr version, so we would additionally disparages
> > > > > slightly the mask register alternative for bitwise operations in the
> > > > > LRA.
> > > >
> > > > This was the reason for UNSPEC tagged instructions with mask
> > > > registers, used mainly for builtins.
> > > >
> > > > Also, care should be taken if we want mask registers to be used under
> > > > GPR pressure, or it is better to spill GPR registers. In the past,
> > > > DImode values caused a lot of problems with MMX registers on x86-64,
> > > > but we were able to hand-tune the allocator in the way you propose.
> > > >
> > > > Let's try the proposed approach to see what happens.
> > > >
> > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, allocate
> > > > > MASK_REGS first since it has already been disparaged.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/101142
> > > > > * config/i386/i386.md: (*anddi_1): Disparage slightly the mask
> > > > > register alternative.
> > > > > (*and_1): Ditto.
> > > > > (*andqi_1): Ditto.
> > > > > (*andn_1): Ditto.
> > > > > (*_1): Ditto.
> > > > > (*qi_1): Ditto.
> > > > > (*one_cmpl2_1): Ditto.
> > > > > (*one_cmplsi2_1_zext): Ditto.
> > > > > (*one_cmplqi2_1): Ditto.
> > > > > * config/i386/i386.c (x86_order_regs_for_local_alloc): Change
> > > > > the order of mask registers to be before general registers.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR target/101142
> > > > > * gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
> > > > > * gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
> > > > > * gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
> > > > > * gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
> > > >
> > > > OK with a comment addition, see inline.
> > > >
> > > > Thanks,
> > > > Uros.
> > > >
> > > > > ---
> > > > >  gcc/config/i386/i386.c|  8 +-
> > > > >  gcc/config/i386/i386.md   | 20 ++---
> > > > >  .../gcc.target/i386/spill_to_mask-1.c | 89 
> > > > > +--
> > > > >  .../gcc.target/i386/spill_to_mask-2.c | 11 ++-
> > > > >  .../gcc.target/i386/spill_to_mask-3.c | 11 ++-
> > > > >  .../gcc.target/i386/spill_to_mask-4.c | 11 ++-
> > > > >  6 files changed, 91 insertions(+), 59 deletions(-)
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > > index a61255857ff..a651853ca3b 100644
> > > > > --- a/gcc/config/i386/i386.c
> > > > > +++ b/gcc/config/i386/i386.c
> > > > > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
> > > > > int pos = 0;
> > > > > int i;
> > > > >
> > > > > +   /* Mask register.  */
> > > > > +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > > > > + reg_alloc_order [pos++] = i;
> > > >
> > > > Please add a comment why mask registers should come first.
> > > Thanks for the review, this is the patch i'm check in.
> >
> > This patch again caused unwanted mask instructions with -m32 in cpuid
> > check code, e.g.:
> >
> > Running target unix/-m32
> > FAIL: gcc.target/i386/avx512bw-pr70329-1.c execution test
> > FAIL: gcc.target/i386/pr96814.c execution test
> >
> > Debugging pr96814 failure:
> >
> >   0x0804921d <+66>:mov%edx,%ecx
> >   0x0804921f <+68>:cpuid
> > => 0x08049221 <+70>:kmovd  %edx,%k0
> >   0x08049225 <+74>:mov%eax,-0x8(%ebp)
> >   0x08049228 <+77>:mov%ebx,-0xc(%ebp)
> >
> > It looks to me that putting mask registers in front of GPR is looking
> So it's not functionality but performance issue here, under 32-bit
> mode there are only 8 gprs which result in higher register pressure,
> and for this we do have mask->integer and integer->mask cost, with
> -mtune=bdver1 where cost of kmovd is quite high(16, 20 /*
> mask->integer and integer->mask moves */), there's no mask
> instructions in cpuid.
> I guess we can adjust mask->integer and integer->mask for 32-bit mode
> to avoid such a situation?
I notice the default option is O0, with -O there's no mask instructions.
IMHO, We don't need to change cost unless there's -O2 cases where mask
instructions regress performance here.
> > for trouble, since apparently disparaging mask registers does not
> > offset register preference of the new order.
> Hmm, w/o order preference,it will fail bitwise_mask_op-3.c.(costs of
> gpr and mask 

[gitignore] Add cscope.out to .gitignore

2021-06-23 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
This patch adds an entry for cscope.out in .gitignore.
OK to commit ?

Thanks,
Prathamesh


ignore-cscope.diff
Description: Binary data


  1   2   >