Re: [PATCH] C/C++: add fix-it hints for missing '&' and '*' (PR c++/87850)

2018-11-09 Thread Eric Gallager
On 11/9/18, David Malcolm  wrote:
> This patch adds a fix-it hint to various pointer-vs-non-pointer
> diagnostics, suggesting the addition of a leading '&' or '*'.
>
> For example, note the ampersand fix-it hint in the following:
>
> demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka 'unsigned
> int'}
>to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
> 5 |   pthread_key_create(key, NULL);
>   |  ^~~
>   |  |
>   |  pthread_key_t {aka unsigned int}
>   |  &

Having both the type and the fixit underneath the caret looks kind of confusing

> In file included from demo.c:1:
> /usr/include/pthread.h:1122:47: note:   initializing argument 1 of 'int
>pthread_key_create(pthread_key_t*, void (*)(void*))'
>  1122 | extern int pthread_key_create (pthread_key_t *__key,
>   |~~~^
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?
>
> gcc/c-family/ChangeLog:
>   PR c++/87850
>   * c-common.c (maybe_add_indirection_token): New function.
>   * c-common.h (maybe_add_indirection_token): New decl.
>
> gcc/c/ChangeLog:
>   PR c++/87850
>   * c-typeck.c (convert_for_assignment): Call
>   maybe_add_indirection_token for pointer vs non-pointer
>   diagnostics.
>
> gcc/cp/ChangeLog:
>   PR c++/87850
>   * call.c (convert_like_real): Call maybe_add_indirection_token
>   for "invalid conversion" diagnostic.
>
> gcc/testsuite/ChangeLog:
>   PR c++/87850
>   * c-c++-common/indirection-fixits.c: New test.
> ---
>  gcc/c-family/c-common.c |  25 +++
>  gcc/c-family/c-common.h |   2 +
>  gcc/c/c-typeck.c|   2 +
>  gcc/cp/call.c   |   1 +
>  gcc/testsuite/c-c++-common/indirection-fixits.c | 250
> 
>  5 files changed, 280 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/indirection-fixits.c
>
> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 534d928..3756e6d 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -8382,6 +8382,31 @@ maybe_suggest_missing_token_insertion (rich_location
> *richloc,
>  }
>  }
>
> +/* Potentially add a fix-it hint for a missing '&' or '*' to RICHLOC,
> +   depending on EXPR and EXPECTED_TYPE.  */
> +
> +void
> +maybe_add_indirection_token (rich_location *richloc,
> +  tree expr, tree expected_type)
> +{
> +  gcc_assert (richloc);
> +  gcc_assert (expr);
> +  gcc_assert (expected_type);
> +
> +  tree actual_type = TREE_TYPE (expr);
> +
> +  /* Fix-it hint for missing '&'.  */
> +  if (TREE_CODE (expected_type) == POINTER_TYPE
> +  && actual_type == TREE_TYPE (expected_type)
> +  && lvalue_p (expr))
> +richloc->add_fixit_insert_before ("&");
> +
> +  /* Fix-it hint for missing '*'.  */
> +  if (TREE_CODE (actual_type) == POINTER_TYPE
> +  && TREE_TYPE (actual_type) == expected_type)
> +richloc->add_fixit_insert_before ("*");
> +}
> +
>  #if CHECKING_P
>
>  namespace selftest {
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index a218432..e362137 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -1340,6 +1340,8 @@ extern void maybe_add_include_fixit (rich_location *,
> const char *, bool);
>  extern void maybe_suggest_missing_token_insertion (rich_location *richloc,
>  enum cpp_ttype token_type,
>  location_t prev_token_loc);
> +extern void maybe_add_indirection_token (rich_location *richloc,
> +  tree expr, tree expected_type);
>  extern tree braced_list_to_string (tree, tree);
>
>  #if CHECKING_P
> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index 144977e..f407025 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -7058,6 +7058,7 @@ convert_for_assignment (location_t location,
> location_t expr_loc, tree type,
> auto_diagnostic_group d;
> range_label_for_type_mismatch rhs_label (rhstype, type);
> gcc_rich_location richloc (expr_loc, _label);
> +   maybe_add_indirection_token (, rhs, type);
> if (pedwarn (, OPT_Wint_conversion,
>  "passing argument %d of %qE makes pointer from "
>  "integer without a cast", parmnum, rname))
> @@ -7094,6 +7095,7 @@ convert_for_assignment (location_t location,
> location_t expr_loc, tree type,
>   auto_diagnostic_group d;
>   range_label_for_type_mismatch rhs_label (rhstype, type);
>   gcc_rich_location richloc (expr_loc, _label);
> + maybe_add_indirection_token (, rhs, type);
>   if (pedwarn (, OPT_Wint_conversion,
>"passing 

Re: [PATCH] x86: Optimize VFIXUPIMM* patterns with multiple-alternative constraints

2018-11-09 Thread Terry Guo
On Fri, Nov 9, 2018 at 6:05 PM Uros Bizjak  wrote:
>
> On Fri, Nov 9, 2018 at 10:54 AM Wei Xiao  wrote:
> >
> > Hi Uros
> >
> > Thanks for the remarks!
> > I improve the patch as attached to address the issues you mentioned:
> > 1. No changes to substs any more.
> > 2. Adopt established approach (e.g "rcp14") 
> > to
> > handle zero masks.
> >
> > I'd like to explain our motivation of combining vfixupimm patterns: there 
> > will
> > be a lot of new x86 instructions with both masking and rounding like 
> > vfixupimm
> > in the future but we still want to keep x86 MD as short as possible and 
> > don't
> > want to write 2 patterns for each of these new instructions, which will also
> > raise code review cost for maintainer. We want to make sure the new pattern
> > paradigm is ok for x86 maintainer through this patch.
>
> Yes, the patch looks much nicer now.
>
> +2018-11-09 Wei Xiao 
> + *config/i386/sse.md: Combine VFIXUPIMM* patterns
> + (_fixupimm_maskz): Update.
> + (_fixupimm): Update.
> + (_fixupimm_mask): Remove.
> + (avx512f_sfixupimm_maskz): Update.
> + (avx512f_sfixupimm): Update.
> + (avx512f_sfixupimm_mask): Remove.
>
> (In future,  please add ChangeLog entry to the text of the mail).
>
> OK for mainline.
>
> Thanks,
> Uros.

Thanks for the review. In future we will pay more attentions to follow
the convention.

BR,
Terry


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-11-09 Thread Segher Boessenkool
On Fri, Nov 09, 2018 at 01:03:55PM -0700, Jeff Law wrote:
> >> And signed zeroes.  Yeah.  I think it would have to be
> >> flag_unsafe_math_optimizations + some more.
> > 
> > Indeed.
> So we need to give Giuliano some clear guidance on guarding.  This is
> out of my area of expertise, so looking to y'all to help here.

IMO, it needs flag_unsafe_optimizations, as above; and it needs to be
investigated which (if any) options like flag_signed_zeros it needs in
addition to that.  It needs an option like that whenever the new expression
can give a zero with a different sign than the original expression, etc.
Although it could be said that flag_unsafe_optimizations supercedes all
of that.  It isn't clear.


Segher


Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-09 Thread Segher Boessenkool
On Fri, Nov 09, 2018 at 02:12:22PM -0700, Jeff Law wrote:
> On 11/8/18 1:34 PM, Segher Boessenkool wrote:
> > On Thu, Nov 08, 2018 at 03:44:44PM +, Sam Tebbs wrote:
> >> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
> >> looking at the issue as well and don't want to do any overlapping work.
> > 
> > I don't know.  Well, there are no incorrect code issues I know of at all
> > now; but you mean that it is taking an instruction more than you would
> > like to see, I suppose?
> Which is ultimately similar to the 3 regressions HJ has reported on x86_64.

I just wish those bug reports would mature from "oh hey this patch changed
some stuff" to "oh yeah this made some existing problems (in the backend,
or common code, hey let's blame RA because it's most obvious here, why not!)
more obvious".

All those PRs show situations where we could generate better code.  None
of those PRs show situations where the combine patches were at fault.

Blaming others is not going to improve your code one bit.


(And yes, i'll commit one further improvement patch _right now_).


Segher


Re: [PATCH 1/3] Add PTWRITE builtins for x86

2018-11-09 Thread Segher Boessenkool
On Thu, Nov 08, 2018 at 06:30:21PM +0100, Uros Bizjak wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/ptwrite2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mptwrite " } */
> +/* { dg-final { scan-assembler "ptwrite.*r" } } */
> +/* { dg-final { scan-assembler "ptwrite.*e" } } */
> 
> Better use \[^\n\r\] instead of .* to avoid unwanted multi-line matches.

Or better, write it as

/* { dg-final { scan-assembler {(?n)ptwrite.*r} } } */


Segher


Re: [PATCH] minor FDO profile related fixes

2018-11-09 Thread Indu Bhagat

Changelog entry attached. Sorry about that.

Comments inline.

Thanks

On 11/09/2018 04:23 PM, Jeff Law wrote:

On 11/7/18 5:49 PM, Indu Bhagat wrote:

diff --git a/gcc/coverage.c b/gcc/coverage.c
index 599a3bb..7595e6c 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -358,7 +358,7 @@ get_coverage_counts (unsigned counter, unsigned 
cfg_checksum,
if (warning_printed && dump_enabled_p ())
{
  dump_user_location_t loc
-   = dump_user_location_t::from_location_t (input_location);
+   = dump_user_location_t::from_function_decl (current_function_decl);
dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
 "use -Wno-error=coverage-mismatch to tolerate "
 "the mismatch but performance may drop if the "

Patches should include a ChangeLog entry.  Because your patch didn't
include one, it's unclear if this hunk is something you really intended
to submit or not.  What's the point behind going from using
input_location to the function decl?


This is intended to be a part of the patch. The patch intends to make opt-info 
and dumps related to ipa-profile more precise.
Using from_function_decl gives the precise location of the function. 
from_location_t gives the same incorrect location for each function
with the coverage-mismatch issue in a compilation unit (location appears to be 
the end of file)
 


diff --git a/gcc/profile.c b/gcc/profile.c
index 2130319..57e3f3c 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -698,6 +698,11 @@ compute_branch_probabilities (unsigned cfg_checksum, 
unsigned lineno_checksum)
}
  }
  
+  if (exec_counts)

+{
+  profile_status_for_fn (cfun) = PROFILE_READ;
+}
+

Extraneous curly braces.  In general if you have a single statement,
don't bother with curly braces.


I will correct this.


@@ -705,7 +710,7 @@ compute_branch_probabilities (unsigned cfg_checksum, 
unsigned lineno_checksum)
bb->count = profile_count::from_gcov_type (bb_gcov_count (bb));
/* If function was not trained, preserve local estimates including 
statically
   determined zero counts.  */
-  else
+  else if (profile_status_for_fn (cfun) == PROFILE_READ)
  FOR_ALL_BB_FN (bb, cfun)
if (!(bb->count == profile_count::zero ()))
  bb->count = bb->count.global0 ();

So I'm guessing the point here is to only set the count to global0 when
we read in profile data for the function and the profile data didn't
have any hits for the function.  Right?


Yes, correct.
(Longer answer : Without this more selective conditional, count is set 
to global0 even for functions which were added

 between profile-generate and profile-use.)


@@ -1317,12 +1327,12 @@ branch_prob (void)
values.release ();
free_edge_list (el);
coverage_end_function (lineno_checksum, cfg_checksum);
-  if (flag_branch_probabilities && profile_info)
+  if (flag_branch_probabilities
+  && (profile_status_for_fn (cfun) == PROFILE_READ))

So what's the point of this change?  Under what circumstances do the two
conditionals do something different.  I don't know this code well, but
ISTM that if profile_info is nonzero, then the profile status is going
to be PROFILE_READ.  Is that not the case?


On Trunk, there is a direct relationship between the three :

1. (.gcda file) is present  --implies--> 2. profile_info set 
--implies--> 3. profile_status set to PROFILE_READ


But for functions added between profile-generate and profile-use, 3. 
should not be done based on 2.
For such functions, profile_info may be set (because there is a .gcda 
file), but profile_status should not be set

to PROFILE_READ, because such functions have no feedback data.


Jeff


gcc/ChangeLog:

2018-11-07  "Indu Bhagat"  <"indu.bha...@oracle.com">

* coverage.c (get_coverage_counts): Use from_function_decl for precise
function location.
* profile-count.c (profile_count::dump): Add handling for precise
profile quality.
* profile.c (compute_branch_probabilities): Rely on exec_counts instead
of profile_info to set x_profile_status of function.
(branch_prob): Do not set x_profile_status of function based on
profile_info. Done above based on exec_counts.



Re: [PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-09 Thread Andrew Pinski
On Fri, Nov 9, 2018 at 9:23 AM Sudakshina Das  wrote:
>
> Hi
>
> I am posting this patch on behalf of Carey (cc'ed). I also have some
> review comments that I will make as a reply to this later.
>
>
> This implements a new AArch64 specific back-end pass that helps optimize
> branch-dense code, which can be a bottleneck for performance on some Arm
> cores. This is achieved by padding out the branch-dense sections of the
> instruction stream with nops.
>
> This has proven to show up to a 2.61%~ improvement on the Cortex A-72
> (SPEC CPU 2006: sjeng).
>
> The implementation includes the addition of a new RTX instruction class
> FILLER_INSN, which has been white listed to allow placement of NOPs
> outside of a basic block. This is to allow padding after unconditional
> branches. This is favorable so that any performance gained from
> diluting branches is not paid straight back via excessive eating of nops.
>
> It was deemed that a new RTX class was less invasive than modifying
> behavior in regards to standard UNSPEC nops.

Maybe you should split this up into two patches, one of the
FILLER_INSN part and one for the aarch64 parts for easier review for
the maintainers.

- extra_objs="aarch64-builtins.o aarch-common.o
cortex-a57-fma-steering.o aarch64-speculation.o
falkor-tag-collision-avoidance.o"
+ extra_objs="aarch64-builtins.o aarch-common.o
cortex-a57-fma-steering.o aarch64-speculation.o
falkor-tag-collision-avoidance.o aarch64-branch-dilution.o"
Also this seems like the line is already long but that does not mean
it should get even longer; maybe split it up like:
extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
extra_objs="${extra_objs} aarch64-speculation.o
falkor-tag-collision-avoidance.o aarch64-branch-dilution.o"

I think you should be using some anonymous namespaces for insn_info,
insn_granule and maybe a few more things too.  Since those are all
local to the pass only (someone could make the same mistake as you and
use those names and you run into ODR violations then).

You might want to use the new dump_printf_loc functions instead of
fprintf where you are printing into dump_file.

+unsigned MAX_BRANCH = 0;
+unsigned GRANULE_SIZE = 0;
Most likely these should not be captilized at all and definitely
either static or in anonymous namespaces.

+inline bool
+is_branch (rtx_insn *insn)

Likewise about static or in an anonymouse namespace.
Maybe it could be rewritten to be easier to understand:
{
  if (insn == NULL)
  return false;
return JUMP_P (insn) || CALL_P (insn) || ANY_RETURN_P (insn);
}

That is swap the order there.

+  /* Pointers to the first/last instructions in granule.  */
+  insn_info *m_first = NULL;
+  insn_info *m_last = NULL;

Even though C++14 is the default compiler in newer versions, we
support C++98 to compile GCC with.  So please fix that so we don't
depend on C++11 features.  Simple way is to test native bootstrapping
with GCC 4.8 on CentOS 7.

+  if (insn->is_unconditional)
+{
+  value += 2;
+}

I noticed in some places you use {} around one statement and others
not.  The style should without except when nested and it confusing
which else belongs to it.


+void
+insn_granule::update_indexes ()
This is missing a comment before the function.  Also this seems more
like fixup indexes rather than update.


+  if (MAX_BRANCH < 1)
+error ("branch dilution: max branch must be greater than zero");
+  else if (MAX_BRANCH >= GRANULE_SIZE)
+{
+  error ("branch dilution: max branches (%d) must be \
+   less than granule size (%d)", MAX_BRANCH, GRANULE_SIZE);
+}

You should almost never use error in backend (there are a few
exceptions to that rule).  Either use sorry or internal_error.  Also
the way you wrapped the last error message is incorrect as the tabs
will show up in the error message.

Thanks,
Andrew Pinski

>
> ## Command Line Options
>
> Three new target-specific options are provided:
> - mbranch-dilution
> - mbranch-dilution-granularity={num}
> - mbranch-dilution-max-branches={num}
>
> A number of cores known to be able to benefit from this pass have been
> given default tuning values for their granularity and max-branches.
> Each affected core has a very specific granule size and associated
> max-branch limit. This is a microarchitecture specific optimization.
> Typical usage should be -mdilute-branches with a specificed -mcpu. Cores
> with a granularity tuned to 0 will be ignored. Options are provided for
> experimentation.
>
> ## Algorithm and Heuristic
>
> The pass takes a very simple 'sliding window' approach to the problem.
> We crawl through each instruction (starting at the first branch) and
> keep track of the number of branches within the current "granule" (or
> window). When this exceeds the max-branch value, the pass will dilute
> the current granule, inserting nops to push out some of the branches.
> The heuristic will favour unconditonal branches (for performance
> reasons), or branches that are between two other 

Re: [PATCH] minor FDO profile related fixes

2018-11-09 Thread Jeff Law
On 11/7/18 5:49 PM, Indu Bhagat wrote:
> I have been looking at -fdump-ipa-profile dump with an intention to sanitize
> 
> bits of information so that one may use it to judge the "quality of a profile"
> 
> in FDO.
> 
> The overall question I want to address is - are there ways to know which
> functions were not run in the training run, i.e. have ZERO profile ?
> (This patch corrects some dumped info; in a subsequent patch I would like to 
> add
> 
> some more explicit information/diagnostics.)
> 
> Towards that end, I noticed that there are a couple of misleading bits of
> information (so I think) in the symbol table dump listing all functions in the
> 
> compilation unit :
>    --- "globally 0" appears even when profile data has not been fed by 
> feedback
> 
> profile (not the intent as the documentation of 
> profile_guessed_global0
> 
>  in profile-count.h suggests).
>    --- "unlikely_executed" should appear only when there is profile feedback 
> or
> 
>    a function attribute is specified (as per documentation of
>    node_frequency in coretypes.h). "unlikely_executed" in case of STALE or
> 
>    NO profile is misleading in my opinion.
> 
> Summary of changes :
> 
> 1. This patch makes some adjustments around how x_profile_status of a function
> 
> is set - x_profile_status should be set to PROFILE_READ only when there is a
> 
> profile for a function read from the .gcda file. So, instead of relying on
> profile_info (set whenever the gcda feedback file is present, even if the
> function does not have a profile available in the file), use exec_counts
> (non null when function has a profile (the latter may or may not be zero)). In
> 
> essence, x_profile_status and profile_count::m_quality
> are set consistent to the stated intent (in code comments.)
> 
> 2. A minor change in coverage.c is for more precise location of the message
> 
> Following -fdump-ipa-profile dump excerpts show the effect :
> 
> 
>  -O1, -O2, -O3
> 
> 
> 0. APPLICABLE PROFILE
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
> 
> 1. STALE PROFILE
> (i.e., those cases covered by Wcoverage-mismatch; when control flow changes
>  between profile-generate and profile-use)
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
> 
> 2. NO PROFILE
> (i.e., those cases covered by Wmissing-profile; when function has no profile
> 
>  available in the .gcda file)
> Trunk (missing .gcda file) : Function flags: count:1073741824 (estimated 
> locally) body
> 
> Trunk (missing function) : Function flags: count: 1073741824 (estimated 
> locally, globally 0) body unlikely_executed
> 
> After Patch (missing .gcda file) : Function flags: count:1073741824 
> (estimated locally) body
> 
> After Patch (missing function) : Function flags: count:1073741824 (estimated 
> locally) body
> 
> 
> 3. ZERO PROFILE (functions not run in training run)
> Trunk : Function flags: count: 1073741824 (estimated locally, globally 0) 
> body unlikely_executed
> 
> After Patch (remains the same) : count: 1073741824 (estimated locally, 
> globally 0) body unlikely_executed
> 
> 
> --
> O0
> --
> In O0, flag_guess_branch_prob is not set. This makes the profile_quality set 
> to
> 
> (precise) for most of the above cases.
> 
> 0. APPLICABLE PROFILE
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
> 
> 1. STALE PROFILE
> (i.e., those cases covered by Wcoverage-mismatch; when control flow changes
>  between profile-generate and profile-use)
> Trunk : Function flags: count:224114269 body hot
> After Patch : Function flags: count:224114269 (precise) body hot
> 
> 2. NO PROFILE
> (i.e., those cases covered by Wmissing-profile; when function has no profile
> 
>  available in the .gcda file)
> Trunk (missing file) : Function flags: body
> Trunk (missing function) : Function flags: count:0 body unlikely_executed
> After Patch (missing file) :  Function flags: body
> *** After Patch (missing function) : Function flags: count:0 (precise) body
> (*** This remains misleading, and I do not have a solution for this; as use 
> of heuristics
> 
>  to guess branch probability is not allowed in O0)
> 
> 3. ZERO PROFILE (functions not run in training run)
> Trunk : Function flags: count:0 body unlikely_executed
> After Patch : Function flags: count:0 (precise) body
> 
> --
> 
> make check-gcc on x86_64 shows no new failures.
> 
> (A related PR was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86957 where we 
> added diagnostics for the NO PROFILE case.)
> 
> 
> 
> precise-ipa-dump-optinfo.patch.ver1
> 
> diff --git 

Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-09 Thread Martin Sebor

On 11/04/2018 12:32 AM, Andi Kleen wrote:

From: Andi Kleen 

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an Processor Trace log, which allows log over head
logging of informatin.

This allows to reconstruct how values later, which can be useful for
debugging or other analysis of the program behavior. With the compiler
support this can be done with without having to manually add instrumentation
to the code.

Using dwarf information this can be later mapped back to the variables.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

The -fvartrace-locals options is experimental: it works, but it
generates redundant ptwrites because the pass doesn't use
the SSA information to minimize instrumentation. This could be optimized
later.

Currently the code can be tested with SDE, or on a Intel
Gemini Lake system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Linux perf can be used to
record the values

perf record -e intel_pt/ptw=1,branch=0/ program
perf script --itrace=crw -F +synth ...

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4

It is currently not able to decode all variable locations to names,
but a large subset.

Longer term hopefully gdb will support this information too.

The CPU can potentially generate very data high bandwidths when
code doing a lot of computation is heavily instrumented.
This can cause some data loss in both the CPU and also in perf
logging the data when the disk cannot keep up.

Running some larger workloads most workloads do not cause
CPU level overflows, but I've seen it with -fvartrace
with crafty, and with more workloads with -fvartrace-locals.

Recommendation is to not fully instrument programs,
but only areas of interest either at the file level or using
the attributes.

The other thing is that perf and the disk often cannot keep up
with the data bandwidth for longer computations. In this case
it's possible to use perf snapshot mode (add --snapshot
to the command line above). The data will be only logged to
a memory ring buffer then, and only dump the buffers on events
of interest by sending SIGUSR2 to the perf binrary.

In the future this will be hopefully better supported with
core files and gdb.

Passes bootstrap and test suite on x86_64-linux, also
bootstrapped and tested gcc itself with full -fvartrace
and -fvartrace-locals instrumentation.


(I initially meant to just suggest detecting and rejecting the two
mutually exclusive attributes but as I read the rest of the patch
to better understand what it's about I noticed a few other issues
I thought would be useful to point out.)

...

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 4416b5042f7..66bbd87921f 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -325,6 +327,12 @@ const struct attribute_spec c_common_attribute_table[] =
   { "no_instrument_function", 0, 0, true,  false, false, false,
  handle_no_instrument_function_attribute,
  NULL },
+  { "vartrace",0, 0, false,  false, false, false,
+ handle_vartrace_attribute,
+ NULL },
+  { "no_vartrace", 0, 0, false,  false, false, false,
+ handle_vartrace_attribute,
+ NULL },
   { "no_profile_instrument_function",  0, 0, true, false, false, false,
  handle_no_profile_instrument_function_attribute,
  NULL },


Unless mixing these attributes on the same declaration makes sense
I would suggest to either define the exclusions that should be
automatically applied to the attributes (see attribute exclusions),
or to enforce them in the handler.  Judging only by the names it
looks to me like vartrace should be mutually exclusive with
no_vartrace.


@@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }

+/* Handle "vartrace"/"no_vartrace" attributes; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_vartrace_attribute (tree *node, tree, 

Re: [patch] allow target config to state r18 is fixed on aarch64

2018-11-09 Thread Andrew Pinski
On Fri, Nov 9, 2018 at 10:09 AM Olivier Hainque  wrote:
>
> Hello Wilco,
>
> Would you have further thoughts on the patches proposed in
>
>  https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01453.html
>
> ?
>
> There was:
>
> 1)  * config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
> R9_REGNUM instead of 9.
> (PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.
>
>
> 2)  * config/aarch64/aarch64.h (FIXED_R18): New internal
> configuration macro, defaulted to 0.
> (FIXED_REGISTERS): Use it.
> (STATIC_CHAIN_REGNUM): Use r11 instead of r18.
>
> and
>
> 3)  * config/aarch64/aarch64.c (aarch64_override_options): Once
> arch, cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS
>if defined.

I remember hearing from some folks, they would like this (the ability
to mark r18 as being fixed) to be a target option so they could
support wine on AARCH64 too; I hear clang has this option already too.
https://bugs.winehq.org/show_bug.cgi?id=38780 for the wine issue.

Thanks,
Andrew

>
>
>
> Thanks a lot in advance!
>
> With Kind Regards,
>
> Olivier
>
>
> > On 23 Oct 2018, at 19:13, Olivier Hainque  wrote:
> >
> > Hi Wilco,
> >
> >> On 18 Oct 2018, at 19:08, Wilco Dijkstra  wrote:
> >
> >>> I wondered if we could set it to R11 unconditionally and picked
> >>> the way ensuring no change for !vxworks ports, especially since I
> >>> don't have means to test more than what I described above.
> >>
> >> Yes it should always be the same register, there is no gain in switching
> >> it dynamically. I'd suggest to use X9 since X8 is the last register used 
> >> for
> >> arguments (STATIC_CHAIN_REGNUM is passed when calling a nested
> >> function) and some of the higher registers may be used as temporaries in
> >> prolog/epilog.
> >
> > Thanks for your feedback!  I ported the patches
> > to gcc-8 and was able to get a functional toolchain
> > for aarch64-wrs-vxworks7 and aarch64-elf, passing
> > full Acats for a couple of runtime variants on VxWorks
> > (compiled for RTP or kernel mode) as well as a small
> > internal testsuite we have, dedicated to cross configurations.
> >
> > All the patches apply directly on mainline.
> >
> > As for the original patch, I also sanity checked that
> > "make all-gcc" passes (self tests included) there for
> > --target=aarch64-elf --enable-languages=c
> >
> >
> > There are three changes to the common aarch64 port files.
> >
> > It turns out that X9 doesn't work for STATIC_CHECK_REGNUM
> > because this conflicts with the registers used for -fstack-check:
> >
> >  /* The pair of scratch registers used for stack probing.  */
> >  #define PROBE_STACK_FIRST_REG  9
> >  #define PROBE_STACK_SECOND_REG 10
> >
> > I didn't find that immediately (read, I first experienced a few
> > badly crashing test runs) because I searched for R9_REGNUM to check
> > for other uses, so the first patchlet attached simply adjusts
> > the two #defines above to use R9/R10_REGNUM.
> >
> >
> > 2018-10-23  Olivier Hainque  
> >
> >* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
> >R9_REGNUM instead of 9.
> >(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.
> >
> >
> > The second patch is the one which I proposed a few days ago
> > to allow a subtarget (in my case, the VxWorks port) to state that
> > R18 is to be considered fixed. Two changes compared to the original
> > patch: a comment attached to the default definition of FIXED_R18,
> > and the unconditional use of R11_REGNUM as an alternate STATIC_CHAIN_REGNUM.
> >
> > I suppose the latter could require extra testing than what I was
> > able to put in (since this is also changing for !vxworks configurations),
> > which Sam very kindly did on the first instance.
> >
> > I didn't touch CALL_SAVED_REGISTERS since this is 1 for r18 already.
> > I also didn't see a strong reason to move to a more dynamic scheme,
> > through conditional_register_usage.
> >
> > 2018-03-18  Olivier Hainque  
> >
> >   * config/aarch64/aarch64.h (FIXED_R18): New internal
> >   configuration macro, defaulted to 0.
> >   (FIXED_REGISTERS): Use it.
> >   (STATIC_CHAIN_REGNUM): Use r11 instead of r18.
> >
> >
> > The third patch proposes the introduction of support for a
> > conditional SUBTARGET_OVERRIDE_OPTIONS macro, as many other
> > architectures have, and which is needed by all VxWorks ports.
> >
> > In the current state, this one could possibly impact only
> > VxWorks, as no other config file would define the macro.
> >
> > I'm not 100% clear on the possible existence of rules regarding
> > the placement of this within the override_options functions. We used
> > something similar to what other ports do, and it worked just fine
> > for VxWorks.
> >
> > 2018-10-23  Olivier Hainque  
> >
> >* config/aarch64/aarch64.c (aarch64_override_options): Once
> >arch, cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS
> >if defined.
> >
> > I'm 

Re: [PATCH v2] [aarch64] Correct the maximum shift amount for shifted operands.

2018-11-09 Thread Andrew Pinski
On Fri, Nov 9, 2018 at 3:39 AM  wrote:
>
> From: Christoph Muellner 
>
> The aarch64 ISA specification allows a left shift amount to be applied
> after extension in the range of 0 to 4 (encoded in the imm3 field).
>
> This is true for at least the following instructions:
>
>  * ADD (extend register)
>  * ADDS (extended register)
>  * SUB (extended register)
>
> The result of this patch can be seen, when compiling the following code:
>
> uint64_t myadd(uint64_t a, uint64_t b)
> {
>   return a+(((uint8_t)b)<<4);
> }
>
> Without the patch the following sequence will be generated:
>
>  :
>0:   d37c1c21ubfiz   x1, x1, #4, #8
>4:   8b20add x0, x1, x0
>8:   d65f03c0ret
>
> With the patch the ubfiz will be merged into the add instruction:
>
>  :
>0:   8b211000add x0, x0, w1, uxtb #4
>4:   d65f03c0ret
>
> Tested with "make check" and no regressions found.
>
> *** gcc/ChangeLog ***
>
> 2018-xx-xx  Christoph Muellner  
> Philipp Tomsich  
>
> * config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum
> shift amount for shifted operands.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-xx-xx  Christoph Muellner  
> Philipp Tomsich  
>
> * gcc.target/aarch64/extend.c: Adjust the testcases to cover
> the changed shift amount.

I think it might be better if we added new testcases instead of
modifying old ones in this case.  The main reason is that if we test
an older compiler with the newer testsuite (which you can do), you
should get the tests failing.
The main reason why you should be modifying testcases if the tests
were not testing what they should be testing ...  This is not one of
those cases.

Thanks,
Andrew Pinski

> ---
>  gcc/config/aarch64/aarch64.c  |  2 +-
>  gcc/testsuite/gcc.target/aarch64/extend.c | 16 
>  2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index c82c7b6..c85988a 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -8190,7 +8190,7 @@ aarch64_output_casesi (rtx *operands)
>  int
>  aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
>  {
> -  if (shift >= 0 && shift <= 3)
> +  if (shift >= 0 && shift <= 4)
>  {
>int size;
>for (size = 8; size <= 32; size *= 2)
> diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c 
> b/gcc/testsuite/gcc.target/aarch64/extend.c
> index f399e55..7986c5b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/extend.c
> +++ b/gcc/testsuite/gcc.target/aarch64/extend.c
> @@ -32,8 +32,8 @@ ldr_sxtw0 (char *arr, int i)
>  unsigned long long
>  adddi_uxtw (unsigned long long a, unsigned int i)
>  {
> -  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?3" } } */
> -  return a + ((unsigned long long)i << 3);
> +  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?4" } } */
> +  return a + ((unsigned long long)i << 4);
>  }
>
>  unsigned long long
> @@ -46,8 +46,8 @@ adddi_uxtw0 (unsigned long long a, unsigned int i)
>  long long
>  adddi_sxtw (long long a, int i)
>  {
> -  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?3" } } */
> -  return a + ((long long)i << 3);
> +  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?4" } } */
> +  return a + ((long long)i << 4);
>  }
>
>  long long
> @@ -60,8 +60,8 @@ adddi_sxtw0 (long long a, int i)
>  unsigned long long
>  subdi_uxtw (unsigned long long a, unsigned int i)
>  {
> -  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?3" } } */
> -  return a - ((unsigned long long)i << 3);
> +  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?4" } } */
> +  return a - ((unsigned long long)i << 4);
>  }
>
>  unsigned long long
> @@ -74,8 +74,8 @@ subdi_uxtw0 (unsigned long long a, unsigned int i)
>  long long
>  subdi_sxtw (long long a, int i)
>  {
> -  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?3" } } */
> -  return a - ((long long)i << 3);
> +  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?4" } } */
> +  return a - ((long long)i << 4);
>  }
>
>  long long
> --
> 2.9.5
>


Re: Tweak ALAP calculation in SCHED_PRESSURE_MODEL

2018-11-09 Thread Jeff Law
On 11/8/18 5:10 AM, Kyrill Tkachov wrote:
> 
> 
> This patch fixes a flaw in the relationship between the way that
> SCHED_PRESSURE_MODEL calculates the alap and depth vs how it uses
> them in model_order_p.  A comment in model_order_p says:
> 
>   /* Combine the length of the longest path of satisfied true dependencies
>  that leads to each instruction (depth) with the length of the longest
>  path of any dependencies that leads from the instruction (alap).
>  Prefer instructions with the greatest combined length.  If the
> combined
>  lengths are equal, prefer instructions with the greatest depth.
> 
>  The idea is that, if we have a set S of "equal" instructions that each
>  have ALAP value X, and we pick one such instruction I, any
> true-dependent
>  successors of I that have ALAP value X - 1 should be preferred over S.
>  This encourages the schedule to be "narrow" rather than "wide".
>  However, if I is a low-priority instruction that we decided to
>  schedule because of its model_classify_pressure, and if there
>  is a set of higher-priority instructions T, the aforementioned
>  successors of I should not have the edge over T.  */
> 
> The expectation was that scheduling an instruction X would give a
> greater priority to the highest-priority successor instructions Y than
> X had: Y.depth would be X.depth + 1 and Y.alap would be X.alap - 1,
> giving an equal combined height, but with the greater depth winning as
> a tie-breaker. But this doesn't work if the alap value was ultimately
> determined by an anti-dependence.
> 
> This is particularly bad when --param max-pending-list-length kicks in,
> since we then start adding fake anti-dependencies in order to keep the
> list length down.  These fake dependencies tend to be on the critical
> path.
> 
> The attached patch avoids that by making the alap calculation only
> look at true dependencies.  This shouldn't be too bad, since we use
> INSN_PRIORITY as the final tie-breaker than that does take
> anti-dependencies into account.
> 
> This reduces the number of spills in the hot function from 436.cactusADM
> by 14% on aarch64 at -O3 (and the number of instructions in general).
> SPEC2017 shows a minor improvement on Cortex-A72 (about 0.1% overall).
> Thanks to Wilco for the benchmarking.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Is this ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2018-11-08  Richard Sandiford  
> 
> gcc/
>     * haifa-sched.c (model_analyze_insns): Only add 1 to the consumer's
>     ALAP if the dependence is a true dependence.
So at the least the documentation of the ALAP field would need to be
updated as well as the comment you referenced (the "any dependencies").

But more importantly, it seems like blindly ignoring anti dependencies
is just a hack that happens to work.  I wonder if we could somehow mark
the fake dependencies we add, and avoid bumping the ALAP when we
encounter those fake dependencies.

It probably wouldn't be a bad idea to look at the default for
MAX_PENDING_LIST_LENGTH.  Based on the current default value and the
comments in the code that value could well have been tuned 25 or more
years ago!  How often are we falling over that during a bootstrap and
during spec builds?


Jeff


Re: [PATCH] add simple attribute introspection

2018-11-09 Thread Martin Sebor

On 11/09/2018 12:59 PM, Jeff Law wrote:

On 10/31/18 10:27 AM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01473.html

With the C++ bits approved I'm still looking for a review/approval
of the remaining changes: the C front end and the shared c-family
handling of the new built-in.

I thought I acked those with just a couple minor doc nits:


I don't see a formal approval for the rest in my Inbox or in
the archive.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8ffb0cd..dcf4747 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2649,8 +2649,9 @@ explicit @code{externally_visible} attributes

are still necessary.

 @cindex @code{flatten} function attribute
 Generally, inlining into a function is limited.  For a function

marked with

 this attribute, every call inside this function is inlined, if possible.
-Whether the function itself is considered for inlining depends on its

size and

-the current inlining parameters.
+Functions declared with attribute @code{noinline} and similar are not
+inlined.  Whether the function itself is considered for inlining depends
+on its size and the current inlining parameters.

Guessing this was from another doc patch that I think has already been
approved


Yes.  It shouldn't be in the latest patch at the link above.


@@ -11726,6 +11728,33 @@ check its compatibility with @var{size}.

 @end deftypefn

+@deftypefn {Built-in Function} bool __builtin_has_attribute

(@var{type-or-expression}, @var{attribute})

+The @code{__builtin_has_attribute} function evaluates to an integer

constant

+expression equal to @code{true} if the symbol or type referenced by
+the @var{type-or-expression} argument has been declared with
+the @var{attribute} referenced by the second argument.  Neither argument
+is valuated.  The @var{type-or-expression} argument is subject to the

same
s/valuated/evaluated/ ?


This should also be fixed in the latest patch at the link above.


Did the implementation change significantly requiring another review
iteration?


I don't think it changed too significantly between the last two
revisions but I don't have a record of anyone having approved
the C FE and the middle-end bits.  (Sorry if I missed it.) Other
than this response from you all I see in the archive is this:

  https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00606.html

Please let me if the last revision is okay to commit.

Thanks
Martin


Re: Relocation (= move+destroy)

2018-11-09 Thread Marc Glisse

On Fri, 9 Nov 2018, Jonathan Wakely wrote:


Here's the fix for the regression this introduced.


Thanks. I was going to handle it, but I was waiting for you to review the 
second relocation patch first to avoid having several patches in flight 
over the same piece of code. Anyway, having the regression fixed is good.


--
Marc Glisse


[PATCH] C/C++: add fix-it hints for missing '&' and '*' (PR c++/87850)

2018-11-09 Thread David Malcolm
This patch adds a fix-it hint to various pointer-vs-non-pointer
diagnostics, suggesting the addition of a leading '&' or '*'.

For example, note the ampersand fix-it hint in the following:

demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka 'unsigned int'}
   to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
5 |   pthread_key_create(key, NULL);
  |  ^~~
  |  |
  |  pthread_key_t {aka unsigned int}
  |  &
In file included from demo.c:1:
/usr/include/pthread.h:1122:47: note:   initializing argument 1 of 'int
   pthread_key_create(pthread_key_t*, void (*)(void*))'
 1122 | extern int pthread_key_create (pthread_key_t *__key,
  |~~~^

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c-family/ChangeLog:
PR c++/87850
* c-common.c (maybe_add_indirection_token): New function.
* c-common.h (maybe_add_indirection_token): New decl.

gcc/c/ChangeLog:
PR c++/87850
* c-typeck.c (convert_for_assignment): Call
maybe_add_indirection_token for pointer vs non-pointer
diagnostics.

gcc/cp/ChangeLog:
PR c++/87850
* call.c (convert_like_real): Call maybe_add_indirection_token
for "invalid conversion" diagnostic.

gcc/testsuite/ChangeLog:
PR c++/87850
* c-c++-common/indirection-fixits.c: New test.
---
 gcc/c-family/c-common.c |  25 +++
 gcc/c-family/c-common.h |   2 +
 gcc/c/c-typeck.c|   2 +
 gcc/cp/call.c   |   1 +
 gcc/testsuite/c-c++-common/indirection-fixits.c | 250 
 5 files changed, 280 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/indirection-fixits.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 534d928..3756e6d 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8382,6 +8382,31 @@ maybe_suggest_missing_token_insertion (rich_location 
*richloc,
 }
 }
 
+/* Potentially add a fix-it hint for a missing '&' or '*' to RICHLOC,
+   depending on EXPR and EXPECTED_TYPE.  */
+
+void
+maybe_add_indirection_token (rich_location *richloc,
+tree expr, tree expected_type)
+{
+  gcc_assert (richloc);
+  gcc_assert (expr);
+  gcc_assert (expected_type);
+
+  tree actual_type = TREE_TYPE (expr);
+
+  /* Fix-it hint for missing '&'.  */
+  if (TREE_CODE (expected_type) == POINTER_TYPE
+  && actual_type == TREE_TYPE (expected_type)
+  && lvalue_p (expr))
+richloc->add_fixit_insert_before ("&");
+
+  /* Fix-it hint for missing '*'.  */
+  if (TREE_CODE (actual_type) == POINTER_TYPE
+  && TREE_TYPE (actual_type) == expected_type)
+richloc->add_fixit_insert_before ("*");
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index a218432..e362137 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1340,6 +1340,8 @@ extern void maybe_add_include_fixit (rich_location *, 
const char *, bool);
 extern void maybe_suggest_missing_token_insertion (rich_location *richloc,
   enum cpp_ttype token_type,
   location_t prev_token_loc);
+extern void maybe_add_indirection_token (rich_location *richloc,
+tree expr, tree expected_type);
 extern tree braced_list_to_string (tree, tree);
 
 #if CHECKING_P
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 144977e..f407025 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -7058,6 +7058,7 @@ convert_for_assignment (location_t location, location_t 
expr_loc, tree type,
  auto_diagnostic_group d;
  range_label_for_type_mismatch rhs_label (rhstype, type);
  gcc_rich_location richloc (expr_loc, _label);
+ maybe_add_indirection_token (, rhs, type);
  if (pedwarn (, OPT_Wint_conversion,
   "passing argument %d of %qE makes pointer from "
   "integer without a cast", parmnum, rname))
@@ -7094,6 +7095,7 @@ convert_for_assignment (location_t location, location_t 
expr_loc, tree type,
auto_diagnostic_group d;
range_label_for_type_mismatch rhs_label (rhstype, type);
gcc_rich_location richloc (expr_loc, _label);
+   maybe_add_indirection_token (, rhs, type);
if (pedwarn (, OPT_Wint_conversion,
 "passing argument %d of %qE makes integer from "
 "pointer without a cast", parmnum, rname))
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 6f40156..85d722c 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6814,6 +6814,7 @@ convert_like_real (conversion *convs, tree 

RE: [PATCH] MIPS: Add `-mfix-r5900' option for the R5900 short loop erratum

2018-11-09 Thread mfortune
Hi Fredrik,

> Many thanks for your prompt review, Maciej!

Maciej, I am equally grateful for you stepping in to get Fredrik's
contribution reviewed.  I have taken a look through and have no
additional comments.  I'm going to be travelling this weekend so
I'd like to ask Maciej to take a quick look at the v2 version but
otherwise pre-approve it.

Has your copyright assignment come through for GCC? I can't access
the copyright info at the moment to check myself. Obviously that will
be a blocker otherwise.

I hope you do manage to work through other r5900 issues and if so the
changes will be most welcome.

Thanks,
Matthew



Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-11-09 Thread Giuliano Augusto Faulin Belinassi
Hi. Sorry for the late reply :P

> But the max. error in sinh/cosh/atanh is less than 2 ULP, with some math
> ibraries.  It could be < 1 ULP, in theory, so sinh(atanh(x)) less than
>2 ULP even.

Sorry, but doesn't the user agree to sacrifice precision for
performance when -ffast-math is enabled?

>> The question is more like whether errno and trapping/exception behaviour
>> is identical - I guess it is not so I would expect this to be fastmath only.
>> Which particular flag one uses is a detail given there isn't a clear 
>> definition
>> for most of them.

> And signed zeroes.  Yeah.  I think it would have to be
> flag_unsafe_math_optimizations + some more.

>From my point of view, this optimization is OK for IEEE 754.
So I have to check if the target has signed zeroes and support signed
infinity. I will look into that.

> So we need to give Giuliano some clear guidance on guarding.  This is
> out of my area of expertise, so looking to y'all to help here.

At this point I don't know how to check that, but I will look into it.
On Fri, Nov 9, 2018 at 6:03 PM Jeff Law  wrote:
>
> On 11/8/18 6:33 AM, Wilco Dijkstra wrote:
> > Hi,
> >
> >> But the max. error in sinh/cosh/atanh is less than 2 ULP, with some math
> >> libraries.  It could be < 1 ULP, in theory, so sinh(atanh(x)) less than
> >> 2 ULP even.
> >
> > You can't add ULP errors in general - a tiny difference in the input can
> > make a huge difference in the result if the derivative is > 1.
> >
> > Even with perfect implementations of 0.501ULP on easy functions with
> > no large derivatives you could get a 2ULP total error if the perfectly 
> > rounded
> > and actual result end up rounding in different directions in the 2nd 
> > function...
> >
> > So you have to measure ULP error since it is quite counter intuitive.
> >
> >> And signed zeroes.  Yeah.  I think it would have to be
> >> flag_unsafe_math_optimizations + some more.
> >
> > Indeed.
> So we need to give Giuliano some clear guidance on guarding.  This is
> out of my area of expertise, so looking to y'all to help here.
>
> jeff


RE: [PATCH v2] MIPS: Default to --with-llsc for the R5900 Linux target as well

2018-11-09 Thread mfortune
Hi Fredrik,

> Would it be possible to apply the reviewed patch below?

Apologies for the total lack of response from me. Thank-you for your
efforts to get this noticed!

The patch looks OK to me. I didn't see a ChangeLog entry anywhere but
something like the following would be appropriate.

gcc/
* config.gcc: Update with-llsc defaults for MIPS r5900.

I'll do what I can to help you get the various changes done for r5900;
I know there have been attempts before but they faded away for one
reason or another. As Maciej has said, your contribution is really
appreciated, and for this one, it is obvious enough to go in prior to
your FSF copyright assignment coming through.

Maciej: I'm not able to commit this for Fredrik at the moment, would
you mind doing that for him?

Thanks,
Matthew

> 
> Thank you,
> Fredrik
> 
> On Fri, Oct 19, 2018 at 08:33:33PM +0200, Fredrik Noring wrote:
> > The Linux kernel requires and emulates LL and SC for the R5900 too.
> > The special --without-llsc default for the R5900 is therefore not
> > applicable in that case.
> >
> > Reviewed-by: Maciej W. Rozycki 
> > ---
> > Changes in v2:
> > - Double spacing instead of single spacing in commit message
> >
> > ---
> >  gcc/config.gcc | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc index
> > 720e6a7373d..68c34b16123 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -3711,14 +3711,14 @@ fi
> >  # Infer a default setting for --with-llsc.
> >  if test x$with_llsc = x; then
> >case ${target} in
> > -mips64r5900-*-* | mips64r5900el-*-* | mipsr5900-*-* | mipsr5900el-*-
> *)
> > -  # The R5900 doesn't support LL(D) and SC(D).
> > -  with_llsc=no
> > -  ;;
> >  mips*-*-linux*)
> ># The kernel emulates LL and SC where necessary.
> >with_llsc=yes
> >;;
> > +mips64r5900-*-* | mips64r5900el-*-* | mipsr5900-*-* | mipsr5900el-*-
> *)
> > +  # The R5900 doesn't support LL(D) and SC(D).
> > +  with_llsc=no
> > +  ;;
> >esac
> >  fi
> >
> > --
> > 2.18.1
> >



Re: [PATCH] Instrument only selected files (PR gcov-profile/87442).

2018-11-09 Thread Jeff Law
On 11/8/18 6:42 AM, Martin Liška wrote:
> Hi.
> 
> The patch is about possibility to filter which files are instrumented. The 
> usage
> is explained in the PR.
> 
> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.
> 
> Ready for trunk?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-11-08  Martin Liska  
> 
>   PR gcov-profile/87442
>   * common.opt: Add -fprofile-filter-files and -fprofile-exclude-files
>   options.
>   * doc/invoke.texi: Document them.
>   * tree-profile.c (parse_profile_filter): New.
>   (parse_profile_file_filtering): Likewise.
>   (release_profile_file_filtering): Likewise.
>   (include_source_file_for_profile): Likewise.
>   (tree_profiling): Filter source files based on the
>   newly added options.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-11-08  Martin Liska  
> 
>   PR gcov-profile/87442
>   * gcc.dg/profile-filtering-1.c: New test.
>   * gcc.dg/profile-filtering-2.c: New test.
Extra credit if we could also do this on a function level.  I've
certainly talked to developers that want finer grained control over what
gets instrumented and what doesn't.  This is probably enough to help
them, but I'm sure they'll want more :-)


OK.
jeff


Re: [PATCH] MAINTAINERS: add myself as or1k maintainer

2018-11-09 Thread Stafford Horne
On Sat, Nov 10, 2018 at 06:09:56AM +0900, Stafford Horne wrote:
> Hello,
> 
> I will just commit the below, I dont expect it to cause any problem as I am
> maintainer :).
> 
>   https://gcc.gnu.org/ml/gcc/2018-08/msg00216.html
> 
> On Fri, Nov 09, 2018 at 09:44:07PM +0900, Stafford Horne wrote:
> > ChangeLog:
> > 
> > -mm-dd  Stafford Horne  
> > 
> > * MAINTAINERS (CPU Port Maintainers): Add myself for or1k.
> > ---
> >  MAINTAINERS | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index ba8c8040967..cc7d4bddee8 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -90,6 +90,7 @@ nds32 portShiva Chen  
> > 
> >  nios2 port Chung-Lin Tang  
> >  nios2 port Sandra Loosemore
> >  nvptx port Tom de Vries
> > +or1k port  Stafford Horne  
> >  pdp11 port Paul Koning 
> >  powerpcspe portAndrew Jenner   
> > 
> >  riscv port Kito Cheng  
> > -- 
> > 2.17.2

FYI, the actual patch I comitted was the below.  I was reminded offline that I
also needed to remove myself from Write After Approval.

* MAINTAINERS (CPU Port Maintainers): Add myself for or1k.
(Write After Approval): Remove myself.
---
 ChangeLog   | 5 +
 MAINTAINERS | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index e0662790934..1d71ed6538e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2018-11-10  Stafford Horne  
+
+   * MAINTAINERS (CPU Port Maintainers): Add myself for or1k.
+   (Write After Approval): Remove myself.
+
 2018-11-06  Hafiz Abid Qadeer  
 
* config/iconv.m4 (AM_ICONV_LINK): Don't overwrite CPPFLAGS.
diff --git a/MAINTAINERS b/MAINTAINERS
index ba8c8040967..35a815bad97 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -90,6 +90,7 @@ nds32 portShiva Chen  

 nios2 port Chung-Lin Tang  
 nios2 port Sandra Loosemore
 nvptx port Tom de Vries
+or1k port  Stafford Horne  
 pdp11 port Paul Koning 
 powerpcspe portAndrew Jenner   

 riscv port Kito Cheng  
@@ -415,7 +416,6 @@ Stuart Henderson

 Matthew Hiller 
 Kazu Hirata
 Manfred Hollstein  
-Stafford Horne 
 Cong Hou   
 Falk Hueffner  
 Andrew John Hughes 
-- 
2.17.2



Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-09 Thread Jeff Law
On 11/8/18 1:34 PM, Segher Boessenkool wrote:
> On Thu, Nov 08, 2018 at 03:44:44PM +, Sam Tebbs wrote:
>> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
>> looking at the issue as well and don't want to do any overlapping work.
> 
> I don't know.  Well, there are no incorrect code issues I know of at all
> now; but you mean that it is taking an instruction more than you would
> like to see, I suppose?
Which is ultimately similar to the 3 regressions HJ has reported on x86_64.

jeff


Re: [PATCH] MAINTAINERS: add myself as or1k maintainer

2018-11-09 Thread Stafford Horne
Hello,

I will just commit the below, I dont expect it to cause any problem as I am
maintainer :).

  https://gcc.gnu.org/ml/gcc/2018-08/msg00216.html

On Fri, Nov 09, 2018 at 09:44:07PM +0900, Stafford Horne wrote:
> ChangeLog:
> 
> -mm-dd  Stafford Horne  
> 
>   * MAINTAINERS (CPU Port Maintainers): Add myself for or1k.
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ba8c8040967..cc7d4bddee8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -90,6 +90,7 @@ nds32 port  Shiva Chen  
> 
>  nios2 port   Chung-Lin Tang  
>  nios2 port   Sandra Loosemore
>  nvptx port   Tom de Vries
> +or1k portStafford Horne  
>  pdp11 port   Paul Koning 
>  powerpcspe port  Andrew Jenner   
> 
>  riscv port   Kito Cheng  
> -- 
> 2.17.2
> 


Re: [PATCH] -fsave-optimization-record: compress the output using zlib

2018-11-09 Thread Jeff Law
On 11/9/18 10:51 AM, David Malcolm wrote:
> One of the concerns noted at Cauldron about -fsave-optimization-record
> was the size of the output files.
> 
> This file implements compression of the -fsave-optimization-record
> output, using zlib.
> 
> I did some before/after testing of this patch, using SPEC 2017's
> 502.gcc_r with -O3, looking at the sizes of the generated
> FILENAME.opt-record.json[.gz] files.
> 
> The largest file was for insn-attrtab.c:
>   before:  171736285 bytes (164M)
>   after: 5304015 bytes (5.1M)
> 
> Smallest file was for vasprintf.c:
>   before:  30567 bytes
>   after:4485 bytes
> 
> Median file by size before was lambda-mat.c:
>   before:2266738 bytes (2.2M)
>   after:   75988 bytes (15K)
> 
> Total of all files in the benchmark:
>   before: 2041720713 bytes (1.9G)
>   after:66870770 bytes (63.8M)
> 
> ...so clearly compression is a big win in terms of file size, at the
> cost of making the files slightly more awkward to work with. [1]
> I also wonder if we want to support any pre-filtering of the output
> (FWIW roughly half of the biggest file seems to be "Adding assert for "
> messages from tree-vrp.c).
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> [1] I've updated my optrecord.py module to deal with this, which
> simplifies things; it's still not clear to me if that should live
> in "contrib/" or not.
> 
> gcc/ChangeLog:
>   * doc/invoke.texi (-fsave-optimization-record): Note that the
>   output is compressed.
>   * optinfo-emit-json.cc: Include .
>   (optrecord_json_writer::write): Compress the output.
OK.
jeff


Re: [PATCH] Add value_range_base (w/o equivs)

2018-11-09 Thread Jeff Law
On 11/9/18 7:19 AM, Richard Biener wrote:
> 
> This adds value_range_base, a base class of class value_range
> with all members but the m_equiv one.
> 
> I have looked into the sole GC user, IPA propagation, and replaced
> the value_range use there with value_range_base since it also
> asserted the equiv member is always NULL.
> 
> This in turn means I have written down that GC users only can
> use value_range_base (and fixed the accessability issue with
> adding a bunch of friends).
> 
> I have moved all methods that are required for this single user
> sofar (without looking with others are trivially movable because
> they do not look at equivs - that's for a followup).  I ended
> up basically duplicating the ::union_ wrapper around union_ranges
> (boo).  Some more refactoring might save us a few lines here.
> 
> There are two places where IPA prop calls extract_range_from_unary_expr.
> I've temporarily made it use value_range temporaries there given
> I need to think about the few cases of ->deep_copy () we have in
> there.  IMHO equiv handling would need to move to the callers or
> rather "copies" shouldn't be handled by extract_range_from_unary_expr.
> Well, I need to think about it.  (no, I don't want to make that
> function virtual)
> 
> The other workers might be even easier to massage to work on 
> value_range_base only.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> If that goes well I want to apply this even in its incomplete form.
> Unless there are any complaints?
> 
> Thanks,
> Richard.
> 
> From 525e2e74bbb666399fafcffe8f17e928f45ca898 Mon Sep 17 00:00:00 2001
> From: Richard Guenther 
> Date: Fri, 9 Nov 2018 15:00:50 +0100
> Subject: [PATCH] add-value_range_base
> 
>   * tree-vrp.h (class value_range_base): New base class for
>   value_range containing all but the m_equiv member.
>   (dump_value_range_base): Add.
>   (range_includes_zero_p): Work on value_range_base.
>   * tree-vrp.c (value_range_base::set): Split out base handling
>   from...
>   (value_range::set): this.
>   (value_range::set_equiv): New.
>   (value_range_base::value_range_base): New constructors.
>   (value_range_base::check): Split out base handling from...
>   (value_range::check): this.
>   (value_range::equal_p): Refactor in terms of
>   ignore_equivs_equal_p which is now member of the base.
>   (value_range_base::set_undefined): New.
>   (value_range_base::set_varying): Likewise.
>   (value_range_base::dump):Split out base handling from...
>   (value_range::dump): this.
>   (value_range_base::set_and_canonicalize): Split out base handling
>   from...
>   (value_range::set_and_canonicalize): this.
>   (value_range_base::union_): New.
> 
>   * ipa-prop.h (struct ipa_jump_func): Use value_range_base *
>   for m_vr.
>   * ipa-cp.c (class ipcp_vr_lattice): Use value_range_base
>   instead of value_range everywhere.
>   (ipcp_vr_lattice::print): Use dump_value_range_base.
>   (ipcp_vr_lattice::meet_with): Adjust.
>   (ipcp_vr_lattice::meet_with_1): Likewise.
>   (ipa_vr_operation_and_type_effects): Likewise.
>   (propagate_vr_across_jump_function): Likewise.
>   * ipa-prop.c (struct ipa_vr_ggc_hash_traits): Likewise.
>   (ipa_get_value_range): Likewise.
>   (ipa_set_jfunc_vr): Likewise.
>   (ipa_compute_jump_functions_for_edge): Likewise.
I like it.  Hell, I wish someone had suggested it last year, I could
have taken care of it then.

jeff


Re: Ping: [PATCH, testsuite] ignore some "conflicting types for built-in" messages

2018-11-09 Thread Jeff Law
On 11/9/18 7:21 AM, Paul Koning wrote:
> Ping.
> 
> I'd like to commit this.  The discussion seems to have ended up with the 
> conclusion that this is a reasonable approach.
> 
>   paul
> 
> 
>> On Nov 1, 2018, at 3:13 PM, Paul Koning  wrote:
>>
>> A number of test cases contain declarations like:
>>  void *memcpy();
>> which currently are silently accepted on most platforms but not on all; 
>> pdp11 (and possibly some others) generate a "conflicting types for built-in 
>> function" warning.
>>
>> It was suggested to prune those messages because the test cases where these 
>> occur are not looking for the message but are testing some other issue, so 
>> the message is not relevant.  The attached patch adds dg-prune-output 
>> directives to do so.
>>
>> Ok for trunk?
>>
>>  paul
>>
>> ChangeLog:
>>
>> 2018-11-01  Paul Koning  
>>
>>  * gcc.dg/Walloca-16.c: Ignore conflicting types for built-in
>>  warnings.
>>  * gcc.dg/Wrestrict-4.c: Ditto.
>>  * gcc.dg/Wrestrict-5.c: Ditto.
>>  * gcc.dg/pr83463.c: Ditto.
>>  * gcc.dg/torture/pr55890-2.c: Ditto.
>>  * gcc.dg/torture/pr55890-3.c: Ditto.
>>  * gcc.dg/torture/pr71816.c: Ditto.
OK.  Thanks for being patient with the round and round on this one.

jeff


Re: [PATCH, libgfortran] Replace sync builtins with atomic builtins

2018-11-09 Thread Jakub Jelinek
On Fri, Nov 09, 2018 at 01:13:28PM +0200, Janne Blomqvist wrote:
> The old __sync builtins have been deprecated for a long time now in
> favor of the __atomic builtins following the C++11/C11 memory model.
> This patch converts libgfortran to use the modern __atomic builtins.
> 
> At the same time I weakened the consistency to acquire-release as that
> should be enough for this particular case.  Jakub, as the original
> author of the algorithm, do you concur?

As mentioned on IRC, I think this could use just __ATOMIC_RELAXED,
but am not 100% sure.

> 2018-11-09  Janne Blomqvist  
> 
>   * acinclude.m4 (LIBGFOR_CHECK_ATOMIC_FETCH_ADD): Rename and test
>   presence of atomic builtins instead of sync builtins.
>   * configure.ac (LIBGFOR_CHECK_ATOMIC_FETCH_ADD): Call new test.
>   * io/io.h (inc_waiting_locked): Use __atomic_fetch_add.
>   (predec_waiting_locked): Use __atomic_add_fetch.
>   (dec_waiting_unlocked): Use __atomic_fetch_add.
>   * config.h.in: Regenerated.
>   * configure: Regenerated.

Jakub


Re: [PATCH] Fix PIE on netbsd (PR target/87221)

2018-11-09 Thread Jeff Law
On 11/9/18 1:59 AM, co...@sdf.org wrote:
> Re-sending because my patch doesn't seem to appear on the archive
> 
> 
> This matches to what netbsd is doing with its own copy of GCC,
> it can be simpler.
> 
> PR target/87221:
> config/netbsd-elf.h (NETBSD_STARTFILE_SPEC): use crtbeginS.o for PIE
> (NETBSD_ENDFILE_SPEC): use crtendS.o for PIE
Thanks.  Installed on the trunk.
jeff


[doc, committed] Clarify defaults for some -fno-xxx options

2018-11-09 Thread Sandra Loosemore
This patch fixes the problems with the docs for the defaults for some 
-fno- options as noted in PR 41179.  I double-checked the behavior 
against the code in opts.c and didn't just base this on trying to 
interpret the previous contorted wording.  :-)


This also addresses part of PR 65703.  My next patch will address the 
other part of that issue, relating to not being able to find the 
documentation for the positive form of options like -fno-defer-pop.


-Sandra
2018-11-09  Sandra Loosemore  

	PR driver/41179
	PR middle-end/65703

	gcc/
	* doc/invoke.texi (Optimize Options): Clarify default behavior
	for -fno-toplevel-reorder, -fno-defer-pop, and -fno-branch-count-reg.
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 265990)
+++ doc/invoke.texi	(working copy)
@@ -8062,12 +8062,11 @@ optimizations to be performed is desired
 @table @gcctabopt
 @item -fno-defer-pop
 @opindex fno-defer-pop
-Always pop the arguments to each function call as soon as that function
-returns.  For machines that must pop arguments after a function call,
-the compiler normally lets arguments accumulate on the stack for several
-function calls and pops them all at once.
-
-Disabled at levels @option{-O}, @option{-O2}, @option{-O3}, @option{-Os}.
+For machines that must pop arguments after a function call, always pop 
+the arguments as soon as each function returns.  
+At levels @option{-O1} and higher, @option{-fdefer-pop} is the default;
+this allows the compiler to let arguments accumulate on the stack for several
+function calls and pop them all at once.
 
 @item -fforward-propagate
 @opindex fforward-propagate
@@ -8284,18 +8283,16 @@ life-range analysis.  This option is eff
 
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
-Avoid running a pass scanning for opportunities to use ``decrement and
-branch'' instructions on a count register instead of generating sequences
-of instructions that decrement a register, compare it against zero, and
+Disable the optimization pass that scans for opportunities to use 
+``decrement and branch'' instructions on a count register instead of
+instruction sequences that decrement a register, compare it against zero, and
 then branch based upon the result.  This option is only meaningful on
 architectures that support such instructions, which include x86, PowerPC,
 IA-64 and S/390.  Note that the @option{-fno-branch-count-reg} option
 doesn't remove the decrement and branch instructions from the generated
 instruction stream introduced by other optimization passes.
 
-Enabled by default at @option{-O1} and higher.
-
-The default is @option{-fbranch-count-reg}.
+The default is @option{-fbranch-count-reg} at @option{-O1} and higher.
 
 @item -fno-function-cse
 @opindex fno-function-cse
@@ -9684,9 +9681,10 @@ are not removed.  This option is intende
 that relies on a particular ordering.  For new code, it is better to
 use attributes when possible.
 
-Enabled at level @option{-O0}.  When disabled explicitly, it also implies
-@option{-fno-section-anchors}, which is otherwise enabled at @option{-O0} on some
-targets.
+@option{-ftoplevel-reorder} is the default at @option{-O1} and higher, and
+also at @option{-O0} if @option{-fsection-anchors} is explicitly requested.
+Additionally @option{-fno-toplevel-reorder} implies
+@option{-fno-section-anchors}.
 
 @item -fweb
 @opindex fweb


[committed] Change parsing of various expressions in OpenMP clauses to assignment-expression

2018-11-09 Thread Jakub Jelinek
Hi!

As mentioned recently, in OpenMP 5.0 the general rule is that the grammar
has assignment-expressions inside of clauses, unless otherwise specified
(e.g. in array sections, array shape operator).

In some cases we were already doing that, in other cases we parsed
expression non-terminal instead, in other cases what is accepted in
if condition.  This changes it to match the standard.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

Thomas, if OpenACC needs in some case do something different, we'll need
is_omp arguments where the same parsing routine is used for both.  And, I
haven't touched any of the OpenACC-only parsing routines.

2018-11-09  Jakub Jelinek  

c/
* c-parser.c (c_parser_omp_clause_final): Use
c_parser_expr_no_commas, convert_lvalue_to_rvalue,
c_objc_common_truthvalue_conversion, c_fully_fold and parentheses
parsing instead of c_parser_paren_condition.
(c_parser_omp_clause_if): Use c_parser_expr_no_commas,
convert_lvalue_to_rvalue, c_objc_common_truthvalue_conversion and
c_fully_fold instead of c_parser_condition.
(c_parser_omp_clause_num_threads, c_parser_omp_clause_num_tasks,
c_parser_omp_clause_grainsize, c_parser_omp_clause_priority,
c_parser_omp_clause_hint, c_parser_omp_clause_num_teams,
c_parser_omp_clause_thread_limit, c_parser_omp_clause_linear): Use
c_parser_expr_no_commas instead of c_parser_expression.
cp/
* parser.c (cp_parser_omp_clause_final, cp_parser_omp_clause_if): Use
cp_parser_assignment_expression instead of cp_parser_condition.
(cp_parser_omp_clause_num_threads, cp_parser_omp_clause_num_tasks,
cp_parser_omp_clause_grainsize, cp_parser_omp_clause_priority,
cp_parser_omp_clause_num_teams, cp_parser_omp_clause_thread_limit,
cp_parser_omp_clause_linear, cp_parser_omp_clause_device): Use
cp_parser_assignment_expression instead of cp_parser_expression.
(cp_parser_omp_clause_hint): Likewise.  Formatting fix.
testsuite/
* c-c++-common/gomp/clauses-5.c: New test.

--- gcc/c/c-parser.c.jj 2018-11-09 17:17:05.454216557 +0100
+++ gcc/c/c-parser.c2018-11-09 19:32:49.210991843 +0100
@@ -12184,8 +12184,19 @@ c_parser_omp_clause_final (c_parser *par
   location_t loc = c_parser_peek_token (parser)->location;
   if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
 {
-  tree t = c_parser_paren_condition (parser);
-  tree c;
+  matching_parens parens;
+  tree t, c;
+  if (!parens.require_open (parser))
+   t = error_mark_node;
+  else
+   {
+ location_t eloc = c_parser_peek_token (parser)->location;
+ c_expr expr = c_parser_expr_no_commas (parser, NULL);
+ t = convert_lvalue_to_rvalue (eloc, expr, true, true).value;
+ t = c_objc_common_truthvalue_conversion (eloc, t);
+ t = c_fully_fold (t, false, NULL);
+ parens.skip_until_found_close (parser);
+   }
 
   check_no_duplicate_clause (list, OMP_CLAUSE_FINAL, "final");
 
@@ -12305,7 +12316,11 @@ c_parser_omp_clause_if (c_parser *parser
}
 }
 
-  tree t = c_parser_condition (parser), c;
+  location_t loc = c_parser_peek_token (parser)->location;
+  c_expr expr = c_parser_expr_no_commas (parser, NULL);
+  expr = convert_lvalue_to_rvalue (loc, expr, true, true);
+  tree t = c_objc_common_truthvalue_conversion (loc, expr.value), c;
+  t = c_fully_fold (t, false, NULL);
   parens.skip_until_found_close (parser);
 
   for (c = list; c ; c = OMP_CLAUSE_CHAIN (c))
@@ -12440,7 +12455,7 @@ c_parser_omp_clause_num_threads (c_parse
   if (parens.require_open (parser))
 {
   location_t expr_loc = c_parser_peek_token (parser)->location;
-  c_expr expr = c_parser_expression (parser);
+  c_expr expr = c_parser_expr_no_commas (parser, NULL);
   expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true);
   tree c, t = expr.value;
   t = c_fully_fold (t, false, NULL);
@@ -12486,7 +12501,7 @@ c_parser_omp_clause_num_tasks (c_parser
   if (parens.require_open (parser))
 {
   location_t expr_loc = c_parser_peek_token (parser)->location;
-  c_expr expr = c_parser_expression (parser);
+  c_expr expr = c_parser_expr_no_commas (parser, NULL);
   expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true);
   tree c, t = expr.value;
   t = c_fully_fold (t, false, NULL);
@@ -12532,7 +12547,7 @@ c_parser_omp_clause_grainsize (c_parser
   if (parens.require_open (parser))
 {
   location_t expr_loc = c_parser_peek_token (parser)->location;
-  c_expr expr = c_parser_expression (parser);
+  c_expr expr = c_parser_expr_no_commas (parser, NULL);
   expr = convert_lvalue_to_rvalue (expr_loc, expr, false, true);
   tree c, t = expr.value;
   t = c_fully_fold (t, false, NULL);
@@ -12578,7 +12593,7 @@ c_parser_omp_clause_priority (c_parser *
   if (parens.require_open (parser))
 {
  

[committed] Emit sorry for inscan reduction modifier

2018-11-09 Thread Jakub Jelinek
Hi!

In this case, even the parsing isn't finished (while the modifier is parsed,
if there is the modifier, there must be scan directive in the body etc.).

This patch emits a sorry.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-11-09  Jakub Jelinek  

* c-parser.c (c_parser_omp_clause_reduction): Call sorry_at on
reduction clause with inscan modifier.

* parser.c (cp_parser_omp_clause_reduction): Call sorry_at on
reduction clause with inscan modifier.

--- gcc/c/c-parser.c.jj 2018-11-09 16:01:53.406548059 +0100
+++ gcc/c/c-parser.c2018-11-09 17:17:05.454216557 +0100
@@ -13280,7 +13280,11 @@ c_parser_omp_clause_reduction (c_parser
  if (strcmp (p, "task") == 0)
task = true;
  else if (strcmp (p, "inscan") == 0)
-   inscan = true;
+   {
+ inscan = true;
+ sorry ("% modifier on % clause "
+"not supported yet");
+   }
  if (task || inscan)
{
  c_parser_consume_token (parser);
--- gcc/cp/parser.c.jj  2018-11-09 16:02:30.582935064 +0100
+++ gcc/cp/parser.c 2018-11-09 17:18:47.887528746 +0100
@@ -33115,7 +33115,11 @@ cp_parser_omp_clause_reduction (cp_parse
  if (strcmp (p, "task") == 0)
task = true;
  else if (strcmp (p, "inscan") == 0)
-   inscan = true;
+   {
+ inscan = true;
+ sorry ("% modifier on % clause "
+"not supported yet");
+   }
  if (task || inscan)
{
  cp_lexer_consume_token (parser->lexer);

Jakub


[committed] Emit sorry on most requires clauses

2018-11-09 Thread Jakub Jelinek
Hi!

Similarly, the parsing here is implemented, but we don't do anything with it
later on (mainly propagate to the runtime library and filter the available
devices to the selected subset).

So, this patch again emits a sorry on it for now.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-11-09  Jakub Jelinek  

* c-parser.c (c_parser_omp_requires): Call sorry_at on requires
clauses other than atomic_default_mem_order.

* parser.c (cp_parser_omp_requires): Call sorry_at on requires
clauses other than atomic_default_mem_order.

* c-c++-common/gomp/requires-1.c: Prune not supported yet messages.
* c-c++-common/gomp/requires-2.c: Likewise.
* c-c++-common/gomp/requires-4.c: Likewise.

--- gcc/c/c-parser.c.jj 2018-11-09 14:23:41.843755140 +0100
+++ gcc/c/c-parser.c2018-11-09 16:01:53.406548059 +0100
@@ -19017,6 +19017,9 @@ c_parser_omp_requires (c_parser *parser)
  return;
}
  if (p)
+   sorry_at (cloc, "%qs clause on % directive not "
+   "supported yet", p);
+ if (p)
c_parser_consume_token (parser);
  if (this_req)
{
--- gcc/cp/parser.c.jj  2018-11-09 14:23:57.143502422 +0100
+++ gcc/cp/parser.c 2018-11-09 16:02:30.582935064 +0100
@@ -39235,6 +39235,9 @@ cp_parser_omp_requires (cp_parser *parse
  return false;
}
  if (p)
+   sorry_at (cloc, "%qs clause on % directive not "
+   "supported yet", p);
+ if (p)
cp_lexer_consume_token (parser->lexer);
  if (this_req)
{
--- gcc/testsuite/c-c++-common/gomp/requires-1.c.jj 2018-11-08 
18:08:05.638918103 +0100
+++ gcc/testsuite/c-c++-common/gomp/requires-1.c2018-11-09 
16:06:52.232619382 +0100
@@ -13,3 +13,5 @@ foo ()
 i++;
   #pragma omp requries atomic_default_mem_order(seq_cst)
 }
+
+/* { dg-prune-output "not supported yet" } */
--- gcc/testsuite/c-c++-common/gomp/requires-2.c.jj 2018-11-08 
18:08:05.638918103 +0100
+++ gcc/testsuite/c-c++-common/gomp/requires-2.c2018-11-09 
16:47:12.998743476 +0100
@@ -16,3 +16,5 @@ foo ()
 }
 
 #pragma omp requires atomic_default_mem_order (seq_cst)/* { dg-error 
"more than one 'atomic_default_mem_order' clause in a single compilation unit" 
} */
+
+/* { dg-prune-output "not supported yet" } */
--- gcc/testsuite/c-c++-common/gomp/requires-4.c.jj 2018-11-08 
18:08:05.638918103 +0100
+++ gcc/testsuite/c-c++-common/gomp/requires-4.c2018-11-09 
16:47:21.838597944 +0100
@@ -9,3 +9,5 @@ foo (void)
 
 #pragma omp requires unified_address   /* { dg-error "'unified_address' clause 
used lexically after first target construct or offloading API" } */
 #pragma omp requires reverse_offload   /* { dg-error "'reverse_offload' clause 
used lexically after first target construct or offloading API" } */
+
+/* { dg-prune-output "not supported yet" } */

Jakub


[committed] Sorry on lastprivate with conditional modifier

2018-11-09 Thread Jakub Jelinek
Hi!

As mentioned earlier, for this feature the branch had just parsing support
and limited diagnostics, plus preparations on the runtime library side,
not the actual lowering/expansion, which I think will need 2 weeks full
time, so deferring that to GCC 10.  Instead of backing out the parsing, this
emits a sorry.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-11-09  Jakub Jelinek  

* gimplify.c (gimplify_scan_omp_clauses): Call sorry_at for valid
but unsupported lastprivate with conditional modifier.

* c-c++-common/gomp/lastprivate-conditional-1.c: New test.
* c-c++-common/gomp/lastprivate-conditional-2.c: New test.

--- gcc/gimplify.c.jj   2018-11-08 18:07:56.327070930 +0100
+++ gcc/gimplify.c  2018-11-09 15:50:30.194813418 +0100
@@ -8065,6 +8065,10 @@ gimplify_scan_omp_clauses (tree *list_p,
"% clause", decl);
  OMP_CLAUSE_LASTPRIVATE_CONDITIONAL (c) = 0;
}
+ if (OMP_CLAUSE_LASTPRIVATE_CONDITIONAL (c))
+   sorry_at (OMP_CLAUSE_LOCATION (c),
+ "% modifier on % clause "
+ "not supported yet");
  if (outer_ctx
  && (outer_ctx->region_type == ORT_COMBINED_PARALLEL
  || ((outer_ctx->region_type & ORT_COMBINED_TEAMS)
--- gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c.jj  
2018-11-09 14:40:24.416193799 +0100
+++ gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c 2018-11-09 
15:55:42.043671395 +0100
@@ -0,0 +1,64 @@
+void
+foo (int *p)
+{
+  int a = -1, b = -1, c = -1, d = -1, e = -1, f = -1, g = -1, h = -1;
+  int i;
+  #pragma omp teams
+  {
+#pragma omp distribute lastprivate (conditional: a) /* { dg-error 
"conditional 'lastprivate' clause on 'distribute' construct" } */
+for (i = 0; i < 32; i++)
+  if (p[i])
+   a = i;
+#pragma omp distribute simd lastprivate (conditional: b) /* { dg-error 
"conditional 'lastprivate' clause on 'distribute' construct" } */
+for (i = 0; i < 32; i++)
+  if (p[i])
+   b = i;
+#pragma omp distribute parallel for lastprivate (conditional: c) /* { 
dg-error "conditional 'lastprivate' clause on 'distribute' construct" } */
+for (i = 0; i < 32; i++)
+  if (p[i])
+   c = i;
+#pragma omp distribute parallel for simd lastprivate (conditional: d) /* { 
dg-error "conditional 'lastprivate' clause on 'distribute' construct" } */
+for (i = 0; i < 32; i++)
+  if (p[i])
+   d = i;
+  }
+  #pragma omp teams distribute parallel for lastprivate (conditional: e) /* { 
dg-error "conditional 'lastprivate' clause on 'distribute' construct" } */
+  for (i = 0; i < 32; i++)
+if (p[i])
+  e = i;
+  #pragma omp parallel
+  {
+#pragma omp master
+#pragma omp taskloop lastprivate (conditional: f) /* { dg-error 
"conditional 'lastprivate' clause on 'taskloop' construct" } */
+for (i = 0; i < 32; i++)
+  if (p[i])
+   f = i;
+#pragma omp master taskloop simd lastprivate (conditional: g) /* { 
dg-error "conditional 'lastprivate' clause on 'taskloop' construct" } */
+for (i = 0; i < 32; i++)
+  if (p[i])
+   g = i;
+  }
+  #pragma omp parallel master taskloop simd lastprivate (conditional: h) /* { 
dg-error "conditional 'lastprivate' clause on 'taskloop' construct" } */
+  for (i = 0; i < 32; i++)
+if (p[i])
+  h = i;
+}
+
+struct S { int a, b; };
+
+void
+bar (int *p)
+{
+  struct S s = { -1, -1 }, t = { 1, 2 };
+  int i;
+  #pragma omp parallel for lastprivate (conditional: s) /* { dg-error 
"non-scalar variable 's' in conditional 'lastprivate' clause" } */
+  for (i = 0; i < 32; i++)
+if (p[i])
+  {
+   struct S u = t;
+   u.b = i;
+   s = u;
+  }
+}
+
+/* { dg-prune-output "not supported yet" } */
--- gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-2.c.jj  
2018-11-09 15:43:31.781711861 +0100
+++ gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-2.c 2018-11-09 
15:52:09.920169067 +0100
@@ -0,0 +1,28 @@
+void
+foo (int *p)
+{
+  int a = -1, b = -1, c = -1, d = -1, e = -1, f = -1, g = -1, h = -1;
+  int i;
+  #pragma omp parallel
+  #pragma omp for lastprivate (conditional: a) /* { dg-message "not supported 
yet" } */
+  for (i = 0; i < 32; i++)
+if (p[i])
+  a = i;
+  #pragma omp simd lastprivate (conditional: b) /* { dg-message "not supported 
yet" } */
+  for (i = 0; i < 32; i++)
+if (p[i])
+  b = i;
+  #pragma omp parallel
+  #pragma omp for simd lastprivate (conditional: c) /* { dg-message "not 
supported yet" } */
+  for (i = 0; i < 32; i++)
+if (p[i])
+  c = i;
+  #pragma omp parallel for lastprivate (conditional: d) /* { dg-message "not 
supported yet" } */
+  for (i = 0; i < 32; i++)
+if (p[i])
+  d = i;
+  #pragma omp parallel for simd lastprivate (conditional: e) /* { dg-message 
"not supported yet" } */
+  for (i = 0; i < 32; i++)
+if (p[i])
+  e = i;
+}


Re: [PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Jakub Jelinek
On Fri, Nov 09, 2018 at 10:23:24AM -0800, Iain Sandoe wrote:
> Works For Me (as amended, bootstrap succeeds):
> 
> display-affinity-1.c:
> libgomp: Affinity not supported on this configuration
> L:0%0>xx.local14527_0x7fffe64ed3c0_0x7fffe64ed3c0_-01 0-7   
> L:0%0>xx.local14527_0x7fffe64ed3c0_0x7fffe64ed3c0_-01 0-7   
> %1
> 00!0!   1!4; 0;01;0;1;0-7
> 00!1!   1!4; 0;01;0;1;0-7
> 00!2!   1!4; 0;01;0;1;0-7
> 00!3!   1!4; 0;01;0;1;0-7
> 
> =

Thanks, I've also successfully bootstrapped/regtested it on x86_64-linux and
i686-linux, verified the display-affinity-1.c output on both and committed
to trunk.  Here it is again because of the missing %:

2018-11-09  Jakub Jelinek  

* affinity-fmt.c: Include inttypes.h if HAVE_INTTYPES_H.
(gomp_display_affinity): Use __builtin_choose_expr to handle
properly handle argument having integral, or pointer or some other
type.  If inttypes.h is available and PRIx64 is defined, use PRIx64
with uint64_t type instead of %llx and unsigned long long.

--- libgomp/affinity-fmt.c.jj   2018-11-08 18:08:01.412987460 +0100
+++ libgomp/affinity-fmt.c  2018-11-09 15:24:52.049169494 +0100
@@ -30,6 +30,9 @@
 #ifdef HAVE_UNISTD_H
 #include 
 #endif
+#ifdef HAVE_INTTYPES_H
+# include   /* For PRIx64.  */
+#endif
 #ifdef HAVE_UNAME
 #include 
 #endif
@@ -356,37 +359,42 @@ gomp_display_affinity (char *buffer, siz
  goto do_int;
case 'i':
 #if defined(LIBGOMP_USE_PTHREADS) && defined(__GNUC__)
- /* Handle integral pthread_t.  */
- if (__builtin_classify_type (handle) == 1)
-   {
- char buf[3 * (sizeof (handle) + sizeof (int)) + 4];
-
- if (sizeof (handle) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) handle);
- else if (sizeof (handle) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) handle);
- else
-   sprintf (buf, "0x%x", (int) handle);
- gomp_display_num (buffer, size, , zero, right, sz, buf);
- break;
-   }
- /* And pointer pthread_t.  */
- else if (__builtin_classify_type (handle) == 5)
-   {
- char buf[3 * (sizeof (uintptr_t) + sizeof (int)) + 4];
-
- if (sizeof (uintptr_t) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) (uintptr_t) handle);
- else if (sizeof (uintptr_t) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) (uintptr_t) handle);
- else
-   sprintf (buf, "0x%x", (int) (uintptr_t) handle);
- gomp_display_num (buffer, size, , zero, right, sz, buf);
- break;
-   }
+ {
+   char buf[3 * (sizeof (handle) + sizeof (uintptr_t) + sizeof (int))
++ 4];
+   /* This macro returns expr unmodified for integral or pointer
+  types and 0 for anything else (e.g. aggregates).  */
+#define gomp_nonaggregate(expr) \
+  __builtin_choose_expr (__builtin_classify_type (expr) == 1   \
+|| __builtin_classify_type (expr) == 5, expr, 0)
+   /* This macro returns expr unmodified for integral types,
+  (uintptr_t) (expr) for pointer types and 0 for anything else
+  (e.g. aggregates).  */
+#define gomp_integral(expr) \
+  __builtin_choose_expr (__builtin_classify_type (expr) == 5,  \
+(uintptr_t) gomp_nonaggregate (expr),  \
+gomp_nonaggregate (expr))
+
+   if (sizeof (gomp_integral (handle)) == sizeof (unsigned long))
+ sprintf (buf, "0x%lx", (unsigned long) gomp_integral (handle));
+#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
+   else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
+ sprintf (buf, "0x%" PRIx64, (uint64_t) gomp_integral (handle));
+#else
+   else if (sizeof (gomp_integral (handle))
+== sizeof (unsigned long long))
+ sprintf (buf, "0x%llx",
+  (unsigned long long) gomp_integral (handle));
 #endif
+   else
+ sprintf (buf, "0x%x", (unsigned int) gomp_integral (handle));
+   gomp_display_num (buffer, size, , zero, right, sz, buf);
+   break;
+ }
+#else
  val = 0;
  goto do_int;
+#endif
case 'A':
  if (sz == (size_t) -1)
gomp_display_affinity_place (buffer, size, ,


Jakub


Re: [PATCH v5] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ulrich Weigand
Ilya Leoshkevich wrote:
 
>   PR target/87762
>   * config/s390/s390.c (s390_safe_relative_long_p): New function.
>   (annotate_constant_pool_refs): Skip insns which support
>   relative addressing.
>   (annotate_constant_pool_refs_1): New helper function.
>   (find_constant_pool_ref): Skip insns which support relative
>   addression.
>   (find_constant_pool_ref_1): New helper function.
>   (replace_constant_pool_ref): Skip insns which support
>   relative addressing.
>   (replace_constant_pool_ref_1): New helper function.
>   (s390_mainpool_start): Adapt to the new signature.
>   (s390_mainpool_finish): Likewise.
>   (s390_chunkify_start): Likewise.
>   (s390_chunkify_finish): Likewise.
>   (pass_s390_early_mach::execute): Likewise.
>   (s390_prologue_plus_offset): Likewise.
>   (s390_emit_prologue): Likewise.
>   (s390_emit_epilogue): Likewise.

This version is OK.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH v5] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ilya Leoshkevich
Bootstrapped and regtested on s390x-redhat-linux.

r265490 allowed the compiler to choose in a more flexible way whether to
use load or load-address-relative-long (LARL) instruction.  When it
chose LARL for literal pool references, the latter ones were rewritten
by pass_s390_early_mach to use UNSPEC_LTREF, which assumes base register
usage, which in turn is not compatible with LARL.  The end result was an
ICE because of unrecognizable insn.

UNSPEC_LTREF and friends are necessary in order to communicate the
dependency on the base register to pass_sched2.  When relative
addressing is used, no base register is necessary, so in such cases the
rewrite must be avoided.

gcc/ChangeLog:

2018-11-09  Ilya Leoshkevich  

PR target/87762
* config/s390/s390.c (s390_safe_relative_long_p): New function.
(annotate_constant_pool_refs): Skip insns which support
relative addressing.
(annotate_constant_pool_refs_1): New helper function.
(find_constant_pool_ref): Skip insns which support relative
addression.
(find_constant_pool_ref_1): New helper function.
(replace_constant_pool_ref): Skip insns which support
relative addressing.
(replace_constant_pool_ref_1): New helper function.
(s390_mainpool_start): Adapt to the new signature.
(s390_mainpool_finish): Likewise.
(s390_chunkify_start): Likewise.
(s390_chunkify_finish): Likewise.
(pass_s390_early_mach::execute): Likewise.
(s390_prologue_plus_offset): Likewise.
(s390_emit_prologue): Likewise.
(s390_emit_epilogue): Likewise.
---
 gcc/config/s390/s390.c | 115 -
 1 file changed, 80 insertions(+), 35 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 1be85727b73..484eba55aa9 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2731,6 +2731,17 @@ s390_safe_attr_type (rtx_insn *insn)
 return TYPE_NONE;
 }
 
+/* Return attribute relative_long of insn.  */
+
+static bool
+s390_safe_relative_long_p (rtx_insn *insn)
+{
+  if (recog_memoized (insn) >= 0)
+return get_attr_relative_long (insn) == RELATIVE_LONG_YES;
+  else
+return false;
+}
+
 /* Return true if DISP is a valid short displacement.  */
 
 static bool
@@ -8116,11 +8127,8 @@ s390_first_cycle_multipass_dfa_lookahead (void)
   return 4;
 }
 
-/* Annotate every literal pool reference in X by an UNSPEC_LTREF expression.
-   Fix up MEMs as required.  */
-
 static void
-annotate_constant_pool_refs (rtx *x)
+annotate_constant_pool_refs_1 (rtx *x)
 {
   int i, j;
   const char *fmt;
@@ -8199,26 +8207,31 @@ annotate_constant_pool_refs (rtx *x)
 {
   if (fmt[i] == 'e')
{
- annotate_constant_pool_refs ( (*x, i));
+ annotate_constant_pool_refs_1 ( (*x, i));
}
   else if (fmt[i] == 'E')
{
  for (j = 0; j < XVECLEN (*x, i); j++)
-   annotate_constant_pool_refs ( (*x, i, j));
+   annotate_constant_pool_refs_1 ( (*x, i, j));
}
 }
 }
 
-/* Find an annotated literal pool symbol referenced in RTX X,
-   and store it at REF.  Will abort if X contains references to
-   more than one such pool symbol; multiple references to the same
-   symbol are allowed, however.
+/* Annotate every literal pool reference in INSN by an UNSPEC_LTREF expression.
+   Fix up MEMs as required.
+   Skip insns which support relative addressing, because they do not use a base
+   register.  */
 
-   The rtx pointed to by REF must be initialized to NULL_RTX
-   by the caller before calling this routine.  */
+static void
+annotate_constant_pool_refs (rtx_insn *insn)
+{
+  if (s390_safe_relative_long_p (insn))
+return;
+  annotate_constant_pool_refs_1 ( (insn));
+}
 
 static void
-find_constant_pool_ref (rtx x, rtx *ref)
+find_constant_pool_ref_1 (rtx x, rtx *ref)
 {
   int i, j;
   const char *fmt;
@@ -8250,21 +8263,37 @@ find_constant_pool_ref (rtx x, rtx *ref)
 {
   if (fmt[i] == 'e')
{
- find_constant_pool_ref (XEXP (x, i), ref);
+ find_constant_pool_ref_1 (XEXP (x, i), ref);
}
   else if (fmt[i] == 'E')
{
  for (j = 0; j < XVECLEN (x, i); j++)
-   find_constant_pool_ref (XVECEXP (x, i, j), ref);
+   find_constant_pool_ref_1 (XVECEXP (x, i, j), ref);
}
 }
 }
 
-/* Replace every reference to the annotated literal pool
-   symbol REF in X by its base plus OFFSET.  */
+/* Find an annotated literal pool symbol referenced in INSN,
+   and store it at REF.  Will abort if INSN contains references to
+   more than one such pool symbol; multiple references to the same
+   symbol are allowed, however.
+
+   The rtx pointed to by REF must be initialized to NULL_RTX
+   by the caller before calling this routine.
+
+   Skip insns which support relative addressing, because they do not use a base
+   register.  */
+
+static void
+find_constant_pool_ref (rtx_insn 

[committed] Fix minor bugs in loongsoon_ext2_prefetch_cookie

2018-11-09 Thread Jeff Law

The recent change to mips_loongson_ext2_prefetch_cookie had a couple
minor bugs which showed up in a mips bootstrap.

First it has an unused argument "locality".  Assuming the argument is
mandated by an API/ABI, the easiest way to shut up the compiler for the
unused argument is to drop its name from the function signature, which
is precisely what this patch does.

Second, the compiler thinks we can drop off the end of the function
without returning a value.  Based on my reading we're only going to get
a load or store here, so that can never happen in practice.  So I've put
a gcc_unreachable at the end of the function.

This silences both warnings.  My tester has bootstrapped mipsisa32r2 as
well as built glibc and a linux kernel with this patch.

Installing on the trunk momentarily.

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e0225c58ffb..b7f3b71d897 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2018-11-09  Jeff Law  
+
+   * config/mips/mips.c (mips_loongson_ext2_prefetch_cookie): Handle
+   unused argument better.  Add gcc_unreachable to silence warning.
+
 2018-11-09  Martin Sebor  
 
PR middle-end/81824
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index d78e2056ec2..17a2a66956e 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -15155,7 +15155,7 @@ mips_prefetch_cookie (rtx write, rtx locality)
(prefetch for store), other hint just scale to hint = 0 and hint = 1.  */
 
 rtx
-mips_loongson_ext2_prefetch_cookie (rtx write, rtx locality)
+mips_loongson_ext2_prefetch_cookie (rtx write, rtx)
 {
   /* store.  */
   if (INTVAL (write) == 1)
@@ -15164,6 +15164,8 @@ mips_loongson_ext2_prefetch_cookie (rtx write, rtx 
locality)
   /* load.  */
   if (INTVAL (write) == 0)
 return GEN_INT (INTVAL (write));
+
+  gcc_unreachable ();
 }
 
 


Re: Relocation (= move+destroy)

2018-11-09 Thread Jonathan Wakely

On 25/10/18 13:36 +0100, Jonathan Wakely wrote:

On 25/10/18 14:30 +0200, Marc Glisse wrote:

On Tue, 23 Oct 2018, Jonathan Wakely wrote:


+  template
+inline void
+__relocate_a(_Tp* __dest, _Up* __orig, _Allocator& __alloc)


I find it a little surprising that this overload for single objects
using the memmove argument ordering (dest, source) but the range overload
below uses the STL ordering (source_begin, source_end, dest).

But I wouldn't be surprised if we're already doing that somewhere that
I've forgotten about.

WOuld it make sense to either rename this overload, or to use
consistent argument ordering for the two __relocate_a overloads?


The functions were not meant as overloads, it just happened that 
I arrived at the same name for both, but it would make perfect 
sense to give them different names. I started from 
__relocate(dest, source) for one element, and later added an 
allocator to it. The other one corresponds to 
__uninitialized_move_a, and naming it __uninitialized_relocate_a 
would be silly since "uninitialized" is included in the 
definition of relocate.


Yes, my first thought was to add "uninitialized" and I rejected it for
that same reason.

I think I'd rather rename than change the order. Do you have 
suggestions? __relocate_range_a?


I was thinking the single object one could be __relocate_1_a with the
_1 being like copy_n, fill_n etc. but that's going to be confusing
with __relocate_a_1 instead!

It seems unfortunate to have to put "range" in the name when no other
algos that work on ranges bother to say that in the name.

Maybe the single object one could be __relocate_single_a? Or
__do_relocate_a? __relocate_obj_a? None of them really makes me happy.


I went with __relocate_object_a, but that's easy to change.


I'm OK with that. If anybody thinks of a better name we can change it.

We may want to specialize / overload __relocate_object_a for some 
specific types in the future, although we would more likely have 
__relocate_object_a call __relocate_object (no allocator) when it 
receives the default allocator, and specialize that one. And this 
should anyway be less common that specializing 
__is_trivially_relocatable.


I just realized that __reallocate_a and construct don't take the 
allocator argument in the same position... I can switch if it helps.


No, leaving it at the end is consistent with __uninitialized_copy_a
et al, and it's closer in spirit to them.

OK for trunk, thanks!


Here's the fix for the regression this introduced.

Tested x86_64-linux, committed to trunk.


commit caf2adcb563d7dd62c3de7bb49f528753f958ba3
Author: Jonathan Wakely 
Date:   Fri Nov 9 19:19:16 2018 +

PR libstdc++/87787 fix UBsan error in std::vector

PR libstdc++/87787
* include/bits/stl_uninitialized.h (__relocate_a_1): Do not call
memmove when there's nothing to copy (and pointers could be null).

diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h b/libstdc++-v3/include/bits/stl_uninitialized.h
index 94c7e151e29..8839bfdcc90 100644
--- a/libstdc++-v3/include/bits/stl_uninitialized.h
+++ b/libstdc++-v3/include/bits/stl_uninitialized.h
@@ -904,7 +904,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		   _Tp* __result, allocator<_Up>& __alloc)
 {
   ptrdiff_t __count = __last - __first;
-  __builtin_memmove(__result, __first, __count * sizeof(_Tp));
+  if (__count > 0)
+	__builtin_memmove(__result, __first, __count * sizeof(_Tp));
   return __result + __count;
 }
 


Re: [RFC] support --with-multilib-list=@/path/name

2018-11-09 Thread Palmer Dabbelt

On Fri, 09 Nov 2018 07:07:06 PST (-0800), ol...@adacore.com wrote:

Richard,

Olivier tells me he talked to you briefly at the Cauldron about allowing
custom multilib sets to be configured from custom Makefile fragments
supplied at configure time, and that you was somewhat receptive to the
idea.  I have implemented this as follows, for ARM targets only so far,
using "@/path/name" in the argument to --with-multilib-list to mean
/path/name is to be used as a multilib-list Makefile fragment.

Does this look like an acceptable interface?  Is it ok as far as ARM
maintainers are concerned?

Any objections from maintainers of other targets that support
--with-multilib-list (or from anyone else) to the syntax, to the
implementation, or to extending this possibility to their ports?


We've had requests for this feature from users of the RISC-V port, but haven't 
had time to look into it.  One wrinkle in RISC-V land is that we have a script 
in GCC that generates our fragments, so I'm not sure if this is exactly what we 
want.  It's certainly better than our current "just patch GCC" flow, though!



for  gcc/ChangeLog

* config.gcc (tmake_file): Add /path/name to tmake_file for
each @/path/name in --with-multilib-list on arm-*-* targets.
* configure.ac: Accept full pathnames in tmake_file.
* configure: Rebuilt.
* doc/install.texi (with-multilib-list): Document it.
---
 gcc/config.gcc   |   13 +
 gcc/configure|9 ++---
 gcc/configure.ac |5 -
 gcc/doc/install.texi |   22 +++---
 4 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7578ff03825e..f1363c41f989 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3998,6 +3998,19 @@ case "${target}" in
aprofile|rmprofile)

tmake_profile_file="arm/t-multilib"
;;
+   @/*)
+   ml=`echo "X$arm_multilib" | sed 
'1s,^X@,,'`
+   if test -f "${ml}"; then
+   tmake_file="${tmake_file} 
${ml}"
+   else
+   echo "Error: ${ml} does not 
exist" >&2
+   exit 1
+   fi
+   ;;
+   @*)
+   echo "Error: multilib config file must 
start with /" >&2
+   exit 1
+   ;;
*)
echo "Error: 
--with-multilib-list=${with_multilib_list} not supported." 1>&2
exit 1
diff --git a/gcc/configure b/gcc/configure
index b814484ea25b..5f15c7a1ff02 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -12244,7 +12244,10 @@ done
 tmake_file_=
 for f in ${tmake_file}
 do
-   if test -f ${srcdir}/config/$f
+   if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
+   then
+   tmake_file_="${tmake_file_} $f"
+   elif test -f ${srcdir}/config/$f
then
tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
fi
@@ -18572,7 +18575,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18575 "configure"
+#line 18578 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H

@@ -18678,7 +18681,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18681 "configure"
+#line 18684 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 59585912556b..99a3e6f8f52f 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1940,7 +1940,10 @@ done
 tmake_file_=
 for f in ${tmake_file}
 do
-   if test -f ${srcdir}/config/$f
+   if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
+   then
+   tmake_file_="${tmake_file_} $f"
+   elif test -f ${srcdir}/config/$f
then
tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
fi
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index be9b07b5d23b..fd19fc590ec8 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1078,13 +1078,21 @@ values and meaning for each target is given below.
 
 @table @code

 @item arm*-*-*
-@var{list} is a comma separated list of @code{aprofile} and @code{rmprofile}
-to build multilibs for A or R and M architecture profiles 

[committed] Minor testsuite cleanup from summer's v850 work

2018-11-09 Thread Jeff Law

I've had this lying around.  Recent changes in that file reminded me to
push it.

Essentially the v850 pushes arguments for variadic functions rather than
passing them in args, similarly to the nds32.  So we need to guard the
test in a similar manner.

Committed to the trunk.

Jeff
commit cc9a0d23a784738d5121c8eb9f097acb0cf476bf
Author: law 
Date:   Fri Nov 9 20:08:20 2018 +

gcc.dg/torture/stackalign/builtin-apply-2.c: Skip on v850.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@265982 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 1556e8a7d19..285d0e1384c 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2018-11-09  Jeff Law  
+
+   gcc.dg/torture/stackalign/builtin-apply-2.c: Skip on v850.
+
 2018-11-09  Martin Sebor  
 
PR middle-end/81824
diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c 
b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
index 3f8d350ba8f..dde2600c4d1 100644
--- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
@@ -10,7 +10,7 @@
avr: Variadic funcs don't pass arguments in registers, while normal funcs
 do.  */
 /* { dg-skip-if "Variadic funcs use different argument passing from normal 
funcs" { arm_hf_eabi || { avr-*-* riscv*-*-* or1k*-*-* } } } */
-/* { dg-skip-if "Variadic funcs have all args on stack. Normal funcs have args 
in registers." { nds32*-*-* } } */
+/* { dg-skip-if "Variadic funcs have all args on stack. Normal funcs have args 
in registers." { nds32*-*-* } { v850*-*-* } } */
 /* { dg-require-effective-target untyped_assembly } */

 


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-11-09 Thread Jeff Law
On 11/8/18 6:33 AM, Wilco Dijkstra wrote:
> Hi,
> 
>> But the max. error in sinh/cosh/atanh is less than 2 ULP, with some math
>> libraries.  It could be < 1 ULP, in theory, so sinh(atanh(x)) less than
>> 2 ULP even.
> 
> You can't add ULP errors in general - a tiny difference in the input can 
> make a huge difference in the result if the derivative is > 1. 
> 
> Even with perfect implementations of 0.501ULP on easy functions with
> no large derivatives you could get a 2ULP total error if the perfectly rounded
> and actual result end up rounding in different directions in the 2nd 
> function...
> 
> So you have to measure ULP error since it is quite counter intuitive.
> 
>> And signed zeroes.  Yeah.  I think it would have to be
>> flag_unsafe_math_optimizations + some more.
> 
> Indeed.
So we need to give Giuliano some clear guidance on guarding.  This is
out of my area of expertise, so looking to y'all to help here.

jeff


Re: [PATCH] Fix PR 86572

2018-11-09 Thread Jeff Law
On 11/5/18 4:20 PM, Bernd Edlinger wrote:
> On 11/5/18 1:28 AM, H.J. Lu wrote:
>> On Sun, Nov 4, 2018 at 10:02 AM Jeff Law  wrote:
>>> On 10/22/18 9:08 AM, Bernd Edlinger wrote:
 Hi!

 This makes c_strlen avoid an unsafe strlen folding of const arguments
 with non-const offset.  Currently a negative out of bounds offset
 makes the strlen function return an extremely large number, and
 at the same time, prevents the VRP machinery, to determine the correct
 range if the strlen function in this case.

 Fixed by doing the whole computation in size_t and casting the
 result back to ssize_t.


 Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
 Is it OK for trunk?


 Thanks
 Bernd.


 patch-pr86572.diff

 gcc:
 2018-10-22  Bernd Edlinger  

PR tree-optimization/86572
* builtins.c (c_strlen): Handle negative offsets in a safe way.

 testsuite:
 2018-10-22  Bernd Edlinger  

PR tree-optimization/86572
* gcc.dg/pr86572.c: New test.
>>> OK.
>>> jeff
>> This caused:
>>
>> /export/gnu/import/git/gcc-test-ia32/src-trunk/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c:56:1:
>> internal compiler error: verify_gimple failed^M
>> 0x8922dc4 verify_gimple_in_seq(gimple*)^M
>>  ../../src-trunk/gcc/tree-cfg.c:5082^M
>> 0x86899d7 gimplify_body(tree_node*, bool)^M
>>  ../../src-trunk/gcc/gimplify.c:12859^M
>> 0x8689b8b gimplify_function_tree(tree_node*)^M
>>  ../../src-trunk/gcc/gimplify.c:12949^M
>> 0x84f7690 cgraph_node::analyze()^M
>>  ../../src-trunk/gcc/cgraphunit.c:667^M
>> 0x84fa1d8 analyze_functions^M
>>  ../../src-trunk/gcc/cgraphunit.c:1126^M
>> 0x84fadd3 symbol_table::finalize_compilation_unit()^M
>>  ../../src-trunk/gcc/cgraphunit.c:2833^M
>> Please submit a full bug report,^M
>> with preprocessed source if appropriate.^M
>> Please include the complete backtrace with any bug report.^M
>> See  for instructions.^M
>> compiler exited with status 1
>> FAIL: gcc.dg/warn-strlen-no-nul.c (internal compiler error)
>>
>> on i386.
>>
> Ah yes thanks.
> 
> This is caused by an incorrect folding in string_constant.
> After stripping the type casts in the POINTER_PLUS_EXPR
> we add the offset which is sizetype to what is left
> over from arg1, which is probably even a correctness issue,
> if the type cast was a narrowing one.
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu (this time
> tested with RUNTESTFLAGS="--target_board=unix\{-m32,\}")
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 
> 
> 
> patch-fix-string-cst.diff
> 
> 2018-11-05  Bernd Edlinger  
> 
>   * expr.c (string_constant): Don't strip NOPS in subexpressions.
>   Fold PLUS_EXPR correctly.
OK.
jeff


Re: [PATCH] add simple attribute introspection

2018-11-09 Thread Jeff Law
On 10/31/18 10:27 AM, Martin Sebor wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01473.html
> 
> With the C++ bits approved I'm still looking for a review/approval
> of the remaining changes: the C front end and the shared c-family
> handling of the new built-in.
I thought I acked those with just a couple minor doc nits:

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 8ffb0cd..dcf4747 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -2649,8 +2649,9 @@ explicit @code{externally_visible} attributes
are still necessary.
>  @cindex @code{flatten} function attribute
>  Generally, inlining into a function is limited.  For a function
marked with
>  this attribute, every call inside this function is inlined, if possible.
> -Whether the function itself is considered for inlining depends on its
size and
> -the current inlining parameters.
> +Functions declared with attribute @code{noinline} and similar are not
> +inlined.  Whether the function itself is considered for inlining depends
> +on its size and the current inlining parameters.
Guessing this was from another doc patch that I think has already been
approved


> @@ -11726,6 +11728,33 @@ check its compatibility with @var{size}.
>
>  @end deftypefn
>
> +@deftypefn {Built-in Function} bool __builtin_has_attribute
(@var{type-or-expression}, @var{attribute})
> +The @code{__builtin_has_attribute} function evaluates to an integer
constant
> +expression equal to @code{true} if the symbol or type referenced by
> +the @var{type-or-expression} argument has been declared with
> +the @var{attribute} referenced by the second argument.  Neither argument
> +is valuated.  The @var{type-or-expression} argument is subject to the
same
s/valuated/evaluated/ ?


Did the implementation change significantly requiring another review
iteration?

jeff


Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-09 Thread Jeff Law
On 9/5/18 7:42 AM, Andrew Stubbs wrote:
> This part initially failed to send due to size.
> 
> Here's part 2.
> 
> 0021-gcn-port-pt2.patch
[ ... ]
You've already addressed Joseph's comments in a follow-up.


> 
> diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
> new file mode 100644
> index 000..7e59b06
> --- /dev/null
> +++ b/gcc/config/gcn/gcn.c
> @@ -0,0 +1,6161 @@
> +/* Copyright (C) 2016-2018 Free Software Foundation, Inc.
> +
> +   This file is free software; you can redistribute it and/or modify it under
> +   the terms of the GNU General Public License as published by the Free
> +   Software Foundation; either version 3 of the License, or (at your option)
> +   any later version.
> +
> +   This file is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +   for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +/* {{{ Includes.  */
> +
> +/* We want GET_MODE_SIZE et al to return integers, please.  */

> +
> +static tree
> +gcn_handle_amdgpu_hsa_kernel_attribute (tree *node, tree name,
> + tree args, int, bool *no_add_attrs)
> +{
> +  if (TREE_CODE (*node) != FUNCTION_TYPE
> +  && TREE_CODE (*node) != METHOD_TYPE
> +  && TREE_CODE (*node) != METHOD_TYPE
> +  && TREE_CODE (*node) != FIELD_DECL
> +  && TREE_CODE (*node) != TYPE_DECL)
METHOD_TYPE tested twice here.  Might as well use FUNC_OR_METHOD_TYPE_P.


>> +
> +/* Return true is REG is a valid place to store a pointer,
> +   for instructions that require an SGPR.
> +   FIXME rename. */
> +
> +static bool
> +gcn_address_register_p (rtx reg, machine_mode mode, bool strict)
> +{
> +  if (GET_CODE (reg) == SUBREG)
> +reg = SUBREG_REG (reg);
> +
> +  if (!REG_P (reg))
> +return false;
> +
> +  if (GET_MODE (reg) != mode)
> +return false;
> +
> +  int regno = REGNO (reg);
> +
> +  if (regno >= FIRST_PSEUDO_REGISTER)
> +{
> +  if (!strict)
> + return true;
> +
> +  if (!reg_renumber)
> + return false;
> +
> +  regno = reg_renumber[regno];
> +}
> +
> +  return (regno < 102 || regno == M0_REG
> +   || regno == ARG_POINTER_REGNUM || regno == FRAME_POINTER_REGNUM);
Consider using a symbolic name for "102" :-)



> +
> +/* Generate epilogue.  Called from gen_epilogue during pro_and_epilogue pass.
> +
> +   See gcn_expand_prologue for stack details.  */
> +
> +void
> +gcn_expand_epilogue (void)
You probably need a barrier in here to ensure that the scheduler doesn't
move an aliased memory reference into the local stack beyond the stack
adjustment.

You're less likely to run into it because you eliminate frame pointers
fairly aggressively, but it's still the right thing to do.

> +
> +/* Implement TARGET_LEGITIMATE_COMBINED_INSN.
> +
> +   Return false if the instruction is not appropriate as a combination of two
> +   or more instructions.  */
> +
> +bool
> +gcn_legitimate_combined_insn (rtx_insn *insn)
> +{
> +  rtx pat = PATTERN (insn);
> +
> +  /* The combine pass tends to strip (use (exec)) patterns from insns.  This
> + means it basically switches everything to use the *_scalar form of the
> + instructions, which is not helpful.  So, this function disallows such
> + combinations.  Unfortunately, this also disallows combinations of 
> genuine
> + scalar-only patterns, but those only come from explicit expand code.
> +
> + Possible solutions:
> + - Invent TARGET_LEGITIMIZE_COMBINED_INSN.
> + - Remove all (use (EXEC)) and rely on md_reorg with "exec" attribute.
> +   */
This seems a bit hokey.  Why specifically is combine removing the USE?



> +
> +/* If INSN sets the EXEC register to a constant value, return the value,
> +   otherwise return zero.  */
> +
> +static
> +int64_t gcn_insn_exec_value (rtx_insn *insn)
Nit.  Make sure the function's name is in column 0 by moving the return
type to the previous line.


> +{
> +  if (!NONDEBUG_INSN_P (insn))
> +return 0;
> +
> +  rtx pattern = PATTERN (insn);
> +
> +  if (GET_CODE (pattern) == SET)
> +{
> +  rtx dest = XEXP (pattern, 0);
> +  rtx src = XEXP (pattern, 1);
> +
> +  if (GET_MODE (dest) == DImode
> +   && REG_P (dest) && REGNO (dest) == EXEC_REG
> +   && CONST_INT_P (src))
> + return INTVAL (src);
> +}
> +
> +  return 0;
> +}
> +
> +/* Sets the EXEC register before INSN to the value that it had after
> +   LAST_EXEC_DEF.  The constant value of the EXEC register is returned if
> +   known, otherwise it returns zero.  */
> +
> +static
> +int64_t gcn_restore_exec (rtx_insn *insn, rtx_insn *last_exec_def,
> +   int64_t curr_exec, bool curr_exec_known,
> +   bool _exec_def_saved)
Similarly.  

[PATCH v2] MIPS: Add `-mfix-r5900' option for the R5900 short loop erratum

2018-11-09 Thread Fredrik Noring
The short loop bug under certain conditions causes loops to
execute only once or twice, due to a hardware bug in the R5900 chip.

`-march=r5900' already enables the R5900 short loop workaround.
However, the R5900 ISA and most other MIPS ISAs are mutually
exclusive since R5900-specific instructions are generated as well.

The `-mfix-r5900' option can be used in combination with e.g.
`-mips2' or `-mips3' to generate generic MIPS binaries that also
work with the R5900 target.  The workaround is implemented by GAS
rather than by GCC.

The following small `shortloop.c' file has been used as a test
with GCC 8.2.0:

void shortloop(void)
{
__asm__ __volatile__ (
"   li $3, 300\n"
"loop:\n"
"   addi $3, -1\n"
"   addi $4, -1\n"
"   bne $3, $0, loop\n"
"   li $4, 3\n"
::);
}

The following six combinations have been tested:

% mipsr5900el-unknown-linux-gnu-gcc -O1 -c shortloop.c
% mipsr5900el-unknown-linux-gnu-gcc -O1 -c shortloop.c -mfix-r5900
% mipsr5900el-unknown-linux-gnu-gcc -O1 -c shortloop.c -mno-fix-r5900

% mipsr4000el-unknown-linux-gnu-gcc -O1 -c shortloop.c
% mipsr4000el-unknown-linux-gnu-gcc -O1 -c shortloop.c -mfix-r5900
% mipsr4000el-unknown-linux-gnu-gcc -O1 -c shortloop.c -mno-fix-r5900

The R5900 short loop erratum is corrected in exactly three cases:

1. for the target `mipsr5900el' by default;

2. for the target `mipsr5900el' with `-mfix-r5900';

3. for any other MIPS target (e.g. `mipsr4000el') with `-mfix-r5900'.

In all other cases the correction is not made.

gcc/
* config/mips/mips.c (mips_reorg_process_insns)
(mips_option_override): Handle `-mfix-r5900'.
* config/mips/mips.h (ASM_SPEC): Add `mfix-r5900' and
`mno-fix-r5900'.
* config/mips/mips.opt (mfix-r5900): New option.
* doc/invoke.texi: Document the `r5900' processor name, and
`-mfix-r5900' and `-mno-fix-r5900' options.
---
Thanks a lot for your review, Maciej!

Changes in v2:
- ChangeLog entry corrections
- Rebase to origin/trunk
---
 gcc/config/mips/mips.c   | 14 ++
 gcc/config/mips/mips.h   |  1 +
 gcc/config/mips/mips.opt |  4 
 gcc/doc/invoke.texi  | 14 +-
 4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index d78e2056ec2..04de85a3ba2 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -18906,13 +18906,13 @@ mips_reorg_process_insns (void)
   if (crtl->profile)
 cfun->machine->all_noreorder_p = false;
 
-  /* Code compiled with -mfix-vr4120, -mfix-rm7000 or -mfix-24k can't be
- all noreorder because we rely on the assembler to work around some
- errata.  The R5900 too has several bugs.  */
+  /* Code compiled with -mfix-vr4120, -mfix-r5900, -mfix-rm7000 or
+ -mfix-24k can't be all noreorder because we rely on the assembler
+ to work around some errata.  The R5900 target has several bugs.  */
   if (TARGET_FIX_VR4120
   || TARGET_FIX_RM7000
   || TARGET_FIX_24K
-  || TARGET_MIPS5900)
+  || TARGET_FIX_R5900)
 cfun->machine->all_noreorder_p = false;
 
   /* The same is true for -mfix-vr4130 if we might generate MFLO or
@@ -20287,6 +20287,12 @@ mips_option_override (void)
   && strcmp (mips_arch_info->name, "r4400") == 0)
 target_flags |= MASK_FIX_R4400;
 
+  /* Default to working around R5900 errata only if the processor
+ was selected explicitly.  */
+  if ((target_flags_explicit & MASK_FIX_R5900) == 0
+  && strcmp (mips_arch_info->name, "r5900") == 0)
+target_flags |= MASK_FIX_R5900;
+
   /* Default to working around R1 errata only if the processor
  was selected explicitly.  */
   if ((target_flags_explicit & MASK_FIX_R1) == 0
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 11ca364d752..d2205f08972 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1411,6 +1411,7 @@ struct mips_cpu_info {
 %{mloongson-ext2} %{mno-loongson-ext2} \
 %{msmartmips} %{mno-smartmips} \
 %{mmt} %{mno-mt} \
+%{mfix-r5900} %{mno-fix-r5900} \
 %{mfix-rm7000} %{mno-fix-rm7000} \
 %{mfix-vr4120} %{mfix-vr4130} \
 %{mfix-24k} \
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 16c33d12e22..75332100435 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -165,6 +165,10 @@ mfix-r4400
 Target Report Mask(FIX_R4400)
 Work around certain R4400 errata.
 
+mfix-r5900
+Target Report Mask(FIX_R5900)
+Work around the R5900 short loop erratum.
+
 mfix-rm7000
 Target Report Var(TARGET_FIX_RM7000)
 Work around certain RM7000 errata.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e071f125efe..44ecc241fd7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -945,6 +945,7 @@ Objective-C and Objective-C++ Dialects}.
 -mmad  -mno-mad  -mimadd  -mno-imadd  -mfused-madd  -mno-fused-madd  -nocpp 
@gol
 -mfix-24k  -mno-fix-24k @gol
 -mfix-r4000  -mno-fix-r4000  

Re: [PATCH 21/25] GCN Back-end (part 1/2).

2018-11-09 Thread Jeff Law
On 9/5/18 7:40 AM, Andrew Stubbs wrote:
> This part initially failed to send due to size.
> 
> This is the main portion of the GCN back-end, plus the configuration
> adjustments needed to build it.
> 
> The config.sub patch is here so people can try it, but I'm aware that
> needs to
> be committed elsewhere first.
> 
> The back-end contains various bits that support OpenACC and OpenMP, but the
> middle-end and libgomp patches are missing.  I included them here because
> they're harmless and carving up the files seems like unnecessary effort.
>  The
> remaining offload support will be posted at a later date.
> 
> The gcn-run.c is a separate tool that can run a GCN program on a GPU using
> the ROCm drivers and HSA runtime libraries.
> 
> 2018-09-05  Andrew Stubbs  
>> ...Kwok Cheung Yeung  
>> ...Julian Brown  
>> ...Tom de Vries  
>> ...Jan Hubicka  
>> ...Martin Jambor  
> 
>> ...* config.sub: Recognize amdgcn*-*-amdhsa.
>> ...* configure.ac: Likewise.
>> ...* configure: Regenerate.
> 
>> ...gcc/
>> ...* common/config/gcn/gcn-common.c: New file.
>> ...* config.gcc: Add amdgcn*-*-amdhsa configuration.
>> ...* config/gcn/constraints.md: New file.
>> ...* config/gcn/driver-gcn.c: New file.
>> ...* config/gcn/gcn-builtins.def: New file.
>> ...* config/gcn/gcn-hsa.h: New file.
>> ...* config/gcn/gcn-modes.def: New file.
>> ...* config/gcn/gcn-opts.h: New file.
>> ...* config/gcn/gcn-passes.def: New file.
>> ...* config/gcn/gcn-protos.h: New file.
>> ...* config/gcn/gcn-run.c: New file.
>> ...* config/gcn/gcn-tree.c: New file.
>> ...* config/gcn/gcn-valu.md: New file.
>> ...* config/gcn/gcn.c: New file.
>> ...* config/gcn/gcn.h: New file.
>> ...* config/gcn/gcn.md: New file.
>> ...* config/gcn/gcn.opt: New file.
>> ...* config/gcn/mkoffload.c: New file.
>> ...* config/gcn/offload.h: New file.
>> ...* config/gcn/predicates.md: New file.
>> ...* config/gcn/t-gcn-hsa: New file.
> 
> 0021-gcn-port-pt1.patch
> 

> +amdgcn-*-amdhsa)
> + tm_file="dbxelf.h elfos.h gcn/gcn-hsa.h gcn/gcn.h newlib-stdint.h"
Please consider killing dbxelf.h :-)  I assume your default debugging
format is dwarf2, but do you really need to support embedded stabs?



> +
> +/* FIXME: review debug info settings */
> +#define PREFERRED_DEBUGGING_TYPE   DWARF2_DEBUG
> +#define DWARF2_DEBUGGING_INFO  1
> +#define DWARF2_ASM_LINE_DEBUG_INFO 1
> +#define EH_FRAME_THROUGH_COLLECT2  1
These look reasonable.  Essentially you're doing dwarf2 by default.
Maybe just look at EH_FRAME_THROUGH_COLLECT2 more closely to make sure
it still makes sense and isn't a remnant of early port hackery to get
things stumbling along.


> diff --git a/gcc/config/gcn/gcn-run.c b/gcc/config/gcn/gcn-run.c
> new file mode 100644
> index 000..3dea343
> --- /dev/null
> +++ b/gcc/config/gcn/gcn-run.c
I'm going to assume this is largely correct.  It looks like all the glue
code to run kernels on the unit.  It loads the code to be run AFACIT, so
it doesn't need an exception clause as it's not linked against the code
that is to be run IIUC.



> diff --git a/gcc/config/gcn/gcn-tree.c b/gcc/config/gcn/gcn-tree.c
> new file mode 100644
> index 000..0365baf
> --- /dev/null
> +++ b/gcc/config/gcn/gcn-tree.c
> @@ -0,0 +1,715 @@
> +/* Copyright (C) 2017-2018 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +   
> +   GCC is free software; you can redistribute it and/or modify it under
> +   the terms of the GNU General Public License as published by the Free
> +   Software Foundation; either version 3, or (at your option) any later
> +   version.
> +   
> +   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +   for more details.
> +   
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +/* {{{ Includes.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "target.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "gimple-iterator.h"
> +#include "cfghooks.h"
> +#include "cfgloop.h"
> +#include "tm_p.h"
> +#include "stringpool.h"
> +#include "fold-const.h"
> +#include "varasm.h"
> +#include "omp-low.h"
> +#include "omp-general.h"
> +#include "internal-fn.h"
> +#include "tree-vrp.h"
> +#include "tree-ssanames.h"
> +#include "tree-ssa-operands.h"
> +#include "gimplify.h"
> +#include "tree-phinodes.h"
> +#include "cgraph.h"
> +#include "targhooks.h"
> +#include "langhooks-def.h"
> +
> +/* }}}  */
> +/* {{{ OMP GCN pass.  */
> +
> +unsigned int
> +execute_omp_gcn (void)
So some documentation about what this pass is supposed to be 

libgo patch committed: Fix typo in gccgo name mangling in cgo

2018-11-09 Thread Ian Lance Taylor
This patch by Than McIntosh fixes a typo in cmd/cgo in the gccgo name
mangling recipe.  The code to implement new-style gccgo name mangling
had a recipe that didn't quite match the one in the compiler
(incorrect handling for '.'). This showed up as a failure in the
gotools cgo test if the directory containing the test run included a
"." character.  Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 265974)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-da8c968474690d1e77442ac3361b2302ea8e1f36
+559fae430b81595efe151222385192a07a9fc3c3
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/cgo/out.go
===
--- libgo/go/cmd/cgo/out.go (revision 265710)
+++ libgo/go/cmd/cgo/out.go (working copy)
@@ -1261,7 +1261,7 @@ func gccgoPkgpathToSymbolNew(ppath strin
for _, c := range []byte(ppath) {
switch {
case 'A' <= c && c <= 'Z', 'a' <= c && c <= 'z',
-   '0' <= c && c <= '9', '_' == c:
+   '0' <= c && c <= '9', c == '_', c == '.':
bsl = append(bsl, c)
default:
changed = true


Re: [PATCH 20/25] GCN libgcc.

2018-11-09 Thread Jeff Law
On 9/5/18 5:52 AM, a...@codesourcery.com wrote:
> This patch contains the GCN port of libgcc.  I've broken it out just to keep
> both parts more manageable.
> 
> We have the usual stuff, plus a "gomp_print" implementation intended to 
> provide
> a means to output text to console without using the full printf.  Originally
> this was because we did not have a working Newlib port, but now it provides 
> the
> underlying mechanism for printf.  It's also much lighter than printf, and
> therefore more suitable for debugging offload kernels (for which there is no
> debugger, yet).
> 
> In order to work in offload kernels the same function must be present in both
> host and GCN toolchains.  Therefore it needs to live in libgomp (hence the
> name).  However, having found it also useful in stand alone testing I have
> moved the GCN implementation to libgcc.
> 
> It was also necessary to provide a means to disable EMUTLS.
> 
> 2018-09-05  Andrew Stubbs  
>   Kwok Cheung Yeung  
>   Julian Brown  
>   Tom de Vries  
> 
>   libgcc/
>   * Makefile.in: Don't add emutls.c when --enable-emutls is "no".
>   * config.host: Recognize amdgcn*-*-amdhsa.
>   * config/gcn/crt0.c: New file.
>   * config/gcn/gomp_print.c: New file.
>   * config/gcn/lib2-divmod-hi.c: New file.
>   * config/gcn/lib2-divmod.c: New file.
>   * config/gcn/lib2-gcn.h: New file.
>   * config/gcn/reduction.c: New file.
>   * config/gcn/sfp-machine.h: New file.
>   * config/gcn/t-amdgcn: New file.
> ---
> 
> 
> 0020-GCN-libgcc.patch
> 
> diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
> index 0c5b264..6f68257 100644
> --- a/libgcc/Makefile.in
> +++ b/libgcc/Makefile.in
> @@ -429,9 +429,11 @@ LIB2ADD += enable-execute-stack.c
>  # While emutls.c has nothing to do with EH, it is in LIB2ADDEH*
>  # instead of LIB2ADD because that's the way to be sure on some targets
>  # (e.g. *-*-darwin*) only one copy of it is linked.
> +ifneq ($(enable_emutls),no)
>  LIB2ADDEH += $(srcdir)/emutls.c
>  LIB2ADDEHSTATIC += $(srcdir)/emutls.c
>  LIB2ADDEHSHARED += $(srcdir)/emutls.c
> +endif
Why is this needed? Are you just trying to cut out stuff you don't need
in the quest for smaller code or does this cause a more direct problem?


> diff --git a/libgcc/config/gcn/crt0.c b/libgcc/config/gcn/crt0.c
> new file mode 100644
> index 000..f4f367b
> --- /dev/null
> +++ b/libgcc/config/gcn/crt0.c
> @@ -0,0 +1,23 @@
> +/* Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   This file is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by the
> +   Free Software Foundation; either version 3, or (at your option) any
> +   later version.
> +
> +   This file is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +/* Provide an entry point symbol to silence a linker warning.  */
> +void _start() {}
This seems wrong.   I realize you're trying to quiet a linker warning
here, but for the case where you're creating GCN executables (testing?)
this should probably be handled by the C-runtime or linker script.


> diff --git a/libgcc/config/gcn/gomp_print.c b/libgcc/config/gcn/gomp_print.c
> new file mode 100644
> index 000..41f50c3
> --- /dev/null
> +++ b/libgcc/config/gcn/gomp_print.c
[ ... ]
Would this be better in libgomp?  Oh, you addressed that in the
prologue.  Feels like libgomp would be better to me, but I can
understand the rationale behind wanting it in libgcc.


I won't comment on the static sizes since this apparently has to match
something already in existence.



> +
> +void
> +gomp_print_string (const char *msg, const char *value)
> +{
> +  struct printf_data *output = reserve_print_slot ();
> +  output->type = 2; /* String.  */
> +
> +  strncpy (output->msg, msg, 127);
> +  output->msg[127] = '\0';
> +  strncpy (output->text, value, 127);
> +  output->text[127] = '\0';
> +
> +  asm ("" ::: "memory");
> +  output->written = 1;
> +}
I'm not familiar with the GCN memory model, but your asm is really just
a barrier for the compiler.  Do you need any kind of hardware fencing
here?  Similarly for other instances.

All these functions probably need a little comment on their purpose and
arguments.

Note some of the divmod stuff recently changed.  You may need minor

Re: [PATCH] MIPS: Add `-mfix-r5900' option for the R5900 short loop erratum

2018-11-09 Thread Fredrik Noring
Many thanks for your prompt review, Maciej!

> > * gcc/config/mips/mips.c (mips_reorg_process_insns)
> >   (mips_option_override): Default to working around R5900
> >   errata only if the processor was selected explicitly.
> 
>  I think this only describes the `mips_option_override' part.  ChangeLog 
> entries are best concise, so how about just:
> 
>   * config/mips/mips.c (mips_reorg_process_insns)
>   (mips_option_override): Handle `-mfix-r5900'.
> 
> ?

Done!

> > * gcc/config/mips/mips.h: Declare `mfix-r5900' and
> >   `mno-fix-r5900'.
> 
>  This has to be:
> 
>   * config/mips/mips.h (ASM_SPEC): Add `mfix-r5900' and 
>   `mno-fix-r5900'.
> 
> as you're adding to the ASM_SPEC definition.

Yes, done!

> Also your patch lines are 
> way off and I had to seach for the place this change applies to -- have 
> you been using current upstream trunk with this change?

Hmm... I thought so, but obviously 2 November was a week old. I'm sorry
about that, I have rebased v2 now. [ As mentioned in the commit message,
the test was done with GCC 8.2.0 since there was an unrelated problem
compiling some sanitizer component at the time. I will try to test a
current GCC version during the weekend. ]

> > * gcc/config/mips/mips.opt: Define MASK_FIX_R5900.
> 
>  Likewise:
> 
>   * config/mips/mips.opt (mfix-r5900): New option.
> 
> as it's an entry for `mfix-r5900' that you're adding.

Done!

> > * gcc/doc/invoke.texi: Document the R5900, `mfix-r5900' and
> >   `mno-fix-r5900'.
> 
>  It looks like missing bits, did you mean:
> 
>   * doc/invoke.texi: Document the `r5900' processor name, and 
>   `-mfix-r5900' and `-mno-fix-r5900' options.
> 
> ?

Yes, done!

>  All these entries apply to the gcc/ChangeLog rather than the top-level 
> one, so please drop the leading gcc/ part of the paths, and indicate it at 
> the beginning instead.

Done!

>  Please also fix indentation throughout; the subsequent lines of entries 
> are horizontally aligned with the asterisk above and not indented any 
> further.

Ah, yes, of course.

>  Please resubmit your change with the ChangeLog entry updated.  
> Regrettably I have no authority to approve your submission, but hopefully 
> a maintainer will so that it can make it to GCC 9.

I will post v2 shortly. Let's hope for the best. Thanks again!

Fredrik


Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-09 Thread Bernhard Reutner-Fischer
On 9 November 2018 16:33:22 CET, "Martin Liška"  wrote:
>Hi.
>
>After I added 2 new options, I would like to include a new master
>option.
>It's minimal version which only disables optimizations that we are
>aware of
>and can potentially cause problems for live-patching.

I think you attached the wrong patch, AFAICS you attached the gfortran 
vectorized_builtins patch..

thanks,


Re: [RFT PATCH, i386]: Fix PR87928, ICE in ix86_compute_frame_layout

2018-11-09 Thread Iain Sandoe
Hi Uros,

> On 8 Nov 2018, at 23:53, Uros Bizjak  wrote:
> 

> Attached patch fixes PR87928, where we ICE in ix86_compute_frame_layout in


> I will commit the patch to mainline SVN early next week to allow
> Darwin and Ming/Cygwin maintainers some time to test the patch on
> their targets.

Bootstrap succeeds and the new test passes for Darwin.

This does not alter that Darwin has breakage in this area 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78444)
see : https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00884.html and HJ’s reply.

Essentially, determining any case that demands anything other than 128b 
alignment on Darwin looks hard to reason about***

(I will try to get to more investigation there when time permits)

Iain

*** Quotes from "Mac OS X ABI Function Call Guide”

m32:

The function calling conventions used in the IA-32 environment are the same as 
those used in the System V IA-32 ABI, with the following exceptions: 

   …

• The stack is 16-byte aligned at the point of function calls
…

m64:

Calls out to the x86_64 psABI





Re: Small typo in iconv.m4

2018-11-09 Thread Jeff Law
On 11/9/18 10:57 AM, Eric Gallager wrote:
> On 11/7/18, Jeff Law  wrote:
>> On 11/6/18 9:37 AM, Hafiz Abid Qadeer wrote:
>>> Hi All,
>>> I was investigating a character set related problem with windows hosted
>>> GDB and I tracked it down to a typo in iconv.m4. This typo caused
>>> libiconv detection to fail and related support was not built into gdb.
>>>
>>> The problem is with the following line.
>>> CPPFLAGS="$LIBS $INCICONV"
>>> which should have been
>>> CPPFLAGS="$CPPFLAGS $INCICONV"
>>>
>>> OK to commit the attached patch?
>>>
>>> 2018-11-06  Hafiz Abid Qadeer  
>>>
>>> * config/iconv.m4 (AM_ICONV_LINK): Don't overwrite CPPFLAGS.
>>> Append $INCICONV to it.
>>> * gcc/configure: Regenerate.
>>> * libcpp/configure: Likewise.
>>> * libstdc++-v3/configure: Likewise.
>>> * intl/configure: Likewise.
>>>
>>> Thanks,
>>>
>> THanks.  I wasn't sure if you had commit privs, so I went ahead and
>> installed the patch.
>>
>> Jeff
>>
> 
> Does this have any effect on GCC bug 78251?
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78251
> Or any of the related bugs under "See Also" for that matter?
Certainly looks related.  Though I think we're still going to see
pollution, just in a slightly different way.

jeff


Re: [PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Iain Sandoe
Hi Jakub,

> On 9 Nov 2018, at 07:40, Jakub Jelinek  wrote:
> 
> On Fri, Nov 09, 2018 at 07:33:44AM -0800, Iain Sandoe wrote:
>>> On 9 Nov 2018, at 06:37, Jakub Jelinek  wrote:
>> 
>>> +#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
>>> +   else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
>>> + sprintf (buf, "0x" PRIx64, (uint64_t) gomp_integral (handle));
>> 
>> s/0x/0x%/?
> 
> Oops, consider it adjusted.  Haven't started my bootstrap/regtest of it
> yet…

Works For Me (as amended, bootstrap succeeds):

display-affinity-1.c:
libgomp: Affinity not supported on this configuration
L:0%0>xx.local   xx.local   

Re: [PATCH v2] [aarch64] Correct the maximum shift amount for shifted operands.

2018-11-09 Thread Kyrill Tkachov

Hi Christoph,


On 09/11/18 11:38, christoph.muell...@theobroma-systems.com wrote:

From: Christoph Muellner 

The aarch64 ISA specification allows a left shift amount to be applied
after extension in the range of 0 to 4 (encoded in the imm3 field).

This is true for at least the following instructions:

  * ADD (extend register)
  * ADDS (extended register)
  * SUB (extended register)

The result of this patch can be seen, when compiling the following code:

uint64_t myadd(uint64_t a, uint64_t b)
{
   return a+(((uint8_t)b)<<4);
}

Without the patch the following sequence will be generated:

 :
0:  d37c1c21ubfiz   x1, x1, #4, #8
4:  8b20add x0, x1, x0
8:  d65f03c0ret

With the patch the ubfiz will be merged into the add instruction:

 :
0:  8b211000add x0, x0, w1, uxtb #4
4:  d65f03c0ret

Tested with "make check" and no regressions found.

*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  
Philipp Tomsich  

* config/aarch64/aarch64.c (aarch64_uxt_size): Correct the maximum
shift amount for shifted operands.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Christoph Muellner  
Philipp Tomsich  

* gcc.target/aarch64/extend.c: Adjust the testcases to cover
the changed shift amount.


Thanks, this looks good to me.
You'll still need a maintainer to approve.

Kyrill



---
  gcc/config/aarch64/aarch64.c  |  2 +-
  gcc/testsuite/gcc.target/aarch64/extend.c | 16 
  2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..c85988a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8190,7 +8190,7 @@ aarch64_output_casesi (rtx *operands)
  int
  aarch64_uxt_size (int shift, HOST_WIDE_INT mask)
  {
-  if (shift >= 0 && shift <= 3)
+  if (shift >= 0 && shift <= 4)
  {
int size;
for (size = 8; size <= 32; size *= 2)
diff --git a/gcc/testsuite/gcc.target/aarch64/extend.c 
b/gcc/testsuite/gcc.target/aarch64/extend.c
index f399e55..7986c5b 100644
--- a/gcc/testsuite/gcc.target/aarch64/extend.c
+++ b/gcc/testsuite/gcc.target/aarch64/extend.c
@@ -32,8 +32,8 @@ ldr_sxtw0 (char *arr, int i)
  unsigned long long
  adddi_uxtw (unsigned long long a, unsigned int i)
  {
-  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?3" } } */
-  return a + ((unsigned long long)i << 3);
+  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*uxtw #?4" } } */
+  return a + ((unsigned long long)i << 4);
  }
  
  unsigned long long

@@ -46,8 +46,8 @@ adddi_uxtw0 (unsigned long long a, unsigned int i)
  long long
  adddi_sxtw (long long a, int i)
  {
-  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?3" } } */
-  return a + ((long long)i << 3);
+  /* { dg-final { scan-assembler "add\tx\[0-9\]+,.*sxtw #?4" } } */
+  return a + ((long long)i << 4);
  }
  
  long long

@@ -60,8 +60,8 @@ adddi_sxtw0 (long long a, int i)
  unsigned long long
  subdi_uxtw (unsigned long long a, unsigned int i)
  {
-  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?3" } } */
-  return a - ((unsigned long long)i << 3);
+  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*uxtw #?4" } } */
+  return a - ((unsigned long long)i << 4);
  }
  
  unsigned long long

@@ -74,8 +74,8 @@ subdi_uxtw0 (unsigned long long a, unsigned int i)
  long long
  subdi_sxtw (long long a, int i)
  {
-  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?3" } } */
-  return a - ((long long)i << 3);
+  /* { dg-final { scan-assembler "sub\tx\[0-9\]+,.*sxtw #?4" } } */
+  return a - ((long long)i << 4);
  }
  
  long long




Re: [PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM

2018-11-09 Thread Kyrill Tkachov

Hi Sudi,

On 09/11/18 15:33, Sudakshina Das wrote:

Hi

This patch adds -march=armv8.5-a to the Arm backend.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
Armv8.5-A also adds two new security features:
- Speculation Barrier instruction
- Execution and Data Prediction Restriction Instructions
These are made optional to all older Armv8-A versions. Thus we are
adding two new options "+sb" and "+predres" to all older Armv8-A. These
are passed on to the assembler and have no code generation effects and
have already gone in the trunk of binutils.

Bootstrapped and regression tested with arm-none-linux-gnueabihf.

Is this ok for trunk?
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
(ARMv8_5a): New fgroup.
(armv8.5-a): New arch.
(armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
options sb and predres.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
* config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
* config/arm/t-multilib (v8_5_a_simd_variants): New variable.
Add matching rules for -march=armv8.5-a and extensions.
* doc/invoke.texi (ARM options): Document -march=armv8.5-a.
Add sb and predres to all armv8-a except armv8.5-a.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
combination tests.


Hi

This patch adds -march=armv8.5-a to the Arm backend.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
Armv8.5-A also adds two new security features:
- Speculation Barrier instruction
- Execution and Data Prediction Restriction Instructions
These are made optional to all older Armv8-A versions. Thus we are
adding two new options "+sb" and "+predres" to all older Armv8-A. These
are passed on to the assembler and have no code generation effects and
have already gone in the trunk of binutils.

Bootstrapped and regression tested with arm-none-linux-gnueabihf.

Is this ok for trunk?
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das

* config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
(ARMv8_5a): New fgroup.
(armv8.5-a): New arch.
(armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
options sb and predres.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
* config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
* config/arm/t-multilib (v8_5_a_simd_variants): New variable.
Add matching rules for -march=armv8.5-a and extensions.
* doc/invoke.texi (ARM options): Document -march=armv8.5-a.
Add sb and predres to all armv8-a except armv8.5-a.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das

* gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
combination tests.



This is ok modulo a typo fix below.

Thanks,
Kyrill



index 
25788ad09851daf41038b1578307bf23b7f34a94..eba038f9d20bc54bef7bdb7fa1c0e7028d954ed7
 100644
--- a/gcc/config/arm/t-multilib
+++ b/gcc/config/arm/t-multilib
@@ -70,7 +70,8 @@ v8_a_simd_variants:= $(call all_feat_combs, simd crypto)
 v8_1_a_simd_variants   := $(call all_feat_combs, simd crypto)
 v8_2_a_simd_variants   := $(call all_feat_combs, simd fp16 fp16fml crypto 
dotprod)
 v8_4_a_simd_variants   := $(call all_feat_combs, simd fp16 crypto)
-v8_r_nosimd_variants   := +crc
+v8_5_a_simd_variants   := $(call all_feat_combs, simd fp16 crypto)
+v8_r_nosimd_variants   := +cr5
 


Typo, should be +crc





Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-09 Thread Andi Kleen
Hi Richard,

On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote:
> > Passes bootstrap and test suite on x86_64-linux, also
> > bootstrapped and tested gcc itself with full -fvartrace
> > and -fvartrace-locals instrumentation.
> 
> So how is this supposed to be used?  I guess in a
> edit-debug cycle and not for production code?

It can actually be used for production code.

When processor trace is disabled the PTWRITE
instructions acts as nops. So it's only increasing
the code foot print. Since the instrumentation
should only log values which are already computed
it normally doesn't cause any other code.

Even when it is enabled the primary overhead is the
additional memory bandwidth, since the CPU can
do the logging in parallel to other code. As long
as the instrumentation is not too excessive to generate
too much memory bandwidth, it might be actually
quite reasonable to even keep the logging on for
production code, and use it as a "flight recorder",
which is dumped on failures. 

This would also be the model in gdb, once we have support
in it. You would run the program in the debugger
and it just logs the data to a memory buffer,
but when stopping the value history can be examined.

There's also some ongoing work to add (optional) support
for PT to Linux crash dumps, so eventually that will
work without having to always run the debugger.

Today it can be done by running perf in the background
to record the PT, however there the setup is a bit
more complicated.

The primary use case I was envisioning was to set
the attribute on some critical functions/structures/types
of interest and then have a very overhead logging
option for them (generally cheaper than
equivalent software instrumentation). And then
they automatically gets logged without the programmer
needing to add lots of instrumentation code to 
catch every instance. So think of it as a 
"hardware accelerated printf"

> 
> What do you actually write with PTWRITE?  I suppose
> you need to keep a ID to something mapping somewhere
> so you can make sense of the perf records?

PTWRITE writes 32bit/64bit values. The CPU reports the
IP of PTWRITE in the log, either explicitely, or implicitely if branch
trace is enabled too. The IP can then be used to look up
the DWARF scope for that IP. Then the decoder
decodes the operand of PTWRITE and maps it back using 
the dwarf information. So it all works using 
existing debugger infrastructure, and a quite simple
instruction decoder.

I'll clarify that in the description.

> 
> Few comments inline below, but I'm not sure if this
> whole thing is interesting for GCC (as opposed to being
> a instrumentation plugin)

I'm biased, but I think automatic data tracing is a very exciting
use case, so hopefully it can be considered for mainstream gcc.

> >   
> > handle_no_profile_instrument_function_attribute,
> >   NULL },
> > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, 
> > tree name, tree, int,
> >return NULL_TREE;
> >  }
> >
> > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_vartrace_attribute (tree *node, tree, tree, int flags,
> > +  bool *)
> > +{
> > +  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
> > +*node = build_variant_type_copy (*node);
> 
> I don't think you want the attribute on types.  As far as I understood your
> descriptions it should only be on variables and functions.

The idea was that it's possible to trace all instances of a type,
especially structure members. Otherwise it will be harder for
the programmer to hunt down every instance.

For example if I have a structure that is used over a program,
and one member gets the wrong value.

I can do then in the header:

struct foo {
int member __attribute__(("vartrace"));
};

and then recompile the program. Every instance of writing to
member will then be automatically instrumented (assuming
the program stays type safe)

Makes sense?

[BTW I considered adding an address trace
too for pointer writes to hunt down the non type safe
instances and possibly some other use cases. 
That might be possible follow on work]

> > +
> >  #undef TARGET_GIMPLIFY_VA_ARG_EXPR
> >  #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 1eca009e255..08286aa4591 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -3193,6 +3193,13 @@ the standard C library can be guaranteed not to 
> > throw an exception
> >  with the notable exceptions of @code{qsort} and @code{bsearch} that
> >  take function pointer arguments.
> >
> > +@item no_vartrace
> > +@cindex @code{no_vartrace} function or variable attribute
> > +Disable data tracing for the function or variable or structured field
> > +marked with this attribute. Applies to types. 

Re: [patch] allow target config to state r18 is fixed on aarch64

2018-11-09 Thread Olivier Hainque
Hello Wilco,

Would you have further thoughts on the patches proposed in 

 https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01453.html

?

There was:

1)  * config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
R9_REGNUM instead of 9.
(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.


2)  * config/aarch64/aarch64.h (FIXED_R18): New internal
configuration macro, defaulted to 0.
(FIXED_REGISTERS): Use it.
(STATIC_CHAIN_REGNUM): Use r11 instead of r18.

and

3)  * config/aarch64/aarch64.c (aarch64_override_options): Once
arch, cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS
   if defined.



Thanks a lot in advance!

With Kind Regards,

Olivier


> On 23 Oct 2018, at 19:13, Olivier Hainque  wrote:
> 
> Hi Wilco,
> 
>> On 18 Oct 2018, at 19:08, Wilco Dijkstra  wrote:
> 
>>> I wondered if we could set it to R11 unconditionally and picked
>>> the way ensuring no change for !vxworks ports, especially since I
>>> don't have means to test more than what I described above.
>> 
>> Yes it should always be the same register, there is no gain in switching
>> it dynamically. I'd suggest to use X9 since X8 is the last register used for
>> arguments (STATIC_CHAIN_REGNUM is passed when calling a nested
>> function) and some of the higher registers may be used as temporaries in
>> prolog/epilog.
> 
> Thanks for your feedback!  I ported the patches
> to gcc-8 and was able to get a functional toolchain
> for aarch64-wrs-vxworks7 and aarch64-elf, passing
> full Acats for a couple of runtime variants on VxWorks
> (compiled for RTP or kernel mode) as well as a small
> internal testsuite we have, dedicated to cross configurations.
> 
> All the patches apply directly on mainline.
> 
> As for the original patch, I also sanity checked that
> "make all-gcc" passes (self tests included) there for
> --target=aarch64-elf --enable-languages=c
> 
> 
> There are three changes to the common aarch64 port files.
> 
> It turns out that X9 doesn't work for STATIC_CHECK_REGNUM
> because this conflicts with the registers used for -fstack-check:
> 
>  /* The pair of scratch registers used for stack probing.  */
>  #define PROBE_STACK_FIRST_REG  9
>  #define PROBE_STACK_SECOND_REG 10
> 
> I didn't find that immediately (read, I first experienced a few
> badly crashing test runs) because I searched for R9_REGNUM to check
> for other uses, so the first patchlet attached simply adjusts
> the two #defines above to use R9/R10_REGNUM.
> 
> 
> 2018-10-23  Olivier Hainque  
> 
>* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG) : Redefine as
>R9_REGNUM instead of 9.
>(PROBE_STACK_SECOND_REG): Redefine as R10_REGNUM instead of 10.
> 
> 
> The second patch is the one which I proposed a few days ago
> to allow a subtarget (in my case, the VxWorks port) to state that
> R18 is to be considered fixed. Two changes compared to the original
> patch: a comment attached to the default definition of FIXED_R18,
> and the unconditional use of R11_REGNUM as an alternate STATIC_CHAIN_REGNUM.
> 
> I suppose the latter could require extra testing than what I was
> able to put in (since this is also changing for !vxworks configurations),
> which Sam very kindly did on the first instance.
> 
> I didn't touch CALL_SAVED_REGISTERS since this is 1 for r18 already.
> I also didn't see a strong reason to move to a more dynamic scheme,
> through conditional_register_usage.
> 
> 2018-03-18  Olivier Hainque  
> 
>   * config/aarch64/aarch64.h (FIXED_R18): New internal
>   configuration macro, defaulted to 0.
>   (FIXED_REGISTERS): Use it.
>   (STATIC_CHAIN_REGNUM): Use r11 instead of r18.
> 
> 
> The third patch proposes the introduction of support for a
> conditional SUBTARGET_OVERRIDE_OPTIONS macro, as many other
> architectures have, and which is needed by all VxWorks ports.
> 
> In the current state, this one could possibly impact only
> VxWorks, as no other config file would define the macro.
> 
> I'm not 100% clear on the possible existence of rules regarding
> the placement of this within the override_options functions. We used
> something similar to what other ports do, and it worked just fine
> for VxWorks.
> 
> 2018-10-23  Olivier Hainque  
> 
>* config/aarch64/aarch64.c (aarch64_override_options): Once
>arch, cpu and tune were validated, insert SUBTARGET_OVERRIDE_OPTIONS
>if defined.
> 
> I'm happy to adjust any of all this if needed of course.
> 
> Thanks in advance for your feedback!
> 
> With Kind Regards,
> 
> Olivier
> 
> 
> <0001-Use-R9-R10_REGNUM-to-designate-stack-checking-regist.patch.txt><0002-Introduce-FIXED_R18-in-aarch64.h.patch.txt><0003-Support-for-SUBTARGET_OVERRIDE_OPTIONS-in-aarch64.c.patch.txt>



Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-09 Thread Kyrill Tkachov

On 09/11/18 12:18, Richard Biener wrote:

On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
 wrote:


Hi all,

In this testcase the codegen for VLA SVE is worse than it could be due to 
unrolling:

fully_peel_me:
 mov x1, 5
 ptrue   p1.d, all
 whilelo p0.d, xzr, x1
 ld1dz0.d, p0/z, [x0]
 faddz0.d, z0.d, z0.d
 st1dz0.d, p0, [x0]
 cntdx2
 addvl   x3, x0, #1
 whilelo p0.d, x2, x1
 beq .L1
 ld1dz0.d, p0/z, [x0, #1, mul vl]
 faddz0.d, z0.d, z0.d
 st1dz0.d, p0, [x3]
 cntwx2
 incbx0, all, mul #2
 whilelo p0.d, x2, x1
 beq .L1
 ld1dz0.d, p0/z, [x0]
 faddz0.d, z0.d, z0.d
 st1dz0.d, p0, [x0]
.L1:
 ret

In this case, due to the vector-length-agnostic nature of SVE the compiler 
doesn't know the loop iteration count.
For such loops we don't want to unroll if we don't end up eliminating branches 
as this just bloats code size
and hurts icache performance.

This patch introduces a new unroll-known-loop-iterations-only param that 
disables cunroll when the loop iteration
count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA 
code, but it does help some
Advanced SIMD cases as well where loops with an unknown iteration count are not 
unrolled when it doesn't eliminate
the branches.

So for the above testcase we generate now:
fully_peel_me:
 mov x2, 5
 mov x3, x2
 mov x1, 0
 whilelo p0.d, xzr, x2
 ptrue   p1.d, all
.L2:
 ld1dz0.d, p0/z, [x0, x1, lsl 3]
 faddz0.d, z0.d, z0.d
 st1dz0.d, p0, [x0, x1, lsl 3]
 incdx1
 whilelo p0.d, x1, x3
 bne .L2
 ret

Not perfect still, but it's preferable to the original code.
The new param is enabled by default on aarch64 but disabled for other targets, 
leaving their behaviour unchanged
(until other target people experiment with it and set it, if appropriate).

Bootstrapped and tested on aarch64-none-linux-gnu.
Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
performance.

Ok for trunk?


Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.


You mean like this?
It's certainly a simpler patch, but I was just a bit hesitant of making this 
change for all targets :)
But I suppose it's a reasonable change.



It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.


Can be worth looking at as a follow-up. Do you envisage the target analysing
the gimple statements of the loop to figure out its cost?

Thanks,
Kyrill


2018-11-09  Kyrylo Tkachov  

* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
loop when number of iterations is not known and flag_peel_loops is in
effect.

2018-11-09  Kyrylo Tkachov  

* gcc.target/aarch64/sve/unroll-1.c: New test.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c b/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c
new file mode 100644
index ..7f53d20cbf8e18a4389b86c037f56f024bac22a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/unroll-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+/* Check that simple loop is not fully unrolled.  */
+
+void
+fully_peel_me (double *x)
+{
+  for (int i = 0; i < 5; i++)
+x[i] = x[i] * 2;
+}
+
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[.+]\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[.+\]\n} 1 } } */
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index c2953059fb9218d4bc4cf12fe9277a552b4a04bd..daeddb384254775d6482ed5580e8262e4b26a87f 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -883,6 +883,16 @@ try_unroll_loop_completely (struct loop *loop,
 			 loop->num);
 	  return false;
 	}
+	  else if (TREE_CODE (niter) == SCEV_NOT_KNOWN
+		   && flag_peel_loops)
+	{
+	  if (dump_enabled_p ())
+		dump_printf (MSG_NOTE, "Not unrolling loop %d: "
+			 "exact number of iterations not known "
+			 "(-fpeel-loops).\n",
+			 loop->num);
+	  return false;
+	}
 	}
 
   initialize_original_copy_tables ();


Re: Small typo in iconv.m4

2018-11-09 Thread Eric Gallager
On 11/7/18, Jeff Law  wrote:
> On 11/6/18 9:37 AM, Hafiz Abid Qadeer wrote:
>> Hi All,
>> I was investigating a character set related problem with windows hosted
>> GDB and I tracked it down to a typo in iconv.m4. This typo caused
>> libiconv detection to fail and related support was not built into gdb.
>>
>> The problem is with the following line.
>> CPPFLAGS="$LIBS $INCICONV"
>> which should have been
>> CPPFLAGS="$CPPFLAGS $INCICONV"
>>
>> OK to commit the attached patch?
>>
>> 2018-11-06  Hafiz Abid Qadeer  
>>
>>  * config/iconv.m4 (AM_ICONV_LINK): Don't overwrite CPPFLAGS.
>>  Append $INCICONV to it.
>>  * gcc/configure: Regenerate.
>>  * libcpp/configure: Likewise.
>>  * libstdc++-v3/configure: Likewise.
>>  * intl/configure: Likewise.
>>
>> Thanks,
>>
> THanks.  I wasn't sure if you had commit privs, so I went ahead and
> installed the patch.
>
> Jeff
>

Does this have any effect on GCC bug 78251?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78251
Or any of the related bugs under "See Also" for that matter?


Re: [PATCH v4] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ilya Leoshkevich
> Am 09.11.2018 um 18:30 schrieb Ulrich Weigand :
> 
> Ilya Leoshkevich wrote:
> 
>>   gcc_assert (GET_CODE (x) != SYMBOL_REF
>> -  || !CONSTANT_POOL_ADDRESS_P (x));
>> +  || !CONSTANT_POOL_ADDRESS_P (x)
>> +  || s390_symbol_relative_long_p (x));
> 
> Hmm, it's a bit weird that this routine now uses a different check
> than the other two.  It would look more straightforward for
> find_constant_pool_ref to use the same s390_safe_relative_long_p
> check as the others.
> 
> (This would then also make s390_symbol_relative_long_p redundant.)

Ok, sounds good.

> If we do that, it might even make sense to pull the 
> s390_safe_relative_long_p check up into the callers:
> 
> E.g. in s390_mainpool_finish, replace
> 
>  /* Replace all literal pool references.  */
> 
>  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
>{
>  if (NONJUMP_INSN_P (insn) || CALL_P (insn))
>{
> 
> with
> 
>  if ((NONJUMP_INSN_P (insn) || CALL_P (insn))
>  && !s390_safe_relative_long_p (insn))
> 
> (This last change is just a suggestion, only if it makes the
> overall code simpler and more readable.)

I would have to do that in 8 places (4 for find/replace and 4 for 
annotate), so it’s probably better to keep the current 3 checks.


[patch, libgfortran] Patch committed as obvious

2018-11-09 Thread Jerry DeLisle
Committed as obvious. I had inserted this line to check effects of the 
-fdec flags and forgot to delete it before my previous commit.


Author: jvdelisle
Date: Fri Nov  9 17:29:33 2018
New Revision: 265979

URL: https://gcc.gnu.org/viewcvs?rev=265979=gcc=rev
Log:
2018-11-09  Jerry DeLisle  

PR libfortran/78351
* io/transfer.c (read_sf_internal): Delete leftover
debug code.

Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/io/transfer.c


Re: [PATCH] Add value_range_base (w/o equivs)

2018-11-09 Thread Aldy Hernandez

On 11/9/18 9:19 AM, Richard Biener wrote:


This adds value_range_base, a base class of class value_range
with all members but the m_equiv one.


First of all, thanks so much for doing this!



I have looked into the sole GC user, IPA propagation, and replaced
the value_range use there with value_range_base since it also
asserted the equiv member is always NULL.

This in turn means I have written down that GC users only can
use value_range_base (and fixed the accessability issue with
adding a bunch of friends).



+
 /* Range of values that can be associated with an SSA_NAME after VRP
-   has executed.  */
-class GTY((for_user)) value_range
+   has executed.  Note you may only use value_range_base with GC memory.  */
+class GTY((for_user)) value_range_base
+{


GC users cannot use the derived value_range?  Either way could you 
document the "why" this is the case above?


And thanks for adding those accessibility friends, they're a pain to get 
right.  It's a pity we have to go to such lengths to deal with this shit.




I have moved all methods that are required for this single user
sofar (without looking with others are trivially movable because
they do not look at equivs - that's for a followup).  I ended


That would be great.


up basically duplicating the ::union_ wrapper around union_ranges
(boo).  Some more refactoring might save us a few lines here.

There are two places where IPA prop calls extract_range_from_unary_expr.
I've temporarily made it use value_range temporaries there given
I need to think about the few cases of ->deep_copy () we have in
there.  IMHO equiv handling would need to move to the callers or
rather "copies" shouldn't be handled by extract_range_from_unary_expr.
Well, I need to think about it.  (no, I don't want to make that
function virtual)

The other workers might be even easier to massage to work on
value_range_base only.


If value_range_base is the predominant idiom, perhaps it should be 
called value_range and the derived class be called 
value_range_with_equiv or some such?




Bootstrap and regtest running on x86_64-unknown-linux-gnu.

If that goes well I want to apply this even in its incomplete form.
Unless there are any complaints?


Fine by me.  Again, thanks.
Aldy


Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-09 Thread Qing Zhao
Hi, Martin,

thanks a lot for the previous two new options for live-patching. 


I have two more questions below:

1. do we still need new options to disable the following:
   A. unreachable code/variable removal? 
   B. Visibility changes with -flto and/or -fwhole-program?

2. for this new patch, could you please explain a little bit more on the 
problem?

Thanks.

Qing

> On Nov 9, 2018, at 9:33 AM, Martin Liška  wrote:
> 
> Hi.
> 
> After I added 2 new options, I would like to include a new master option.
> It's minimal version which only disables optimizations that we are aware of
> and can potentially cause problems for live-patching.
> 
> Martin
> <0001-Come-up-with-fvectorized-functions.patch>



Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
Hi James,

>On Mon, Jan 22, 2018 at 09:22:27AM -0600, Richard Biener wrote:
>> It would be better to dissect this cost into vec_to_scalar and vec_extract 
>> where
>> vec_to_scalar really means getting at the scalar value of a vector of
>> uniform values
>> which most targets can do without any instruction (just use a subreg).
>> 
>> I suppose we could also make vec_to_scalar equal to vector extraction and 
>> remove
>> the uses for the other case (reduction vector result to scalar reg).
>
> I have dug up Richard's comments from last year, which you appear to have
> ignored and made no reference to when resubmitting the patch.
>
> Please don't do that. Carefully consider Richard's review feedback before
> resubmitting this patch.

I seem to have missed this comment... However I can't see how it relates to my
patch, particularly since Richard claimed in PR79262 that this PR is a backend
issue. Sure it *would* be great to fix the vector cost infrastructure, but 
that's a lot
of work, and wasn't my goal here, nor is it required to fix PR79262. The 
existing
costs allow the issue to be solved, just like the other targets/tunings have 
done
already, so that's clearly the best approach for this PR.

Wilco

Re: [PATCH] detect attribute mismatches in alias declarations (PR 81824)

2018-11-09 Thread Martin Sebor

+/* Handle the "copy" attribute by copying the set of attributes
+   from the symbol referenced by ARGS to the declaration of *NODE.  */
+
+static tree
+handle_copy_attribute (tree *node, tree name, tree args,
+  int flags, bool *no_add_attrs)
+{
+  /* Break cycles in circular references.  */
+  static hash_set attr_copy_visited;

Does this really need to be static?


The variable was intended to break cycles in recursive calls to
the function for self-referential applications of attribute copy
but since the attribute itself is not applied (anymore) such cycles
can no longer form.  I have removed the variable and simplified
the handlers (there are tests to verify this works correctly).


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cfe6a8e..8ffb0cd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c95f67..c027acd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi

[ ... ]


+
+In C++, the warning is issued when an explicitcspecialization of a primary

"explicitcspecialization" ? :-)



Fixed.



Looks pretty good.  There's the explicit specialization nit and the
static vs auto question for attr_copy_visited.  Otherwise it's OK.


Thanks.  I've retested a revision with the changes discussed here
and committed it as r265980.

Martin


Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
Hi James,

> We have 7 unique target tuning structures in the AArch64 backend, of which
> only one has a 2x ratio between scalar_int_cost and vec_to_scalar_cost. Other
> ratios are 1, 3, 8, 3, 4, 6.

I wouldn't read too much in the exact value here - the costs are simply 
relative to
other values for the same tuning, ie. cores that use 4 or 6 also tend to use 
larger
values for the other entries.

> What makes this choice correct? What makes it more correct than what we
> have now? On which of the 28 entries in config/aarch64/aarch64-cores.def does
> performance improve? 

It's correct since it's the minimum value that stops vectorization for this 
particular
bug without increasing costs too much and accidentally blocking useful loops 
from
being vectorized (as confirmed by benchmarking). Given that all other vector 
cost
tunings already block vectorization for this case, this is clearly what is 
required
for best performance. So this improves performance for all 28 entries.

> Please validate the performance effect of this patch, which changes default
> code generation for everyone, on more than one testcase in a bug report.

I did validate the performance like I mentioned in the patch. Since it has been 
a
while, I can easily rerun the benchmarks using latest trunk to verify it's 
still a gain.

> Are the Spec benchmarks sufficiently representative to change the generic
> vectorisation costs?

SPEC is sufficiently large and complex to show there is a gain without 
regressions.
Do you have reason to believe other benchmarks might not see the same gain?

Wilco

Re: [PATCH v4] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ulrich Weigand
Ilya Leoshkevich wrote:

>gcc_assert (GET_CODE (x) != SYMBOL_REF
> -   || !CONSTANT_POOL_ADDRESS_P (x));
> +   || !CONSTANT_POOL_ADDRESS_P (x)
> +   || s390_symbol_relative_long_p (x));

Hmm, it's a bit weird that this routine now uses a different check
than the other two.  It would look more straightforward for
find_constant_pool_ref to use the same s390_safe_relative_long_p
check as the others.

(This would then also make s390_symbol_relative_long_p redundant.)

If we do that, it might even make sense to pull the 
s390_safe_relative_long_p check up into the callers:

E.g. in s390_mainpool_finish, replace

  /* Replace all literal pool references.  */

  for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
{
  if (NONJUMP_INSN_P (insn) || CALL_P (insn))
{

with

  if ((NONJUMP_INSN_P (insn) || CALL_P (insn))
  && !s390_safe_relative_long_p (insn))

(This last change is just a suggestion, only if it makes the
overall code simpler and more readable.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



arm - minor cleanups for CPU alias support

2018-11-09 Thread Richard Earnshaw (lists)
A couple of very minor issues with the new support for CPU
aliases.

* config/arm/parsecpu.awk (/alias/): Tighten invisible alias
matching criteria.  Remove unused array initializer.

Committed.
diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index ba2dee5fdcb..87411847ec6 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -684,7 +684,7 @@ BEGIN {
 for (n = 2; n <= alias_count; n++) {
 	visible = "true"
 	alias = $n
-	if (alias ~ /!.*/) {
+	if (alias ~ /^!.*/) {
 	visible = "false"
 	gsub(/^!/, "", alias)
 	}
@@ -700,7 +700,6 @@ BEGIN {
 	}
 	cpu_all_aliases[alias] = cpu_name
 }
-cpu_has_alias[cpu_name] = 1
 parse_ok = 1
 }
 


[PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-09 Thread Sudakshina Das
Hi

I am posting this patch on behalf of Carey (cc'ed). I also have some
review comments that I will make as a reply to this later.


This implements a new AArch64 specific back-end pass that helps optimize
branch-dense code, which can be a bottleneck for performance on some Arm
cores. This is achieved by padding out the branch-dense sections of the
instruction stream with nops.

This has proven to show up to a 2.61%~ improvement on the Cortex A-72
(SPEC CPU 2006: sjeng).

The implementation includes the addition of a new RTX instruction class
FILLER_INSN, which has been white listed to allow placement of NOPs
outside of a basic block. This is to allow padding after unconditional
branches. This is favorable so that any performance gained from
diluting branches is not paid straight back via excessive eating of nops.

It was deemed that a new RTX class was less invasive than modifying
behavior in regards to standard UNSPEC nops.

## Command Line Options

Three new target-specific options are provided:
- mbranch-dilution
- mbranch-dilution-granularity={num}
- mbranch-dilution-max-branches={num}

A number of cores known to be able to benefit from this pass have been
given default tuning values for their granularity and max-branches.
Each affected core has a very specific granule size and associated
max-branch limit. This is a microarchitecture specific optimization. 
Typical usage should be -mdilute-branches with a specificed -mcpu. Cores 
with a granularity tuned to 0 will be ignored. Options are provided for 
experimentation.

## Algorithm and Heuristic

The pass takes a very simple 'sliding window' approach to the problem. 
We crawl through each instruction (starting at the first branch) and 
keep track of the number of branches within the current "granule" (or 
window). When this exceeds the max-branch value, the pass will dilute 
the current granule, inserting nops to push out some of the branches. 
The heuristic will favour unconditonal branches (for performance 
reasons), or branches that are between two other branches (in order to 
decrease the likelihood of another dilution call being needed).

Each branch type required a different method for nop insertion due to 
RTL/basic_block restrictions:

- Returning calls do not end a basic block so can be handled by emitting
a generic nop.
- Unconditional branches must be the end of a basic block, and nops 
cannot be outside of a basic block.
   Thus the need for FILLER_INSN, which allows placement outside of a 
basic block - and translates to a nop.
- For most conditional branches we've taken a simple approach and only 
handle the fallthru edge for simplicity,
   which we do by inserting a "nop block" of nops on the fallthru edge, 
mapping that back to the original destination block.
- asm gotos and pcsets are going to be tricky to analyse from a dilution 
perspective so are ignored at present.


## Changelog

gcc/testsuite/ChangeLog:

2018-11-09  Carey Williams  

* gcc.target/aarch64/branch-dilution-off.c: New test.
* gcc.target/aarch64/branch-dilution-on.c: New test.


gcc/ChangeLog:

2018-11-09  Carey Williams  

* cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
* cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
basic blocks.
* config.gcc (extra_objs): Add aarch64-branch-dilution.o.
* config/aarch64/aarch64-branch-dilution.c: New file.
* config/aarch64/aarch64-passes.def (branch-dilution): Register
pass.
* config/aarch64/aarch64-protos.h (struct tune_params): Declare
tuning parameters bdilution_gsize and bdilution_maxb.
(make_pass_branch_dilution): New declaration.
* config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
thunderx_tunings,tsv110_tunings,xgene1_tunings,
qdf24xx_tunings,saphira_tunings,thunderx2t99_tunings):
Provide default tunings for bdilution_gsize and bdilution_maxb.
* config/aarch64/aarch64.md (filler_insn): Define new insn.
* config/aarch64/aarch64.opt (mbranch-dilution,
mbranch-dilution-granularity,
mbranch-dilution-max-branches): Define new branch dilution
options.
* config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
for aarch64-branch-dilution.c.
* coretypes.h (rtx_filler_insn): New rtx class.
* doc/invoke.texi (mbranch-dilution,
mbranch-dilution-granularity,
mbranch-dilution-max-branches): Document branch dilution
options.
* emit-rtl.c (emit_filler_after): New emit function.
* rtl.def (FILLER_INSN): New RTL EXPR of type RTX_INSN.
* rtl.h (class GTY): New class for rtx_filler_insn.
(is_a_helper ::test): New test helper for rtx_filler_insn.
(macro FILLER_INSN_P(X)): New predicate.
* target-insns.def 

Re: PING [PATCH] use MAX_OFILE_ALIGNMENT to validate attribute aligned (PR 87795)

2018-11-09 Thread Martin Sebor

On 11/07/2018 03:43 PM, Jeff Law wrote:

On 11/6/18 5:06 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg02081.html

I thought I'd already ACK's this one...


OK.


You did but I made changes afterwards in response to Joseph's
comment.  I've committed the final patch in r265977.

Thanks
Martin


Re: [PATCH] Simplify floating point comparisons

2018-11-09 Thread Wilco Dijkstra
Richard Biener wrote:
>Marc Glisse wrote:
>> Let's try with C = DBL_MIN and x = 婊BL_MAX. I don't believe it involves
>> signed zeros or infinities, just an underflow. First, the result depends on
>> the rounding mode. And in the default round-to-nearest, both divisions give
>> 0, and thus compare the same with 0, but we replace that with a sign test on
>> x, where they clearly give opposite answers.
>>
>> What would be the proper flag to test to check if we care about underflow?
>
> We have none specific so this makes it flag_unsafe_math_optimizations.

Right I have added the unsafe math check again like in the previous version:


The patch implements some of the optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.

Simplify (C / x >= 0.0) into x >= 0.0 with -funsafe-math-optimizations
(since C / x can underflow to zero if x is huge, it's not safe otherwise).
If C is negative the comparison is reversed.


Simplify (x * C1) > C2 into x > (C2 / C1) with -funsafe-math-optimizations.
If C1 is negative the comparison is reversed.

OK for commit?

ChangeLog
2018-11-09  Wilco Dijkstra
Jackson Woodruff  

gcc/
PR 71026/tree-optimization
* match.pd: Simplify floating point comparisons.

gcc/testsuite/
PR 71026/tree-optimization
* gcc.dg/div-cmp-1.c: New test.
* gcc.dg/div-cmp-2.c: New test.

--
diff --git a/gcc/match.pd b/gcc/match.pd
index 
94fbab841f5e36bd33fda849a686fd80886ee1ff..f6c76510f95be2485e5bacd07edab336705cbd25
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -405,6 +405,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (rdiv @0 (negate @1))
  (rdiv (negate @0) @1))
 
+(if (flag_unsafe_math_optimizations)
+ /* Simplify (C / x op 0.0) to x op 0.0 for C != 0, C != Inf/Nan.
+Since C / x may underflow to zero, do this only for unsafe math.  */
+ (for op (lt le gt ge)
+  neg_op (gt ge lt le)
+  (simplify
+   (op (rdiv REAL_CST@0 @1) real_zerop@2)
+   (if (!HONOR_SIGNED_ZEROS (@1) && !HONOR_INFINITIES (@1))
+(switch
+ (if (real_less (, TREE_REAL_CST_PTR (@0)))
+  (op @1 @2))
+ /* For C < 0, use the inverted operator.  */
+ (if (real_less (TREE_REAL_CST_PTR (@0), ))
+  (neg_op @1 @2)))
+
 /* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
 (for div (trunc_div ceil_div floor_div round_div exact_div)
  (simplify
@@ -4049,6 +4064,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(rdiv @2 @1))
(rdiv (op @0 @2) @1)))
 
+ (for cmp (lt le gt ge)
+  neg_cmp (gt ge lt le)
+  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
+  (simplify
+   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
+   (with
+{ tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
+(if (tem
+&& !(REAL_VALUE_ISINF (TREE_REAL_CST (tem))
+ || (real_zerop (tem) && !real_zerop (@1
+ (switch
+  (if (real_less (, TREE_REAL_CST_PTR (@1)))
+   (cmp @0 { tem; }))
+  (if (real_less (TREE_REAL_CST_PTR (@1), ))
+   (neg_cmp @0 { tem; })))
+
  /* Simplify sqrt(x) * sqrt(y) -> sqrt(x*y).  */
  (for root (SQRT CBRT)
   (simplify
diff --git a/gcc/testsuite/gcc.dg/div-cmp-1.c b/gcc/testsuite/gcc.dg/div-cmp-1.c
new file mode 100644
index 
..cd1a5cd3d6fee5a10e9859ca99b344fa3fdb7f5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/div-cmp-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-optimized-raw" } 
*/
+
+int
+cmp_mul_1 (float x)
+{
+  return x * 3 <= 100;
+}
+
+int
+cmp_mul_2 (float x)
+{
+  return x * -5 > 100;
+}
+
+int
+div_cmp_1 (float x, float y)
+{
+  return x / 3 <= y;
+}
+
+int
+div_cmp_2 (float x, float y)
+{
+  return x / 3 <= 1;
+}
+
+/* { dg-final { scan-tree-dump-times "mult_expr" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "rdiv_expr" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/div-cmp-2.c b/gcc/testsuite/gcc.dg/div-cmp-2.c
new file mode 100644
index 
..f4ac42a196a804747d0b578e0aa2131671c8d3cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/div-cmp-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funsafe-math-optimizations -ffinite-math-only 
-fdump-tree-optimized-raw" } */
+
+int
+cmp_1 (float x)
+{
+  return 5 / x >= 0;
+}
+
+int
+cmp_2 (float x)
+{
+  return 1 / x <= 0;
+}
+
+int
+cmp_3 (float x)
+{
+  return -2 / x >= 0;
+}
+
+int
+cmp_4 (float x)
+{
+  return -5 / x <= 0;
+}
+
+/* { dg-final { scan-tree-dump-not "rdiv_expr" "optimized" } } */
+

[PATCH] -fsave-optimization-record: compress the output using zlib

2018-11-09 Thread David Malcolm
One of the concerns noted at Cauldron about -fsave-optimization-record
was the size of the output files.

This file implements compression of the -fsave-optimization-record
output, using zlib.

I did some before/after testing of this patch, using SPEC 2017's
502.gcc_r with -O3, looking at the sizes of the generated
FILENAME.opt-record.json[.gz] files.

The largest file was for insn-attrtab.c:
  before:  171736285 bytes (164M)
  after: 5304015 bytes (5.1M)

Smallest file was for vasprintf.c:
  before:  30567 bytes
  after:4485 bytes

Median file by size before was lambda-mat.c:
  before:2266738 bytes (2.2M)
  after:   75988 bytes (15K)

Total of all files in the benchmark:
  before: 2041720713 bytes (1.9G)
  after:66870770 bytes (63.8M)

...so clearly compression is a big win in terms of file size, at the
cost of making the files slightly more awkward to work with. [1]
I also wonder if we want to support any pre-filtering of the output
(FWIW roughly half of the biggest file seems to be "Adding assert for "
messages from tree-vrp.c).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

[1] I've updated my optrecord.py module to deal with this, which
simplifies things; it's still not clear to me if that should live
in "contrib/" or not.

gcc/ChangeLog:
* doc/invoke.texi (-fsave-optimization-record): Note that the
output is compressed.
* optinfo-emit-json.cc: Include .
(optrecord_json_writer::write): Compress the output.
---
 gcc/doc/invoke.texi  |  6 +++---
 gcc/optinfo-emit-json.cc | 35 +++
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4ff3a150..f26ada0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14408,11 +14408,11 @@ dumps from the vectorizer about missed opportunities.
 
 @item -fsave-optimization-record
 @opindex fsave-optimization-record
-Write a SRCFILE.opt-record.json file detailing what optimizations
+Write a SRCFILE.opt-record.json.gz file detailing what optimizations
 were performed, for those optimizations that support @option{-fopt-info}.
 
-This option is experimental and the format of the data within the JSON
-file is subject to change.
+This option is experimental and the format of the data within the
+compressed JSON file is subject to change.
 
 It is roughly equivalent to a machine-readable version of
 @option{-fopt-info-all}, as a collection of messages with source file,
diff --git a/gcc/optinfo-emit-json.cc b/gcc/optinfo-emit-json.cc
index 31029ad..6d4502c 100644
--- a/gcc/optinfo-emit-json.cc
+++ b/gcc/optinfo-emit-json.cc
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "pass_manager.h"
 #include "selftest.h"
 #include "dump-context.h"
+#include 
 
 /* A class for writing out optimization records in JSON format.  */
 
@@ -133,16 +134,34 @@ optrecord_json_writer::~optrecord_json_writer ()
 void
 optrecord_json_writer::write () const
 {
-  char *filename = concat (dump_base_name, ".opt-record.json", NULL);
-  FILE *outfile = fopen (filename, "w");
-  if (outfile)
+  pretty_printer pp;
+  m_root_tuple->print ();
+
+  bool emitted_error = false;
+  char *filename = concat (dump_base_name, ".opt-record.json.gz", NULL);
+  gzFile outfile = gzopen (filename, "w");
+  if (outfile == NULL)
 {
-  m_root_tuple->dump (outfile);
-  fclose (outfile);
+  error_at (UNKNOWN_LOCATION, "cannot open file %qs for writing 
optimization records",
+   filename); // FIXME: more info?
+  goto cleanup;
 }
-  else
-error_at (UNKNOWN_LOCATION, "unable to write optimization records to %qs",
- filename); // FIXME: more info?
+
+  if (gzputs (outfile, pp_formatted_text ()) <= 0)
+{
+  int tmp;
+  error_at (UNKNOWN_LOCATION, "error writing optimization records to %qs: 
%s",
+   filename, gzerror (outfile, ));
+  emitted_error = true;
+}
+
+ cleanup:
+  if (outfile)
+if (gzclose (outfile) != Z_OK)
+  if (!emitted_error)
+   error_at (UNKNOWN_LOCATION, "error closing optimization records %qs",
+ filename);
+
   free (filename);
 }
 
-- 
1.8.5.3



[PATCH v4] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ilya Leoshkevich
Bootstrapped and regtested on s390x-redhat-linux.

r265490 allowed the compiler to choose in a more flexible way whether to
use load or load-address-relative-long (LARL) instruction.  When it
chose LARL for literal pool references, the latter ones were rewritten
by pass_s390_early_mach to use UNSPEC_LTREF, which assumes base register
usage, which in turn is not compatible with LARL.  The end result was an
ICE because of unrecognizable insn.

UNSPEC_LTREF and friends are necessary in order to communicate the
dependency on the base register to pass_sched2.  When relative
addressing is used, no base register is necessary, so in such cases the
rewrite must be avoided.

gcc/ChangeLog:

2018-11-05  Ilya Leoshkevich  

PR target/87762
* config/s390/predicates.md (larl_operand): Use
s390_symbol_relative_long_p () to reduce code duplication.
* config/s390/s390-protos.h (s390_symbol_relative_long_p): New
function.
* config/s390/s390.c (s390_safe_relative_long_p): Likewise.
(annotate_constant_pool_refs): Skip insns which support relative
addressing.
(annotate_constant_pool_refs_1): New helper function.
(s390_symbol_relative_long_p): New function.
(find_constant_pool_ref): Assert that unannotated constant pool
references must use relative addressing.
(replace_constant_pool_ref): Skip insns which support relative
addressing.
(replace_constant_pool_ref_1): New helper function.
(s390_mainpool_finish): Adjust to the new signature of
replace_constant_pool_ref ().
(s390_chunkify_finish): Likewise.
(pass_s390_early_mach::execute): Likewise.
(s390_prologue_plus_offset): Likewise.
(s390_emit_prologue): Likewise.
(s390_emit_epilogue): Likewise.
---
 gcc/config/s390/predicates.md |  8 +--
 gcc/config/s390/s390-protos.h |  1 +
 gcc/config/s390/s390.c| 93 ++-
 3 files changed, 72 insertions(+), 30 deletions(-)

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 97f717c558d..bb66c8a6bcb 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -151,9 +151,7 @@
   if (GET_CODE (op) == LABEL_REF)
 return true;
   if (SYMBOL_REF_P (op))
-return (!SYMBOL_FLAG_NOTALIGN2_P (op)
-   && SYMBOL_REF_TLS_MODEL (op) == 0
-   && s390_rel_address_ok_p (op));
+return s390_symbol_relative_long_p (op);
 
   /* Everything else must have a CONST, so strip it.  */
   if (GET_CODE (op) != CONST)
@@ -176,9 +174,7 @@
   if (GET_CODE (op) == LABEL_REF)
 return true;
   if (SYMBOL_REF_P (op))
-return (!SYMBOL_FLAG_NOTALIGN2_P (op)
-   && SYMBOL_REF_TLS_MODEL (op) == 0
-   && s390_rel_address_ok_p (op));
+return s390_symbol_relative_long_p (op);
 
 
   /* Now we must have a @GOTENT offset or @PLT stub
diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 96fa705f879..d3c2cc55e28 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -157,6 +157,7 @@ extern void s390_indirect_branch_via_thunk (unsigned int 
regno,
rtx comparison_operator,
enum s390_indirect_branch_type 
type);
 extern void s390_indirect_branch_via_inline_thunk (rtx execute_target);
+extern bool s390_symbol_relative_long_p (rtx);
 #endif /* RTX_CODE */
 
 /* s390-c.c routines */
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 1be85727b73..58f3163b274 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2731,6 +2731,17 @@ s390_safe_attr_type (rtx_insn *insn)
 return TYPE_NONE;
 }
 
+/* Return attribute relative_long of insn.  */
+
+static bool
+s390_safe_relative_long_p (rtx_insn *insn)
+{
+  if (recog_memoized (insn) >= 0)
+return get_attr_relative_long (insn) == RELATIVE_LONG_YES;
+  else
+return false;
+}
+
 /* Return true if DISP is a valid short displacement.  */
 
 static bool
@@ -8116,11 +8127,8 @@ s390_first_cycle_multipass_dfa_lookahead (void)
   return 4;
 }
 
-/* Annotate every literal pool reference in X by an UNSPEC_LTREF expression.
-   Fix up MEMs as required.  */
-
 static void
-annotate_constant_pool_refs (rtx *x)
+annotate_constant_pool_refs_1 (rtx *x)
 {
   int i, j;
   const char *fmt;
@@ -8199,16 +8207,39 @@ annotate_constant_pool_refs (rtx *x)
 {
   if (fmt[i] == 'e')
{
- annotate_constant_pool_refs ( (*x, i));
+ annotate_constant_pool_refs_1 ( (*x, i));
}
   else if (fmt[i] == 'E')
{
  for (j = 0; j < XVECLEN (*x, i); j++)
-   annotate_constant_pool_refs ( (*x, i, j));
+   annotate_constant_pool_refs_1 ( (*x, i, j));
}
 }
 }
 
+/* Annotate every literal pool reference in INSN by an UNSPEC_LTREF expression.
+   Fix up MEMs as required.
+   Skip insns which support relative 

Re: [PATCH] Fix not properly nul-terminated string constants in JIT

2018-11-09 Thread Bernd Edlinger
Hi Dave,

is the patch OK, or do you still have questions?


Thanks
Bernd.

On 11/2/18 10:48 PM, Bernd Edlinger wrote:
> On 11/2/18 9:40 PM, David Malcolm wrote:
>> On Sun, 2018-08-05 at 16:59 +, Bernd Edlinger wrote:
>>> Hi!
>>>
>>>
>>> My other patch with adds assertions to varasm.c regarding correct
>>> nul termination of sting literals did make these incorrect string
>>> constants in JIT frontend fail.
>>>
>>> The string constants are not nul terminated if their length exceeds
>>> 200 characters.  The test cases do not use strings of that size where
>>> that would make a difference.  But using a fixed index type is
>>> clearly
>>> wrong.
>>>
>>> This patch removes the fixed char[200] array type from
>>> playback::context,
>>> and uses build_string_literal instead of using build_string directly.
>>>
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>>
>> Sorry for the belated response.
>>
>> Was this tested with --enable-host-shared and --enable-languages=jit ?
>> Note that "jit" is not included in --enable-languages=all.
>>
> 
> Yes, of course.  The test suite contains a few string constants, just
> all of them are shorter than 200 characters.  But I think removing this
> artificial limit enables the existing test cases to test that the
> shorter string is in fact zero terminated.
> 
>> The patch seems reasonable, but I'm a little confused over the meaning
>> of "len" in build_string_literal and build_string: does it refer to the
>> length or the size of the string?
>>
> 
> build_string_literal:
> For languages that use zero-terminated strings, len is strlen(str)+1, and
> str is a zero terminated single-byte character string.
> For languages that don't use zero-terminated strings, len is the size of
> the string and str is not zero terminated.
> 
> build_string:
> constructs a STRING_CST tree object, which is usable as is in some contexts,
> like for asm constraints, but as a string literal it is incomplete, and
> needs an index type.  The index type defines the memory size which must
> be larger than the string precision.  Excess memory is implicitly cleared.
> 
> This means currently all jit strings shorter than 200 characters
> are filled with zero up to the limit of 200 chars as imposed by
> m_char_array_type_node.  Strings of exactly 200 chars are not zero terminated,
> and larger strings should result in an assertion (excess precision was 
> previously
> allowed, but no zero termination was appended, when that is not part of
> the original string constant).
> 
> Previously it was allowed to have memory size less than the string len, which
> had complicated the STRING_CST semantics in the middle-end, but with the
> string_cst semantic rework I did for gcc-9 this is no longer allowed and
> results in (checking) assertions in varasm.c.
> 
>>> @@ -617,16 +616,9 @@ playback::rvalue *
>>>   playback::context::
>>>   new_string_literal (const char *value)
>>>   {
>>> -  tree t_str = build_string (strlen (value), value);
>>> -  gcc_assert (m_char_array_type_node);
>>> -  TREE_TYPE (t_str) = m_char_array_type_node;
>>> -
>>> -  /* Convert to (const char*), loosely based on
>>> - c/c-typeck.c: array_to_pointer_conversion,
>>> - by taking address of start of string.  */
>>> -  tree t_addr = build1 (ADDR_EXPR, m_const_char_ptr, t_str);
>>> +  tree t_str = build_string_literal (strlen (value) + 1, value);
>>> -  return new rvalue (this, t_addr);
>>> +  return new rvalue (this, t_str);
>>>   }
>>
>> In the above, the call to build_string with strlen is replaced with
>> build_string_literal with strlen + 1.
>>
>>
>> build_string's comment says:
>>
>> "Note that for a C string literal, LEN should include the trailing
>> NUL."
>>
>> but has:
>>
>>    length = len + offsetof (struct tree_string, str) + 1;
>>
>> and:
>>
>>    TREE_STRING_LENGTH (s) = len;
>>    memcpy (s->string.str, str, len);
>>    s->string.str[len] = '\0';
>>
>> suggesting that the "len" parameter is in fact the length *without* the
>> trailing NUL, and that a trailing NUL is added by build_string.
>>
> 
> Yes, string constants in tree objects have another zero termiation,
> but varasm.c does something different, there the index range takes
> precedence.
> The index range is built in build_string_literal as follows:
> 
>    elem = build_type_variant (char_type_node, 1, 0);
>    index = build_index_type (size_int (len - 1));
>    type = build_array_type (elem, index);
> 
> therefore the string constant hast the type char[0..len-1]
> thus only len bytes are significant for code generation, the extra
> nul is just for "convenience".
> 
>> However build_string_literal has:
>>
>>    t = build_string (len, str);
>>    elem = build_type_variant (char_type_node, 1, 0);
>>    index = build_index_type (size_int (len - 1));
>>
>> suggesting that the len is passed directly to build_string (and thus
>> ought to be strlen), but the build_index_type uses len - 1 (which
>> suggests that len is the size of the 

Re: [PATCH v3] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ilya Leoshkevich
> Am 09.11.2018 um 14:43 schrieb Ulrich Weigand :
> 
> Ilya Leoshkevich wrote:
> 
>> +  /* Return unannotated constant pool references, so that the corresponding
>> + entries are added to the back-end-managed pool.  Not doing so would 
>> result
>> + in those entries being placed in the middle-end-managed pool, which 
>> would
>> + in turn prevent merging them with identical ones in the 
>> back-end-managed
>> + pool.  */
> 
> I'm still not convinced this is necessary; how frequently does the case occur
> that a constant is duplicated, and is it worth the extra code complexity?
> (Also, note that the middle-end pool is shared across the whole translation
> unit, while the back-end pool is duplicated in every function ... so if the
> same constant is used by many functions, forcing it into the back-end pool
> may *increase* overall size.)
> 
> In any case, does this even happen with your current code?  You find the
> unannotated reference here, and therefore copy the constant into the local
> pool, but then replace_constant_pool_ref will still ignore the insn, and
> thus the insn will actually continue to refer to the middle-end pool
> constant, *not* the copy created in the local pool.  Right?

Whoops, that’s right.  This code must be currently achieving nothing at
all!  I did not run into such cases and I don’t have a test case for
them, so they are hypothetical.  So let’s keep it simple here.

> 
>>   if (DISP_IN_RANGE (INTVAL (frame_off)))
>>  {
>> -  insn = gen_rtx_SET (frame_pointer,
>> -  gen_rtx_PLUS (Pmode, frame_pointer, frame_off));
>> -  insn = emit_insn (insn);
>> +  insn = emit_insn (copy_rtx (cfa));
>>  }
> 
> This seems unrelated?

Now that insn is of type rtx_insn *, gen_rtx_SET () result cannot be
assigned to it.  Let me rewrite this by creating an rtx temporary,
which should be more readable.


Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-11-09 Thread David Edelsohn
> 2018-11-08  Rainer Orth  
>
> * affinity.c: Include , .
> (gomp_display_affinity_place): Remove cpusetp.
> * teams.c: Include .

This fixed AIX bootstrap failure also.

Thanks, David


Re: [PATCH][GCC] Make DR_TARGET_ALIGNMENT compile time variable

2018-11-09 Thread Andre Vieira (lists)
On 05/11/18 12:41, Richard Biener wrote:
> On Mon, Nov 5, 2018 at 1:07 PM Andre Vieira (lists)
>  wrote:
>>
>>
>> Hi,
>>
Hi,

Thank you for the quick response! See inline responses below.

>> This patch enables targets to describe DR_TARGET_ALIGNMENT as a
>> compile-time variable.  It does so by turning the variable into a
>> 'poly_uint64'.  This should not affect the current code-generation for
>> any target.
>>
>> We hope to use this in the near future for SVE using the
>> current_vector_size as the preferred target alignment for vectors.  In
>> fact I have a patch to do just this, but I am still trying to figure out
>> whether and when it is beneficial to peel for alignment with a runtime
>> misalignment.
> 
> In fact in most cases I have seen the issue is that it's not visible whether
> peeling will be able to align _all_ references and doing peeling only to
> align some is hardly beneficial.  To improve things the vectorizer would
> have to version the loop for the case where peeling can reach alignment
> for a group of DRs and then vectorize one copy with peeling for alignment
> and one copy with unaligned accesses.


So I have seen code being peeled for alignment even when it only knows
how to align one of a group (only checked 2 or 3) and I think this may
still be beneficial in some cases.  I am more worried about cases where
the number of iterations isn't enough to justify the initial peeling
cost or when the loop isn't memory bound, i.e. very arithmetic heavy
loops.  This is a bigger vectorization problem though, that would
require some kind of cost-model.

> 
>>  The patch I am working on will change the behavior of
>> auto-vectorization for SVE when building vector-length agnostic code for
>> targets that benefit from aligned vector loads/stores.  The patch will
>> result in  the generation of a runtime computation of misalignment and
>> the construction of a corresponding mask for the first iteration of the
>> loop.
>>
>> I have decided to not offer support for prolog/epilog peeling when the
>> target alignment is not compile-time constant, as this didn't seem
>> useful, this is why 'vect_do_peeling' returns early if
>> DR_TARGET_ALIGNMENT is not constant.
>>
>> I bootstrapped and tested this on aarch64 and x86 basically
>> bootstrapping one target that uses this hook and one that doesn't.
>>
>> Is this OK for trunk?
> 
> The patch looks good but I wonder wheter it is really necessary at this
> point.

The goal of this patch is really to enable future work, on it's own it
does nothing.  I am working on a small target-specific patch to enable
this for SVE, but I need to do a bit more analysis and benchmarking to
be able to determine whether its beneficial which I will not be able to
finish before end of stage 1. That is why I split them up and sent this
one upstream to see if I could get the middle-end change in.

> 
> Thanks,
> Richard.
> 
>> Cheers,
>> Andre
>>
>> 2018-11-05  Andre Vieira  
>>
>> * config/aarch64/aarch64.c 
>> (aarch64_vectorize_preferred_vector_alignment):
>> Change return type to poly_uint64.
>> (aarch64_simd_vector_alignment_reachable): Adapt to preferred vector
>> alignment being a poly int.
>> * doc/tm.texi (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Change 
>> return
>> type to poly_uint64.
>> * target.def (default_preferred_vector_alignment): Likewise.
>> * targhooks.c (default_preferred_vector_alignment): Likewise.
>> * targhooks.h (default_preferred_vector_alignment): Likewise.
>> * tree-vect-data-refs.c
>> (vect_calculate_target_alignment): Likewise.
>> (vect_compute_data_ref_alignment): Adapt to vector alignment
>> being a poly int.
>> (vect_update_misalignment_for_peel): Likewise.
>> (vect_enhance_data_refs_alignment): Likewise.
>> (vect_find_same_alignment_drs): Likewise.
>> (vect_duplicate_ssa_name_ptr_info): Likewise.
>> (vect_setup_realignment): Likewise.
>> (vect_can_force_dr_alignment_p): Change alignment parameter type to
>> poly_uint64.
>> * tree-vect-loop-manip.c (get_misalign_in_elems): Learn to construct 
>> a mask
>> with a compile time variable vector alignment.
>> (vect_gen_prolog_loop_niters): Adapt to vector alignment being a 
>> poly int.
>> (vect_do_peeling): Exit early if vector alignment is not constant.
>> * tree-vect-stmts.c (ensure_base_align): Adapt to vector alignment 
>> being a
>> poly int.
>> (vectorizable_store): Likewise.
>> (vectorizable_load): Likweise.
>> * tree-vectorizer.h (struct dr_vec_info): Make target_alignment 
>> field a
>> poly_uint64.
>> (vect_known_alignment_in_bytes): Adapt to vector alignment being a 
>> poly
>> int.
>> (vect_can_force_dr_alignment_p): Change alignment parameter type to
>> poly_uint64.



Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization

2018-11-09 Thread Renlin Li

Hi Richard,

On 11/09/2018 11:48 AM, Richard Biener wrote:

On Thu, Nov 8, 2018 at 5:55 PM Renlin Li  wrote:


Hi Richard,


*However*, after I rebased my patch on the latest trunk.
Got the following dump from ifcvt:
 [local count: 1006632961]:
# i_20 = PHI 
# ivtmp_18 = PHI 
a_10 = array[i_20];
_1 = a_10 & 1;
_2 = a_10 + 1;
_ifc__34 = _1 != 0 ? _2 : a_10;
array[i_20] = _ifc__34;
_4 = a_10 + 2;
_ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
array[i_20] = _ifc__37;
i_13 = i_20 + 1;
ivtmp_5 = ivtmp_18 - 1;
if (ivtmp_5 != 0)
  goto ; [93.33%]
else
  goto ; [6.67%]

the redundant load is not generated, but you could still see the unconditional 
store.


Yes, I fixed the redundant loads recently and indeed dead stores
remain (for the particular
testcase they would be easy to remove).


Right.




After loop vectorization, the following is generated (without my change):


Huh.  But that's not because of if-conversion but because SVE needs to
mask _all_
loop operations that are not safe to execute with the loop_mask!


vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
a_10 = array[i_20];
vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
_1 = a_10 & 1;
vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
_2 = a_10 + 1;
vect__ifc__34.9_43 = VEC_COND_EXPR ;
_ifc__34 = _1 != 0 ? _2 : a_10;
.MASK_STORE (vectp_array.10_45, 4B, loop_mask_7, vect__ifc__34.9_43);
vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
_4 = a_10 + 2;
vect__ifc__37.13_51 = VEC_COND_EXPR  vect_cst__50, 
vect__4.12_49, vect__ifc__34.9_43>;
_ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
.MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);

With the old ifcvt code, my change here could improve it a little bit, 
eliminate some redundant load.
With the new code, it could not improved it further. I'll adjust the patch 
based on the latest trunk.


So what does the patch change the above to?  The code has little to no
comments apart from a
small picture with code _before_ the transform...

It is like this:
  vect_a_10.6_6 = .MASK_LOAD (vectp_array.4_35, 4B, loop_mask_7);
  a_10 = array[i_20];
  vect__1.7_39 = vect_a_10.6_6 & vect_cst__38;
  _1 = a_10 & 1;
  vect__2.8_41 = vect_a_10.6_6 + vect_cst__40;
  _2 = a_10 + 1;
  _60 = vect__1.7_39 != vect_cst__42;
  vect__ifc__34.9_43 = VEC_COND_EXPR <_60, vect__2.8_41, vect_a_10.6_6>;
  _ifc__34 = _1 != 0 ? _2 : a_10;
  vec_mask_and_61 = _60 & loop_mask_7;
  .MASK_STORE (vectp_array.10_45, 4B, vec_mask_and_61, vect__2.8_41);
  vect__4.12_49 = vect_a_10.6_6 + vect_cst__48;
  _4 = a_10 + 2;
  vect__ifc__37.13_51 = VEC_COND_EXPR  vect_cst__50, 
vect__4.12_49, vect__ifc__34.9_43>;
  _ifc__37 = _ifc__34 > 10 ? _4 : _ifc__34;
  .MASK_STORE (vectp_array.14_53, 4B, loop_mask_7, vect__ifc__37.13_51);

As the loaded value is used later, It could not be removed.

With the change, ideally, less data is stored.
However, it might generate more instructions.

1, The load is not eliminable. Apparently, your change eliminate most of the 
redundant load.
   The rest is necessary or not easy to remove.
2, additional AND instruction.

With a simpler test case like this:

static int array[100];
int test (int a, int i)
{
  for (unsigned i = 0; i < 16; i++)
{
  if (a & 1)
array[i] = a + 1;
}
  return array[i];
}

The new code-gen will be:
  vect__2.4_29 = vect_cst__27 + vect_cst__28;
  _44 = vect_cst__34 != vect_cst__35;
  vec_mask_and_45 = _44 & loop_mask_32;
  .MASK_STORE (vectp_array.9_37, 4B, vec_mask_and_45, vect__2.4_29);

While the old one is:

  vect__2.4_29 = vect_cst__27 + vect_cst__28;
  vect__ifc__24.7_33 = .MASK_LOAD (vectp_array.5_30, 4B, loop_mask_32);
  vect__ifc__26.8_36 = VEC_COND_EXPR ;
  .MASK_STORE (vectp_array.9_37, 4B, loop_mask_32, vect__ifc__26.8_36);




I was wondering whether we can implement

   l = [masked]load;
   tem = cond ? x : l;
   masked-store = tem;

pattern matching in a regular pass - forwprop for example.  Note the
load doesn't need to be masked,
correct?  In fact if it is masked you need to make sure the
conditional never accesses parts that
are masked in the load, no?  Or require the mask to be the same as
that used by the store.  But then
you still cannot simply replace the store mask with a new mask
generated from the conditional?


Yes, this would require the mask for load and store is the same.
This matches the pattern before loop vectorization.
The mask here is loop mask, to ensure we are bounded by the number of 
iterations.

The new mask is the (original mask & condition mask) (example shown above).
In this case, less lanes will be stored.

It is possible we do that in forwprop.
I could try to integrate the change into it if it is the correct place to go.

As the pattern is initially generated by loop vectorizer, I did the change 
right after it before it got
converted into other forms. For example, forwprop will transform the original 

Re: [RFC] support --with-multilib-list=@/path/name

2018-11-09 Thread Richard Earnshaw (lists)
On 09/11/2018 15:07, Alexandre Oliva wrote:
> Richard,
> 
> Olivier tells me he talked to you briefly at the Cauldron about allowing
> custom multilib sets to be configured from custom Makefile fragments
> supplied at configure time, and that you was somewhat receptive to the
> idea.  I have implemented this as follows, for ARM targets only so far,
> using "@/path/name" in the argument to --with-multilib-list to mean
> /path/name is to be used as a multilib-list Makefile fragment.
> 
> Does this look like an acceptable interface?  Is it ok as far as ARM
> maintainers are concerned?
> 
> Any objections from maintainers of other targets that support
> --with-multilib-list (or from anyone else) to the syntax, to the
> implementation, or to extending this possibility to their ports?
> 
> for  gcc/ChangeLog
> 
>   * config.gcc (tmake_file): Add /path/name to tmake_file for
> each @/path/name in --with-multilib-list on arm-*-* targets.
>   * configure.ac: Accept full pathnames in tmake_file.
>   * configure: Rebuilt.
>   * doc/install.texi (with-multilib-list): Document it.

I'm not opposed to such an extension, but I do have some caveats on the
current implementation (mostly on the implied contract between the
compiler and the person building GCC):

- I'm not sure if we really want to allow combinations of an arbitrary
multilib config with the builtin configurations.  Those are tricky
enough to get right as it is, and requests to support additional libs as
well as those with changes to these makefile fragments might be an
issue.  As such, I think I'd want to say that you either use the builtin
lists *or* you supply your own fragment.

- I'd also be concerned about implying that this interface into the
compiler build system is in any way stable, so I think we'd want to
document explicitly that makefile fragments supplied this way may have
to be tailored to a specific release of GCC.

Given the second point, there's nothing to stop a user copying the
internal makefile fragments into their own fragment and then adjusting
it to work with their additional features; so I don't think the first
restriction is too onerous.

R.

> ---
>  gcc/config.gcc   |   13 +
>  gcc/configure|9 ++---
>  gcc/configure.ac |5 -
>  gcc/doc/install.texi |   22 +++---
>  4 files changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 7578ff03825e..f1363c41f989 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -3998,6 +3998,19 @@ case "${target}" in
>   aprofile|rmprofile)
>   
> tmake_profile_file="arm/t-multilib"
>   ;;
> + @/*)
> + ml=`echo "X$arm_multilib" | sed 
> '1s,^X@,,'`
> + if test -f "${ml}"; then
> + 
> tmake_file="${tmake_file} ${ml}"
> + else
> + echo "Error: ${ml} does 
> not exist" >&2
> + exit 1
> + fi
> + ;;
> + @*)
> + echo "Error: multilib config 
> file must start with /" >&2
> + exit 1
> + ;;
>   *)
>   echo "Error: 
> --with-multilib-list=${with_multilib_list} not supported." 1>&2
>   exit 1
> diff --git a/gcc/configure b/gcc/configure
> index b814484ea25b..5f15c7a1ff02 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -12244,7 +12244,10 @@ done
>  tmake_file_=
>  for f in ${tmake_file}
>  do
> - if test -f ${srcdir}/config/$f
> + if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
> + then
> + tmake_file_="${tmake_file_} $f"
> + elif test -f ${srcdir}/config/$f
>   then
>   tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
>   fi
> @@ -18572,7 +18575,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 18575 "configure"
> +#line 18578 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> @@ -18678,7 +18681,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 18681 "configure"
> +#line 18684 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index 

Re: [PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Jakub Jelinek
On Fri, Nov 09, 2018 at 07:33:44AM -0800, Iain Sandoe wrote:
> > On 9 Nov 2018, at 06:37, Jakub Jelinek  wrote:
> 
> > +#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
> > +   else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
> > + sprintf (buf, "0x" PRIx64, (uint64_t) gomp_integral (handle));
> 
> s/0x/0x%/?

Oops, consider it adjusted.  Haven't started my bootstrap/regtest of it
yet...

Thanks.

Jakub


Re: Implement {get,set}_range_info() variants that work with value_range's

2018-11-09 Thread Martin Sebor

On 11/09/2018 03:02 AM, Aldy Hernandez wrote:



On 11/8/18 4:47 PM, Martin Sebor wrote:

On 11/08/2018 04:52 AM, Aldy Hernandez wrote:

get/set_range_info() currently returns the extremes of a range.  I have
implemented overloaded variants that return a proper range.  In the
future we should use actual ranges throughout, and not depend on range
extremes, as depending on this behavior could causes us to lose
precision.

I am also including changes to size_must_be_zero_p() to show how we
should be using the range API, as opposed to performing error prone
ad-hoc calculations on ranges and anti-ranges.

Martin, I'm not saying your code is wrong.  There are numerous other
places in the compiler where we manipulate ranges/anti-ranges directly,
all of which should be adapted in the future.  Everywhere there is a
mention of VR_RANGE/VR_ANTI_RANGE in the compiler is suspect.  We should
ideally be using intersect/union/may_contain_p/null_p, etc.

OK pending another round of tests?  (As I had tested this patch along
with a whole slew of other upcoming changes ;-)).


I'm all for simplification! :)

I'd love it if the API made it possible to write these expressions
using native operators and intuitive names, without having to worry
about details like what types they are represented in or how to
convert between them.  It would be great if a function like
size_must_be_zero_p could be coded along the lines similar to:

   return intersect (size, value_range (ssize_type_node) - 0) == 0;

It shouldn't have to twiddle bits to compute the wide_int value
of SSIZE_MAX or wrestle with convertibility warts by calling
build_int_cst and and wide_int_to_tree.

Martin

PS I think we should be able to get close to the syntax above
with a few value_range ctors:

   value_range::value_range (tree type_or_expression);


Would this create the range for [type, type] or [-MIN, type] or what?


For a TYPE it would create [TYPE_MIN, TYPE_MAX].


   template 
   value_range::value_range (const IntegerType&);

a non-member minus operator to compute the difference:

   value_range operator- (const value_range&, const value_range&);

a non-member intersect function like this:

   value_range intersect (const value_range&, const value_range&);

and a non-member equality operator:

   bool operator== (const value_range&, const value_range&);

With the above, because a tree and any integer (including wide_int,
offset_int, and all the other flavors du jour) implicitly convert
to a value_range (possibly representing a single value),
the subtraction, intersection, and equality would apply to any and
all of those types as well without callers having to spell out or
even know what representation they're dealing with.


I like it.


I'm glad!  When can we get it? ;-)  Seriously, if this is
something we would like to see I'm happy to help put it
together.

Martin


Re: [PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Iain Sandoe
Hi Jakub,


> On 9 Nov 2018, at 06:37, Jakub Jelinek  wrote:

> +#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
> + else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
> +   sprintf (buf, "0x" PRIx64, (uint64_t) gomp_integral (handle));

s/0x/0x%/?

Iain



[PATCH][RFC] Come up with -flive-patching master option.

2018-11-09 Thread Martin Liška
Hi.

After I added 2 new options, I would like to include a new master option.
It's minimal version which only disables optimizations that we are aware of
and can potentially cause problems for live-patching.

Martin
>From dd52cd0249fc30cf6d7bf01a8826323277817b78 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 7 Nov 2018 12:41:19 +0100
Subject: [PATCH] Come up with -fvectorized-functions.

---
 gcc/fortran/decl.c| 33 +++
 gcc/fortran/gfortran.h|  2 ++
 gcc/fortran/match.h   |  1 +
 gcc/fortran/parse.c   |  3 +++
 gcc/fortran/trans-intrinsic.c | 43 +++
 5 files changed, 82 insertions(+)

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 2b77d950abb..938c35c508f 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -98,6 +98,9 @@ bool gfc_matching_function;
 /* Set upon parsing a !GCC$ unroll n directive for use in the next loop.  */
 int directive_unroll = -1;
 
+/* List middle-end built-ins that should be vectorized.  */
+vec vectorized_builtins;
+
 /* If a kind expression of a component of a parameterized derived type is
parameterized, temporarily store the expression here.  */
 static gfc_expr *saved_kind_expr = NULL;
@@ -11243,3 +11246,33 @@ gfc_match_gcc_unroll (void)
   gfc_error ("Syntax error in !GCC$ UNROLL directive at %C");
   return MATCH_ERROR;
 }
+
+/* Match a !GCC$ builtin b attributes flags form:
+
+   The parameter b is name of a middle-end built-in.
+   Flags are one of:
+ - omp-simd-notinbranch.
+
+   When we come here, we have already matched the !GCC$ builtin string.  */
+match
+gfc_match_gcc_builtin (void)
+{
+  char builtin[GFC_MAX_SYMBOL_LEN + 1];
+
+  if (gfc_match_name (builtin) != MATCH_YES)
+return MATCH_ERROR;
+
+  gfc_gobble_whitespace ();
+  if (gfc_match ("attributes") != MATCH_YES)
+return MATCH_ERROR;
+
+  gfc_gobble_whitespace ();
+  if (gfc_match ("omp_simd_notinbranch") != MATCH_YES)
+return MATCH_ERROR;
+
+  char *r = XNEWVEC (char, strlen (builtin) + 32);
+  sprintf (r, "__builtin_%s", builtin);
+  vectorized_builtins.safe_push (r);
+
+  return MATCH_YES;
+}
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index d8ef35d9d6c..bb7f4dd0c4b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2763,6 +2763,7 @@ gfc_finalizer;
 bool gfc_in_match_data (void);
 match gfc_match_char_spec (gfc_typespec *);
 extern int directive_unroll;
+extern vec vectorized_builtins;
 
 /* Handling Parameterized Derived Types  */
 bool gfc_insert_kind_parameter_exprs (gfc_expr *);
@@ -3501,5 +3502,6 @@ bool gfc_is_reallocatable_lhs (gfc_expr *);
 /* trans-decl.c */
 
 void finish_oacc_declare (gfc_namespace *, gfc_symbol *, bool);
+void gfc_adjust_builtins (void);
 
 #endif /* GCC_GFORTRAN_H  */
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 418542bd5a6..f25ed860c06 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -247,6 +247,7 @@ match gfc_match_dimension (void);
 match gfc_match_external (void);
 match gfc_match_gcc_attributes (void);
 match gfc_match_gcc_unroll (void);
+match gfc_match_gcc_builtin (void);
 match gfc_match_import (void);
 match gfc_match_intent (void);
 match gfc_match_intrinsic (void);
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 13cc6f5fccd..56d0d050bc3 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -1072,6 +1072,7 @@ decode_gcc_attribute (void)
 
   match ("attributes", gfc_match_gcc_attributes, ST_ATTR_DECL);
   match ("unroll", gfc_match_gcc_unroll, ST_NONE);
+  match ("builtin", gfc_match_gcc_builtin, ST_NONE);
 
   /* All else has failed, so give up.  See if any of the matchers has
  stored an error message of some sort.  */
@@ -5663,6 +5664,8 @@ parse_progunit (gfc_statement st)
   gfc_state_data *p;
   int n;
 
+  gfc_adjust_builtins ();
+
   if (gfc_new_block
   && gfc_new_block->abr_modproc_decl
   && gfc_new_block->attr.function)
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 4ae2b3252b5..0417e039a39 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -597,7 +597,50 @@ define_quad_builtin (const char *name, tree type, bool is_const)
   return fndecl;
 }
 
+/* Add SIMD attribute for FNDECL built-in if the built-in
+   name is in VECTORIZED_BUILTINS.  */
 
+static void
+add_simd_flag_for_built_in (tree fndecl)
+{
+  if (fndecl == NULL_TREE)
+return;
+
+  const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  for (unsigned i = 0; i < vectorized_builtins.length (); i++)
+if (strcmp (vectorized_builtins[i], name) == 0)
+  {
+	DECL_ATTRIBUTES (fndecl)
+	  = tree_cons (get_identifier ("omp declare simd"),
+		   build_tree_list (NULL_TREE,
+	build_omp_clause (UNKNOWN_LOCATION,
+			  OMP_CLAUSE_NOTINBRANCH)),
+		   DECL_ATTRIBUTES (fndecl));
+	return;
+  }
+}
+
+void
+gfc_adjust_builtins (void)
+{
+  gfc_intrinsic_map_t *m;
+  for (m = 

[PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM

2018-11-09 Thread Sudakshina Das
Hi

This patch adds -march=armv8.5-a to the Arm backend.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
Armv8.5-A also adds two new security features:
- Speculation Barrier instruction
- Execution and Data Prediction Restriction Instructions
These are made optional to all older Armv8-A versions. Thus we are
adding two new options "+sb" and "+predres" to all older Armv8-A. These
are passed on to the assembler and have no code generation effects and
have already gone in the trunk of binutils.

Bootstrapped and regression tested with arm-none-linux-gnueabihf.

Is this ok for trunk?
Sudi

*** gcc/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
(ARMv8_5a): New fgroup.
(armv8.5-a): New arch.
(armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
options sb and predres.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
* config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
* config/arm/t-multilib (v8_5_a_simd_variants): New variable.
Add matching rules for -march=armv8.5-a and extensions.
* doc/invoke.texi (ARM options): Document -march=armv8.5-a.
Add sb and predres to all armv8-a except armv8.5-a.

*** gcc/testsuite/ChangeLog ***

2018-xx-xx  Sudakshina Das  

* gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
combination tests.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index d82e95a226659948e59b317f07e0fd386ed674a2..e6bcc3c720b64f4c80d9bff101e756de82d760e6 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -114,6 +114,9 @@ define feature armv8_3
 # Architecture rel 8.4.
 define feature armv8_4
 
+# Architecture rel 8.5.
+define feature armv8_5
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -174,6 +177,14 @@ define feature quirk_cm3_ldrd
 # (Very) slow multiply operations.  Should probably be a tuning bit.
 define feature smallmul
 
+# Speculation Barrier Instruction for v8-A architectures, added by
+# default to v8.5-A
+define feature sb
+
+# Execution and Data Prediction Restriction Instruction for
+# v8-A architectures, added by default from v8.5-A
+define feature predres
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -235,6 +246,7 @@ define fgroup ARMv8_1aARMv8a crc32 armv8_1
 define fgroup ARMv8_2aARMv8_1a armv8_2
 define fgroup ARMv8_3aARMv8_2a armv8_3
 define fgroup ARMv8_4aARMv8_3a armv8_4
+define fgroup ARMv8_5aARMv8_4a armv8_5 sb predres
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r  ARMv8a
@@ -505,6 +517,8 @@ begin arch armv8-a
  option crypto add FP_ARMv8 CRYPTO
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8-a
 
 begin arch armv8.1-a
@@ -517,6 +531,8 @@ begin arch armv8.1-a
  option crypto add FP_ARMv8 CRYPTO
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8.1-a
 
 begin arch armv8.2-a
@@ -532,6 +548,8 @@ begin arch armv8.2-a
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
  option dotprod add FP_ARMv8 DOTPROD
+ option sb add sb
+ option predres add predres
 end arch armv8.2-a
 
 begin arch armv8.3-a
@@ -547,6 +565,8 @@ begin arch armv8.3-a
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
  option dotprod add FP_ARMv8 DOTPROD
+ option sb add sb
+ option predres add predres
 end arch armv8.3-a
 
 begin arch armv8.4-a
@@ -560,8 +580,23 @@ begin arch armv8.4-a
  option crypto add FP_ARMv8 CRYPTO DOTPROD
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option sb add sb
+ option predres add predres
 end arch armv8.4-a
 
+begin arch armv8.5-a
+ tune for cortex-a53
+ tune flags CO_PROC
+ base 8A
+ profile A
+ isa ARMv8_5a
+ option simd add FP_ARMv8 DOTPROD
+ option fp16 add fp16 fp16fml FP_ARMv8 DOTPROD
+ option crypto add FP_ARMv8 CRYPTO DOTPROD
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+end arch armv8.5-a
+
 begin arch armv8-m.base
  tune for cortex-m23
  base 8M_BASE
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index eacee746a39912d04aa03c636f9a95e0e72ce43b..dde6e137db5598d92df6a1e69a63140146bf7372 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -377,19 +377,22 @@ EnumValue
 Enum(arm_arch) String(armv8.4-a) Value(24)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.base) Value(25)
+Enum(arm_arch) String(armv8.5-a) Value(25)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.main) Value(26)
+Enum(arm_arch) 

libgo patch committed: Change RLIM_INFINITY from 0xffffffffffffffff to -1

2018-11-09 Thread Ian Lance Taylor
This libgo patch changees RLIM_INFINITY from 0x to -1
on GNU/Linux.  0x is arguably the correct value, but
in the gc toolchain's syscall package the value is -1.  So we are
compatible.  New programs should be using the golang.org/x/sys/unix
package anyhow.  This fixes https://golang.org/issue/28665.
Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.  Committed to
trunk and to 7 and 8 branches.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 265820)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-5fcfe352ad91945a4f4d0dcfb6309df9bd072c7d
+da8c968474690d1e77442ac3361b2302ea8e1f36
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 265710)
+++ libgo/mksysinfo.sh  (working copy)
@@ -1152,10 +1152,17 @@ grep '^const _RLIMIT_' gen-sysinfo.go |
 grep '^const _RLIM_' gen-sysinfo.go |
 grep -v '^const _RLIM_INFINITY ' |
 sed -e 's/^\(const \)_\(RLIM_[^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
+rliminf=""
 if test "${rlimit}" = "_rlimit64" && grep '^const _RLIM64_INFINITY ' 
gen-sysinfo.go > /dev/null 2>&1; then
-  echo 'const RLIM_INFINITY = _RLIM64_INFINITY' >> ${OUT}
-elif grep '^const _RLIM_INFINITY ' gen-sysinfo.go > /dev/null 2>&1; then
-  echo 'const RLIM_INFINITY = _RLIM_INFINITY' >> ${OUT}
+  rliminf=`grep '^const _RLIM64_INFINITY ' gen-sysinfo.go | sed -e 's/.* //'`
+else
+  rliminf=`grep '^const _RLIM_INFINITY ' gen-sysinfo.go | sed -e 's/.* //'`
+fi
+# For compatibility with the gc syscall package, treat 0x as 
-1.
+if test "$rliminf" = "0x"; then
+  echo "const RLIM_INFINITY = -1" >> ${OUT}
+elif test -n "$rliminf"; then
+  echo "const RLIM_INFINITY = $rliminf" >> ${OUT}
 fi
 
 # The sysinfo struct.


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-09 Thread Richard Biener
On Sun, Nov 4, 2018 at 7:33 AM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> Add a new pass to automatically instrument changes to variables
> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
> field into an Processor Trace log, which allows log over head
> logging of informatin.
>
> This allows to reconstruct how values later, which can be useful for
> debugging or other analysis of the program behavior. With the compiler
> support this can be done with without having to manually add instrumentation
> to the code.
>
> Using dwarf information this can be later mapped back to the variables.
>
> There are new options to enable instrumentation for different types,
> and also a new attribute to control analysis fine grained per
> function or variable level. The attributes can be set on both
> the variable and the type level, and also on structure fields.
> This allows to enable tracing only for specific code in large
> programs.
>
> The pass is generic, but only the x86 backend enables the necessary
> hooks. When the backend enables the necessary hooks (with -mptwrite)
> there is an additional pass that looks through the code for
> attribute vartrace enabled functions or variables.
>
> The -fvartrace-locals options is experimental: it works, but it
> generates redundant ptwrites because the pass doesn't use
> the SSA information to minimize instrumentation. This could be optimized
> later.
>
> Currently the code can be tested with SDE, or on a Intel
> Gemini Lake system with a new enough Linux kernel (v4.10+)
> that supports PTWRITE for PT. Linux perf can be used to
> record the values
>
> perf record -e intel_pt/ptw=1,branch=0/ program
> perf script --itrace=crw -F +synth ...
>
> I have an experimential version of perf that can also use
> dwarf information to symbolize many[1] values back to their variable
> names. So far it is not in standard perf, but available at
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4
>
> It is currently not able to decode all variable locations to names,
> but a large subset.
>
> Longer term hopefully gdb will support this information too.
>
> The CPU can potentially generate very data high bandwidths when
> code doing a lot of computation is heavily instrumented.
> This can cause some data loss in both the CPU and also in perf
> logging the data when the disk cannot keep up.
>
> Running some larger workloads most workloads do not cause
> CPU level overflows, but I've seen it with -fvartrace
> with crafty, and with more workloads with -fvartrace-locals.
>
> Recommendation is to not fully instrument programs,
> but only areas of interest either at the file level or using
> the attributes.
>
> The other thing is that perf and the disk often cannot keep up
> with the data bandwidth for longer computations. In this case
> it's possible to use perf snapshot mode (add --snapshot
> to the command line above). The data will be only logged to
> a memory ring buffer then, and only dump the buffers on events
> of interest by sending SIGUSR2 to the perf binrary.
>
> In the future this will be hopefully better supported with
> core files and gdb.
>
> Passes bootstrap and test suite on x86_64-linux, also
> bootstrapped and tested gcc itself with full -fvartrace
> and -fvartrace-locals instrumentation.

So how is this supposed to be used?  I guess in a
edit-debug cycle and not for production code?

What do you actually write with PTWRITE?  I suppose
you need to keep a ID to something mapping somewhere
so you can make sense of the perf records?

Few comments inline below, but I'm not sure if this
whole thing is interesting for GCC (as opposed to being
a instrumentation plugin)

> gcc/:
>
> 2018-11-03  Andi Kleen  
>
> * Makefile.in: Add tree-vartrace.o.
> * common.opt: Add -fvartrace, -fvartrace-returns,
> -fvartrace-args, -fvartrace-reads, -fvartrace-writes,
> -fvartrace-locals
> * config/i386/i386.c (ix86_vartrace_func): Add.
> (TARGET_VARTRACE_FUNC): Add.
> * doc/extend.texi: Document vartrace/no_vartrace
> attributes.
> * doc/invoke.texi: Document -fvartrace, -fvartrace-returns,
> -fvartrace-args, -fvartrace-reads, -fvartrace-writes,
> -fvartrace-locals
> * doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
> * passes.def: Add vartrace pass.
> * target.def (vartrace_func): Add.
> * tree-pass.h (make_pass_vartrace): Add.
> * tree-vartrace.c: New file to implement vartrace pass.
>
> gcc/c-family/:
>
> 2018-11-03  Andi Kleen  
>
> * c-attribs.c (handle_vartrace_attribute): New function.
>
> config/:
>
> 2018-11-03  Andi Kleen  
>
> * bootstrap-vartrace.mk: New.
> * bootstrap-vartrace-locals.mk: New.
> ---
>  config/bootstrap-vartrace-locals.mk |   3 +
>  config/bootstrap-vartrace.mk|   3 +
>  gcc/Makefile.in |   1 +
>  gcc/c-family/c-attribs.c|  23 

[RFC] support --with-multilib-list=@/path/name

2018-11-09 Thread Alexandre Oliva
Richard,

Olivier tells me he talked to you briefly at the Cauldron about allowing
custom multilib sets to be configured from custom Makefile fragments
supplied at configure time, and that you was somewhat receptive to the
idea.  I have implemented this as follows, for ARM targets only so far,
using "@/path/name" in the argument to --with-multilib-list to mean
/path/name is to be used as a multilib-list Makefile fragment.

Does this look like an acceptable interface?  Is it ok as far as ARM
maintainers are concerned?

Any objections from maintainers of other targets that support
--with-multilib-list (or from anyone else) to the syntax, to the
implementation, or to extending this possibility to their ports?

for  gcc/ChangeLog

* config.gcc (tmake_file): Add /path/name to tmake_file for
each @/path/name in --with-multilib-list on arm-*-* targets.
* configure.ac: Accept full pathnames in tmake_file.
* configure: Rebuilt.
* doc/install.texi (with-multilib-list): Document it.
---
 gcc/config.gcc   |   13 +
 gcc/configure|9 ++---
 gcc/configure.ac |5 -
 gcc/doc/install.texi |   22 +++---
 4 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7578ff03825e..f1363c41f989 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3998,6 +3998,19 @@ case "${target}" in
aprofile|rmprofile)

tmake_profile_file="arm/t-multilib"
;;
+   @/*)
+   ml=`echo "X$arm_multilib" | sed 
'1s,^X@,,'`
+   if test -f "${ml}"; then
+   
tmake_file="${tmake_file} ${ml}"
+   else
+   echo "Error: ${ml} does 
not exist" >&2
+   exit 1
+   fi
+   ;;
+   @*)
+   echo "Error: multilib config 
file must start with /" >&2
+   exit 1
+   ;;
*)
echo "Error: 
--with-multilib-list=${with_multilib_list} not supported." 1>&2
exit 1
diff --git a/gcc/configure b/gcc/configure
index b814484ea25b..5f15c7a1ff02 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -12244,7 +12244,10 @@ done
 tmake_file_=
 for f in ${tmake_file}
 do
-   if test -f ${srcdir}/config/$f
+   if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
+   then
+   tmake_file_="${tmake_file_} $f"
+   elif test -f ${srcdir}/config/$f
then
tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
fi
@@ -18572,7 +18575,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18575 "configure"
+#line 18578 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18678,7 +18681,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18681 "configure"
+#line 18684 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 59585912556b..99a3e6f8f52f 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1940,7 +1940,10 @@ done
 tmake_file_=
 for f in ${tmake_file}
 do
-   if test -f ${srcdir}/config/$f
+   if test -n `echo "X$f" | sed -n '1s,^X/.*,/,p` && test -f "$f"
+   then
+   tmake_file_="${tmake_file_} $f"
+   elif test -f ${srcdir}/config/$f
then
tmake_file_="${tmake_file_} \$(srcdir)/config/$f"
fi
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index be9b07b5d23b..fd19fc590ec8 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1078,13 +1078,21 @@ values and meaning for each target is given below.
 
 @table @code
 @item arm*-*-*
-@var{list} is a comma separated list of @code{aprofile} and @code{rmprofile}
-to build multilibs for A or R and M architecture profiles respectively.  Note
-that, due to some limitation of the current multilib framework, using the
-combined @code{aprofile,rmprofile} multilibs selects in some cases a less
-optimal multilib than when using the multilib profile for the architecture
-targetted.  The special value @code{default} is also accepted and is equivalent
-to omitting the option, ie. only the default run-time library will be 

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread James Greenhalgh
On Mon, Jan 22, 2018 at 09:22:27AM -0600, Richard Biener wrote:
> On Mon, Jan 22, 2018 at 4:01 PM, Wilco Dijkstra  
> wrote:
> > PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> > still
> > vectorized in a few cases, resulting in lower performance.  Increase the 
> > cost of
> > vector-to-scalar moves so it is more similar to the other vector costs. As 
> > a result
> > -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> > performance improves.
> >
> > OK for commit?
> 
> It would be better to dissect this cost into vec_to_scalar and vec_extract 
> where
> vec_to_scalar really means getting at the scalar value of a vector of
> uniform values
> which most targets can do without any instruction (just use a subreg).
> 
> I suppose we could also make vec_to_scalar equal to vector extraction and 
> remove
> the uses for the other case (reduction vector result to scalar reg).

I have dug up Richard's comments from last year, which you appear to have
ignored and made no reference to when resubmitting the patch.

Please don't do that. Carefully consider Richard's review feedback before
resubmitting this patch.

To reiterate, it is not OK for trunk.

Thanks,
James

> 
> Richard.
> 
> > ChangeLog:
> > 2018-01-22  Wilco Dijkstra  
> >
> > PR target/79262
> > * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> > vec_to_scalar_cost.


Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread James Greenhalgh
On Fri, Nov 09, 2018 at 08:14:27AM -0600, Wilco Dijkstra wrote:
> PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> still
> vectorized in a few cases, resulting in lower performance.  Increase the cost 
> of
> vector-to-scalar moves so it is more similar to the other vector costs. As a 
> result
> -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> performance improves.
> 
> OK for commit?

No.

We have 7 unique target tuning structures in the AArch64 backend, of which
only one has a 2x ratio between scalar_int_cost and vec_to_scalar_cost. Other
ratios are 1, 3, 8, 3, 4, 6.

What makes this choice correct? What makes it more correct than what we
have now? On which of the 28 entries in config/aarch64/aarch64-cores.def does
performance improve? Are the Spec benchmarks sufficiently representative to
change the generic vectorisation costs?

Please validate the performance effect of this patch, which changes default
code generation for everyone, on more than one testcase in a bug report.

Thanks,
James

> ChangeLog:
> 2018-01-22  Wilco Dijkstra  
> 
>     PR target/79262
>     * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> vec_to_scalar_cost.
> --


Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9

2018-11-09 Thread David Malcolm
On Fri, 2018-11-09 at 12:11 +0100, Jakub Jelinek wrote:
> On Fri, Nov 09, 2018 at 12:04:54PM +0100, Jakub Jelinek wrote:
> > On Fri, Nov 09, 2018 at 11:34:48AM +0100, Dominique d'Humières
> > wrote:
> > > Bootstrapping r265942 on darwin failed with
> > > 
> > > In file included from /Applications/Xcode-
> > > 6.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SD
> > > Ks/MacOSX10.9.sdk/usr/include/stdio.h:490,
> > >  from ../../../work/libgomp/affinity-fmt.c:28:
> > > ../../../work/libgomp/affinity-fmt.c: In function
> > > 'gomp_display_affinity':
> > > ../../../work/libgomp/affinity-fmt.c:369:17: error: format '%x'
> > > expects argument of type 'unsigned int', but argument 5 has type
> > > 'long unsigned int' -Werror=format=]
> > >   369 |   sprintf (buf, "0x%x", (uintptr_t) handle);
> > >   | ^~  ~~
> > 
> > Weird, the above prints a line which just isn't there, line 369 is:
> > sprintf (buf, "0x%x", (int) handle);
> > not
> > sprintf (buf, "0x%x", (uintptr_t) handle);
> > 
> > What is pthread_t on your platform?  An integral type (which one),
> > pointer
> > or something different (e.g. an aggregate)?
> 
> The "but argument 5" in there is also weird, sprintf has 3 arguments.
> Doesn't current gcc emit warnings like:
> /tmp/j.c: In function ‘main’:
> /tmp/j.c:8:21: warning: format ‘%x’ expects argument of type
> ‘unsigned int’, but argument 3 has type ‘long unsigned int’ [-
> Wformat=]
> 8 |   sprintf (buf, "0x%x", l);
>   |~^   ~
>   | |   |
>   | |   long unsigned int
>   | unsigned int
>   |%lx
> ?
> So, what exact compiler is compiling this code?

I agree that that looks weird.  I wonder if this is showing a bug in
the substring-locations.c or c-format.c code?  (I've changed these in
the last couple of months; maybe I messed up?)  Or is the compiler
picking up the wrong code?

The fact that it fails to use the location *within* the string in the
initial error but then prints a supporting "note" with it suggests to
me that there are macros involved.  Is sprintf a macro on this
configuration?  Maybe that's got something to do with it.

Dave


[PATCH][AArch64] Fix symbol offset limit

2018-11-09 Thread Wilco Dijkstra
In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations. 
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like _char + 0xff00.

To avoid this, limit the offset to +/-1MB so that the symbol needs to be within 
a
3.9GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.

Bootstrapped on AArch64, passes regress, OK for commit?

ChangeLog:
2018-11-09  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbol offsets.

testsuite/
* gcc.target/aarch64/symbol-range.c (foo): Set new limit.
* gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b6ea6d3f14b212b69f76d21b72527b6a8ea8cb0e..be03aeea8cd9bab07a01a161c4a91283bd25c99d
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11794,26 +11794,26 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
 the offset does not cause overflow of the final address.  But
 we have no way of knowing the address of symbol at compile time
 so we can't accurately say if the distance between the PC and
-symbol + offset is outside the addressible range of +/-1M in the
-TINY code model.  So we rely on images not being greater than
-1M and cap the offset at 1M and anything beyond 1M will have to
-be loaded using an alternative mechanism.  Furthermore if the
-symbol is a weak reference to something that isn't known to
-resolve to a symbol in this module, then force to memory.  */
+symbol + offset is outside the addressible range of +/-1MB in the
+TINY code model.  So we limit the maximum offset to +/-64KB and
+assume the offset to the symbol is not larger than +/-(1MB - 64KB).
+Furthermore force to memory if the symbol is a weak reference to
+something that doesn't resolve to a symbol in this module.  */
  if ((SYMBOL_REF_WEAK (x)
   && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (offset, -1048575, 1048575))
+ || !IN_RANGE (offset, -0x1, 0x1))
return SYMBOL_FORCE_TO_MEM;
+
  return SYMBOL_TINY_ABSOLUTE;
 
case AARCH64_CMODEL_SMALL:
  /* Same reasoning as the tiny code model, but the offset cap here is
-4G.  */
+1MB, allowing +/-3.9GB for the offset to the symbol.  */
  if ((SYMBOL_REF_WEAK (x)
   && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (offset, -0x10, 0x10))
return SYMBOL_FORCE_TO_MEM;
+
  return SYMBOL_SMALL_ABSOLUTE;
 
case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */
 
-int fixed_regs[0x0020];
+char fixed_regs[0x0020];
 
 int
-foo()
+main ()
 {
-  return fixed_regs[0x0008];
+  return fixed_regs[0x000ff000];
 }
 
 /* { dg-final { scan-assembler-not "adr\tx\[0-9\]+, fixed_regs\\\+" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
index 
6574cf4310430b847e77ea56bf8f20ef312d53e4..75c87c12f08004c153efc5192e5cfab566c089db
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=small" } */
 
-int fixed_regs[0x2ULL];
+char fixed_regs[0x2ULL];
 
 int
-foo()
+main ()
 {
-  return fixed_regs[0x1ULL];
+  return fixed_regs[0xf000ULL];
 }
 
 /* { dg-final { scan-assembler-not "adrp\tx\[0-9\]+, fixed_regs\\\+" } } */


[PATCH] Fix up affinity-fmt.c (was Re: [committed 0/4] (Partial) OpenMP 5.0 support for GCC 9)

2018-11-09 Thread Jakub Jelinek
Hi!

The earlier patch doesn't work, because there were still expressions
where handle could be still cast to integer of different size if it
happened to be a pointer, or an invalid cast of e.g. aggregate to
integer.

The following patch so far only tested on a simplified code should
handle it properly though, gomp_integral should yield an integral type
without a warning (for aggregates etc. 0, but that is what I wanted to
print, don't know what else to print if pthread_t is an aggregate I have no
idea what it contains).  Plus I've added some portability stuff for mingw
%llx vs. %I64x.

Can you please give it a whirl on Darwin and see what the
display-affinity-1.c testcase prints (in libgomp/testsuite/libgomp.log) ?

Thanks.

2018-11-09  Jakub Jelinek  

* affinity-fmt.c: Include inttypes.h if HAVE_INTTYPES_H.
(gomp_display_affinity): Use __builtin_choose_expr to handle
properly handle argument having integral, or pointer or some other
type.  If inttypes.h is available and PRIx64 is defined, use PRIx64
with uint64_t type instead of %llx and unsigned long long.

--- libgomp/affinity-fmt.c.jj   2018-11-08 18:08:01.412987460 +0100
+++ libgomp/affinity-fmt.c  2018-11-09 15:24:52.049169494 +0100
@@ -30,6 +30,9 @@
 #ifdef HAVE_UNISTD_H
 #include 
 #endif
+#ifdef HAVE_INTTYPES_H
+# include   /* For PRIx64.  */
+#endif
 #ifdef HAVE_UNAME
 #include 
 #endif
@@ -356,37 +359,42 @@ gomp_display_affinity (char *buffer, siz
  goto do_int;
case 'i':
 #if defined(LIBGOMP_USE_PTHREADS) && defined(__GNUC__)
- /* Handle integral pthread_t.  */
- if (__builtin_classify_type (handle) == 1)
-   {
- char buf[3 * (sizeof (handle) + sizeof (int)) + 4];
-
- if (sizeof (handle) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) handle);
- else if (sizeof (handle) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) handle);
- else
-   sprintf (buf, "0x%x", (int) handle);
- gomp_display_num (buffer, size, , zero, right, sz, buf);
- break;
-   }
- /* And pointer pthread_t.  */
- else if (__builtin_classify_type (handle) == 5)
-   {
- char buf[3 * (sizeof (uintptr_t) + sizeof (int)) + 4];
-
- if (sizeof (uintptr_t) == sizeof (long))
-   sprintf (buf, "0x%lx", (long) (uintptr_t) handle);
- else if (sizeof (uintptr_t) == sizeof (long long))
-   sprintf (buf, "0x%llx", (long long) (uintptr_t) handle);
- else
-   sprintf (buf, "0x%x", (int) (uintptr_t) handle);
- gomp_display_num (buffer, size, , zero, right, sz, buf);
- break;
-   }
+ {
+   char buf[3 * (sizeof (handle) + sizeof (uintptr_t) + sizeof (int))
++ 4];
+   /* This macro returns expr unmodified for integral or pointer
+  types and 0 for anything else (e.g. aggregates).  */
+#define gomp_nonaggregate(expr) \
+  __builtin_choose_expr (__builtin_classify_type (expr) == 1   \
+|| __builtin_classify_type (expr) == 5, expr, 0)
+   /* This macro returns expr unmodified for integral types,
+  (uintptr_t) (expr) for pointer types and 0 for anything else
+  (e.g. aggregates).  */
+#define gomp_integral(expr) \
+  __builtin_choose_expr (__builtin_classify_type (expr) == 5,  \
+(uintptr_t) gomp_nonaggregate (expr),  \
+gomp_nonaggregate (expr))
+
+   if (sizeof (gomp_integral (handle)) == sizeof (unsigned long))
+ sprintf (buf, "0x%lx", (unsigned long) gomp_integral (handle));
+#if defined (HAVE_INTTYPES_H) && defined (PRIx64)
+   else if (sizeof (gomp_integral (handle)) == sizeof (uint64_t))
+ sprintf (buf, "0x" PRIx64, (uint64_t) gomp_integral (handle));
+#else
+   else if (sizeof (gomp_integral (handle))
+== sizeof (unsigned long long))
+ sprintf (buf, "0x%llx",
+  (unsigned long long) gomp_integral (handle));
 #endif
+   else
+ sprintf (buf, "0x%x", (unsigned int) gomp_integral (handle));
+   gomp_display_num (buffer, size, , zero, right, sz, buf);
+   break;
+ }
+#else
  val = 0;
  goto do_int;
+#endif
case 'A':
  if (sz == (size_t) -1)
gomp_display_affinity_place (buffer, size, ,


Jakub


Ping: [PATCH, testsuite] ignore some "conflicting types for built-in" messages

2018-11-09 Thread Paul Koning
Ping.

I'd like to commit this.  The discussion seems to have ended up with the 
conclusion that this is a reasonable approach.

paul


> On Nov 1, 2018, at 3:13 PM, Paul Koning  wrote:
> 
> A number of test cases contain declarations like:
>  void *memcpy();
> which currently are silently accepted on most platforms but not on all; pdp11 
> (and possibly some others) generate a "conflicting types for built-in 
> function" warning.
> 
> It was suggested to prune those messages because the test cases where these 
> occur are not looking for the message but are testing some other issue, so 
> the message is not relevant.  The attached patch adds dg-prune-output 
> directives to do so.
> 
> Ok for trunk?
> 
>   paul
> 
> ChangeLog:
> 
> 2018-11-01  Paul Koning  
> 
>   * gcc.dg/Walloca-16.c: Ignore conflicting types for built-in
>   warnings.
>   * gcc.dg/Wrestrict-4.c: Ditto.
>   * gcc.dg/Wrestrict-5.c: Ditto.
>   * gcc.dg/pr83463.c: Ditto.
>   * gcc.dg/torture/pr55890-2.c: Ditto.
>   * gcc.dg/torture/pr55890-3.c: Ditto.
>   * gcc.dg/torture/pr71816.c: Ditto.
> 
> Index: gcc.dg/Walloca-16.c
> ===
> --- gcc.dg/Walloca-16.c   (revision 265727)
> +++ gcc.dg/Walloca-16.c   (working copy)
> @@ -1,5 +1,6 @@
> /* PR tree-optimization/84224 */
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> /* { dg-options "-O0 -Walloca" } */
> 
> void *alloca ();
> Index: gcc.dg/Wrestrict-4.c
> ===
> --- gcc.dg/Wrestrict-4.c  (revision 265727)
> +++ gcc.dg/Wrestrict-4.c  (working copy)
> @@ -3,6 +3,7 @@
>Test to verify that invalid calls to built-in functions declared
>without a prototype don't cause an ICE.
>{ dg-do compile }
> +   { dg-prune-output "conflicting types for built-in" }
>{ dg-options "-O2 -Warray-bounds -Wrestrict" } */
> 
> void* memcpy ();
> Index: gcc.dg/Wrestrict-5.c
> ===
> --- gcc.dg/Wrestrict-5.c  (revision 265727)
> +++ gcc.dg/Wrestrict-5.c  (working copy)
> @@ -2,6 +2,7 @@
>functions declared with no prototype are checked for overlap, and that
>invalid calls are ignored.
>   { dg-do compile }
> +  { dg-prune-output "conflicting types for built-in" }
>   { dg-options "-O2 -Wrestrict" }  */
> 
> typedef __SIZE_TYPE__ size_t;
> Index: gcc.dg/pr83463.c
> ===
> --- gcc.dg/pr83463.c  (revision 265727)
> +++ gcc.dg/pr83463.c  (working copy)
> @@ -1,5 +1,6 @@
> /* PR middle-end/83463 */
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> /* { dg-options "-O2 -Wrestrict -Wno-pointer-to-int-cast" } */
> 
> int *a;
> Index: gcc.dg/torture/pr55890-2.c
> ===
> --- gcc.dg/torture/pr55890-2.c(revision 265727)
> +++ gcc.dg/torture/pr55890-2.c(working copy)
> @@ -1,4 +1,5 @@
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> 
> extern void *memcpy();
> int main() { memcpy(); }
> Index: gcc.dg/torture/pr55890-3.c
> ===
> --- gcc.dg/torture/pr55890-3.c(revision 265727)
> +++ gcc.dg/torture/pr55890-3.c(working copy)
> @@ -1,4 +1,5 @@
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> 
> void *memmove ();
> 
> Index: gcc.dg/torture/pr71816.c
> ===
> --- gcc.dg/torture/pr71816.c  (revision 265727)
> +++ gcc.dg/torture/pr71816.c  (working copy)
> @@ -1,4 +1,5 @@
> /* { dg-do compile } */
> +/* { dg-prune-output "conflicting types for built-in" } */
> 
> void *ext2fs_resize_mem_p;
> struct ext2_icount_el {
> 



[PATCH] Add value_range_base (w/o equivs)

2018-11-09 Thread Richard Biener


This adds value_range_base, a base class of class value_range
with all members but the m_equiv one.

I have looked into the sole GC user, IPA propagation, and replaced
the value_range use there with value_range_base since it also
asserted the equiv member is always NULL.

This in turn means I have written down that GC users only can
use value_range_base (and fixed the accessability issue with
adding a bunch of friends).

I have moved all methods that are required for this single user
sofar (without looking with others are trivially movable because
they do not look at equivs - that's for a followup).  I ended
up basically duplicating the ::union_ wrapper around union_ranges
(boo).  Some more refactoring might save us a few lines here.

There are two places where IPA prop calls extract_range_from_unary_expr.
I've temporarily made it use value_range temporaries there given
I need to think about the few cases of ->deep_copy () we have in
there.  IMHO equiv handling would need to move to the callers or
rather "copies" shouldn't be handled by extract_range_from_unary_expr.
Well, I need to think about it.  (no, I don't want to make that
function virtual)

The other workers might be even easier to massage to work on 
value_range_base only.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

If that goes well I want to apply this even in its incomplete form.
Unless there are any complaints?

Thanks,
Richard.

>From 525e2e74bbb666399fafcffe8f17e928f45ca898 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Fri, 9 Nov 2018 15:00:50 +0100
Subject: [PATCH] add-value_range_base

* tree-vrp.h (class value_range_base): New base class for
value_range containing all but the m_equiv member.
(dump_value_range_base): Add.
(range_includes_zero_p): Work on value_range_base.
* tree-vrp.c (value_range_base::set): Split out base handling
from...
(value_range::set): this.
(value_range::set_equiv): New.
(value_range_base::value_range_base): New constructors.
(value_range_base::check): Split out base handling from...
(value_range::check): this.
(value_range::equal_p): Refactor in terms of
ignore_equivs_equal_p which is now member of the base.
(value_range_base::set_undefined): New.
(value_range_base::set_varying): Likewise.
(value_range_base::dump):Split out base handling from...
(value_range::dump): this.
(value_range_base::set_and_canonicalize): Split out base handling
from...
(value_range::set_and_canonicalize): this.
(value_range_base::union_): New.

* ipa-prop.h (struct ipa_jump_func): Use value_range_base *
for m_vr.
* ipa-cp.c (class ipcp_vr_lattice): Use value_range_base
instead of value_range everywhere.
(ipcp_vr_lattice::print): Use dump_value_range_base.
(ipcp_vr_lattice::meet_with): Adjust.
(ipcp_vr_lattice::meet_with_1): Likewise.
(ipa_vr_operation_and_type_effects): Likewise.
(propagate_vr_across_jump_function): Likewise.
* ipa-prop.c (struct ipa_vr_ggc_hash_traits): Likewise.
(ipa_get_value_range): Likewise.
(ipa_set_jfunc_vr): Likewise.
(ipa_compute_jump_functions_for_edge): Likewise.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4471bae11c7..4f147eb37cc 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -307,18 +307,18 @@ private:
 class ipcp_vr_lattice
 {
 public:
-  value_range m_vr;
+  value_range_base m_vr;
 
   inline bool bottom_p () const;
   inline bool top_p () const;
   inline bool set_to_bottom ();
-  bool meet_with (const value_range *p_vr);
+  bool meet_with (const value_range_base *p_vr);
   bool meet_with (const ipcp_vr_lattice );
   void init () { gcc_assert (m_vr.undefined_p ()); }
   void print (FILE * f);
 
 private:
-  bool meet_with_1 (const value_range *other_vr);
+  bool meet_with_1 (const value_range_base *other_vr);
 };
 
 /* Structure containing lattices for a parameter itself and for pieces of
@@ -522,7 +522,7 @@ ipcp_bits_lattice::print (FILE *f)
 void
 ipcp_vr_lattice::print (FILE * f)
 {
-  dump_value_range (f, _vr);
+  dump_value_range_base (f, _vr);
 }
 
 /* Print all ipcp_lattices of all functions to F.  */
@@ -909,7 +909,7 @@ ipcp_vr_lattice::meet_with (const ipcp_vr_lattice )
lattice.  */
 
 bool
-ipcp_vr_lattice::meet_with (const value_range *p_vr)
+ipcp_vr_lattice::meet_with (const value_range_base *p_vr)
 {
   return meet_with_1 (p_vr);
 }
@@ -918,7 +918,7 @@ ipcp_vr_lattice::meet_with (const value_range *p_vr)
OTHER_VR lattice.  Return TRUE if anything changed.  */
 
 bool
-ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
+ipcp_vr_lattice::meet_with_1 (const value_range_base *other_vr)
 {
   if (bottom_p ())
 return false;
@@ -926,7 +926,7 @@ ipcp_vr_lattice::meet_with_1 (const value_range *other_vr)
   if (other_vr->varying_p ())
 return set_to_bottom ();
 
-  value_range 

[PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-11-09 Thread Wilco Dijkstra
Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
bitfields by their declared type, which results in better codegeneration on 
practically
any target.

I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
from GCC as it's confusing and useless.

OK for commit until we get rid of it?

ChangeLog:
2017-11-17  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
056110afb228fb919e837c04aa5e5552a4868ec3..d8f4d129a02fb89eb00d256aba8c4764d6026078
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -769,14 +769,9 @@ typedef struct
    if given data not on the nominal alignment.  */
 #define STRICT_ALIGNMENT    TARGET_STRICT_ALIGN
 
-/* Define this macro to be non-zero if accessing less than a word of
-   memory is no faster than accessing a word of memory, i.e., if such
-   accesses require more than one instruction or if there is no
-   difference in cost.
-   Although there's no difference in instruction count or cycles,
-   in AArch64 we don't want to expand to a sub-word to a 64-bit access
-   if we don't have to, for power-saving reasons.  */
-#define SLOW_BYTE_ACCESS   0
+/* Contrary to all documentation, this enables wide bitfield accesses,
+   which results in better code when accessing multiple bitfields.  */
+#define SLOW_BYTE_ACCESS   1
 
 #define NO_FUNCTION_CSE 1



[PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
PR79262 has been fixed for almost all AArch64 cpus, however the example is still
vectorized in a few cases, resulting in lower performance.  Increase the cost of
vector-to-scalar moves so it is more similar to the other vector costs. As a 
result
-mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
performance improves.

OK for commit?

ChangeLog:
2018-01-22  Wilco Dijkstra  

    PR target/79262
    * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
vec_to_scalar_cost.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
   1, /* vec_int_stmt_cost  */
   1, /* vec_fp_stmt_cost  */
   2, /* vec_permute_cost  */
-  1, /* vec_to_scalar_cost  */
+  2, /* vec_to_scalar_cost  */
   1, /* scalar_to_vec_cost  */
   1, /* vec_align_load_cost  */
   1, /* vec_unalign_load_cost  */


[PATCH][ARM] Update max_cond_insns settings

2018-11-09 Thread Wilco Dijkstra
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences.
Also such long sequences of conditional instructions can increase the number
of executed instructions significantly, so using 5 for max_cond_insns is
non-optimal.

Previous benchmarking showed that setting max_cond_insn to 2 was the best value
for Cortex-A15 and Cortex-A57.  All ARMv8-A cores use 2 - apart from Cortex-A35
and Cortex-A53.  Given that using 5 is worse, set it to 2.  This also has the
advantage of producing more uniform code.

Bootstrap and regress OK on arm-none-linux-gnueabihf.

OK for stage 1?

ChangeLog:
2017-04-12  Wilco Dijkstra  

    * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
    (arm_cortex_a35_tune): Likewise.
---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
29e8d1d07d918fbb2a627a653510dfc8587ee01a..1a6d552aa322114795acbb3667c6ea36963bf193
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1967,7 +1967,7 @@ const struct tune_params arm_cortex_a35_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   1,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
@@ -1990,7 +1990,7 @@ const struct tune_params arm_cortex_a53_tune =
   arm_default_branch_cost,
   _default_vec_cost,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   8,   /* Memset max inline.  */
   2,   /* Issue rate.  */
   ARM_PREFETCH_NOT_BENEFICIAL,


Re: [PATCH v3] S/390: Allow relative addressing of literal pool entries

2018-11-09 Thread Ulrich Weigand
Ilya Leoshkevich wrote:

> +  /* Return unannotated constant pool references, so that the corresponding
> + entries are added to the back-end-managed pool.  Not doing so would 
> result
> + in those entries being placed in the middle-end-managed pool, which 
> would
> + in turn prevent merging them with identical ones in the back-end-managed
> + pool.  */

I'm still not convinced this is necessary; how frequently does the case occur
that a constant is duplicated, and is it worth the extra code complexity?
(Also, note that the middle-end pool is shared across the whole translation
unit, while the back-end pool is duplicated in every function ... so if the
same constant is used by many functions, forcing it into the back-end pool
may *increase* overall size.)

In any case, does this even happen with your current code?  You find the
unannotated reference here, and therefore copy the constant into the local
pool, but then replace_constant_pool_ref will still ignore the insn, and
thus the insn will actually continue to refer to the middle-end pool
constant, *not* the copy created in the local pool.  Right?

>if (DISP_IN_RANGE (INTVAL (frame_off)))
>   {
> -   insn = gen_rtx_SET (frame_pointer,
> -   gen_rtx_PLUS (Pmode, frame_pointer, frame_off));
> -   insn = emit_insn (insn);
> +   insn = emit_insn (copy_rtx (cfa));
>   }

This seems unrelated?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH, arm] Backport -- Fix ICE during thunk generation with -mlong-calls

2018-11-09 Thread Mihail Ionescu

Hi Sudi

On 11/08/2018 06:53 PM, Sudakshina Das wrote:

Hi Mihail

On 08/11/18 10:02, Ramana Radhakrishnan wrote:

On 07/11/2018 17:49, Mihail Ionescu wrote:

Hi All,

This is a backport from trunk for GCC 8 and 7.

SVN revision: r264595.

Regression tested on arm-none-eabi.


gcc/ChangeLog

2018-11-02  Mihail Ionescu  

Backport from mainiline
2018-09-26  Eric Botcazou  

* config/arm/arm.c (arm_reorg): Skip Thumb reorg pass for thunks.
(arm32_output_mi_thunk): Deal with long calls.

gcc/testsuite/ChangeLog

2018-11-02  Mihail Ionescu  

Backport from mainiline
2018-09-17  Eric Botcazou  

* g++.dg/other/thunk2a.C: New test.
* g++.dg/other/thunk2b.C: Likewise.


If everything is ok, could someone commit it on my behalf?

Best regards,
   Mihail



It is a regression since my rewrite of this code.

Ok to backport to the release branches, it's been on trunk for a while
and not shown any issues - please give the release managers a day or so
to object.

regards
Ramana



Does this fix PR87867 you reported? If yes, then it would be easier
to add the PR tag in the ChangeLog so that the ticket gets updated once
committed.

Thanks
Sudi



I've seen the patch has been commited (Ramana added the PR tag), but I
now know it has to be added and I will do it in the future.

Regards
Mihail


Re: [PATCH] Come up with htab_hash_string_vptr and use string-specific if possible.

2018-11-09 Thread Michael Matz
Hi,

On Thu, 8 Nov 2018, Martin Liška wrote:

> > That seems better.  But still, why declare this in system.h?  It seems 
> > hash-table.h seems more appropriate.
> 
> I need to declare it before I'll poison it. As system.h is included very 
> early, one can guarantee that there will be no usage before the 
> poisoning happens.

Yes but it's also included everywhere, so adding anything to it comes at a 
cost, and conceptually it simply doesn't belong there.

There's no fundamental reason why we can't poison identifiers in other 
headers.  Indeed we do in vec.h.  So move the whole thing including 
poisoning to hash-table.h?


Ciao,
Michael.

[committed] json.cc: fix comment

2018-11-09 Thread David Malcolm
I've committed this obvious fix to trunk as r265968 (a copy error).

gcc/ChangeLog:
* json.cc (selftest::test_writing_literals): Fix comment.
---
 gcc/json.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/json.cc b/gcc/json.cc
index a0c43956..46b6ef6 100644
--- a/gcc/json.cc
+++ b/gcc/json.cc
@@ -288,7 +288,7 @@ test_writing_strings ()
   assert_print_eq (contains_quotes, "\"before \\\"quoted\\\" after\"");
 }
 
-/* Verify that JSON strings are written correctly.  */
+/* Verify that JSON literals are written correctly.  */
 
 static void
 test_writing_literals ()
-- 
1.8.5.3



  1   2   >