[Patch] S/390: Fix symbol ref alignment

2015-10-23 Thread Robin Dapp
not always be generated. This patch uses separate flags for 2-, 4-, and 8-byte alignment to fix the problem. Bootstrapped, no regressions on s390. Regards Robin gcc/testsuite/ChangeLog: 2015-10-23 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.target/s390/load-relative-check.c: Ne

CSE pass prevents loop-invariant motion

2015-09-15 Thread Robin Dapp
Hi, recently, I came across a problem that keeps a load instruction in a loop although it is loop-invariant. A simple example is: #include #define SZ 256 int a[SZ], b[SZ], c[SZ]; int main() { int i; for (i = 0; i < SZ; i++) { a[i] = b[i] + c[i]; } printf("%d\n", a[0]); } The

Re: CSE pass prevents loop-invariant motion

2015-09-24 Thread Robin Dapp
On 09/15/2015 05:25 PM, Jeff Law wrote: > On 09/15/2015 06:11 AM, Robin Dapp wrote: >> Hi, >> >> recently, I came across a problem that keeps a load instruction in a >> loop although it is loop-invariant. [..] > You might want to check your costing model -- cprop

Re: [Patch] S/390: Simplify vector conditionals

2015-12-17 Thread Robin Dapp
Hi, the attached patch renames the constm1_operand predicate to all_ones_operand and introduces a check for int mode. It should be applied on top of the last patch ([Patch] S/390: Simplify vector conditionals). Regtested on s390. Regards Robin gcc/ChangeLog: 2015-12-15 Robin Dapp <

[Patch] S/390: Simplify vector conditionals

2015-12-15 Thread Robin Dapp
ree-level. Bootstrapped and regression-tested on s390. Regards Robin gcc/ChangeLog: 2015-12-15 Robin Dapp <rd...@linux.vnet.ibm.com> * config/s390/s390.c (s390_expand_vcond): Convert vector conditional into shift. * config/s390/vector.md: Change operand predicate.

[PATCH] Some tree-vect-data-refs.c cleanup

2016-04-13 Thread Robin Dapp
? No regressions on s390x and amd64. Regards Robin -- gcc/ChangeLog: 2016-04-13 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vectorizer.h (dr_misalignment): Introduce named DR_MISALIGNMENT constants. (aligned_access_p): Use constants. (known_alignment_for_ac

[RFC] 69526 - ivopts candidate strangeness

2016-03-20 Thread Robin Dapp
et perform bootstrapping and more testing due to the premature nature of the patch. Thanks Robin gcc/ChangeLog: 2016-03-17 Robin Dapp <rd...@linux.vnet.ibm.com> * cfgloop.h (struct GTY): Add second number of iterations * loop-doloop.c (doloop_condition_get)

[PATCH] Tree-level fix for PR 69526

2016-07-21 Thread Robin Dapp
As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526, we currently fail to simplify cases like (unsigned long)(a - 1) + 1 to (unsigned long)a when VRP knows that (a - 1) does not overflow. This patch introduces a match.pd pattern as well as a helper function that checks for

Re: [PATCH] Tree-level fix for PR 69526

2017-02-02 Thread Robin Dapp
I skimmed through the code to see where transformation like (a - 1) -> (a + UINT_MAX) are performed. It seems there are only two places, match.pd (/* A - B -> A + (-B) if B is easily negatable. */) and fold-const.c. In order to be able to reliably know whether to zero-extend or to sign-extend

Re: [PATCH] Tree-level fix for PR 69526

2017-01-16 Thread Robin Dapp
Ping. To put it shortly, I'm not sure how to differentiate between: example range of a: [3,3] (ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(-1 + 1), sign extend example range of a: [0,0] (ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(UINT_MAX + 1), no sign extend In this case, there

Re: [PATCH] Tree-level fix for PR 69526

2016-08-22 Thread Robin Dapp
for now because I find extract_range_from_binary_expr_1 somewhat lengthy and hard to follow already :) Wouldn't it be better to "separate concerns"/split it up in the long run and merge the functionality needed here at some time? Bootstrapped and reg-tested on s390x, bootstrap on x86 running.

[PATCH] S/390: Change 2-byte NOPs

2017-03-01 Thread Robin Dapp
Hi, the following patch changes "nopr %r7" to "nopr %r0" which is advantageous from a hardware perspective. It will only be emitted for hotpatching and should not impact normal code. Bootstrapped and regression tested on s390 and s390x. Regards Robin gcc/ChangeLog: 20

Re: [PATCH] Tree-level fix for PR 69526

2016-09-05 Thread Robin Dapp
Ping. diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index 2beadbc..d66fcb1 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "case-cfn-macros.h" #include

Re: [PATCH] Use RPO order for fwprop iteration

2016-09-02 Thread Robin Dapp
This causes a performance regression in the xalancbmk SPECint2006 benchmark on s390x. At first sight, the produced asm output doesn't look too different but I'll have a closer look. Is the fwprop order supposed to have major performance implications? Regards Robin > This changes it from PRE on

Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer

2016-09-26 Thread Robin Dapp
idn't manage to run it independently in this directory via RUNTESTFLAGS=vect.exp=... or otherwise) Bootstrapped on x86 and s390. -- gcc/ChangeLog: 2016-09-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-loop-manip.c (create_intersect_range_checks_index): Add tree

Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer

2016-09-26 Thread Robin Dapp
i_p(). ok to commit? Regards Robin -- gcc/ChangeLog: 2016-09-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-loop-manip.c (create_intersect_range_checks_index): Add tree_fits_uhwi_p check. diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c inde

Re: [PATCH] Tree-level fix for PR 69526

2016-10-05 Thread Robin Dapp
Ping.

Re: [PATCH] Tree-level fix for PR 69526

2016-09-20 Thread Robin Dapp
flow. Do you think it should be handled differently? Revised version attached. Regards Robin -- gcc/ChangeLog: 2016-09-20 Robin Dapp <rd...@linux.vnet.ibm.com> PR middle-end/69526 This enables combining of wrapped binary operations and fixes the tree l

Re: [PATCH] Tree-level fix for PR 69526

2016-08-23 Thread Robin Dapp
gah, this + return true; + if (TREE_CODE (t1) != SSA_NAME) should of course be like this + if (TREE_CODE (t1) != SSA_NAME) + return true; in the last patch.

Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer

2016-09-27 Thread Robin Dapp
> Also the '=' in the split line goes to the next line according to > coding conventions. fixed, I had only looked at an instance one function above which had it wrong as well. Also changed comment grammar slightly. Regards Robin -- gcc/ChangeLog: 2016-09-27 Robin Dap

Re: [PATCH] Fix PR77407

2016-10-01 Thread Robin Dapp
This introduces an ICE ("bogus comparison result type") on s390 for the following test case: #include void foo(int dim) { int ba, sign; ba = abs (dim); sign = dim / ba; } Doing diff --git a/gcc/match.pd b/gcc/match.pd index ba7e013..2455592 100644 --- a/gcc/match.pd +++

Re: [PATCH] Tree-level fix for PR 69526

2016-10-14 Thread Robin Dapp
Ping :)

Re: [PATCH] Tree-level fix for PR 69526

2016-11-24 Thread Robin Dapp
Ping.

Re: [PATCH] Tree-level fix for PR 69526

2016-11-28 Thread Robin Dapp
>> + /* Sign-extend @1 to TYPE. */ >> + w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED); >> >> not sure why you do always sign-extend. If the inner op is unsigned >> and we widen then that's certainly bogus considering your UINT_MAX >> example above. Does >> >>

Re: [PATCH] Tree-level fix for PR 69526

2016-11-16 Thread Robin Dapp
Found some time to look into this again. > Index: tree-ssa-propagate.c > === > --- tree-ssa-propagate.c(revision 240133) > +++ tree-ssa-propagate.c(working copy) > @@ -1105,10 +1105,10 @@

Re: [PATCH] Tree-level fix for PR 69526

2016-12-04 Thread Robin Dapp
Ping. Any idea how to tackle this?

Re: [PATCH] Tree-level fix for PR 69526

2017-01-10 Thread Robin Dapp
Perhaps I'm still missing how some cases are handled or not handled, sorry for the noise. > I'm not sure there is anything to "interpret" -- the operation is unsigned > and overflow is when the operation may wrap around zero. There might > be clever ways of re-writing the expression to >

Re: [PATCH] Tree-level fix for PR 69526

2016-12-07 Thread Robin Dapp
> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type) > produces (uint64_t)uint32 + -1U + 1. This simply means that we cannot ignore > overflow of the inner operation and for some reason your change > to extract_range_from_binary_expr didn't catch this. That is _8 +

[PATCH] Fix s390 testcase vcond-shift

2017-03-27 Thread Robin Dapp
Hi, this patch fixes the vcond shift testcase that failed since setting PARAM_MIN_VECT_LOOP_BOUND in the s390 backend. Regards Robin -- gcc/testsuite/ChangeLog: 2017-03-27 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.target/s390/vector/vcond-shift.c (void foo): In

[RFC] S/390: Alignment peeling prolog generation

2017-04-11 Thread Robin Dapp
Hi, when looking at various vectorization examples on s390x I noticed that we still peel vf/2 iterations for alignment even though vectorization costs of unaligned loads and stores are the same as normal loads/stores. A simple example is void foo(int *restrict a, int *restrict b, unsigned int

Re: [RFC] S/390: Alignment peeling prolog generation

2017-04-11 Thread Robin Dapp
Hi Bin, > Seems Richi added code like below comparing costs between aligned and > unsigned loads, and only peeling if it's beneficial: > > /* In case there are only loads with different unknown misalignments, > use > peeling only if it may help to align other accesses in the loop

Re: [RFC] S/390: Alignment peeling prolog generation

2017-04-12 Thread Robin Dapp
> Note I was very conservative here to allow store bandwidth starved > CPUs to benefit from aligning a store. > > I think it would be reasonable to apply the same heuristic to the > store case that we only peel for same cost if peeling would at least > align two refs. Do you mean checking if

[PATCH] S/390: Disable vectorization for loops with few iterations

2017-03-02 Thread Robin Dapp
gards Robin [1] https://gcc.gnu.org/ml/gcc/2017-01/msg00234.html [2] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01562.html -- gcc/ChangeLog: 2017-03-02 Robin Dapp <rd...@linux.vnet.ibm.com> * config/s390/s390.c (s390_option_override_internal): Set PARAM_MIN_VECT_LOOP_

[PATCH, committed] Add myself to MAINTAINERS

2017-07-31 Thread Robin Dapp
ChangeLog: 2017-07-31 Robin Dapp <rd...@linux.vnet.ibm.com> * MAINTAINERS (write after approval): Add myself. Index: MAINTAINERS === --- MAINTAINERS (revision 250740) +++ MAINTAINERS (working copy) @@ -356,6

[PATCH] Fix PR81362: Vector peeling

2017-07-12 Thread Robin Dapp
the body_cost_vec parameter which is not used elsewhere. Regards Robin -- gcc/ChangeLog: 2017-07-12 Robin Dapp <rd...@linux.vnet.ibm.com> * (vect_enhance_data_refs_alignment): Remove body_cost_vec from _vect_peel_extended_info. tree-vect-data-

[RFC] If conversion min/max search, costs and problems

2017-07-25 Thread Robin Dapp
Hi, recently I wondered why a snippet like the following is not being if-converted at all on s390: int foo (int *a, unsigned int n) { int min = 99; int bla = 0; for (int i = 0; i < n; i++) { if (a[i] < min) { min = a[i]; bla = 1; } }

Re: [RFC] If conversion min/max search, costs and problems

2017-07-26 Thread Robin Dapp
> Do you have an example where wrong code is generated through the > noce_convert_multiple_sets_p path (with or without bodged costs)? > > Both AArch64 and x86-64 reject your testcase along this codepath because > of the constant set of 1. If we work around that by setting bla = n rather > than

Re: [PATCH 2/3] Simplify wrapped binops

2017-07-05 Thread Robin Dapp
[3/3] Tests -- gcc/testsuite/ChangeLog: 2017-07-05 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.dg/wrapped-binop-simplify-signed-1.c: New test. * gcc.dg/wrapped-binop-simplify-signed-2.c: New test. * gcc.dg/wrapped-binop-simplify-unsigned-1.c: Ne

Re: [PATCH 2/3] Simplify wrapped binops

2017-07-05 Thread Robin Dapp
> While the initialization value doesn't matter (wi::add will overwrite it) > better initialize both to false ;) Ah, you mean because we want to > transform only if get_range_info returned VR_RANGE. Indeed somewhat > unintuitive (but still the best variant for now). > so I'm still missing a

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-28 Thread Robin Dapp
> ideally you'd use a wide-int here and defer the tree allocation to the result Did that in the attached version. > So I guess we never run into the outer_op == minus case as the above is > clearly wrong for that? Right, damn, not only was the treatment for this missing but it was bogus in the

[PATCH 4/5 v2] Vect peeling cost model

2017-05-11 Thread Robin Dapp
Included the workaround for SLP now. With it, testsuite is clean on x86 as well. gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_get_data_access_cost): Workaround for SLP handling. (vect_enhance_data_refs_ali

[PATCH 3/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling): Return peeling info and set costs to zero for unlimited cost model. (vect_enhance_data_refs_alignment): Also inspect all da

[PATCH 1/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vectorizer.h (dr_misalignment): Introduce DR_MISALIGNMENT_UNKNOWN. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Refactoring. (vect_update_misalignment_for_peel

[PATCH 2/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Change comment and rename variable. (vect_get_peeling_costs_all_drs): New function. (vect_peeling_hash_get_lowest_cost

[PATCH 4/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Remove check for supportable_dr_alignment, compute costs for doing no peeling at all, compare to the best peeling costs so far

[PATCH 5/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/testsuite/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.target/s390/vector/vec-nopeel-2.c: New test. diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c new file mode 100644 index 0

Re: [RFC] S/390: Alignment peeling prolog generation

2017-05-11 Thread Robin Dapp
Included the requested changes in the patches (to follow). I removed the alignment count check now altogether. > I'm not sure why you test for unlimited_cost_model here as I said > elsewhere I'm not sure > what not cost modeling means for static decisions. The purpose of > unlimited_cost_model

Re: [PATCH] Tree-level fix for PR 69526

2017-05-09 Thread Robin Dapp
ping.

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-21 Thread Robin Dapp
> use INTEGRAL_TYPE_P. Done. > but you do not actually _use_ vr_outer. Do you think that if > vr_outer is a VR_RANGE then the outer operation may not > possibly have wrapped? That's a false conclusion. These were remains of a previous version. vr_outer is indeed not needed anymore; removed.

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-20 Thread Robin Dapp
r max overflow, split/anti range). Test suite on s390x has no regressions, bootstrap is ok, x86 running. Regards Robin -- gcc/ChangeLog: 2017-06-19 Robin Dapp <rd...@linux.vnet.ibm.com> * match.pd: Simplify wrapped binary operations. diff --git a/gcc/match.pd b/gcc/match.pd in

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-27 Thread Robin Dapp
Ping.

Re: [PATCH 2/3] Simplify wrapped binops

2017-05-19 Thread Robin Dapp
> I can guess what is happening here. It's a 40 bits unsigned long long > field, (s.b-8) will be like: > _1 = s.b > _2 = _1 + 0xf8 > Also get_range_info returns value range [0, 0xFF] for _1. > You'd need to check if _1(with range [0, 0xFF]) + 0xf8 > overflows

Re: [PATCH] Tree-level fix for PR 69526

2017-05-18 Thread Robin Dapp
> Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient > condition for such transformation? Yes, in principle this should suffice. What we're actually looking for is something like a "proper" (or no) overflow, i.e. an overflow in both min and max of the value range. In (a +

[PATCH 1/3] Simplify wrapped binops

2017-05-18 Thread Robin Dapp
This tries to fold unconditionally and fixes some test cases. gcc/ChangeLog: 2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-ssa-propagate.c (substitute_and_fold_dom_walker::before_dom_children): Always try to fold. gcc/testsuite/ChangeLog: 2017-05-18 Robi

Re: [PATCH 2/3] Simplify wrapped binops

2017-05-18 Thread Robin Dapp
> Any reason to expose tree-vrp.c internal interface here? The function > looks quite expensive. Overflow check can be done by get_range_info > and simple wi::cmp calls. Existing code like in > tree-ssa-loop-niters.c already does that. Also could you avoid using > comma expressions in

[PATCH 2/3] Simplify wrapped binops

2017-05-18 Thread Robin Dapp
match.pd part of the patch. gcc/ChangeLog: 2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com> * match.pd: Simplify wrapped binary operations. * tree-vrp.c (extract_range_from_binary_expr_1): Add overflow parameter. (extract_range_from_binary_expr): Li

[PATCH 3/3] Simplify wrapped binops

2017-05-18 Thread Robin Dapp
New testcases. gcc/testsuite/ChangeLog: 2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.dg/wrapped-binop-simplify-signed-1.c: New test. * gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test. * gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test. diff

[PATCH 0/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
The last version of the patch series caused some regressions for ppc64. This was largely due to incorrect handling of unsupportable alignment and should be fixed with the new version. p2 and p5 have not changed but I'm posting the whole series again for reference. p1 only changed comment

[PATCH 5/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/testsuite/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.target/s390/vector/vec-nopeel-2.c: New test. diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c new file mode 100644 index 0

[PATCH 1/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Create DR_HAS_NEGATIVE_STEP. (vect_update_misalignment_for_peel): Define DR_MISALIGNMENT. (vect_enhance_data_refs_alignment

[PATCH 2/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename. (vect_get_peeling_costs_all_drs): Create function. (vect_peeling_hash_get_lowest_cost):

[PATCH 3/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling): Return peeling info and set costs to zero for unlimited cost model. (vect_enhance_data_refs_alignment): Also inspect all da

[PATCH 4/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_get_data_access_cost): Workaround for SLP handling. (vect_enhance_data_refs_alignment): Compute costs for doing no peeling at all, compare to the best p

Re: [PATCH 0/5 v3] Vect peeling cost model

2017-05-24 Thread Robin Dapp
ld series itself (-p3) doesn't apply to trunk anymore (because of the change in vect_enhance_data_refs_alignment). Regards Robin -- gcc/ChangeLog: 2017-05-24 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Introduce unkno

Re: [PATCH 2/5 v3] Vect peeling cost model

2017-05-24 Thread Robin Dapp
> Not sure I've understood the series TBH, but is the npeel == vf / 2 > there specifically for the "unknown number of peels" case? How do > we distinguish that from the case in which the number of peels is > known to be vf / 2 at compile time? Or have I missed the point > completely? (probably

Re: [PATCH 0/5 v3] Vect peeling cost model

2017-06-07 Thread Robin Dapp
> http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg00297.html What machine is this running on? power4 BE? The tests are compiled with --with-cpu-64=power4 apparently. I cannot reproduce this on power7 -m32. Is it possible to get more detailed logs or machine access to reproduce? Regards Robin

Re: [PATCH 0/5 v3] Vect peeling cost model

2017-06-06 Thread Robin Dapp
> Patch 6 breaks no-vfa-vect-57.c on powerpc. Which CPU model (power6/7/8?) and which compile options (-maltivec/ -mpower8-vector?) have been used for running and compiling the test? As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80925 this has an influence on the cost function and

[PATCH 3/3] Vect peeling cost model

2017-05-04 Thread Robin Dapp
gcc/ChangeLog: 2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost): Change cost model. (vect_peeling_hash_choose_best_peeling): Return extended peel info. (vect_peeling_supportable): Return peeling

[PATCH 2/3] Vect peeling cost model

2017-05-04 Thread Robin Dapp
Wrap some frequently used snippets in separate functions. gcc/ChangeLog: 2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename. (vect_get_peeling_costs_all_drs): Create fu

[PATCH 1/3] Vect peeling cost model

2017-05-04 Thread Robin Dapp
Some refactoring and definitions to use for (unknown) DR_MISALIGNMENT, gcc/ChangeLog: 2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-data-ref.h (struct data_reference): Create DR_HAS_NEGATIVE_STEP. * tree-vectorizer.h (dr_misalignment): Define DR_MISALI

Re: [RFC] S/390: Alignment peeling prolog generation

2017-05-04 Thread Robin Dapp
Hi, > This one only works for known misalignment, otherwise it's overkill. > > OTOH if with some refactoring we can end up using a single cost model > that would be great. That is for the SAME_ALIGN_REFS we want to > choose the unknown misalignment with the maximum number of > SAME_ALIGN_REFS.

[PATCH 4/4] Vect peeling cost model

2017-05-08 Thread Robin Dapp
gcc/ChangeLog: 2017-05-08 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost): Remove unused variable. (vect_enhance_data_refs_alignment): Compare best peelings costs to doing no peeling and choose no p

Re: [RFC] S/390: Alignment peeling prolog generation

2017-05-08 Thread Robin Dapp
> So the new part is the last point? There's a lot of refactoring in 3/3 that > makes it hard to see what is actually changed ... you need to resist > in doing this, it makes review very hard. The new part is actually spread across the three last "-"s. Attached is a new version of [3/3] split

[PATCH 3/4] Vect peeling cost model

2017-05-08 Thread Robin Dapp
gcc/ChangeLog: 2017-05-08 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling): Return peel info. (vect_enhance_data_refs_alignment): Compute full costs when peeling for unknown alignment, compare to

Re: [PATCH 4/5 v3] Vect peeling cost model

2017-05-31 Thread Robin Dapp
> Since this commit (r248678), I've noticed regressions on some arm targets. > Executed from: gcc.dg/tree-ssa/tree-ssa.exp > gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment > of access forced using peeling" 1 > gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect >

Re: [PATCH 2/2] S/390: Do not end groups after fallthru edge

2017-10-17 Thread Robin Dapp
gcc/ChangeLog: 2017-10-17 Robin Dapp <rd...@linux.vnet.ibm.com> * config/s390/s390.c (s390_bb_fallthru_entry_likely): New function. (s390_sched_init): Do not reset s390_sched_state if we entered the current basic block via a fallthru edge and all othe

[PATCH 1/2] S/390: Handle long-running instructions

2017-10-11 Thread Robin Dapp
This patch introduces balancing of long-running instructions that may clog the pipeline. gcc/ChangeLog: 2017-10-11 Robin Dapp <rd...@linux.vnet.ibm.com> * config/s390/s390.c (NUM_SIDES): New constant. (LONGRUNNING_THRESHOLD): New constant. (LATENCY_FACTOR

[PATCH 2/2] S/390: Do not end groups after fallthru edge

2017-10-11 Thread Robin Dapp
This patch fixes cases where we start a new group although the previous one has not ended. Regression tested on s390x. gcc/ChangeLog: 2017-10-11 Robin Dapp <rd...@linux.vnet.ibm.com> * config/s390/s390.c (s390_has_ok_fallthru): New function. (s390_sched_score): Tempo

Re: [PATCH 2/2] S/390: Do not end groups after fallthru edge

2017-10-18 Thread Robin Dapp
> Preserving the sched state across basic blocks for your case works only if > the BBs are traversed > with the fall through edges coming first. Is that the case? We probably > should have a description > for s390_last_sched_state stating this. Committed as attached with an additional comment

[PATCH, S390] Change mtune default

2018-06-04 Thread Robin Dapp
explicitly state -march=z13 -mtune=zEC12. Regards Robin -- gcc/ChangeLog: 2018-06-04 Robin Dapp * config/s390/s390.h (enum processor_flags): Do not use default tune parameter when -march was specified. diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index a372981ff3a

fwprop addressing costs

2018-06-01 Thread Robin Dapp
Hi, when investigating a regression, I realized that we create a superfluous load on S390. The snippet looks something like LA %r10, 0(%r8,%r9) LLH %r4, 0(%r10) meaning the address in r10 is computed by an LA even though LLH supports the addressing already. The same address is used multiple

[PATCH, S390] Avoid LA with base and index on z13

2018-07-16 Thread Robin Dapp
d the comment. Regards Robin -- gcc/ChangeLog: 2018-07-16 Robin Dapp * config/s390/s390.c (preferred_la_operand_p): Do not use LA with index register on z196 or later. diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 23c3f3db621..d8b47c6fe67 100644 --- a/g

[PATCH 1/2] zEC12 pipeline

2018-09-06 Thread Robin Dapp
Hi, this patch increases the latency of some floating point instructions to better match the real machine's behavior. Regards Robin -- gcc/ChangeLog: 2018-09-06 Robin Dapp * config/s390/2827.md: Increase latencies for some FP instructions. --- gcc/config/s390/2827.md | 14

[S/390] Re: [PATCH 1/2] zEC12 pipeline

2018-09-06 Thread Robin Dapp
Sorry, forgot the [S/390] tag in the subject.

[PATCH 2/2] z13 pipeline

2018-09-06 Thread Robin Dapp
Similar to zEC12, the change in latencies helps match the real machine's behavior better. -- gcc/ChangeLog: 2018-09-06 Robin Dapp * config/s390/2964.md: Increase latencies for some FP instructions. --- gcc/config/s390/2964.md | 80 ++--- 1 file

[PATCH, S390] Avoid LA with base and index on z13

2018-07-05 Thread Robin Dapp
Hi, this patch avoids emitting LA on z13 and later when the address has both an index and a base since a regular add is faster in that case. Regtested on s390x. Regards Robin -- gcc/ChangeLog: 2018-07-05 Robin Dapp * config/s390/s390.c (preferred_la_operand_p): Do not use

[RFC] fwprop address cost changes

2018-07-11 Thread Robin Dapp
Hi, we recently hit a problem where fwprop would not propagate a memory address into an insn because our backend (s390) tells it that the address_cost ()s for an address with index are higher than for one without. Subsequently, should_replace_address () returns false and no propagation is

[PATCH, S390] Increase function alignment to 16 bytes

2018-07-11 Thread Robin Dapp
ot;Os"))) void bar () {}; I did not observe that the default alignment, once set, was reset anywhere. Regards Robin -- gcc/ChangeLog: 2018-07-11 Robin Dapp * config/s390/s390.c (s390_default_align): Set default function alignment. (s390_override_options_after_ch

Re: [PATCH, S390] Increase function alignment to 16 bytes

2018-07-12 Thread Robin Dapp
il without the patch as we can get lucky with the alignment). Regtested on s390x. Regards Robin -- gcc/ChangeLog: 2018-07-12 Robin Dapp * config/s390/s390.c (s390_default_align): Set default function alignment to 16. (s390_override_options_after_change

Re: sched2 priorities and replacements

2018-10-08 Thread Robin Dapp
ping, any insight on this? Regards Robin

Re: [PATCH 1/2] zEC12 pipeline

2018-10-08 Thread Robin Dapp
Hi, committed only the zEC12 part for now. Performance behavior of z13 with the patch is still unclear and will be tackled separately. Regards Robin

Re: [PATCH] Reset insn priority after inc/ref replacement in haifa sched

2018-10-15 Thread Robin Dapp
ng. Regards Robin gcc/ChangeLog: 2018-10-15 Robin Dapp * haifa-sched.c (priority): Add force_recompute parameter. (apply_replacement): Call priority () with force_recompute = true. (restore_pattern): Likewise. diff --git a/gcc/haifa-sched.c b/gcc/haifa-s

Re: [PATCH] Reset insn priority after inc/ref replacement in haifa sched

2018-10-16 Thread Robin Dapp
> A C++ style nit/question: instead of adding a new overload > > priority (rtx_insn *, bool) > > you can add a parameter with a default value in the existing > static function > > priority (rtx_insn *insn, bool force_recompute = false) Sometimes I'm still stuck in C land with GCC :),

Re: [PATCH] Reset insn priority after inc/ref replacement in haifa sched

2018-10-19 Thread Robin Dapp
> Still OK :-) Committed as r265304. Regards Robin

Re: [PATCH] Reset insn priority after inc/ref replacement in haifa sched

2018-10-18 Thread Robin Dapp
/ChangeLog: 2018-10-16 Robin Dapp * haifa-sched.c (priority): Add force_recompute parameter. (apply_replacement): Call priority () with force_recompute = true. (restore_pattern): Likewise. diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c index 1fdc9df9fb2..2c84ce38143 100644

[PATCH] S/390: Add loc patterns for QImode and HImode

2018-10-18 Thread Robin Dapp
Hi, this enables QImode and HImode for load on condition. For SPEC2006 this reduces code size overall, performance impact is negligible. Regtested on s390x. Regards Robin -- gcc/ChangeLog: 2018-10-18 Robin Dapp * config/s390/s390.md: Add movcc for QImode and HImode. diff --git

[PATCH] S/390: Allow immediates in loc expander

2018-10-17 Thread Robin Dapp
Hi, this allows immediates in the load-on-condition expander on z13 or later. Regtested on z14. Regards Robin -- gcc/ChangeLog: 2018-10-17 Robin Dapp * config/s390/predicates.md: Allow immediate operand in loc_operand for z13. * config/s390/s390.md: Use

Re: [PATCH] S/390: Allow immediates in loc expander

2018-10-26 Thread Robin Dapp
/ChangeLog: 2018-10-26 Robin Dapp * config/s390/predicates.md: Fix typo. * config/s390/s390.md: Allow immediates for load on condition. gcc/testsuite/ChangeLog: 2018-10-26 Robin Dapp * gcc.dg/loop-8.c: On s390, always run the test with -march=zEC12. diff --git a/gcc

Re: [PATCH] S/390: Add loc patterns for QImode and HImode

2018-10-26 Thread Robin Dapp
Hi, this is v2 of the patch with less quirky pattern syntax and two tests. Regards Robin -- gcc/ChangeLog: 2018-10-26 Robin Dapp * config/s390/s390.md: QImode and HImode for load on condition. gcc/testsuite/ChangeLog: 2018-10-26 Robin Dapp * gcc.target/s390/ifcvt

[PATCH] S/390: Increase register move costs for CC_REGS

2018-11-05 Thread Robin Dapp
Hi, the attached patch increases the move costs for moves involving the CC register. This saves us some instructions in SPEC CPU2006. Regards Robin -- gcc/ChangeLog: 2018-11-05 Robin Dapp * config/s390/s390.c (s390_register_move_cost): Increase costs for moves involving

  1   2   3   4   5   6   7   8   9   10   >