Re: [PATCH 2/3] Simplify wrapped binops

2017-07-05 Thread Robin Dapp
> While the initialization value doesn't matter (wi::add will overwrite it) > better initialize both to false ;) Ah, you mean because we want to > transform only if get_range_info returned VR_RANGE. Indeed somewhat > unintuitive (but still the best variant for now). > so I'm still missing a

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-28 Thread Robin Dapp
> ideally you'd use a wide-int here and defer the tree allocation to the result Did that in the attached version. > So I guess we never run into the outer_op == minus case as the above is > clearly wrong for that? Right, damn, not only was the treatment for this missing but it was bogus in the

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-27 Thread Robin Dapp
Ping.

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-21 Thread Robin Dapp
> use INTEGRAL_TYPE_P. Done. > but you do not actually _use_ vr_outer. Do you think that if > vr_outer is a VR_RANGE then the outer operation may not > possibly have wrapped? That's a false conclusion. These were remains of a previous version. vr_outer is indeed not needed anymore; removed.

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-20 Thread Robin Dapp
r max overflow, split/anti range). Test suite on s390x has no regressions, bootstrap is ok, x86 running. Regards Robin -- gcc/ChangeLog: 2017-06-19 Robin Dapp <rd...@linux.vnet.ibm.com> * match.pd: Simplify wrapped binary operations. diff --git a/gcc/match.pd b/gcc/match.pd in

Re: [PATCH 0/5 v3] Vect peeling cost model

2017-06-07 Thread Robin Dapp
> http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg00297.html What machine is this running on? power4 BE? The tests are compiled with --with-cpu-64=power4 apparently. I cannot reproduce this on power7 -m32. Is it possible to get more detailed logs or machine access to reproduce? Regards Robin

Re: [PATCH 0/5 v3] Vect peeling cost model

2017-06-06 Thread Robin Dapp
> Patch 6 breaks no-vfa-vect-57.c on powerpc. Which CPU model (power6/7/8?) and which compile options (-maltivec/ -mpower8-vector?) have been used for running and compiling the test? As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80925 this has an influence on the cost function and

Re: [PATCH 4/5 v3] Vect peeling cost model

2017-05-31 Thread Robin Dapp
> Since this commit (r248678), I've noticed regressions on some arm targets. > Executed from: gcc.dg/tree-ssa/tree-ssa.exp > gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment > of access forced using peeling" 1 > gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect >

Re: [PATCH 0/5 v3] Vect peeling cost model

2017-05-24 Thread Robin Dapp
ld series itself (-p3) doesn't apply to trunk anymore (because of the change in vect_enhance_data_refs_alignment). Regards Robin -- gcc/ChangeLog: 2017-05-24 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Introduce unkno

Re: [PATCH 2/5 v3] Vect peeling cost model

2017-05-24 Thread Robin Dapp
> Not sure I've understood the series TBH, but is the npeel == vf / 2 > there specifically for the "unknown number of peels" case? How do > we distinguish that from the case in which the number of peels is > known to be vf / 2 at compile time? Or have I missed the point > completely? (probably

[PATCH 5/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/testsuite/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.target/s390/vector/vec-nopeel-2.c: New test. diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c new file mode 100644 index 0

[PATCH 4/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_get_data_access_cost): Workaround for SLP handling. (vect_enhance_data_refs_alignment): Compute costs for doing no peeling at all, compare to the best p

[PATCH 3/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling): Return peeling info and set costs to zero for unlimited cost model. (vect_enhance_data_refs_alignment): Also inspect all da

[PATCH 2/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename. (vect_get_peeling_costs_all_drs): Create function. (vect_peeling_hash_get_lowest_cost):

[PATCH 1/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
gcc/ChangeLog: 2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Create DR_HAS_NEGATIVE_STEP. (vect_update_misalignment_for_peel): Define DR_MISALIGNMENT. (vect_enhance_data_refs_alignment

[PATCH 0/5 v3] Vect peeling cost model

2017-05-23 Thread Robin Dapp
The last version of the patch series caused some regressions for ppc64. This was largely due to incorrect handling of unsupportable alignment and should be fixed with the new version. p2 and p5 have not changed but I'm posting the whole series again for reference. p1 only changed comment

Re: [PATCH 2/3] Simplify wrapped binops

2017-05-19 Thread Robin Dapp
> I can guess what is happening here. It's a 40 bits unsigned long long > field, (s.b-8) will be like: > _1 = s.b > _2 = _1 + 0xf8 > Also get_range_info returns value range [0, 0xFF] for _1. > You'd need to check if _1(with range [0, 0xFF]) + 0xf8 > overflows

Re: [PATCH 2/3] Simplify wrapped binops

2017-05-18 Thread Robin Dapp
> Any reason to expose tree-vrp.c internal interface here? The function > looks quite expensive. Overflow check can be done by get_range_info > and simple wi::cmp calls. Existing code like in > tree-ssa-loop-niters.c already does that. Also could you avoid using > comma expressions in

[PATCH 3/3] Simplify wrapped binops

2017-05-18 Thread Robin Dapp
New testcases. gcc/testsuite/ChangeLog: 2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.dg/wrapped-binop-simplify-signed-1.c: New test. * gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test. * gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test. diff

[PATCH 2/3] Simplify wrapped binops

2017-05-18 Thread Robin Dapp
match.pd part of the patch. gcc/ChangeLog: 2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com> * match.pd: Simplify wrapped binary operations. * tree-vrp.c (extract_range_from_binary_expr_1): Add overflow parameter. (extract_range_from_binary_expr): Li

[PATCH 1/3] Simplify wrapped binops

2017-05-18 Thread Robin Dapp
This tries to fold unconditionally and fixes some test cases. gcc/ChangeLog: 2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-ssa-propagate.c (substitute_and_fold_dom_walker::before_dom_children): Always try to fold. gcc/testsuite/ChangeLog: 2017-05-18 Robi

Re: [PATCH] Tree-level fix for PR 69526

2017-05-18 Thread Robin Dapp
> Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient > condition for such transformation? Yes, in principle this should suffice. What we're actually looking for is something like a "proper" (or no) overflow, i.e. an overflow in both min and max of the value range. In (a +

[PATCH 4/5 v2] Vect peeling cost model

2017-05-11 Thread Robin Dapp
Included the workaround for SLP now. With it, testsuite is clean on x86 as well. gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_get_data_access_cost): Workaround for SLP handling. (vect_enhance_data_refs_ali

[PATCH 5/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/testsuite/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.target/s390/vector/vec-nopeel-2.c: New test. diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c new file mode 100644 index 0

[PATCH 4/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Remove check for supportable_dr_alignment, compute costs for doing no peeling at all, compare to the best peeling costs so far

[PATCH 3/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling): Return peeling info and set costs to zero for unlimited cost model. (vect_enhance_data_refs_alignment): Also inspect all da

[PATCH 2/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Change comment and rename variable. (vect_get_peeling_costs_all_drs): New function. (vect_peeling_hash_get_lowest_cost

[PATCH 1/5] Vect peeling cost model

2017-05-11 Thread Robin Dapp
gcc/ChangeLog: 2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vectorizer.h (dr_misalignment): Introduce DR_MISALIGNMENT_UNKNOWN. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Refactoring. (vect_update_misalignment_for_peel

Re: [RFC] S/390: Alignment peeling prolog generation

2017-05-11 Thread Robin Dapp
Included the requested changes in the patches (to follow). I removed the alignment count check now altogether. > I'm not sure why you test for unlimited_cost_model here as I said > elsewhere I'm not sure > what not cost modeling means for static decisions. The purpose of > unlimited_cost_model

Re: [PATCH] Tree-level fix for PR 69526

2017-05-09 Thread Robin Dapp
ping.

[PATCH 4/4] Vect peeling cost model

2017-05-08 Thread Robin Dapp
gcc/ChangeLog: 2017-05-08 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost): Remove unused variable. (vect_enhance_data_refs_alignment): Compare best peelings costs to doing no peeling and choose no p

[PATCH 3/4] Vect peeling cost model

2017-05-08 Thread Robin Dapp
gcc/ChangeLog: 2017-05-08 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling): Return peel info. (vect_enhance_data_refs_alignment): Compute full costs when peeling for unknown alignment, compare to

Re: [RFC] S/390: Alignment peeling prolog generation

2017-05-08 Thread Robin Dapp
> So the new part is the last point? There's a lot of refactoring in 3/3 that > makes it hard to see what is actually changed ... you need to resist > in doing this, it makes review very hard. The new part is actually spread across the three last "-"s. Attached is a new version of [3/3] split

[PATCH 3/3] Vect peeling cost model

2017-05-04 Thread Robin Dapp
gcc/ChangeLog: 2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost): Change cost model. (vect_peeling_hash_choose_best_peeling): Return extended peel info. (vect_peeling_supportable): Return peeling

[PATCH 2/3] Vect peeling cost model

2017-05-04 Thread Robin Dapp
Wrap some frequently used snippets in separate functions. gcc/ChangeLog: 2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename. (vect_get_peeling_costs_all_drs): Create fu

[PATCH 1/3] Vect peeling cost model

2017-05-04 Thread Robin Dapp
Some refactoring and definitions to use for (unknown) DR_MISALIGNMENT, gcc/ChangeLog: 2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-data-ref.h (struct data_reference): Create DR_HAS_NEGATIVE_STEP. * tree-vectorizer.h (dr_misalignment): Define DR_MISALI

Re: [RFC] S/390: Alignment peeling prolog generation

2017-05-04 Thread Robin Dapp
Hi, > This one only works for known misalignment, otherwise it's overkill. > > OTOH if with some refactoring we can end up using a single cost model > that would be great. That is for the SAME_ALIGN_REFS we want to > choose the unknown misalignment with the maximum number of > SAME_ALIGN_REFS.

Re: [RFC] S/390: Alignment peeling prolog generation

2017-04-12 Thread Robin Dapp
> Note I was very conservative here to allow store bandwidth starved > CPUs to benefit from aligning a store. > > I think it would be reasonable to apply the same heuristic to the > store case that we only peel for same cost if peeling would at least > align two refs. Do you mean checking if

Re: [RFC] S/390: Alignment peeling prolog generation

2017-04-11 Thread Robin Dapp
Hi Bin, > Seems Richi added code like below comparing costs between aligned and > unsigned loads, and only peeling if it's beneficial: > > /* In case there are only loads with different unknown misalignments, > use > peeling only if it may help to align other accesses in the loop

[RFC] S/390: Alignment peeling prolog generation

2017-04-11 Thread Robin Dapp
Hi, when looking at various vectorization examples on s390x I noticed that we still peel vf/2 iterations for alignment even though vectorization costs of unaligned loads and stores are the same as normal loads/stores. A simple example is void foo(int *restrict a, int *restrict b, unsigned int

[PATCH] Fix s390 testcase vcond-shift

2017-03-27 Thread Robin Dapp
Hi, this patch fixes the vcond shift testcase that failed since setting PARAM_MIN_VECT_LOOP_BOUND in the s390 backend. Regards Robin -- gcc/testsuite/ChangeLog: 2017-03-27 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.target/s390/vector/vcond-shift.c (void foo): In

[PATCH] S/390: Disable vectorization for loops with few iterations

2017-03-02 Thread Robin Dapp
gards Robin [1] https://gcc.gnu.org/ml/gcc/2017-01/msg00234.html [2] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01562.html -- gcc/ChangeLog: 2017-03-02 Robin Dapp <rd...@linux.vnet.ibm.com> * config/s390/s390.c (s390_option_override_internal): Set PARAM_MIN_VECT_LOOP_

[PATCH] S/390: Change 2-byte NOPs

2017-03-01 Thread Robin Dapp
Hi, the following patch changes "nopr %r7" to "nopr %r0" which is advantageous from a hardware perspective. It will only be emitted for hotpatching and should not impact normal code. Bootstrapped and regression tested on s390 and s390x. Regards Robin gcc/ChangeLog: 20

Re: [PATCH] Tree-level fix for PR 69526

2017-02-02 Thread Robin Dapp
I skimmed through the code to see where transformation like (a - 1) -> (a + UINT_MAX) are performed. It seems there are only two places, match.pd (/* A - B -> A + (-B) if B is easily negatable. */) and fold-const.c. In order to be able to reliably know whether to zero-extend or to sign-extend

Vectorization regression on s390x GCC6 vs GCC5

2017-01-26 Thread Robin Dapp
Hi, while analyzing a test case with a lot of nested loops (>7) and double floating point operations I noticed a performance regression of GCC 6/7 vs GCC 5 on s390x. It seems due to GCC 6 vectorizing something GCC 5 couldn't. Basically, each loop iterates over three dimensions, we fully unroll

Re: [PATCH] Tree-level fix for PR 69526

2017-01-16 Thread Robin Dapp
Ping. To put it shortly, I'm not sure how to differentiate between: example range of a: [3,3] (ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(-1 + 1), sign extend example range of a: [0,0] (ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(UINT_MAX + 1), no sign extend In this case, there

Re: k-byte memset/memcpy/strlen builtins

2017-01-12 Thread Robin Dapp
> Yes, for memset with larger element we could add an optab plus > internal function combination and use that when the target wants. Or > always use such IFN and fall back to loopy expansion. So, adding additional patterns in tree-loop-distribute.c (and mapping them to dedicated optabs) is fine?

k-byte memset/memcpy/strlen builtins

2017-01-11 Thread Robin Dapp
Hi, When examining the performance of some test cases on s390 I realized that we could do better for constructs like 2-byte memcpys or 2-byte/4-byte memsets. Due to some s390-specific architectural properties, we could be faster by e.g. avoiding excessive unrolling and using dedicated memory

Re: [PATCH] Tree-level fix for PR 69526

2017-01-10 Thread Robin Dapp
Perhaps I'm still missing how some cases are handled or not handled, sorry for the noise. > I'm not sure there is anything to "interpret" -- the operation is unsigned > and overflow is when the operation may wrap around zero. There might > be clever ways of re-writing the expression to >

Re: [PATCH] Tree-level fix for PR 69526

2016-12-07 Thread Robin Dapp
> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type) > produces (uint64_t)uint32 + -1U + 1. This simply means that we cannot ignore > overflow of the inner operation and for some reason your change > to extract_range_from_binary_expr didn't catch this. That is _8 +

Re: [PATCH] Tree-level fix for PR 69526

2016-12-04 Thread Robin Dapp
Ping. Any idea how to tackle this?

Re: [PATCH] Tree-level fix for PR 69526

2016-11-28 Thread Robin Dapp
>> + /* Sign-extend @1 to TYPE. */ >> + w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED); >> >> not sure why you do always sign-extend. If the inner op is unsigned >> and we widen then that's certainly bogus considering your UINT_MAX >> example above. Does >> >>

Re: [PATCH] Tree-level fix for PR 69526

2016-11-24 Thread Robin Dapp
Ping.

Re: [PATCH] Tree-level fix for PR 69526

2016-11-16 Thread Robin Dapp
Found some time to look into this again. > Index: tree-ssa-propagate.c > === > --- tree-ssa-propagate.c(revision 240133) > +++ tree-ssa-propagate.c(working copy) > @@ -1105,10 +1105,10 @@

Re: [PATCH] Tree-level fix for PR 69526

2016-10-14 Thread Robin Dapp
Ping :)

Re: [PATCH] Tree-level fix for PR 69526

2016-10-05 Thread Robin Dapp
Ping.

Re: [PATCH] Fix PR77407

2016-10-01 Thread Robin Dapp
This introduces an ICE ("bogus comparison result type") on s390 for the following test case: #include void foo(int dim) { int ba, sign; ba = abs (dim); sign = dim / ba; } Doing diff --git a/gcc/match.pd b/gcc/match.pd index ba7e013..2455592 100644 --- a/gcc/match.pd +++

Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer

2016-09-27 Thread Robin Dapp
> Also the '=' in the split line goes to the next line according to > coding conventions. fixed, I had only looked at an instance one function above which had it wrong as well. Also changed comment grammar slightly. Regards Robin -- gcc/ChangeLog: 2016-09-27 Robin Dap

Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer

2016-09-26 Thread Robin Dapp
idn't manage to run it independently in this directory via RUNTESTFLAGS=vect.exp=... or otherwise) Bootstrapped on x86 and s390. -- gcc/ChangeLog: 2016-09-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-loop-manip.c (create_intersect_range_checks_index): Add tree

Re: [PATCH GCC][v2]Simplify alias check code generation in vectorizer

2016-09-26 Thread Robin Dapp
i_p(). ok to commit? Regards Robin -- gcc/ChangeLog: 2016-09-26 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vect-loop-manip.c (create_intersect_range_checks_index): Add tree_fits_uhwi_p check. diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c inde

Re: [PATCH] Tree-level fix for PR 69526

2016-09-20 Thread Robin Dapp
flow. Do you think it should be handled differently? Revised version attached. Regards Robin -- gcc/ChangeLog: 2016-09-20 Robin Dapp <rd...@linux.vnet.ibm.com> PR middle-end/69526 This enables combining of wrapped binary operations and fixes the tree l

Re: [PATCH] Tree-level fix for PR 69526

2016-09-05 Thread Robin Dapp
Ping. diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index 2beadbc..d66fcb1 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include "internal-fn.h" #include "case-cfn-macros.h" #include

Re: [PATCH] Use RPO order for fwprop iteration

2016-09-02 Thread Robin Dapp
This causes a performance regression in the xalancbmk SPECint2006 benchmark on s390x. At first sight, the produced asm output doesn't look too different but I'll have a closer look. Is the fwprop order supposed to have major performance implications? Regards Robin > This changes it from PRE on

Re: [PATCH] Tree-level fix for PR 69526

2016-08-23 Thread Robin Dapp
gah, this + return true; + if (TREE_CODE (t1) != SSA_NAME) should of course be like this + if (TREE_CODE (t1) != SSA_NAME) + return true; in the last patch.

Re: [PATCH] Tree-level fix for PR 69526

2016-08-22 Thread Robin Dapp
for now because I find extract_range_from_binary_expr_1 somewhat lengthy and hard to follow already :) Wouldn't it be better to "separate concerns"/split it up in the long run and merge the functionality needed here at some time? Bootstrapped and reg-tested on s390x, bootstrap on x86 running.

[PATCH] Tree-level fix for PR 69526

2016-07-21 Thread Robin Dapp
As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526, we currently fail to simplify cases like (unsigned long)(a - 1) + 1 to (unsigned long)a when VRP knows that (a - 1) does not overflow. This patch introduces a match.pd pattern as well as a helper function that checks for

[PATCH] Some tree-vect-data-refs.c cleanup

2016-04-13 Thread Robin Dapp
? No regressions on s390x and amd64. Regards Robin -- gcc/ChangeLog: 2016-04-13 Robin Dapp <rd...@linux.vnet.ibm.com> * tree-vectorizer.h (dr_misalignment): Introduce named DR_MISALIGNMENT constants. (aligned_access_p): Use constants. (known_alignment_for_ac

[RFC] 69526 - ivopts candidate strangeness

2016-03-20 Thread Robin Dapp
et perform bootstrapping and more testing due to the premature nature of the patch. Thanks Robin gcc/ChangeLog: 2016-03-17 Robin Dapp <rd...@linux.vnet.ibm.com> * cfgloop.h (struct GTY): Add second number of iterations * loop-doloop.c (doloop_condition_get)

Re: [Patch] S/390: Simplify vector conditionals

2015-12-17 Thread Robin Dapp
Hi, the attached patch renames the constm1_operand predicate to all_ones_operand and introduces a check for int mode. It should be applied on top of the last patch ([Patch] S/390: Simplify vector conditionals). Regtested on s390. Regards Robin gcc/ChangeLog: 2015-12-15 Robin Dapp <

[Patch] S/390: Simplify vector conditionals

2015-12-15 Thread Robin Dapp
ree-level. Bootstrapped and regression-tested on s390. Regards Robin gcc/ChangeLog: 2015-12-15 Robin Dapp <rd...@linux.vnet.ibm.com> * config/s390/s390.c (s390_expand_vcond): Convert vector conditional into shift. * config/s390/vector.md: Change operand predicate.

[Patch] S/390: Fix symbol ref alignment

2015-10-23 Thread Robin Dapp
not always be generated. This patch uses separate flags for 2-, 4-, and 8-byte alignment to fix the problem. Bootstrapped, no regressions on s390. Regards Robin gcc/testsuite/ChangeLog: 2015-10-23 Robin Dapp <rd...@linux.vnet.ibm.com> * gcc.target/s390/load-relative-check.c: Ne

Re: CSE pass prevents loop-invariant motion

2015-09-24 Thread Robin Dapp
On 09/15/2015 05:25 PM, Jeff Law wrote: > On 09/15/2015 06:11 AM, Robin Dapp wrote: >> Hi, >> >> recently, I came across a problem that keeps a load instruction in a >> loop although it is loop-invariant. [..] > You might want to check your costing model -- cprop

CSE pass prevents loop-invariant motion

2015-09-15 Thread Robin Dapp
Hi, recently, I came across a problem that keeps a load instruction in a loop although it is loop-invariant. A simple example is: #include #define SZ 256 int a[SZ], b[SZ], c[SZ]; int main() { int i; for (i = 0; i < SZ; i++) { a[i] = b[i] + c[i]; } printf("%d\n", a[0]); } The

<    5   6   7   8   9   10