> While the initialization value doesn't matter (wi::add will overwrite it)
> better initialize both to false ;) Ah, you mean because we want to
> transform only if get_range_info returned VR_RANGE. Indeed somewhat
> unintuitive (but still the best variant for now).
> so I'm still missing a
> ideally you'd use a wide-int here and defer the tree allocation to the result
Did that in the attached version.
> So I guess we never run into the outer_op == minus case as the above is
> clearly wrong for that?
Right, damn, not only was the treatment for this missing but it was
bogus in the
Ping.
> use INTEGRAL_TYPE_P.
Done.
> but you do not actually _use_ vr_outer. Do you think that if
> vr_outer is a VR_RANGE then the outer operation may not
> possibly have wrapped? That's a false conclusion.
These were remains of a previous version. vr_outer is indeed not needed
anymore; removed.
r max overflow, split/anti range). Test
suite on s390x has no regressions, bootstrap is ok, x86 running.
Regards
Robin
--
gcc/ChangeLog:
2017-06-19 Robin Dapp <rd...@linux.vnet.ibm.com>
* match.pd: Simplify wrapped binary operations.
diff --git a/gcc/match.pd b/gcc/match.pd
in
> http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg00297.html
What machine is this running on? power4 BE? The tests are compiled with
--with-cpu-64=power4 apparently. I cannot reproduce this on power7
-m32. Is it possible to get more detailed logs or machine access to
reproduce?
Regards
Robin
> Patch 6 breaks no-vfa-vect-57.c on powerpc.
Which CPU model (power6/7/8?) and which compile options (-maltivec/
-mpower8-vector?) have been used for running and compiling the test? As
discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80925
this has an influence on the cost function and
> Since this commit (r248678), I've noticed regressions on some arm targets.
> Executed from: gcc.dg/tree-ssa/tree-ssa.exp
> gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment
> of access forced using peeling" 1
> gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect
>
ld series itself (-p3)
doesn't apply to trunk anymore (because of the change in
vect_enhance_data_refs_alignment).
Regards
Robin
--
gcc/ChangeLog:
2017-05-24 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_get_peeling_costs_all_drs):
Introduce unkno
> Not sure I've understood the series TBH, but is the npeel == vf / 2
> there specifically for the "unknown number of peels" case? How do
> we distinguish that from the case in which the number of peels is
> known to be vf / 2 at compile time? Or have I missed the point
> completely? (probably
gcc/testsuite/ChangeLog:
2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com>
* gcc.target/s390/vector/vec-nopeel-2.c: New test.
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
new file mode 100644
index 0
gcc/ChangeLog:
2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_get_data_access_cost):
Workaround for SLP handling.
(vect_enhance_data_refs_alignment):
Compute costs for doing no peeling at all, compare to the best
p
gcc/ChangeLog:
2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
Return peeling info and set costs to zero for unlimited cost
model.
(vect_enhance_data_refs_alignment): Also inspect all da
gcc/ChangeLog:
2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_update_misalignment_for_peel):
Rename.
(vect_get_peeling_costs_all_drs): Create function.
(vect_peeling_hash_get_lowest_cost):
gcc/ChangeLog:
2017-05-23 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
Create DR_HAS_NEGATIVE_STEP.
(vect_update_misalignment_for_peel): Define DR_MISALIGNMENT.
(vect_enhance_data_refs_alignment
The last version of the patch series caused some regressions for ppc64.
This was largely due to incorrect handling of unsupportable alignment
and should be fixed with the new version.
p2 and p5 have not changed but I'm posting the whole series again for
reference. p1 only changed comment
> I can guess what is happening here. It's a 40 bits unsigned long long
> field, (s.b-8) will be like:
> _1 = s.b
> _2 = _1 + 0xf8
> Also get_range_info returns value range [0, 0xFF] for _1.
> You'd need to check if _1(with range [0, 0xFF]) + 0xf8
> overflows
> Any reason to expose tree-vrp.c internal interface here? The function
> looks quite expensive. Overflow check can be done by get_range_info
> and simple wi::cmp calls. Existing code like in
> tree-ssa-loop-niters.c already does that. Also could you avoid using
> comma expressions in
New testcases.
gcc/testsuite/ChangeLog:
2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com>
* gcc.dg/wrapped-binop-simplify-signed-1.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test.
diff
match.pd part of the patch.
gcc/ChangeLog:
2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com>
* match.pd: Simplify wrapped binary operations.
* tree-vrp.c (extract_range_from_binary_expr_1): Add overflow
parameter.
(extract_range_from_binary_expr): Li
This tries to fold unconditionally and fixes some test cases.
gcc/ChangeLog:
2017-05-18 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-ssa-propagate.c
(substitute_and_fold_dom_walker::before_dom_children):
Always try to fold.
gcc/testsuite/ChangeLog:
2017-05-18 Robi
> Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient
> condition for such transformation?
Yes, in principle this should suffice. What we're actually looking for
is something like a "proper" (or no) overflow, i.e. an overflow in both
min and max of the value range. In
(a +
Included the workaround for SLP now. With it, testsuite is clean on x86
as well.
gcc/ChangeLog:
2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_get_data_access_cost):
Workaround for SLP handling.
(vect_enhance_data_refs_ali
gcc/testsuite/ChangeLog:
2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com>
* gcc.target/s390/vector/vec-nopeel-2.c: New test.
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
new file mode 100644
index 0
gcc/ChangeLog:
2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Remove check for supportable_dr_alignment, compute costs for
doing no peeling at all, compare to the best peeling costs so
far
gcc/ChangeLog:
2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
Return peeling info and set costs to zero for unlimited cost
model.
(vect_enhance_data_refs_alignment): Also inspect all da
gcc/ChangeLog:
2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Change
comment and rename variable.
(vect_get_peeling_costs_all_drs): New function.
(vect_peeling_hash_get_lowest_cost
gcc/ChangeLog:
2017-05-11 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vectorizer.h (dr_misalignment): Introduce
DR_MISALIGNMENT_UNKNOWN.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Refactoring.
(vect_update_misalignment_for_peel
Included the requested changes in the patches (to follow). I removed
the alignment count check now altogether.
> I'm not sure why you test for unlimited_cost_model here as I said
> elsewhere I'm not sure
> what not cost modeling means for static decisions. The purpose of
> unlimited_cost_model
ping.
gcc/ChangeLog:
2017-05-08 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
Remove unused variable.
(vect_enhance_data_refs_alignment):
Compare best peelings costs to doing no peeling and choose no
p
gcc/ChangeLog:
2017-05-08 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
Return peel info.
(vect_enhance_data_refs_alignment):
Compute full costs when peeling for unknown alignment, compare
to
> So the new part is the last point? There's a lot of refactoring in
3/3 that
> makes it hard to see what is actually changed ... you need to resist
> in doing this, it makes review very hard.
The new part is actually spread across the three last "-"s. Attached is
a new version of [3/3] split
gcc/ChangeLog:
2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
Change cost model.
(vect_peeling_hash_choose_best_peeling): Return extended peel info.
(vect_peeling_supportable): Return peeling
Wrap some frequently used snippets in separate functions.
gcc/ChangeLog:
2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename.
(vect_get_peeling_costs_all_drs): Create fu
Some refactoring and definitions to use for (unknown) DR_MISALIGNMENT,
gcc/ChangeLog:
2017-04-26 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-data-ref.h (struct data_reference): Create DR_HAS_NEGATIVE_STEP.
* tree-vectorizer.h (dr_misalignment): Define DR_MISALI
Hi,
> This one only works for known misalignment, otherwise it's overkill.
>
> OTOH if with some refactoring we can end up using a single cost model
> that would be great. That is for the SAME_ALIGN_REFS we want to
> choose the unknown misalignment with the maximum number of
> SAME_ALIGN_REFS.
> Note I was very conservative here to allow store bandwidth starved
> CPUs to benefit from aligning a store.
>
> I think it would be reasonable to apply the same heuristic to the
> store case that we only peel for same cost if peeling would at least
> align two refs.
Do you mean checking if
Hi Bin,
> Seems Richi added code like below comparing costs between aligned and
> unsigned loads, and only peeling if it's beneficial:
>
> /* In case there are only loads with different unknown misalignments,
> use
> peeling only if it may help to align other accesses in the loop
Hi,
when looking at various vectorization examples on s390x I noticed that
we still peel vf/2 iterations for alignment even though vectorization
costs of unaligned loads and stores are the same as normal loads/stores.
A simple example is
void foo(int *restrict a, int *restrict b, unsigned int
Hi,
this patch fixes the vcond shift testcase that failed since setting
PARAM_MIN_VECT_LOOP_BOUND in the s390 backend.
Regards
Robin
--
gcc/testsuite/ChangeLog:
2017-03-27 Robin Dapp <rd...@linux.vnet.ibm.com>
* gcc.target/s390/vector/vcond-shift.c (void foo): In
gards
Robin
[1] https://gcc.gnu.org/ml/gcc/2017-01/msg00234.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01562.html
--
gcc/ChangeLog:
2017-03-02 Robin Dapp <rd...@linux.vnet.ibm.com>
* config/s390/s390.c (s390_option_override_internal): Set
PARAM_MIN_VECT_LOOP_
Hi,
the following patch changes "nopr %r7" to "nopr %r0" which is
advantageous from a hardware perspective. It will only be emitted for
hotpatching and should not impact normal code.
Bootstrapped and regression tested on s390 and s390x.
Regards
Robin
gcc/ChangeLog:
20
I skimmed through the code to see where transformation like
(a - 1) -> (a + UINT_MAX) are performed. It seems there are only two
places, match.pd (/* A - B -> A + (-B) if B is easily negatable. */)
and fold-const.c.
In order to be able to reliably know whether to zero-extend or to
sign-extend
Hi,
while analyzing a test case with a lot of nested loops (>7) and double
floating point operations I noticed a performance regression of GCC 6/7
vs GCC 5 on s390x. It seems due to GCC 6 vectorizing something GCC 5
couldn't.
Basically, each loop iterates over three dimensions, we fully unroll
Ping.
To put it shortly, I'm not sure how to differentiate between:
example range of a: [3,3]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(-1 + 1), sign extend
example range of a: [0,0]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(UINT_MAX + 1), no
sign extend
In this case, there
> Yes, for memset with larger element we could add an optab plus
> internal function combination and use that when the target wants. Or
> always use such IFN and fall back to loopy expansion.
So, adding additional patterns in tree-loop-distribute.c (and mapping
them to dedicated optabs) is fine?
Hi,
When examining the performance of some test cases on s390 I realized
that we could do better for constructs like 2-byte memcpys or
2-byte/4-byte memsets. Due to some s390-specific architectural
properties, we could be faster by e.g. avoiding excessive unrolling and
using dedicated memory
Perhaps I'm still missing how some cases are handled or not handled,
sorry for the noise.
> I'm not sure there is anything to "interpret" -- the operation is unsigned
> and overflow is when the operation may wrap around zero. There might
> be clever ways of re-writing the expression to
>
> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type)
> produces (uint64_t)uint32 + -1U + 1. This simply means that we cannot ignore
> overflow of the inner operation and for some reason your change
> to extract_range_from_binary_expr didn't catch this. That is _8 +
Ping. Any idea how to tackle this?
>> + /* Sign-extend @1 to TYPE. */
>> + w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);
>>
>> not sure why you do always sign-extend. If the inner op is unsigned
>> and we widen then that's certainly bogus considering your UINT_MAX
>> example above. Does
>>
>>
Ping.
Found some time to look into this again.
> Index: tree-ssa-propagate.c
> ===
> --- tree-ssa-propagate.c(revision 240133)
> +++ tree-ssa-propagate.c(working copy)
> @@ -1105,10 +1105,10 @@
Ping :)
Ping.
This introduces an ICE ("bogus comparison result type") on s390 for the
following test case:
#include
void foo(int dim)
{
int ba, sign;
ba = abs (dim);
sign = dim / ba;
}
Doing
diff --git a/gcc/match.pd b/gcc/match.pd
index ba7e013..2455592 100644
--- a/gcc/match.pd
+++
> Also the '=' in the split line goes to the next line according to
> coding conventions.
fixed, I had only looked at an instance one function above which had it
wrong as well. Also changed comment grammar slightly.
Regards
Robin
--
gcc/ChangeLog:
2016-09-27 Robin Dap
idn't manage to run it independently in this
directory via RUNTESTFLAGS=vect.exp=... or otherwise)
Bootstrapped on x86 and s390.
--
gcc/ChangeLog:
2016-09-26 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-loop-manip.c (create_intersect_range_checks_index):
Add tree
i_p().
ok to commit?
Regards
Robin
--
gcc/ChangeLog:
2016-09-26 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vect-loop-manip.c (create_intersect_range_checks_index):
Add tree_fits_uhwi_p check.
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
inde
flow. Do you think it
should be handled differently?
Revised version attached.
Regards
Robin
--
gcc/ChangeLog:
2016-09-20 Robin Dapp <rd...@linux.vnet.ibm.com>
PR middle-end/69526
This enables combining of wrapped binary operations and fixes
the tree l
Ping.
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2beadbc..d66fcb1 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see
#include "internal-fn.h"
#include "case-cfn-macros.h"
#include
This causes a performance regression in the xalancbmk SPECint2006
benchmark on s390x. At first sight, the produced asm output doesn't look
too different but I'll have a closer look. Is the fwprop order supposed
to have major performance implications?
Regards
Robin
> This changes it from PRE on
gah, this
+ return true;
+ if (TREE_CODE (t1) != SSA_NAME)
should of course be like this
+ if (TREE_CODE (t1) != SSA_NAME)
+ return true;
in the last patch.
for now because I find
extract_range_from_binary_expr_1 somewhat lengthy and hard to follow
already :) Wouldn't it be better to "separate concerns"/split it up in
the long run and merge the functionality needed here at some time?
Bootstrapped and reg-tested on s390x, bootstrap on x86 running.
As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526, we
currently fail to simplify cases like
(unsigned long)(a - 1) + 1
to
(unsigned long)a
when VRP knows that (a - 1) does not overflow.
This patch introduces a match.pd pattern as well as a helper function
that checks for
?
No regressions on s390x and amd64.
Regards
Robin
--
gcc/ChangeLog:
2016-04-13 Robin Dapp <rd...@linux.vnet.ibm.com>
* tree-vectorizer.h
(dr_misalignment): Introduce named DR_MISALIGNMENT constants.
(aligned_access_p): Use constants.
(known_alignment_for_ac
et perform bootstrapping
and more testing due to the premature nature of the patch.
Thanks
Robin
gcc/ChangeLog:
2016-03-17 Robin Dapp <rd...@linux.vnet.ibm.com>
* cfgloop.h (struct GTY): Add second number of iterations
* loop-doloop.c (doloop_condition_get)
Hi,
the attached patch renames the constm1_operand predicate to
all_ones_operand and introduces a check for int mode.
It should be applied on top of the last patch ([Patch] S/390: Simplify
vector conditionals).
Regtested on s390.
Regards
Robin
gcc/ChangeLog:
2015-12-15 Robin Dapp <
ree-level.
Bootstrapped and regression-tested on s390.
Regards
Robin
gcc/ChangeLog:
2015-12-15 Robin Dapp <rd...@linux.vnet.ibm.com>
* config/s390/s390.c (s390_expand_vcond): Convert vector
conditional into shift.
* config/s390/vector.md: Change operand predicate.
not always be generated. This patch uses separate flags
for 2-, 4-, and 8-byte alignment to fix the problem.
Bootstrapped, no regressions on s390.
Regards
Robin
gcc/testsuite/ChangeLog:
2015-10-23 Robin Dapp <rd...@linux.vnet.ibm.com>
* gcc.target/s390/load-relative-check.c: Ne
On 09/15/2015 05:25 PM, Jeff Law wrote:
> On 09/15/2015 06:11 AM, Robin Dapp wrote:
>> Hi,
>>
>> recently, I came across a problem that keeps a load instruction in a
>> loop although it is loop-invariant.
[..]
> You might want to check your costing model -- cprop
Hi,
recently, I came across a problem that keeps a load instruction in a
loop although it is loop-invariant.
A simple example is:
#include
#define SZ 256
int a[SZ], b[SZ], c[SZ];
int main() {
int i;
for (i = 0; i < SZ; i++) {
a[i] = b[i] + c[i];
}
printf("%d\n", a[0]);
}
The
901 - 973 of 973 matches
Mail list logo