[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-04-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-04-04 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #48 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:e3174d6183e5c042e822d9feabb670235b737441 commit r12-7990-ge3174d6183e5c042e822d9feabb670235b737441 Author: liuhongt Date: Wed

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-29 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #47 from rguenther at suse dot de --- On Tue, 29 Mar 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 > > --- Comment #46 from Hongtao.liu --- > Another issue is splitting vector load

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-29 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #46 from Hongtao.liu --- Another issue is splitting vector load to halves or elements, the latter requires scratch registers which may not be available, the former doesn't require extra register but may still trigger STLF stalls.

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-29 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #45 from rguenther at suse dot de --- On Tue, 29 Mar 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 > > --- Comment #43 from Hongtao.liu --- > One thing I found by experiments: >

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #44 from Hongtao.liu --- (In reply to Hongtao.liu from comment #43) > One thing I found by experiments: > Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other, > just emulate for pipeline) before stalled load,

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #43 from Hongtao.liu --- One thing I found by experiments: Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other, just emulate for pipeline) before stalled load, stlf stall case is as fast as no stall cases on

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-15 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #42 from rguenther at suse dot de --- On Tue, 15 Mar 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 > > --- Comment #41 from Hongtao.liu --- > (In reply to Richard Biener from comment

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #41 from Hongtao.liu --- (In reply to Richard Biener from comment #22) > (In reply to Hongtao.liu from comment #21) > > Now we have SLP node available in vector cost hook, maybe we can do sth in > > cost model to prevent

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-14 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #40 from rguenther at suse dot de --- On Mon, 14 Mar 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 > > --- Comment #39 from Hongtao.liu --- > > > I'll see if I get around to

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-14 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #39 from Hongtao.liu --- > I'll see if I get around to prototype some argument classification > in the vectorizer (looking how hard it is to use > INIT_CUMULATIVE_ARGS in a context where we are not expanding to RTL), >

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-11 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #38 from rguenther at suse dot de --- On Fri, 11 Mar 2022, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 > > --- Comment #37 from Hongtao.liu --- > > There is not much value in the

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #37 from Hongtao.liu --- > There is not much value in the vectorization we do in this function > (when manually fixing the STLF issue the speed is as good as with the > scalar code). We cost > > ray.dir.x 1 times scalar_load costs

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #36 from Richard Biener --- As additional observation for the c-ray case we end up with [local count: 1073741824]: vect_ray_orig_x_87.270_173 = MEM [(double *)]; _170 = BIT_FIELD_REF ; _171 = BIT_FIELD_REF ; # DEBUG

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #35 from Hongtao.liu --- (In reply to Richard Biener from comment #34) > I can confirm this observation on Zen2. Note perf still records STLF > failures penalty is much higher on Znver3 than zen2 for the same case(v2df).

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #34 from Richard Biener --- I can confirm this observation on Zen2. Note perf still records STLF failures for these cases it just seems that the penalties are well hidden with the high store load on the caller side for small NUM?

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #33 from Hongtao.liu --- (In reply to Hongtao.liu from comment #32) > (In reply to Hongtao.liu from comment #31) > > Created attachment 52595 [details] > > microbenchmark > The interesting the microbenchmark didn't hit store

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #32 from Hongtao.liu --- (In reply to Hongtao.liu from comment #31) > Created attachment 52595 [details] > microbenchmark The microbenchmark is used to test penalty for STFS, I've run it on CLX, and find 1 stalled vector load is

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #31 from Hongtao.liu --- Created attachment 52595 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52595=edit microbenchmark

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-10 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #30 from Hongtao.liu --- Created attachment 52594 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52594=edit tar -xvf micro.tar.gz Num/Typechar/s char/v char/vn short/s short/v short/vnint/s int/v int/vn

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-03-01 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #29 from Hongtao.liu --- >From Agner Fog's excellent optimization manuals(https://www.agner.org/optimize/microarchitecture.pdf). For ICX/TGL: An aligned write of 128 bits or more followed by a read of one or both of the two halves

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-27 Thread lili.cui at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #28 from cuilili --- (In reply to H.J. Lu from comment #25) > Can this be mitigated by removing redundant load and store? Yes, inlining say_sphere can remove redundant loads and stores, O3 does inlining, but O2 is more sensitive to

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #27 from Hongtao.liu --- > We can start with disabling vectorization with very cheap cost model to fix Of course only for (>=)16-byte struct passing.

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #26 from Hongtao.liu --- (In reply to Richard Biener from comment #22) > (In reply to Hongtao.liu from comment #21) > > Now we have SLP node available in vector cost hook, maybe we can do sth in > > cost model to prevent

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-25 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #25 from H.J. Lu --- (In reply to cuilili from comment #24) > (In reply to cuilili from comment #23) > > (In reply to Richard Biener from comment #17) > > > I do wonder though how CLX is fine with such access pattern ;) (did you >

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-25 Thread lili.cui at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #24 from cuilili --- (In reply to cuilili from comment #23) > (In reply to Richard Biener from comment #17) > > I do wonder though how CLX is fine with such access pattern ;) (did you > > test > > with just -O2?) > Sorry, correct

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-25 Thread lili.cui at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 cuilili changed: What|Removed |Added CC||lili.cui at intel dot com --- Comment #23

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #22 from Richard Biener --- (In reply to Hongtao.liu from comment #21) > Now we have SLP node available in vector cost hook, maybe we can do sth in > cost model to prevent vectorization when node's definition from big-size >

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #21 from Hongtao.liu --- Now we have SLP node available in vector cost hook, maybe we can do sth in cost model to prevent vectorization when node's definition from big-size parameter.

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-02-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 Richard Biener changed: What|Removed |Added CC||aros at gmx dot com --- Comment #20

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-01-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 --- Comment #19 from Richard Biener --- #include struct X { double x[3]; }; typedef double v2df __attribute__((vector_size(16))); v2df __attribute__((noipa)) foo (struct X x) { return (v2df) {x.x[1], x.x[2] }; } struct X y; int main(int

[Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2

2022-01-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908 Richard Biener changed: What|Removed |Added Priority|P3 |P1