https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
Richard Biener changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #48 from CVS Commits ---
The master branch has been updated by hongtao Liu :
https://gcc.gnu.org/g:e3174d6183e5c042e822d9feabb670235b737441
commit r12-7990-ge3174d6183e5c042e822d9feabb670235b737441
Author: liuhongt
Date: Wed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #47 from rguenther at suse dot de ---
On Tue, 29 Mar 2022, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
>
> --- Comment #46 from Hongtao.liu ---
> Another issue is splitting vector load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #46 from Hongtao.liu ---
Another issue is splitting vector load to halves or elements, the latter
requires scratch registers which may not be available, the former doesn't
require extra register but may still trigger STLF stalls.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #45 from rguenther at suse dot de ---
On Tue, 29 Mar 2022, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
>
> --- Comment #43 from Hongtao.liu ---
> One thing I found by experiments:
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #44 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #43)
> One thing I found by experiments:
> Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other,
> just emulate for pipeline) before stalled load,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #43 from Hongtao.liu ---
One thing I found by experiments:
Insert 64 vaddps %xmm18, %xmm19, %xmm20(no dependence between each other, just
emulate for pipeline) before stalled load, stlf stall case is as fast as no
stall cases on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #42 from rguenther at suse dot de ---
On Tue, 15 Mar 2022, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
>
> --- Comment #41 from Hongtao.liu ---
> (In reply to Richard Biener from comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #41 from Hongtao.liu ---
(In reply to Richard Biener from comment #22)
> (In reply to Hongtao.liu from comment #21)
> > Now we have SLP node available in vector cost hook, maybe we can do sth in
> > cost model to prevent
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #40 from rguenther at suse dot de ---
On Mon, 14 Mar 2022, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
>
> --- Comment #39 from Hongtao.liu ---
>
> > I'll see if I get around to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #39 from Hongtao.liu ---
> I'll see if I get around to prototype some argument classification
> in the vectorizer (looking how hard it is to use
> INIT_CUMULATIVE_ARGS in a context where we are not expanding to RTL),
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #38 from rguenther at suse dot de ---
On Fri, 11 Mar 2022, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
>
> --- Comment #37 from Hongtao.liu ---
> > There is not much value in the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #37 from Hongtao.liu ---
> There is not much value in the vectorization we do in this function
> (when manually fixing the STLF issue the speed is as good as with the
> scalar code). We cost
>
> ray.dir.x 1 times scalar_load costs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #36 from Richard Biener ---
As additional observation for the c-ray case we end up with
[local count: 1073741824]:
vect_ray_orig_x_87.270_173 = MEM [(double *)];
_170 = BIT_FIELD_REF ;
_171 = BIT_FIELD_REF ;
# DEBUG
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #35 from Hongtao.liu ---
(In reply to Richard Biener from comment #34)
> I can confirm this observation on Zen2. Note perf still records STLF
> failures
penalty is much higher on Znver3 than zen2 for the same case(v2df).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #34 from Richard Biener ---
I can confirm this observation on Zen2. Note perf still records STLF failures
for these cases it just seems that the penalties are well hidden with the
high store load on the caller side for small NUM?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #33 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #32)
> (In reply to Hongtao.liu from comment #31)
> > Created attachment 52595 [details]
> > microbenchmark
>
The interesting the microbenchmark didn't hit store
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #32 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #31)
> Created attachment 52595 [details]
> microbenchmark
The microbenchmark is used to test penalty for STFS, I've run it on CLX, and
find 1 stalled vector load is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #31 from Hongtao.liu ---
Created attachment 52595
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52595=edit
microbenchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #30 from Hongtao.liu ---
Created attachment 52594
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52594=edit
tar -xvf micro.tar.gz
Num/Typechar/s char/v char/vn short/s short/v short/vnint/s
int/v int/vn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #29 from Hongtao.liu ---
>From Agner Fog's excellent optimization
manuals(https://www.agner.org/optimize/microarchitecture.pdf).
For ICX/TGL:
An aligned write of 128 bits or more followed by a read of one or both of the
two halves
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #28 from cuilili ---
(In reply to H.J. Lu from comment #25)
> Can this be mitigated by removing redundant load and store?
Yes, inlining say_sphere can remove redundant loads and stores, O3 does
inlining, but O2 is more sensitive to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #27 from Hongtao.liu ---
> We can start with disabling vectorization with very cheap cost model to fix
Of course only for (>=)16-byte struct passing.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #26 from Hongtao.liu ---
(In reply to Richard Biener from comment #22)
> (In reply to Hongtao.liu from comment #21)
> > Now we have SLP node available in vector cost hook, maybe we can do sth in
> > cost model to prevent
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #25 from H.J. Lu ---
(In reply to cuilili from comment #24)
> (In reply to cuilili from comment #23)
> > (In reply to Richard Biener from comment #17)
> > > I do wonder though how CLX is fine with such access pattern ;) (did you
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #24 from cuilili ---
(In reply to cuilili from comment #23)
> (In reply to Richard Biener from comment #17)
> > I do wonder though how CLX is fine with such access pattern ;) (did you
> > test
> > with just -O2?)
>
Sorry, correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
cuilili changed:
What|Removed |Added
CC||lili.cui at intel dot com
--- Comment #23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #22 from Richard Biener ---
(In reply to Hongtao.liu from comment #21)
> Now we have SLP node available in vector cost hook, maybe we can do sth in
> cost model to prevent vectorization when node's definition from big-size
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #21 from Hongtao.liu ---
Now we have SLP node available in vector cost hook, maybe we can do sth in cost
model to prevent vectorization when node's definition from big-size parameter.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
Richard Biener changed:
What|Removed |Added
CC||aros at gmx dot com
--- Comment #20
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
--- Comment #19 from Richard Biener ---
#include
struct X { double x[3]; };
typedef double v2df __attribute__((vector_size(16)));
v2df __attribute__((noipa))
foo (struct X x)
{
return (v2df) {x.x[1], x.x[2] };
}
struct X y;
int main(int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101908
Richard Biener changed:
What|Removed |Added
Priority|P3 |P1
32 matches
Mail list logo