[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-11-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 Kewen Lin changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-11-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #36 from CVS Commits --- The master branch has been updated by Kewen Lin : https://gcc.gnu.org/g:f5e18dd9c7dacc9671044fc669bd5c1b26b6bdba commit r11-4637-gf5e18dd9c7dacc9671044fc669bd5c1b26b6bdba Author: Kewen Lin Date: Tue Nov 3

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-29 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 rsandifo at gcc dot gnu.org changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #34 from Hongtao.liu --- (In reply to Kewen Lin from comment #29) > (In reply to Hongtao.liu from comment #28) > > > Probably you can try to tweak it in ix86_add_stmt_cost? when the statement > > > > Yes, it's the place. > > > > > i

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-28 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #33 from Richard Biener --- (In reply to Kewen Lin from comment #32) > (In reply to Richard Biener from comment #31) > > (In reply to Kewen Lin from comment #29) > > > (In reply to Hongtao.liu from comment #28) > > > > > Probably you

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #32 from Kewen Lin --- (In reply to Richard Biener from comment #31) > (In reply to Kewen Lin from comment #29) > > (In reply to Hongtao.liu from comment #28) > > > > Probably you can try to tweak it in ix86_add_stmt_cost? when the >

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #31 from Richard Biener --- (In reply to Kewen Lin from comment #29) > (In reply to Hongtao.liu from comment #28) > > > Probably you can try to tweak it in ix86_add_stmt_cost? when the statement > > > > Yes, it's the place. > > > >

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #30 from Richard Biener --- (In reply to Hongtao.liu from comment #23) > > _813 = {_437, _448, _459, _470, _490, _501, _512, _523, _543, _554, _565, > > _576, _125, _143, _161, _179}; > > The cost of vec_construct in i386 backend i

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #29 from Kewen Lin --- (In reply to Hongtao.liu from comment #28) > > Probably you can try to tweak it in ix86_add_stmt_cost? when the statement > > Yes, it's the place. > > > is UB to UH conversion statement, further check if the d

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #28 from Hongtao.liu --- > Probably you can try to tweak it in ix86_add_stmt_cost? when the statement Yes, it's the place. > is UB to UH conversion statement, further check if the def of the input UB > is MEM. Only if there's no m

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #27 from Kewen Lin --- (In reply to Hongtao.liu from comment #22) > >One of my workmates found that if we disable vectorization for SPEC2017 > >>525.x264_r function sub4x4_dct in source file x264_src/common/dct.c with > >?>explicit

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #26 from Kewen Lin --- > > By following this idea, to release the restriction on loop_outer > > (loop_father) when setting the father_bbs, I can see FRE works as > > expectedly. But it actually does the rpo_vn from cfun's entry to it

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #25 from Kewen Lin --- > > > > Got it! For > > > > else if (vect_nop_conversion_p (stmt_info)) > > continue; > > > > Is it a good idea to change it to call record_stmt_cost like the others? > > 1) introduce one ve

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-26 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #24 from rguenther at suse dot de --- On September 27, 2020 4:56:43 AM GMT+02:00, crazylht at gmail dot com wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 > >--- Comment #22 from Hongtao.liu --- >>One of my workmates fou

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #23 from Hongtao.liu --- > _813 = {_437, _448, _459, _470, _490, _501, _512, _523, _543, _554, _565, > _576, _125, _143, _161, _179}; The cost of vec_construct in i386 backend is 64, calculated as 16 x 4 cut from i386.c --- /* N e

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #22 from Hongtao.liu --- >One of my workmates found that if we disable vectorization for SPEC2017 >>525.x264_r function sub4x4_dct in source file x264_src/common/dct.c with >?>explicit function attribute __attribute__((optimize("no-

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #21 from Richard Biener --- (In reply to Kewen Lin from comment #18) > (In reply to Richard Biener from comment #10) > > (In reply to Kewen Lin from comment #9) > > > (In reply to Richard Biener from comment #8) > > > > (In reply to K

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #20 from Richard Biener --- (In reply to Kewen Lin from comment #19) > (In reply to rguent...@suse.de from comment #17) > > On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #19 from Kewen Lin --- (In reply to rguent...@suse.de from comment #17) > On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 > > > > --- Comment #15 from Kewen Lin --- > >

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #18 from Kewen Lin --- (In reply to Richard Biener from comment #10) > (In reply to Kewen Lin from comment #9) > > (In reply to Richard Biener from comment #8) > > > (In reply to Kewen Lin from comment #7) > > > > Two questions in min

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-21 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #17 from rguenther at suse dot de --- On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 > > --- Comment #15 from Kewen Lin --- > (In reply to rguent...@suse.de from comment #1

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-19 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #16 from Hongtao.liu --- I notice 0x5561dc0 _36 * 2 1 times scalar_stmt costs 16 in body 0x5561dc0 _38 * 2 1 times scalar_stmt costs 16 in body 0x5562df0 _36 * 2 1 times vector_stmt costs 16 in body 0x5562df0 _38 * 2 1 times vector_s

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-18 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #15 from Kewen Lin --- (In reply to rguent...@suse.de from comment #14) > On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 > > > > --- Comment #13 from Kewen Lin --- > >

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-18 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #14 from rguenther at suse dot de --- On Fri, 18 Sep 2020, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 > > --- Comment #13 from Kewen Lin --- > > 2) on Power, the conversion from unsigned

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-18 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #13 from Kewen Lin --- > 2) on Power, the conversion from unsigned char to unsigned short is nop > conversion, when we counting scalar cost, it's counted, then add costs 32 > totally onto scalar cost. Meanwhile, the conversion from

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-16 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #12 from Kewen Lin --- > Thanks for the explanation! I'll look at it after checking 2). IIUC, the > advantage to eliminate stores here looks able to get those things which is > fed to stores and stores' consumers bundled, then get mo

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-16 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #11 from Kewen Lin --- (In reply to Richard Biener from comment #10) > (In reply to Kewen Lin from comment #9) > > (In reply to Richard Biener from comment #8) > > > (In reply to Kewen Lin from comment #7) > > > > Two questions in min

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #10 from Richard Biener --- (In reply to Kewen Lin from comment #9) > (In reply to Richard Biener from comment #8) > > (In reply to Kewen Lin from comment #7) > > > Two questions in mind, need to dig into it further: > > > 1) from t

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-16 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #9 from Kewen Lin --- (In reply to Richard Biener from comment #8) > (In reply to Kewen Lin from comment #7) > > Two questions in mind, need to dig into it further: > > 1) from the assembly of scalar/vector code, I don't see any sto

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #8 from Richard Biener --- (In reply to Kewen Lin from comment #7) > Two questions in mind, need to dig into it further: > 1) from the assembly of scalar/vector code, I don't see any stores needed > into temp array d (array diff in

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-16 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 Kewen Lin changed: What|Removed |Added Last reconfirmed||2020-09-16 Status|UNCONFIRMED

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-08-30 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 Kewen Lin changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org --- Comment

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-08-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #5 from Richard Biener --- testcase from https://github.com/mirror/x264/blob/master/common/dct.c where FENC_STRIDE is 16 and FDEC_STRIDE 32 pixel is unsigned char, dctcoef is unsigned short static inline void pixel_sub_wxh( dctcoef

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-08-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #4 from Richard Biener --- This delays some checks to eventually support part of the BB vectorization which is what succeeds here. I suspect that w/o vectorization we manage to elide the tmp[] array but with the part vectorization th

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-08-26 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #3 from Kewen Lin --- Bisection shows it started to fail from r11-205.

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-08-26 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 --- Comment #2 from Kewen Lin --- Created attachment 49124 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49124&action=edit sub4x4_dct SLP dumping

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-08-25 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789 Richard Biener changed: What|Removed |Added Component|tree-optimization |target Keywords|