Re: [PATCH] Fix PR91522
On Fri, Aug 23, 2019 at 3:57 AM Richard Biener wrote: > > On Fri, 23 Aug 2019, Uros Bizjak wrote: > > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener wrote: > > > > > > > > > This fixes quadraticness in STV and makes > > > > > > machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( > > > 95%) 54 kB ( 0%) > > > > > > drop to zero. Anybody remembers why it is the way it is now? > > > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > > > > > OK? > > > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete > > and will be updated? > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 > > Yes. I'm still learning how STV operates (and learing DF and RTL...). > The following is a rewrite of the non-TImode chain conversion > according to I think how it should operate als allowing the hunk > that fixes the compile-time and fixing PR91527 on the way > (which I ran into during more extensive testing of the patch myself). > > So compared to the state before which I still do not 100% understand > we now do the following. Chain detection works as before including > recording of all defs (both defined by the insns in the chain and > insns outside) that need copy-in or copy-out operations. > > But then the patch changes things as to guarantee that > after the conversion all uses/defs of a pseudo are > of the (subreg:Vmode ..) form or of the original scalar form. > In particular it avoids the need to change any insns that > are not part of the chain (besides emitting the extra copy > instructions). For this all defs that were marked as needing > copies (thus they have uses/defs both in the chain and outside) > the chain will use a new pseudo that we copy to from scalar sources > and that we copy from for scalar uses. There's the new defs_map > which records the mapping of old to new reg. pseudos that are > only used in the chain already are not remapped. > > The conversion itself then happens in two stages, first, > in make_vector_copies, we emit the copy-in insns and > allocate all pseudos we need. Then the rest of the > conversion happens fully inside of convert_insn where > we generate the copy-outs of the insns def, replace > defs and uses according to the mapping and replace uses > and defs with the (subreg:Vmode ..) style. > > For PR91527 IRA doesn't like the REG_EQUIV note in > > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0) > (subreg:V4SI (reg:SI 100) 0)) > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 > 1248 {movv4si_internal} > (expr_list:REG_DEAD (reg:SI 100) > (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp) > (const_int 16 [0x10])) [1 c+0 S4 A64]) > (nil > > because the SET_DEST is not a REG_P. I'm not sure if this > is invalid RTL, docs say SET_DEST might be a strict_low_part > or a zero_extract but doesn't mention a subreg. So I opted > to simply remove equal/equiv notes on insns we convert > and since the above has a REG_DEAD note I took the liberty > to update that according to the mapping (so that would have > been not needed before this patch) rather than dropping it. > > Bootstrapped with and without --with-march=westmere (to get > some STV coverage, this included all languages) on > x88_64-unknown-linux-gnu, testing in progress. > > OK if testing succeeds? > > It still solves the compile-time issue (which is a latent issue, > btw, and with a carefully crafted testcase can be triggered > since STV exists for DImode chains with !TARGET_64BIT). > > Thanks, > Richard. > > 2019-08-22 Richard Biener > > PR target/91522 > PR target/91527 > * config/i386/i386-features.h (general_scalar_chain::defs_map): > New member. > (general_scalar_chain::replace_with_subreg): Remove. > (general_scalar_chain::replace_with_subreg_in_insn): Likewise. > (general_scalar_chain::convert_reg): Adjust signature. > * config/i386/i386-features.c (scalar_chain::add_insn): Do not > iterate over all defs of a reg. > (general_scalar_chain::replace_with_subreg): Remove. > (general_scalar_chain::replace_with_subreg_in_insn): Likewise. > (general_scalar_chain::make_vector_copies): Populate defs_map, > place copy only after defs that are used as vectors in the chain. > (general_scalar_chain::convert_reg): Emit a copy for a specific > def in a specific instruction. > (general_scalar_chain::convert_op): All reg uses are converted here. > (general_scalar_chain::convert_insn): Emit copies for scalar > uses of defs here. Replace uses with the copies we created. > Replace and convert the def. Adjust REG_DEAD notes, remove > REG_EQUIV/EQUAL notes. > (general_scalar_chain::convert_registers): Only handle copies > into the chain here. > This caused: https://gcc.gnu.org/bugzilla/sho
Re: [PATCH] Fix PR91522
On Wed, Aug 28, 2019 at 3:54 PM Richard Biener wrote: > > >>> 2019-08-23 Richard Biener > > >>> > > >>> PR target/91522 > > >>> PR target/91527 > > >>> * config/i386/i386-features.h (general_scalar_chain::defs_map): > > >>> New member. > > >>> (general_scalar_chain::replace_with_subreg): Remove. > > >>> (general_scalar_chain::replace_with_subreg_in_insn): Likewise. > > >>> (general_scalar_chain::convert_reg): Adjust signature. > > >>> * config/i386/i386-features.c (scalar_chain::add_insn): Do not > > >>> iterate over all defs of a reg. > > >>> (general_scalar_chain::replace_with_subreg): Remove. > > >>> (general_scalar_chain::replace_with_subreg_in_insn): Likewise. > > >>> (general_scalar_chain::make_vector_copies): Populate defs_map, > > >>> place copy only after defs that are used as vectors in the chain. > > >>> (general_scalar_chain::convert_reg): Emit a copy for a specific > > >>> def in a specific instruction. > > >>> (general_scalar_chain::convert_op): All reg uses are converted > > >>> here. > > >>> (general_scalar_chain::convert_insn): Emit copies for scalar > > >>> uses of defs here. Replace uses with the copies we created. > > >>> Replace and convert the def. Adjust REG_DEAD notes, remove > > >>> REG_EQUIV/EQUAL notes. > > >>> (general_scalar_chain::convert_registers): Only handle copies > > >>> into the chain here. > > > > > > Rubberstamped with LGTM. It looks you are the master of this domain now ;) > > > > This breaks bootstrap for i686-darwin (and most likely is the cause of > > bootstrap fail on > > i686-linux i686-linux-gnu at > > https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00398.html > > et al) > > It gives a bunch of compare errors spread around the tree - so no specific > > pointer there. > > > > There’s a secondary fail overlaying it between 274933-or so and 274980 > > which confused > > my initial search. > > Please file a bugreport. Don't have time today to dig into but will do > tomorrow (but now at least try to reproduce). Looking at HJ's testreports, it looks configuring for i686-linux --with-fpmath=sse should be enough to trigger comparison failure. Uros.
Re: [PATCH] Fix PR91522
On Wed, 28 Aug 2019, Iain Sandoe wrote: > > > > > On 26 Aug 2019, at 11:06, Uros Bizjak wrote: > > > > On Mon, Aug 26, 2019 at 10:40 AM Richard Biener wrote: > >> > >> On Fri, 23 Aug 2019, Richard Biener wrote: > >> > >>> On Fri, 23 Aug 2019, Richard Biener wrote: > >>> > On Fri, 23 Aug 2019, Uros Bizjak wrote: > > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener > > wrote: > >> > >> > >> This fixes quadraticness in STV and makes > >> > >> machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) > >> 89.10 ( > >> 95%) 54 kB ( 0%) > >> > >> drop to zero. Anybody remembers why it is the way it is now? > >> > >> Bootstrap / regtest running on x86_64-unknown-linux-gnu. > >> > >> OK? > > > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete > > and will be updated? > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 > > Yes. I'm still learning how STV operates (and learing DF and RTL...). > The following is a rewrite of the non-TImode chain conversion > according to I think how it should operate als allowing the hunk > that fixes the compile-time and fixing PR91527 on the way > (which I ran into during more extensive testing of the patch myself). > > So compared to the state before which I still do not 100% understand > we now do the following. Chain detection works as before including > recording of all defs (both defined by the insns in the chain and > insns outside) that need copy-in or copy-out operations. > > But then the patch changes things as to guarantee that > after the conversion all uses/defs of a pseudo are > of the (subreg:Vmode ..) form or of the original scalar form. > In particular it avoids the need to change any insns that > are not part of the chain (besides emitting the extra copy > instructions). For this all defs that were marked as needing > copies (thus they have uses/defs both in the chain and outside) > the chain will use a new pseudo that we copy to from scalar sources > and that we copy from for scalar uses. There's the new defs_map > which records the mapping of old to new reg. pseudos that are > only used in the chain already are not remapped. > > The conversion itself then happens in two stages, first, > in make_vector_copies, we emit the copy-in insns and > allocate all pseudos we need. Then the rest of the > conversion happens fully inside of convert_insn where > we generate the copy-outs of the insns def, replace > defs and uses according to the mapping and replace uses > and defs with the (subreg:Vmode ..) style. > > For PR91527 IRA doesn't like the REG_EQUIV note in > > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0) > (subreg:V4SI (reg:SI 100) 0)) > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 > 1248 {movv4si_internal} > (expr_list:REG_DEAD (reg:SI 100) > (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp) > (const_int 16 [0x10])) [1 c+0 S4 A64]) > (nil > > because the SET_DEST is not a REG_P. I'm not sure if this > is invalid RTL, docs say SET_DEST might be a strict_low_part > or a zero_extract but doesn't mention a subreg. So I opted > to simply remove equal/equiv notes on insns we convert > and since the above has a REG_DEAD note I took the liberty > to update that according to the mapping (so that would have > been not needed before this patch) rather than dropping it. > > Bootstrapped with and without --with-march=westmere (to get > some STV coverage, this included all languages) on > x88_64-unknown-linux-gnu, testing in progress. > > OK if testing succeeds? > >>> > >>> Testing revealed I made an error in general_scalar_chain::convert_insn > >>> failing to move down SET_SRC extraction below replacing with > >>> the defs map. This showed in 4 execute FAILs in 32bit fortran > >>> testing (only). Fixed by moving down the whole def_set/src/dst > >>> extraction. > >>> > >>> Re-testing on x86_64-unknown-linux-gnu. > >> > >> Bootstrapped / tested on x86_64-unknown-linux-gnu. The "no-costmodel" > >> run runs into the latent PR91528 building 32bit libada in stage3 > >> for a few sources, I've manually built those with -mno-stv and > >> bootstrap finishes successfully. I hope HJ can help with this > >> dynamic stack-alignment issue. > >> > >> So - OK for trunk? > >> > >> As followup we can now remove general_remove_non_convertible_regs > >> because we can handle defs that cannot be converted just fine > >> with the patch since we're splitting live-ranges of all defs at > >> the chain boundary. > >> > >> Thanks, > >> Richa
Re: [PATCH] Fix PR91522
> On 26 Aug 2019, at 11:06, Uros Bizjak wrote: > > On Mon, Aug 26, 2019 at 10:40 AM Richard Biener wrote: >> >> On Fri, 23 Aug 2019, Richard Biener wrote: >> >>> On Fri, 23 Aug 2019, Richard Biener wrote: >>> On Fri, 23 Aug 2019, Uros Bizjak wrote: > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener wrote: >> >> >> This fixes quadraticness in STV and makes >> >> machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( >> 95%) 54 kB ( 0%) >> >> drop to zero. Anybody remembers why it is the way it is now? >> >> Bootstrap / regtest running on x86_64-unknown-linux-gnu. >> >> OK? > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete > and will be updated? > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 Yes. I'm still learning how STV operates (and learing DF and RTL...). The following is a rewrite of the non-TImode chain conversion according to I think how it should operate als allowing the hunk that fixes the compile-time and fixing PR91527 on the way (which I ran into during more extensive testing of the patch myself). So compared to the state before which I still do not 100% understand we now do the following. Chain detection works as before including recording of all defs (both defined by the insns in the chain and insns outside) that need copy-in or copy-out operations. But then the patch changes things as to guarantee that after the conversion all uses/defs of a pseudo are of the (subreg:Vmode ..) form or of the original scalar form. In particular it avoids the need to change any insns that are not part of the chain (besides emitting the extra copy instructions). For this all defs that were marked as needing copies (thus they have uses/defs both in the chain and outside) the chain will use a new pseudo that we copy to from scalar sources and that we copy from for scalar uses. There's the new defs_map which records the mapping of old to new reg. pseudos that are only used in the chain already are not remapped. The conversion itself then happens in two stages, first, in make_vector_copies, we emit the copy-in insns and allocate all pseudos we need. Then the rest of the conversion happens fully inside of convert_insn where we generate the copy-outs of the insns def, replace defs and uses according to the mapping and replace uses and defs with the (subreg:Vmode ..) style. For PR91527 IRA doesn't like the REG_EQUIV note in (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0) (subreg:V4SI (reg:SI 100) 0)) "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 1248 {movv4si_internal} (expr_list:REG_DEAD (reg:SI 100) (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp) (const_int 16 [0x10])) [1 c+0 S4 A64]) (nil because the SET_DEST is not a REG_P. I'm not sure if this is invalid RTL, docs say SET_DEST might be a strict_low_part or a zero_extract but doesn't mention a subreg. So I opted to simply remove equal/equiv notes on insns we convert and since the above has a REG_DEAD note I took the liberty to update that according to the mapping (so that would have been not needed before this patch) rather than dropping it. Bootstrapped with and without --with-march=westmere (to get some STV coverage, this included all languages) on x88_64-unknown-linux-gnu, testing in progress. OK if testing succeeds? >>> >>> Testing revealed I made an error in general_scalar_chain::convert_insn >>> failing to move down SET_SRC extraction below replacing with >>> the defs map. This showed in 4 execute FAILs in 32bit fortran >>> testing (only). Fixed by moving down the whole def_set/src/dst >>> extraction. >>> >>> Re-testing on x86_64-unknown-linux-gnu. >> >> Bootstrapped / tested on x86_64-unknown-linux-gnu. The "no-costmodel" >> run runs into the latent PR91528 building 32bit libada in stage3 >> for a few sources, I've manually built those with -mno-stv and >> bootstrap finishes successfully. I hope HJ can help with this >> dynamic stack-alignment issue. >> >> So - OK for trunk? >> >> As followup we can now remove general_remove_non_convertible_regs >> because we can handle defs that cannot be converted just fine >> with the patch since we're splitting live-ranges of all defs at >> the chain boundary. >> >> Thanks, >> Richard. >> >>> Updated patch below. I'm feeling adventurous and will run >>> the "westmere" bootstrap with costing disabled (aka always >>> convert detected chains...). >>> >>> Richard. >>> >>> 2019-08-23 Richard Biener >>> >>> PR target/91522 >>> PR target/91527 >>>
Re: [PATCH] Fix PR91522
On Mon, Aug 26, 2019 at 10:40 AM Richard Biener wrote: > > On Fri, 23 Aug 2019, Richard Biener wrote: > > > On Fri, 23 Aug 2019, Richard Biener wrote: > > > > > On Fri, 23 Aug 2019, Uros Bizjak wrote: > > > > > > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener > > > > wrote: > > > > > > > > > > > > > > > This fixes quadraticness in STV and makes > > > > > > > > > > machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) > > > > > 89.10 ( > > > > > 95%) 54 kB ( 0%) > > > > > > > > > > drop to zero. Anybody remembers why it is the way it is now? > > > > > > > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > > > > > > > > > OK? > > > > > > > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete > > > > and will be updated? > > > > > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 > > > > > > Yes. I'm still learning how STV operates (and learing DF and RTL...). > > > The following is a rewrite of the non-TImode chain conversion > > > according to I think how it should operate als allowing the hunk > > > that fixes the compile-time and fixing PR91527 on the way > > > (which I ran into during more extensive testing of the patch myself). > > > > > > So compared to the state before which I still do not 100% understand > > > we now do the following. Chain detection works as before including > > > recording of all defs (both defined by the insns in the chain and > > > insns outside) that need copy-in or copy-out operations. > > > > > > But then the patch changes things as to guarantee that > > > after the conversion all uses/defs of a pseudo are > > > of the (subreg:Vmode ..) form or of the original scalar form. > > > In particular it avoids the need to change any insns that > > > are not part of the chain (besides emitting the extra copy > > > instructions). For this all defs that were marked as needing > > > copies (thus they have uses/defs both in the chain and outside) > > > the chain will use a new pseudo that we copy to from scalar sources > > > and that we copy from for scalar uses. There's the new defs_map > > > which records the mapping of old to new reg. pseudos that are > > > only used in the chain already are not remapped. > > > > > > The conversion itself then happens in two stages, first, > > > in make_vector_copies, we emit the copy-in insns and > > > allocate all pseudos we need. Then the rest of the > > > conversion happens fully inside of convert_insn where > > > we generate the copy-outs of the insns def, replace > > > defs and uses according to the mapping and replace uses > > > and defs with the (subreg:Vmode ..) style. > > > > > > For PR91527 IRA doesn't like the REG_EQUIV note in > > > > > > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0) > > > (subreg:V4SI (reg:SI 100) 0)) > > > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 > > > 1248 {movv4si_internal} > > > (expr_list:REG_DEAD (reg:SI 100) > > > (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp) > > > (const_int 16 [0x10])) [1 c+0 S4 A64]) > > > (nil > > > > > > because the SET_DEST is not a REG_P. I'm not sure if this > > > is invalid RTL, docs say SET_DEST might be a strict_low_part > > > or a zero_extract but doesn't mention a subreg. So I opted > > > to simply remove equal/equiv notes on insns we convert > > > and since the above has a REG_DEAD note I took the liberty > > > to update that according to the mapping (so that would have > > > been not needed before this patch) rather than dropping it. > > > > > > Bootstrapped with and without --with-march=westmere (to get > > > some STV coverage, this included all languages) on > > > x88_64-unknown-linux-gnu, testing in progress. > > > > > > OK if testing succeeds? > > > > Testing revealed I made an error in general_scalar_chain::convert_insn > > failing to move down SET_SRC extraction below replacing with > > the defs map. This showed in 4 execute FAILs in 32bit fortran > > testing (only). Fixed by moving down the whole def_set/src/dst > > extraction. > > > > Re-testing on x86_64-unknown-linux-gnu. > > Bootstrapped / tested on x86_64-unknown-linux-gnu. The "no-costmodel" > run runs into the latent PR91528 building 32bit libada in stage3 > for a few sources, I've manually built those with -mno-stv and > bootstrap finishes successfully. I hope HJ can help with this > dynamic stack-alignment issue. > > So - OK for trunk? > > As followup we can now remove general_remove_non_convertible_regs > because we can handle defs that cannot be converted just fine > with the patch since we're splitting live-ranges of all defs at > the chain boundary. > > Thanks, > Richard. > > > Updated patch below. I'm feeling adventurous and will run > > the "westmere" bootstrap with costing disabled (aka always > > convert detected chains...). > > > > Richard. > > > > 2019-08-23 Richard Biener > > > > PR target/91522 > >
Re: [PATCH] Fix PR91522
On Fri, 23 Aug 2019, Richard Biener wrote: > On Fri, 23 Aug 2019, Richard Biener wrote: > > > On Fri, 23 Aug 2019, Uros Bizjak wrote: > > > > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener wrote: > > > > > > > > > > > > This fixes quadraticness in STV and makes > > > > > > > > machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) > > > > 89.10 ( > > > > 95%) 54 kB ( 0%) > > > > > > > > drop to zero. Anybody remembers why it is the way it is now? > > > > > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > > > > > > > OK? > > > > > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete > > > and will be updated? > > > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 > > > > Yes. I'm still learning how STV operates (and learing DF and RTL...). > > The following is a rewrite of the non-TImode chain conversion > > according to I think how it should operate als allowing the hunk > > that fixes the compile-time and fixing PR91527 on the way > > (which I ran into during more extensive testing of the patch myself). > > > > So compared to the state before which I still do not 100% understand > > we now do the following. Chain detection works as before including > > recording of all defs (both defined by the insns in the chain and > > insns outside) that need copy-in or copy-out operations. > > > > But then the patch changes things as to guarantee that > > after the conversion all uses/defs of a pseudo are > > of the (subreg:Vmode ..) form or of the original scalar form. > > In particular it avoids the need to change any insns that > > are not part of the chain (besides emitting the extra copy > > instructions). For this all defs that were marked as needing > > copies (thus they have uses/defs both in the chain and outside) > > the chain will use a new pseudo that we copy to from scalar sources > > and that we copy from for scalar uses. There's the new defs_map > > which records the mapping of old to new reg. pseudos that are > > only used in the chain already are not remapped. > > > > The conversion itself then happens in two stages, first, > > in make_vector_copies, we emit the copy-in insns and > > allocate all pseudos we need. Then the rest of the > > conversion happens fully inside of convert_insn where > > we generate the copy-outs of the insns def, replace > > defs and uses according to the mapping and replace uses > > and defs with the (subreg:Vmode ..) style. > > > > For PR91527 IRA doesn't like the REG_EQUIV note in > > > > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0) > > (subreg:V4SI (reg:SI 100) 0)) > > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 > > > > 1248 {movv4si_internal} > > (expr_list:REG_DEAD (reg:SI 100) > > (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp) > > (const_int 16 [0x10])) [1 c+0 S4 A64]) > > (nil > > > > because the SET_DEST is not a REG_P. I'm not sure if this > > is invalid RTL, docs say SET_DEST might be a strict_low_part > > or a zero_extract but doesn't mention a subreg. So I opted > > to simply remove equal/equiv notes on insns we convert > > and since the above has a REG_DEAD note I took the liberty > > to update that according to the mapping (so that would have > > been not needed before this patch) rather than dropping it. > > > > Bootstrapped with and without --with-march=westmere (to get > > some STV coverage, this included all languages) on > > x88_64-unknown-linux-gnu, testing in progress. > > > > OK if testing succeeds? > > Testing revealed I made an error in general_scalar_chain::convert_insn > failing to move down SET_SRC extraction below replacing with > the defs map. This showed in 4 execute FAILs in 32bit fortran > testing (only). Fixed by moving down the whole def_set/src/dst > extraction. > > Re-testing on x86_64-unknown-linux-gnu. Bootstrapped / tested on x86_64-unknown-linux-gnu. The "no-costmodel" run runs into the latent PR91528 building 32bit libada in stage3 for a few sources, I've manually built those with -mno-stv and bootstrap finishes successfully. I hope HJ can help with this dynamic stack-alignment issue. So - OK for trunk? As followup we can now remove general_remove_non_convertible_regs because we can handle defs that cannot be converted just fine with the patch since we're splitting live-ranges of all defs at the chain boundary. Thanks, Richard. > Updated patch below. I'm feeling adventurous and will run > the "westmere" bootstrap with costing disabled (aka always > convert detected chains...). > > Richard. > > 2019-08-23 Richard Biener > > PR target/91522 > PR target/91527 > * config/i386/i386-features.h (general_scalar_chain::defs_map): > New member. > (general_scalar_chain::replace_with_subreg): Remove. > (general_scalar_chain::replace_with_subreg_in_insn): Likewise. > (general_sca
Re: [PATCH] Fix PR91522
On Fri, 23 Aug 2019, Richard Biener wrote: > On Fri, 23 Aug 2019, Uros Bizjak wrote: > > > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener wrote: > > > > > > > > > This fixes quadraticness in STV and makes > > > > > > machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( > > > 95%) 54 kB ( 0%) > > > > > > drop to zero. Anybody remembers why it is the way it is now? > > > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > > > > > OK? > > > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete > > and will be updated? > > > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 > > Yes. I'm still learning how STV operates (and learing DF and RTL...). > The following is a rewrite of the non-TImode chain conversion > according to I think how it should operate als allowing the hunk > that fixes the compile-time and fixing PR91527 on the way > (which I ran into during more extensive testing of the patch myself). > > So compared to the state before which I still do not 100% understand > we now do the following. Chain detection works as before including > recording of all defs (both defined by the insns in the chain and > insns outside) that need copy-in or copy-out operations. > > But then the patch changes things as to guarantee that > after the conversion all uses/defs of a pseudo are > of the (subreg:Vmode ..) form or of the original scalar form. > In particular it avoids the need to change any insns that > are not part of the chain (besides emitting the extra copy > instructions). For this all defs that were marked as needing > copies (thus they have uses/defs both in the chain and outside) > the chain will use a new pseudo that we copy to from scalar sources > and that we copy from for scalar uses. There's the new defs_map > which records the mapping of old to new reg. pseudos that are > only used in the chain already are not remapped. > > The conversion itself then happens in two stages, first, > in make_vector_copies, we emit the copy-in insns and > allocate all pseudos we need. Then the rest of the > conversion happens fully inside of convert_insn where > we generate the copy-outs of the insns def, replace > defs and uses according to the mapping and replace uses > and defs with the (subreg:Vmode ..) style. > > For PR91527 IRA doesn't like the REG_EQUIV note in > > (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0) > (subreg:V4SI (reg:SI 100) 0)) > "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 > > 1248 {movv4si_internal} > (expr_list:REG_DEAD (reg:SI 100) > (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp) > (const_int 16 [0x10])) [1 c+0 S4 A64]) > (nil > > because the SET_DEST is not a REG_P. I'm not sure if this > is invalid RTL, docs say SET_DEST might be a strict_low_part > or a zero_extract but doesn't mention a subreg. So I opted > to simply remove equal/equiv notes on insns we convert > and since the above has a REG_DEAD note I took the liberty > to update that according to the mapping (so that would have > been not needed before this patch) rather than dropping it. > > Bootstrapped with and without --with-march=westmere (to get > some STV coverage, this included all languages) on > x88_64-unknown-linux-gnu, testing in progress. > > OK if testing succeeds? Testing revealed I made an error in general_scalar_chain::convert_insn failing to move down SET_SRC extraction below replacing with the defs map. This showed in 4 execute FAILs in 32bit fortran testing (only). Fixed by moving down the whole def_set/src/dst extraction. Re-testing on x86_64-unknown-linux-gnu. Updated patch below. I'm feeling adventurous and will run the "westmere" bootstrap with costing disabled (aka always convert detected chains...). Richard. 2019-08-23 Richard Biener PR target/91522 PR target/91527 * config/i386/i386-features.h (general_scalar_chain::defs_map): New member. (general_scalar_chain::replace_with_subreg): Remove. (general_scalar_chain::replace_with_subreg_in_insn): Likewise. (general_scalar_chain::convert_reg): Adjust signature. * config/i386/i386-features.c (scalar_chain::add_insn): Do not iterate over all defs of a reg. (general_scalar_chain::replace_with_subreg): Remove. (general_scalar_chain::replace_with_subreg_in_insn): Likewise. (general_scalar_chain::make_vector_copies): Populate defs_map, place copy only after defs that are used as vectors in the chain. (general_scalar_chain::convert_reg): Emit a copy for a specific def in a specific instruction. (general_scalar_chain::convert_op): All reg uses are converted here. (general_scalar_chain::convert_insn): Emit copies for scalar uses of defs here. Replace uses with the copies we created. Replace and convert the
Re: [PATCH] Fix PR91522
On Fri, 23 Aug 2019, Uros Bizjak wrote: > On Thu, Aug 22, 2019 at 3:35 PM Richard Biener wrote: > > > > > > This fixes quadraticness in STV and makes > > > > machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( > > 95%) 54 kB ( 0%) > > > > drop to zero. Anybody remembers why it is the way it is now? > > > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > > > OK? > > Looking at the PR, comment #3 [1], I assume this patch is obsoltete > and will be updated? > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 Yes. I'm still learning how STV operates (and learing DF and RTL...). The following is a rewrite of the non-TImode chain conversion according to I think how it should operate als allowing the hunk that fixes the compile-time and fixing PR91527 on the way (which I ran into during more extensive testing of the patch myself). So compared to the state before which I still do not 100% understand we now do the following. Chain detection works as before including recording of all defs (both defined by the insns in the chain and insns outside) that need copy-in or copy-out operations. But then the patch changes things as to guarantee that after the conversion all uses/defs of a pseudo are of the (subreg:Vmode ..) form or of the original scalar form. In particular it avoids the need to change any insns that are not part of the chain (besides emitting the extra copy instructions). For this all defs that were marked as needing copies (thus they have uses/defs both in the chain and outside) the chain will use a new pseudo that we copy to from scalar sources and that we copy from for scalar uses. There's the new defs_map which records the mapping of old to new reg. pseudos that are only used in the chain already are not remapped. The conversion itself then happens in two stages, first, in make_vector_copies, we emit the copy-in insns and allocate all pseudos we need. Then the rest of the conversion happens fully inside of convert_insn where we generate the copy-outs of the insns def, replace defs and uses according to the mapping and replace uses and defs with the (subreg:Vmode ..) style. For PR91527 IRA doesn't like the REG_EQUIV note in (insn 4 24 5 2 (set (subreg:V4SI (reg/v:SI 90 [ c ]) 0) (subreg:V4SI (reg:SI 100) 0)) "/space/rguenther/src/svn/trunk2/gcc/testsuite/g++.dg/tree-ssa/pr21463.C":11:4 1248 {movv4si_internal} (expr_list:REG_DEAD (reg:SI 100) (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 16 argp) (const_int 16 [0x10])) [1 c+0 S4 A64]) (nil because the SET_DEST is not a REG_P. I'm not sure if this is invalid RTL, docs say SET_DEST might be a strict_low_part or a zero_extract but doesn't mention a subreg. So I opted to simply remove equal/equiv notes on insns we convert and since the above has a REG_DEAD note I took the liberty to update that according to the mapping (so that would have been not needed before this patch) rather than dropping it. Bootstrapped with and without --with-march=westmere (to get some STV coverage, this included all languages) on x88_64-unknown-linux-gnu, testing in progress. OK if testing succeeds? It still solves the compile-time issue (which is a latent issue, btw, and with a carefully crafted testcase can be triggered since STV exists for DImode chains with !TARGET_64BIT). Thanks, Richard. 2019-08-22 Richard Biener PR target/91522 PR target/91527 * config/i386/i386-features.h (general_scalar_chain::defs_map): New member. (general_scalar_chain::replace_with_subreg): Remove. (general_scalar_chain::replace_with_subreg_in_insn): Likewise. (general_scalar_chain::convert_reg): Adjust signature. * config/i386/i386-features.c (scalar_chain::add_insn): Do not iterate over all defs of a reg. (general_scalar_chain::replace_with_subreg): Remove. (general_scalar_chain::replace_with_subreg_in_insn): Likewise. (general_scalar_chain::make_vector_copies): Populate defs_map, place copy only after defs that are used as vectors in the chain. (general_scalar_chain::convert_reg): Emit a copy for a specific def in a specific instruction. (general_scalar_chain::convert_op): All reg uses are converted here. (general_scalar_chain::convert_insn): Emit copies for scalar uses of defs here. Replace uses with the copies we created. Replace and convert the def. Adjust REG_DEAD notes, remove REG_EQUIV/EQUAL notes. (general_scalar_chain::convert_registers): Only handle copies into the chain here. Index: gcc/config/i386/i386-features.c === --- gcc/config/i386/i386-features.c (revision 274843) +++ gcc/config/i386/i386-features.c (working copy) @@ -416,13 +416,9 @@ scalar_chain::add_insn (bitmap candidate iterates over
Re: [PATCH] Fix PR91522
On Thu, Aug 22, 2019 at 3:35 PM Richard Biener wrote: > > > This fixes quadraticness in STV and makes > > machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( > 95%) 54 kB ( 0%) > > drop to zero. Anybody remembers why it is the way it is now? > > Bootstrap / regtest running on x86_64-unknown-linux-gnu. > > OK? Looking at the PR, comment #3 [1], I assume this patch is obsoltete and will be updated? [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91522#c3 Uros. > Thanks, > Richard. > > 2019-08-22 Richard Biener > > PR target/91522 > * config/i386/i386-features.c (scalar_chain::add_insn): Do not > iterate over all defs of a reg. > > Index: gcc/config/i386/i386-features.c > === > --- gcc/config/i386/i386-features.c (revision 274764) > +++ gcc/config/i386/i386-features.c (working copy) > @@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate >df_ref def; >for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) > if (!HARD_REGISTER_P (DF_REF_REG (ref))) > - for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref)); > - def; > - def = DF_REF_NEXT_REG (def)) > - analyze_register_chain (candidates, def); > + analyze_register_chain (candidates, ref); >for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) > if (!DF_REF_REG_MEM_P (ref)) >analyze_register_chain (candidates, ref); >
[PATCH] Fix PR91522
This fixes quadraticness in STV and makes machine dep reorg : 89.07 ( 95%) 0.02 ( 18%) 89.10 ( 95%) 54 kB ( 0%) drop to zero. Anybody remembers why it is the way it is now? Bootstrap / regtest running on x86_64-unknown-linux-gnu. OK? Thanks, Richard. 2019-08-22 Richard Biener PR target/91522 * config/i386/i386-features.c (scalar_chain::add_insn): Do not iterate over all defs of a reg. Index: gcc/config/i386/i386-features.c === --- gcc/config/i386/i386-features.c (revision 274764) +++ gcc/config/i386/i386-features.c (working copy) @@ -419,10 +419,7 @@ scalar_chain::add_insn (bitmap candidate df_ref def; for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!HARD_REGISTER_P (DF_REF_REG (ref))) - for (def = DF_REG_DEF_CHAIN (DF_REF_REGNO (ref)); - def; - def = DF_REF_NEXT_REG (def)) - analyze_register_chain (candidates, def); + analyze_register_chain (candidates, ref); for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!DF_REF_REG_MEM_P (ref)) analyze_register_chain (candidates, ref);