RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
The patch is modified based on the review comments from Richard, Bernd and David. The following changes are done to incorporate the comments received on the previous mail. 1. With this patch the liveness of the Loop is not stored on the LOOPDATA structures. The liveness is calculated based on the current loops for which the invariant is checked and the regs_used is calculated. 2. Memory leaks are fixed. 3. Reworked on the comments section based on Bernd's comments. Bootstrapped and regtested for i386 and Microblaze target. SPEC CPU 2000 benchmarks are run on i386 target and following is the summary of the results. SPEC CPU 2000 INT benchmarks. ( Gemoean Score without the change vs Geomean score with reg pressure change = 3745.193 vs 3745.328) SPEC CPU 2000 FP benchmarks. ( Gemoean Score without the change vs Geomean score with reg pressure change = 4741.825 vs 4748.364). [Patch,rtl Optimization]: Better register pressure estimate for loop invariant code motion Calculate the loop liveness used for regs for calculating the register pressure in the cost estimation. Loop liveness is based on the following properties. We only need to find the set of objects that are live at the birth or the header of the loop. We don't need to calculate the live through the loop by considering live in and live out of all the basic blocks of the loop. This is based on the point that the set of objects that are live-in at the birth or header of the loop will be live-in at every node in the loop. If a v live is out at the header of the loop then the variable is live-in at every node in the loop. To prove this, consider a loop L with header h such that the variable v defined at d is live-in at h. Since v is live at h, d is not part of L. This follows i from the dominance property, i.e. h is strictly dominated by d. Furthermore, there exists a path from h to a use of v which does not go through d. For every node p in the loop, since the loop is strongly connected and node is a component of the CFG, there exists a path, consisting only of nodes of L from p to h. Concatenating these two paths proves that v is live-in and live-out of p. Calculate the live-out and live-in for the exit edge of the loop. This patch considers liveness for not only the loop latch but also the liveness outside the loops. ChangeLog: 2015-12-15 Ajit Agarwal <ajit...@xilinx.com> * loop-invariant.c (find_invariants_to_move): Add the logic of regs_used based on liveness. * cfgloopanal.c (estimate_reg_pressure_cost): Update the heuristics in presence of call_p. Signed-off-by:Ajit Agarwal ajit...@xilinx.com. Thanks & Regards Ajit -Original Message- From: Bernd Schmidt [mailto:bschm...@redhat.com] Sent: Wednesday, December 09, 2015 7:34 PM To: Ajit Kumar Agarwal; Richard Biener Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion. On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote: > > This is because the available_regs = 6 and the regs_needed = 1 and > new_regs = 0 and the regs_used = 10. As the reg_used that are based > on the Liveness given above is greater than the available_regs, then > it's candidate of spill and the estimate_register_pressure calculates > the > spill cost. This spill cost is greater than inv_cost and gain > comes to be > negative. The disables the loop invariant for the above > testcase. As far as I can tell this loop does not lead to a spill. Hence, failure of this testcase would suggest there is something wrong with your idea. >> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { Formatting. >> +/* Loop Liveness is based on the following proprties. "properties" >> + we only require to calculate the set of objects that are live at >> + the birth or the header of the loop. >> + We don't need to calculate the live through the Loop considering >> + Live in and Live out of all the basic blocks of the Loop. This is >> + because the set of objects. That are live-in at the birth or header >> + of the loop will be live-in at every node in the Loop. >> + If a v live out at the header of the loop then the variable is >> live-in >> + at every node in the Loop. To prove this, Consider a Loop L with >> header >> + h such that The variable v defined at d is live-in at h. Since v is >> live >> + at h, d is not part of L. This follows from the dominance property, >> i.e. >> + h is strictly dominated by d. Furthermore, there exists a path from >> h to >> + a use of v whic
Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwalwrote: > Based on the comments on RFC patch this patch incorporates all the comments > from Jeff. Thanks Jeff for the valuable feedback. > > This patch enables the better register pressure estimate for Loop Invariant > code motion. This patch Calculate the Loop Liveness used for regs_used > used to calculate the register pressure used in the cost estimation. > > Loop Liveness is based on the following properties. we only require to > calculate the set of objects that are live at the birth or the header of the > loop. We > don't need to calculate the live through the Loop considering Live in and > Live out of all the basic blocks of the Loop. This is because the set of > objects. That > are live-in at the birth or header of the loop will be live-in at every node > in the Loop. > > If a v live out at the header of the loop then the variable is live-in at > every node in the Loop. To prove this, Consider a Loop L with header h such > that The > variable v defined at d is live-in at h. Since v is live at h, d is not part > of L. This follows from the dominance property, i.e. h is strictly dominated > by d. > Furthermore, there exists a path from h to a use of v which does not go > through d. For every node of the loop, p, since the loop is strongly connected > Component of the CFG, there exists a path, consisting only of nodes of L from > p to h. Concatenating those two paths prove that v is live-in and live-out Of > p. > > Also Calculate the Live-Out and Live-In for the exit edge of the loop. This > considers liveness for not only the Loop latch but also the liveness outside > the Loops. > > Bootstrapped and Regtested for x86_64 and microblaze target. > > There is an extra failure for the testcase gcc.dg/loop-invariant.c with the > change that looks correct to me. > > This is because the available_regs = 6 and the regs_needed = 1 and new_regs = > 0 and the regs_used = 10. As the reg_used that are based on the Liveness > given above is greater than the available_regs, then it's candidate of spill > and the estimate_register_pressure calculates the spill cost. This spill cost > is greater > than inv_cost and gain comes to be negative. The disables the loop invariant > for the above testcase. > > Disabling of the Loop invariant for the testcase loop-invariant.c with this > patch looks correct to me considering the calculation of available_regs in > cfgloopanal.c > is correct. You keep a lot of data (bitmaps) live just to use the number of bits set in the end. I'll note that this also increases the complexity of the pass which is enabled at -O1+ where -O1 should limit itself to O(n*log n) algorithms but live is quadratic. So I don't think doing this unconditionally is a good idea (and we have -fira-loop-pressure after all). Please watch not making -O1 worse again after we spent so much time making it suitable for all the weird autogenerated code. Richard. > ChangeLog: > 2015-12-09 Ajit Agarwal > > * loop-invariant.c > (calculate_loop_liveness): New. > (move_loop_invariants): Use of calculate_loop_liveness. > > Signed-off-by:Ajit Agarwal ajit...@xilinx.com > > --- > gcc/loop-invariant.c | 77 > ++ > 1 files changed, 65 insertions(+), 12 deletions(-) > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > index 53d1377..ac08594 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1464,18 +1464,7 @@ find_invariants_to_move (bool speed, bool call_p) > registers used; we put some initial bound here to stand for > induction variables etc. that we do not detect. */ > { > - unsigned int n_regs = DF_REG_SIZE (df); > - > - regs_used = 2; > - > - for (i = 0; i < n_regs; i++) > - { > - if (!DF_REGNO_FIRST_DEF (i) && DF_REGNO_LAST_USE (i)) > - { > - /* This is a value that is used but not changed inside loop. */ > - regs_used++; > - } > - } > + regs_used = bitmap_count_bits (_DATA (curr_loop)->regs_live) + 2; > } > >if (! flag_ira_loop_pressure) > @@ -1966,7 +1955,63 @@ mark_ref_regs (rtx x) >} > } > > +/* Calculate the Loop Liveness used for regs_used used in > + heuristics to calculate the register pressure. */ > + > +static void > +calculate_loop_liveness (void) > +{ > + struct loop *loop; > + > + FOR_EACH_LOOP (loop, 0) > +if (loop->aux == NULL) > + { > +loop->aux = xcalloc (1, sizeof (struct loop_data)); > +bitmap_initialize (_DATA (loop)->regs_live, _obstack); > + } > + > + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) > + { > +int i; > +edge e; > +vec edges; > +edges = get_loop_exit_edges (loop); > + > +/* Loop Liveness is based on the following proprties. > + we only require to calculate the
RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, December 09, 2015 4:06 PM To: Ajit Kumar Agarwal Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion. On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwal <ajit.kumar.agar...@xilinx.com> wrote: > Based on the comments on RFC patch this patch incorporates all the comments > from Jeff. Thanks Jeff for the valuable feedback. > > This patch enables the better register pressure estimate for Loop > Invariant code motion. This patch Calculate the Loop Liveness used for > regs_used used to calculate the register pressure used in the cost > estimation. > > Loop Liveness is based on the following properties. we only require > to calculate the set of objects that are live at the birth or the > header of the loop. We don't need to calculate the live through the Loop > considering Live in and Live out of all the basic blocks of the Loop. This is > because the set of objects. That are live-in at the birth or header of the > loop will be live-in at every node in the Loop. > > If a v live out at the header of the loop then the variable is live-in > at every node in the Loop. To prove this, Consider a Loop L with header h > such that The variable v defined at d is live-in at h. Since v is live at h, > d is not part of L. This follows from the dominance property, i.e. h is > strictly dominated by d. > Furthermore, there exists a path from h to a use of v which does not > go through d. For every node of the loop, p, since the loop is strongly > connected Component of the CFG, there exists a path, consisting only of nodes > of L from p to h. Concatenating those two paths prove that v is live-in and > live-out Of p. > > Also Calculate the Live-Out and Live-In for the exit edge of the loop. This > considers liveness for not only the Loop latch but also the liveness outside > the Loops. > > Bootstrapped and Regtested for x86_64 and microblaze target. > > There is an extra failure for the testcase gcc.dg/loop-invariant.c with the > change that looks correct to me. > > This is because the available_regs = 6 and the regs_needed = 1 and > new_regs = 0 and the regs_used = 10. As the reg_used that are based > on the Liveness given above is greater than the available_regs, then it's > candidate of spill and the estimate_register_pressure calculates the spill > cost. This spill cost is greater than inv_cost and gain comes to be negative. > The disables the loop invariant for the above testcase. > > Disabling of the Loop invariant for the testcase loop-invariant.c with > this patch looks correct to me considering the calculation of available_regs > in cfgloopanal.c is correct. >>You keep a lot of data (bitmaps) live just to use the number of bits set in >>the end. >>I'll note that this also increases the complexity of the pass which is >>enabled at -O1+ where >>-O1 should limit itself to O(n*log n) algorithms but live is quadratic. >.So I don't think doing this unconditionally is a good idea (and we have >-fira-loop-pressure after all). >>Please watch not making -O1 worse again after we spent so much time making it >>suitable for all the weird autogenerated code. I can also implement without keeping the bitmaps live data and would not require to set the bits in the end. I am interested in the Liveness in the loop header and the loop exit which I can calculate where I am setting the regs_used for the curr_loop. This will not require to set and keep the bitmaps for liveness and also make it available for the curr_loop that are candidates of loop Invariant. Thanks & Regards Ajit Richard. > ChangeLog: > 2015-12-09 Ajit Agarwal <ajit...@xilinx.com> > > * loop-invariant.c > (calculate_loop_liveness): New. > (move_loop_invariants): Use of calculate_loop_liveness. > > Signed-off-by:Ajit Agarwal ajit...@xilinx.com > > --- > gcc/loop-invariant.c | 77 > ++ > 1 files changed, 65 insertions(+), 12 deletions(-) > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index > 53d1377..ac08594 100644 > --- a/gcc/loop-invariant.c > +++ b/gcc/loop-invariant.c > @@ -1464,18 +1464,7 @@ find_invariants_to_move (bool speed, bool call_p) > registers used; we put some initial bound here to stand for > induction variables etc. that we do not detect. */ > { > - unsigned int n_regs = DF_REG_SIZE (df); > - > - regs_used = 2; > - > - for (i = 0; i <
RE: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
-Original Message- From: Bernd Schmidt [mailto:bschm...@redhat.com] Sent: Wednesday, December 09, 2015 7:34 PM To: Ajit Kumar Agarwal; Richard Biener Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion. On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote: > > This is because the available_regs = 6 and the regs_needed = 1 and > new_regs = 0 and the regs_used = 10. As the reg_used that are based > on the Liveness given above is greater than the available_regs, then > it's candidate of spill and the estimate_register_pressure calculates > the > spill cost. This spill cost is greater than inv_cost and gain > comes to be > negative. The disables the loop invariant for the above > testcase. >>As far as I can tell this loop does not lead to a spill. Hence, failure of >>this testcase would suggest there is something wrong with your idea. As far as I can see there is nothing wrong with the idea, the existing implementation of the calculation of available_regs results to 6 which is not the case for this testcase. The estimate of regs_used based on the Liveness ( idea behind this patch) is correct, but the incorrect estimate of the available_regs (6) results in the gain to be negative. >> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { >>Formatting. >> +/* Loop Liveness is based on the following proprties. >>"properties" I will incorporate the change. >> + we only require to calculate the set of objects that are live at >> + the birth or the header of the loop. >> + We don't need to calculate the live through the Loop considering >> + Live in and Live out of all the basic blocks of the Loop. This is >> + because the set of objects. That are live-in at the birth or header >> + of the loop will be live-in at every node in the Loop. >> + If a v live out at the header of the loop then the variable is >> live-in >> + at every node in the Loop. To prove this, Consider a Loop L with >> header >> + h such that The variable v defined at d is live-in at h. Since v is >> live >> + at h, d is not part of L. This follows from the dominance property, >> i.e. >> + h is strictly dominated by d. Furthermore, there exists a path from >> h to >> + a use of v which does not go through d. For every node of the loop, >> p, >> + since the loop is strongly connected Component of the CFG, there >> exists >> + a path, consisting only of nodes of L from p to h. Concatenating >> those >> + two paths prove that v is live-in and live-out Of p. */ >>Please Refrain From Randomly Capitalizing Words, and please also fix the >>grammar ("set of objects. That are live-in"). These problems make this >>comment (and also your emails) >>extremely hard to read. >>Partly for that reason, I can't quite make up my mind whether this patch >>makes things better or just different. The testcase failure makes me think >>it's probably not an improvement. I'll correct it. I am seeing the gains for Mibench/EEMBC benchmarks for Microblaze target with this patch. I will run SPEC CPU 2000 benchmarks for i386 with this patch. >> +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN (loop->header)); >> +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT >> + (loop->header)); >>Formatting. I'll incorporate the change. Thanks & Regards Ajit Bernd
Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
On Wed, 2015-12-09 at 11:35 +0100, Richard Biener wrote: > On Tue, Dec 8, 2015 at 11:24 PM, Ajit Kumar Agarwal >wrote: > > Based on the comments on RFC patch this patch incorporates all the comments > > from Jeff. Thanks Jeff for the valuable feedback. > > > > This patch enables the better register pressure estimate for Loop Invariant > > code motion. This patch Calculate the Loop Liveness used for regs_used > > used to calculate the register pressure used in the cost estimation. > > > > Loop Liveness is based on the following properties. we only require to > > calculate the set of objects that are live at the birth or the header of > > the loop. We > > don't need to calculate the live through the Loop considering Live in and > > Live out of all the basic blocks of the Loop. This is because the set of > > objects. That > > are live-in at the birth or header of the loop will be live-in at every > > node in the Loop. > > > > If a v live out at the header of the loop then the variable is live-in at > > every node in the Loop. To prove this, Consider a Loop L with header h such > > that The > > variable v defined at d is live-in at h. Since v is live at h, d is not > > part of L. This follows from the dominance property, i.e. h is strictly > > dominated by d. > > Furthermore, there exists a path from h to a use of v which does not go > > through d. For every node of the loop, p, since the loop is strongly > > connected > > Component of the CFG, there exists a path, consisting only of nodes of L > > from p to h. Concatenating those two paths prove that v is live-in and > > live-out Of p. > > > > Also Calculate the Live-Out and Live-In for the exit edge of the loop. This > > considers liveness for not only the Loop latch but also the liveness > > outside the Loops. > > > > Bootstrapped and Regtested for x86_64 and microblaze target. > > > > There is an extra failure for the testcase gcc.dg/loop-invariant.c with > > the change that looks correct to me. > > > > This is because the available_regs = 6 and the regs_needed = 1 and new_regs > > = 0 and the regs_used = 10. As the reg_used that are based on the Liveness > > given above is greater than the available_regs, then it's candidate of > > spill and the estimate_register_pressure calculates the spill cost. This > > spill cost is greater > > than inv_cost and gain comes to be negative. The disables the loop > > invariant for the above testcase. > > > > Disabling of the Loop invariant for the testcase loop-invariant.c with this > > patch looks correct to me considering the calculation of available_regs in > > cfgloopanal.c > > is correct. > > You keep a lot of data (bitmaps) live just to use the number of bits > set in the end. > > I'll note that this also increases the complexity of the pass which is > enabled at -O1+ where > -O1 should limit itself to O(n*log n) algorithms but live is quadratic. > > So I don't think doing this unconditionally is a good idea (and we > have -fira-loop-pressure > after all). > > Please watch not making -O1 worse again after we spent so much time > making it suitable > for all the weird autogenerated code. > > Richard. > > > ChangeLog: > > 2015-12-09 Ajit Agarwal > > > > * loop-invariant.c > > (calculate_loop_liveness): New. > > (move_loop_invariants): Use of calculate_loop_liveness. > > > > Signed-off-by:Ajit Agarwal ajit...@xilinx.com > > > > --- > > gcc/loop-invariant.c | 77 > > ++ > > 1 files changed, 65 insertions(+), 12 deletions(-) > > > > diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c > > index 53d1377..ac08594 100644 > > --- a/gcc/loop-invariant.c > > +++ b/gcc/loop-invariant.c [...snip...] > > @@ -1966,7 +1955,63 @@ mark_ref_regs (rtx x) > >} > > } > > > > +/* Calculate the Loop Liveness used for regs_used used in > > + heuristics to calculate the register pressure. */ > > + > > +static void > > +calculate_loop_liveness (void) > > +{ > > + struct loop *loop; > > + > > + FOR_EACH_LOOP (loop, 0) > > +if (loop->aux == NULL) > > + { > > +loop->aux = xcalloc (1, sizeof (struct loop_data)); > > +bitmap_initialize (_DATA (loop)->regs_live, _obstack); > > + } > > + > > + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) > > + { > > +int i; > > +edge e; > > +vec edges; > > +edges = get_loop_exit_edges (loop); Doesn't this leak "edges"? (Could/should it be an auto_vec?) > > + > > +/* Loop Liveness is based on the following proprties. > > + we only require to calculate the set of objects that are live at > > + the birth or the header of the loop. > > + We don't need to calculate the live through the Loop considering > > + Live in and Live out of all the basic blocks of the Loop. This is > > + because the set of objects. That are live-in at the birth or header >
Re: [Patch,rtl Optimization]: Better register pressure estimate for Loop Invariant Code Motion.
On 12/09/2015 12:22 PM, Ajit Kumar Agarwal wrote: This is because the available_regs = 6 and the regs_needed = 1 and new_regs = 0 and the regs_used = 10. As the reg_used that are based on the Liveness given above is greater than the available_regs, then > it's candidate of spill and the estimate_register_pressure calculates > the spill cost. This spill cost is greater than inv_cost and gain > comes to be negative. The disables the loop invariant for the above > testcase. As far as I can tell this loop does not lead to a spill. Hence, failure of this testcase would suggest there is something wrong with your idea. + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) { Formatting. +/* Loop Liveness is based on the following proprties. "properties" + we only require to calculate the set of objects that are live at + the birth or the header of the loop. + We don't need to calculate the live through the Loop considering + Live in and Live out of all the basic blocks of the Loop. This is + because the set of objects. That are live-in at the birth or header + of the loop will be live-in at every node in the Loop. + If a v live out at the header of the loop then the variable is live-in + at every node in the Loop. To prove this, Consider a Loop L with header + h such that The variable v defined at d is live-in at h. Since v is live + at h, d is not part of L. This follows from the dominance property, i.e. + h is strictly dominated by d. Furthermore, there exists a path from h to + a use of v which does not go through d. For every node of the loop, p, + since the loop is strongly connected Component of the CFG, there exists + a path, consisting only of nodes of L from p to h. Concatenating those + two paths prove that v is live-in and live-out Of p. */ Please Refrain From Randomly Capitalizing Words, and please also fix the grammar ("set of objects. That are live-in"). These problems make this comment (and also your emails) extremely hard to read. Partly for that reason, I can't quite make up my mind whether this patch makes things better or just different. The testcase failure makes me think it's probably not an improvement. +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_IN (loop->header)); +bitmap_ior_into (_DATA (loop)->regs_live, DF_LR_OUT + (loop->header)); Formatting. Bernd