On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches wrote: > fold_range() already invokes wi_fold_in_parts to try to get more refined > information. If the subranges are quite small, it will do each individual > calculation and combine the results. > > x * y with x = [1,3] and y = [1,3] is broken down and we calculate each > possibility and we end up with [1,4][6,6][9,9] instead of [1,9] > > We limit this as the time is between quadratic to exponential depending on > the number of elements in x and y. > > If we also check the relation and determine that x == y, we don't need to > worry about that growth as this process is linear. The above case will be > broken down to just 1*1, 2*2 and 3*3, resulting in a range of [1, > 1][4,4][9,9]. > > In the testcase, it happens to be the right_shift operation, but this > solution is generic and applies to all range-op operations. I added a > testcase which checks >>, *, + and %. > > I also arbitrarily chose 8 elements as the limit for breaking down > individual operations. The overall compile time change for this is > negligible. > > Although this is a regression fix, it will affect all operations where x == > y, which is where my initial hesitancy arose. > > Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for > trunk?
Will defer to Aldy, just some nits. > + // if there are 1 to 8 values in the LH range, split them up. > + r.set_undefined (); > + if (lh_range >= 0 && lh_range <= 7) > + { > + unsigned x; > + for (x = 0; x <= lh_range; x++) Nothing uses x after the loop, so why not for (unsigned x = 0; x <= lh_range; x++) instead? > @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type, > unsigned num_lh = lh.num_pairs (); > unsigned num_rh = rh.num_pairs (); > > + // If op1 and op2 are equivalences, then we don't need a complete cross > + // product, just pairs of matching elements. > + if (relation_equiv_p (rel) && (lh == rh)) The ()s around lh == rh look superfluous to me. > + { > + int_range_max tmp; > + r.set_undefined (); > + for (unsigned x = 0; x < num_lh; ++x) fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something like that be there for this case too? I mean, every wi_fold_in_parts_equiv can result in 8 subranges, but num_lh could be up to 255 here, it is true it is linear and union_ should merge excess ones, but still I wonder if some larger num_lh upper bound like 20 or 32 wouldn't be useful. Up to you... > + { > + wide_int lh_lb = lh.lower_bound (x); > + wide_int lh_ub = lh.upper_bound (x); > + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); > + r.union_ (tmp); > + if (r.varying_p ()) > + break; > + } > + op1_op2_relation_effect (r, type, lh, rh, rel); > + update_known_bitmask (r, m_code, lh, rh); > + return true; > + } > + Jakub