On Tue, Oct 13, 2020 at 3:48 PM Amit Langote <amitlangot...@gmail.com> wrote:
> On Thu, Oct 8, 2020 at 8:25 PM Ashutosh Bapat > <ashutosh.bapat....@gmail.com> wrote: > > On Wed, Oct 7, 2020 at 7:00 PM Andy Fan <zhihui.fan1...@gmail.com> > wrote: > > > I can understand Robert's idea now, he intended to resolve both the > > > "initial-partition-prune" case and "runtime partition prune" case > while I always think > > > about the former case (Amit reminded me about that at the beginning, > but I just > > > wake up right now..) > > > > > > With the Robert's idea, I think we can do some hack at > create_append_path, > > > we can figure out the pruneinfo (it was done at create_append_plan > now) at that > > > stage, and then did a fix_append_path with such pruneinfo. We need to > fix not > > > only the cost of AppendPath, but also rows of AppendPath, For example: > > > SELECT .. FROM t1, t2, p where t1.a = p.partkey and t1.b = t2.b, if the > > > plan is: > > > Nest Loop > > > Nest Loop > > > t1 > > > Append (p) > > > t2 > > > > > > Then the rows of Append (p) will be very important. > > > > > > For this idea, how to estimate how many partitions/rows can be pruned > is a key > > > part. Robert said "I was thinking, rather, that if we know for > example that > > > we've doing pruning on partition_column = $1, then we know that only > > > one partition will match. That's probably a common case. If we've > > > got partition_column > $1, we could assume that, say, 75% of the > > > partitions would match. partition_column BETWEEN $1 and $2 is > > > probably a bit more selective, so maybe we assume 50% of the > > > partitions would match.". I think we can't say the 75% or 50% is a > good > > > number, but the keypoint may be "partition_column = $1" is a common > > > case. And for the others case, we probably don't make it worse. > > > > > > I think we need to do something here, any thoughts? Personally I'm more > > > like this idea now. > > > > Yes, I think we have to do something on those lines. But I am > > wondering why our stats machinery is failing to do that already. For > > equality I think it's straight forward. But even for other operators > > we should use the statistics from the partitioned table to estimate > > the selectivity and then adjust number of scanned partitions = (number > > of partitions * fraction of rows scanned). That might be better than > > 50% or 75%. > > This is an interesting idea, that is, the idea of consulting parent > relation's stats to guess at the number of partitions that might be > scanned. > > However, we don't currently consult an inheritance parent relation's > stats at all during "base" relation planning, which is why you don't > see the parent relation's statistics reflected in the row count and > costs assigned to its (Append at al) paths. The hard-coded rule is to > sum up children's rows and their paths' costs; see cost_append(). > > My thinking on this was that we'd just extend that hard-coded rule to > take into account that run-time pruning, if applicable in a given > plan, will cause some of those child paths to be discarded. Maybe > Andy is headed in that direction? > > Yes, that's what I am trying to do. The minimum code in my mind is: create_append_path(...) { double run_time_prune_est; if (enable_partition_prune && .. ) List * partition_prune_step_ops = cal_prune_step_ops(rel, partitioned_rels); run_time_prune_est = estimate_runtime_prune_ratio(partition_prune_step_ops); } // adjust the rows/costs of AppendPath based on the run_time_prune_est. The difference between '=' and '<' will be handled in function estimate_runtime_prune_ratio. Since I still don't make my mind runnable now, some data structures may change, but the overall idea is something like above. -- Best Regards Andy Fan