Re: [HACKERS] Runtime Partition Pruning

Andy Fan Tue, 13 Oct 2020 17:41:20 -0700

On Tue, Oct 13, 2020 at 3:48 PM Amit Langote <amitlangot...@gmail.com>
wrote:


> On Thu, Oct 8, 2020 at 8:25 PM Ashutosh Bapat
> <ashutosh.bapat....@gmail.com> wrote:
> > On Wed, Oct 7, 2020 at 7:00 PM Andy Fan <zhihui.fan1...@gmail.com>
> wrote:
> > > I can understand Robert's idea now,  he intended to resolve both the
> > > "initial-partition-prune" case and "runtime partition prune" case
> while I always think
> > > about the former case (Amit reminded me about that at the beginning,
> but I just
> > > wake up right now..)
> > >
> > > With the Robert's idea,  I think we can do some hack at
> create_append_path,
> > > we can figure out the pruneinfo (it was done at create_append_plan
> now) at that
> > > stage, and then did a fix_append_path with such pruneinfo.  We need to
> fix not
> > > only the cost of AppendPath,  but also rows of AppendPath, For example:
> > > SELECT .. FROM t1, t2, p where t1.a = p.partkey and t1.b = t2.b, if the
> > > plan is:
> > > Nest Loop
> > >    Nest Loop
> > >       t1
> > >       Append (p)
> > >    t2
> > >
> > > Then the rows of Append (p) will be very important.
> > >
> > > For this idea,  how to estimate how many partitions/rows can be pruned
> is a key
> > > part.  Robert said "I was thinking, rather, that if we know for
> example that
> > > we've doing pruning on partition_column = $1, then we know that only
> > > one partition will match.  That's probably a common case.  If we've
> > > got partition_column > $1, we could assume that, say, 75% of the
> > > partitions would match.  partition_column BETWEEN $1 and $2 is
> > > probably a bit more selective, so maybe we assume 50% of the
> > > partitions would match.".   I think we can't say the 75% or 50% is a
> good
> > > number,  but the keypoint may be "partition_column = $1" is a common
> > > case.  And for the others case, we probably don't make it worse.
> > >
> > > I think we need to do something here, any thoughts? Personally I'm more
> > > like this idea now.
> >
> > Yes, I think we have to do something on those lines. But I am
> > wondering why our stats machinery is failing to do that already. For
> > equality I think it's straight forward. But even for other operators
> > we should use the statistics from the partitioned table to estimate
> > the selectivity and then adjust number of scanned partitions = (number
> > of partitions * fraction of rows scanned). That might be better than
> > 50% or 75%.
>
> This is an interesting idea, that is, the idea of consulting parent
> relation's stats to guess at the number of partitions that might be
> scanned.
>
> However, we don't currently consult an inheritance parent relation's
> stats at all during "base" relation planning, which is why you don't
> see the parent relation's statistics reflected in the row count and
> costs assigned to its (Append at al) paths.  The hard-coded rule is to
> sum up children's rows and their paths' costs; see cost_append().
>
> My thinking on this was that we'd just extend that hard-coded rule to
> take into account that run-time pruning, if applicable in a given
> plan, will cause some of those child paths to be discarded.  Maybe
> Andy is headed in that direction?
>
>
Yes, that's what I am trying to do.  The minimum code in my mind is:

create_append_path(...)
{

   double run_time_prune_est;

   if (enable_partition_prune && .. )
       List *  partition_prune_step_ops =  cal_prune_step_ops(rel,
partitioned_rels);
       run_time_prune_est =
estimate_runtime_prune_ratio(partition_prune_step_ops);
}
// adjust the rows/costs of AppendPath based on the  run_time_prune_est.

The difference between '=' and '<' will be handled in function
estimate_runtime_prune_ratio. Since I still don't make my mind runnable
now,
some data structures may  change, but the overall idea is something like
above.


-- 
Best Regards
Andy Fan

Re: [HACKERS] Runtime Partition Pruning

Reply via email to