On 18 May 2015 at 14:50, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Robert Haas <robertmh...@gmail.com> writes:
> > On Sun, May 17, 2015 at 12:11 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> Rather than adding tlists per se to Paths, I've been vaguely toying with
> >> a notion of identifying all the "interesting" subexpressions in a query
> >> (expensive functions, aggregates, etc), giving them indexes 1..n, and
> then
> >> marking Paths with bitmapsets showing which interesting subexpressions
> >> they can produce values for.  This would make things like "does this
> Path
> >> compute all the needed aggregates" much cheaper to deal with than a raw
> >> tlist representation would do.  But maybe that's still not the best way.
>
> > I don't know, but it seems like this might be pulling in the opposite
> > direction from your previously-stated desire to get subquery_planner
> > to output Paths rather than Plans as soon as possible.
>
> Sorry, I didn't mean to suggest that that necessarily had to happen right
> away.
>
> What we do need right away, though, is *some* design for distinguishing
> Paths for the different possible upper-level steps.  I won't cry if we
> change it around later, but we have to have something to start with.
>
> So for the moment, let's assume that we still rigidly follow the sequence
> of upper-level steps currently embodied in grouping_planner.  (I'm not
> sure if it even makes sense to consider other orderings of those
> processing steps, but in any case we don't need to allow it on day zero.)
> Then, make a dummy RelOptInfo corresponding to the result of each step,
> and insert links to those in new fields in PlannerInfo.  (We create these
> *before* starting scan/join planning, so that FDWs, custom scans, etc, can
> inject paths into these RelOptInfos if they want, so as to represent cases
> like remote aggregation.)  Then just use add_path with the appropriate
> target RelOptInfo when producing different ways to do grouping etc.
>
> This is a bit ad-hoc but it would be a place to start.
>
> Comments?
>

My thinking was to push aggregation down to the lowest level possible in
the plan, hopefully a single relation. That way we can generate paths for
the current grouping_planner options as well as others, such as these

* Push down aggregate prior to a join (which might then affect join
planning)
* Allow parallel queries to follow a
scan-aggregate-collectfromslaves-aggregate strategy (hence need for double
aggregation semantics)
* Allow a lookaside to a Mat View rather than do a scan-aggregate (assume
for now these are maintained correctly)
* Allow a lookaside to an alternate datastore/mechanism via CustomScan
(assume these are maintained correctly)

all of which need to be costed against each other and the current
strategies (aggregate last).

The above proposal sounds like it will do that, but not completely sure.

I'm assuming the O(N^2) Mat View planning problem can be solved in part by
recognizing that many MVs are just single-table plus aggregates, and that
we'd have a small enough number of MVs in play that search would not be a
problem in practice.

I'm also aware that LIMIT is still very badly optimized, so I'm hoping it
helps there also.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to