>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:

 >> What that code does is produce plans that look like this:

 >> GroupAggregate
 >> -> Sort
 >>    -> ChainAggregate
 >>       -> Sort
 >>          -> ChainAggregate

 >> in much the same way that WindowAgg nodes are generated.

 Tom> That seems pretty messy, especially given your further comments
 Tom> that these plan nodes are interconnected and know about each
 Tom> other (though you failed to say exactly how).

I'd already explained in more detail way back when we posted the
patch. But to reiterate: the ChainAggregate nodes pass through their
input data unchanged, but on group boundaries they write aggregated
result rows to a tuplestore shared by the whole chain. The top node
returns the data from the tuplestore after its own output is
completed.

The chain_head pointer in the ChainAggregate nodes is used for:

  - obtaining the head node's targetlist and qual, to use to project
    rows into the tuplestore (the ChainAggregate nodes don't do
    ordinary projection so they have dummy targetlists like the Sort
    nodes do)

  - obtaining the pointer to the tuplestore itself

  - on rescan without parameter change, to inform the parent node
    whether or not the child nodes are also being rescanned (since
    the Sort nodes may or may not block this)

 Tom> The claimed analogy to WindowAgg therefore seems bogus since
 Tom> stacked WindowAggs are independent, AFAIR anyway.

The analogy is only in that they need to see the same input rows but
in different sort orders.

 Tom> I'm also wondering about performance: doesn't this imply more
 Tom> rows passing through some of the plan steps than really
 Tom> necessary?

There's no way to cut down the number of rows seen by intermediate
nodes unless you implement (and require) associative aggregates, which
we do not do in this patch (that's left for possible future
optimization efforts). Our approach makes no new demands on the
implementation of aggregate functions.

 Tom> Also, how would this extend to preferring hashed aggregation in
 Tom> some of the grouping steps?

My suggestion for extending it to hashed aggs is: by having a (single)
HashAggregate node keep multiple hash tables, per grouping set, then
any arbitrary collection of grouping sets can be handled in one node
provided that memory permits and no non-hashable features are used.
So the normal plan for CUBE(a,b) under this scheme would be just:

  HashAggregate
      Grouping Sets: (), (a), (b), (a,b)
  -> (input path in unsorted order)

If a mixture of hashable and non-hashable data types are used, for
example CUBE(hashable,unhashable), then a plan of this form could be
constructed:

  HashAggregate
      Grouping Sets: (), (hashable)
  -> ChainAggregate
         Grouping Sets: (unhashable), (unhashable,hashable)
     -> (input path sorted by (unhashable,hashable))

Likewise, plans of this form could be considered for cases like
CUBE(low_card, high_card) where hashed grouping on high_card would
require excessive memory:

  HashAggregate
      Grouping Sets: (), (low_card)
  -> ChainAggregate
         Grouping Sets: (high_card), (high_card, low_card)
     -> (input path sorted by (high_card, low_card))

 Tom> ISTM that maybe what we should do is take a cue from the SQL
 Tom> spec, which defines these things in terms of UNION ALL of
 Tom> plain-GROUP-BY operations reading from a common CTE.

I looked at that, in fact that was our original plan, but it became
clear quite quickly that it was not going to be easy.

I tried two different approaches. First was to actually re-plan the
input (i.e. running query_planner more than once) for different sort
orders; that crashed and burned quickly thanks to the extent to which
the planner assumes that it'll be run once only on any given input.

Second was to generate a CTE for the input data. This didn't get very
far because everything that already exists to handle CTE nodes assumes
that they are explicit in the planner's input (that they have their
own Query node, etc.) and I was not able to determine a good solution.
If you have any suggestions for how to approach the problem, I'm happy
to have another go at it.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to