Re: [HACKERS] Final Patch for GROUPING SETS

Andrew Gierth Fri, 12 Dec 2014 20:39:06 -0800

>>>>> "Tom" == Tom Lane <[email protected]> writes:


 >> I'd already explained in more detail way back when we posted the
 >> patch. But to reiterate: the ChainAggregate nodes pass through
 >> their input data unchanged, but on group boundaries they write
 >> aggregated result rows to a tuplestore shared by the whole
 >> chain. The top node returns the data from the tuplestore after its
 >> own output is completed.

 Tom> That seems pretty grotty from a performance+memory consumption
 Tom> standpoint.  At peak memory usage, each one of the Sort nodes
 Tom> will contain every input row,

Has this objection ever been raised for WindowAgg, which has the same
issue?

 Tom> and the shared tuplestore will contain every output row.

Every output row except those produced by the top node, and since this
is after grouping, that's expected to be smaller than the input.

 Tom> That will lead to either a lot of memory eaten, or a lot of
 Tom> temp-file I/O, depending on how big work_mem is.

Yes. Though note that this code only kicks in when dealing with
grouping sets more complex than a simple rollup. A CUBE of two
dimensions uses only one Sort node above whatever is needed to produce
sorted input, and a CUBE of three dimensions uses only two. (It does
increase quite a lot for large cubes though.)

 Tom> In principle, with the CTE+UNION approach I was suggesting, the
 Tom> peak memory consumption would be one copy of the input rows in
 Tom> the CTE's tuplestore plus one copy in the active branch's Sort
 Tom> node.  I think a bit of effort would be needed to get there (ie,
 Tom> shut down one branch's Sort node before starting the next,
 Tom> something I'm pretty sure doesn't happen today).

Correct, it doesn't.

However, I notice that having ChainAggregate shut down its input would
also have the effect of bounding the memory usage (to two copies,
which is as good as the append+sorts+CTE case).

Is shutting down and reinitializing parts of the plan really feasible
here? Or would it be a case of forcing a rescan?

 >> Second was to generate a CTE for the input data. This didn't get
 >> very far because everything that already exists to handle CTE
 >> nodes assumes that they are explicit in the planner's input (that
 >> they have their own Query node, etc.) and I was not able to
 >> determine a good solution.

 Tom> Seems like restructuring that wouldn't be *that* hard.  We
 Tom> probably don't want it to be completely like a CTE for planning
 Tom> purposes anyway --- that would foreclose passing down any
 Tom> knowledge of desired sort order, which we don't want.  But it
 Tom> seems like we could stick a variant of CtePath atop the chosen
 Tom> result path of the scan/join planning phase.  If you like I can
 Tom> poke into this a bit.

Please do.

That seems to cover the high-priority issues from our point of view.

We will continue working on the other issues, on the assumption that
when we have some idea how to do it your way, we will rip out the
ChainAggregate stuff in favour of an Append-based solution.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

Reply via email to