Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

Simon Riggs Fri, 20 Jan 2006 08:56:28 -0800

On Thu, 2006-01-19 at 18:38 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > This seems to lead to a super-geometric progression in the number of
> > files required,
> 
> But we double the number of batches at each step, so there are going to
> be at most 20 or so levels, and that's only assuming a *horridly* wrong
> initial guess by the planner.  In practice I think it's reasonable to
> assume at most a couple rounds of doubling.  If you have more than that,
> the extra data-shuffling is going to exhaust your patience anyway.


What I'm saying is that if we start from 1 batch and move dynamically
upwards we quickly get an unmanageable number of files. However, if we
start at a particular number N, then we start with N-1 files, then move
to at most 2N(N-1) files etc..

So we can only "get it wrong" and double the number of batches about
twice before we get swamped with files. i.e. if we start at 1 we can
only reasonably get to 8 batches.

So we should start at a number higher than 1, attempting to make an
accurate guess about number of batches (N) required. If we have R rows
to aggregate and we get N correct, then the cost of the HashAgg is
2*R*(N-1)/N I/Os, which is cheaper than a sort, for *any* value of R for
both CPU and I/O costs. If we get it wrong, we have to read and re-write
more and more rows, which could eventually surpass the sort costs,
especially if we have growing transition state data from the aggregate.
I think the cost will be to re-write half of all rows already written
when we double N. If we fail early because we got Ndistinct wrong then
this could be cheap, though if we fail later on because of a growing
aggregate then this could easily be very expensive and quickly exceed
the cost of a sort.

My thought is to collect statistics about an aggregate at CREATE
AGGREGATE time. Simply send the aggregate 100 data values and see if the
output varies in size according to the input, if it does we take much
greater care about selecting HashAgg plans with that aggregate. ...and
that way we don't need the user to define the aggregate type directly.
This would only work with aggregates that return well known datatypes
such as int or char.

So getting the number of groups correct would be critical to making this
work, but HashAgg could be effective for even very large aggregates.

Any holes in that thinking?

Best Regards, Simon Riggs



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

Reply via email to