On Thu, 2006-01-19 at 18:38 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > This seems to lead to a super-geometric progression in the number of > > files required, > > But we double the number of batches at each step, so there are going to > be at most 20 or so levels, and that's only assuming a *horridly* wrong > initial guess by the planner. In practice I think it's reasonable to > assume at most a couple rounds of doubling. If you have more than that, > the extra data-shuffling is going to exhaust your patience anyway.
What I'm saying is that if we start from 1 batch and move dynamically upwards we quickly get an unmanageable number of files. However, if we start at a particular number N, then we start with N-1 files, then move to at most 2N(N-1) files etc.. So we can only "get it wrong" and double the number of batches about twice before we get swamped with files. i.e. if we start at 1 we can only reasonably get to 8 batches. So we should start at a number higher than 1, attempting to make an accurate guess about number of batches (N) required. If we have R rows to aggregate and we get N correct, then the cost of the HashAgg is 2*R*(N-1)/N I/Os, which is cheaper than a sort, for *any* value of R for both CPU and I/O costs. If we get it wrong, we have to read and re-write more and more rows, which could eventually surpass the sort costs, especially if we have growing transition state data from the aggregate. I think the cost will be to re-write half of all rows already written when we double N. If we fail early because we got Ndistinct wrong then this could be cheap, though if we fail later on because of a growing aggregate then this could easily be very expensive and quickly exceed the cost of a sort. My thought is to collect statistics about an aggregate at CREATE AGGREGATE time. Simply send the aggregate 100 data values and see if the output varies in size according to the input, if it does we take much greater care about selecting HashAgg plans with that aggregate. ...and that way we don't need the user to define the aggregate type directly. This would only work with aggregates that return well known datatypes such as int or char. So getting the number of groups correct would be critical to making this work, but HashAgg could be effective for even very large aggregates. Any holes in that thinking? Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match