On 14 Srpen 2014, 18:12, Tom Lane wrote: > Jeff Davis <pg...@j-davis.com> writes: >> HashJoin only deals with tuples. With HashAgg, you have to deal with a >> mix of tuples and partially-computed aggregate state values. Not >> impossible, but it is a little more awkward than HashJoin. > > Not sure that I follow your point. You're going to have to deal with that > no matter what, no?
That is not how the patch work. Once the memory consumption hits work_mem, it keeps the already existing groups in memory, and only stops creating new groups. For each tuple, hashagg does a lookup - if the group is already in memory, it performs the transition, otherwise it writes the tuple to disk (and does some batching, but that's mostly irrelevant here). This way it's not necessary to dump the partially-computed states, and for fixed-size states it actually limits the amount of consumed memory. For variable-length aggregates (array_agg et.al.) not so much. > I guess in principle you could avoid the need to dump agg state to disk. > What you'd have to do is write out tuples to temp files even when you > think you've processed them entirely, so that if you later realize you > need to split the current batch, you can recompute the states of the > postponed aggregates from scratch (ie from the input tuples) when you get > around to processing the batch they got moved to. This would avoid > confronting the how-to-dump-agg-state problem, but it seems to have little > else to recommend it. Even if splitting a batch is a rare occurrence, > the killer objection here is that even a totally in-memory HashAgg would > have to write all its input to a temp file, on the small chance that it > would exceed work_mem and need to switch to batching. Yeah, I think putting this burden on each hashagg is not a good thing. I was thinking about is an automatic fall-back - try to do an in-memory hash-agg. When you hit work_mem limit, see how far we are (have we scanned 10% or 90% of tuples?), and decide whether to restart with batching. But I think there's no single solution, fixing all the possible cases. I think the patch proposed here is a solid starting point, that may be improved and extended by further patches. Eventually, what I think might work is this combination of approaches: 1) fixed-size states and states with serialize/deserialize methods => hashjoin-like batching (i.e. dumping both tuples and states) 2) variable-size states without serialize/deserialize => Jeff's approach (keep states in memory, dump tuples) => possibly with the rescan fall-back, for quickly growing states Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers