On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: > Oh? So if we have aggregates like array_agg whose memory footprint > increases over time, the patch completely fails to avoid bloat?
Yes, in its current form. > I might think a patch with such a limitation was useful, if it weren't > for the fact that aggregates of that nature are a large part of the > cases where the planner misestimates the table size in the first place. > Any complication that we add to nodeAgg should be directed towards > dealing with cases that the planner is likely to get wrong. In my experience, the planner has a lot of difficulty estimating the cardinality unless it's coming from a base table without any operators above it (other than maybe a simple predicate). This is probably a lot more common than array_agg problems, simply because array_agg is relatively rare compared with GROUP BY in general. Also, there are also cases where my patch should win against Sort even when it does go to disk. For instance, high enough cardinality to exceed work_mem, but also a large enough group size. Sort will have to deal with all of the tuples before it can group any of them, whereas HashAgg can group at least some of them along the way. Consider the skew case where the cardinality is 2M, work_mem fits 1M groups, and the input consists of the keys 1..1999999 mixed randomly inside one billion zeros. (Aside: if the input is non-random, you may not get the skew value before the hash table fills up, in which case HashAgg is just as bad as Sort.) That being said, we can hold out for an array_agg fix if desired. As I pointed out in another email, my proposal is compatible with the idea of dumping groups out of the hash table, and does take some steps in that direction. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers