Hi,

On 06/14/15 22:21, Tom Lane wrote:
Jeff Davis <pg...@j-davis.com> writes:
This patch tracks memory usage (at the block level) for all memory
contexts. Individual palloc()s aren't tracked; only the new blocks
allocated to the memory context with malloc().
...
My general answer to the performance concerns is that they aren't a
reason to block this patch, unless someone has a suggestion about how to
fix them. Adding one field to a struct and a few arithmetic operations
for each malloc() or realloc() seems reasonable to me.

Maybe, but this here is a pretty weak argument:

The current state, where HashAgg just blows up the memory, is just not
reasonable, and we need to track the memory to fix that problem.

Meh. HashAgg could track its memory usage without loading the entire
system with a penalty.

+1 to a solution like that, although I don't think that's doable without digging the info from memory contexts somehow.

> Moreover, this is about fourth or fifth down the
list of the implementation problems with spilling hash aggregation to
disk.  It would be good to see credible solutions for the bigger issues
before we buy into adding overhead for a mechanism with no uses in core.

I don't think so. If we don't have an acceptable solution for tracking memory in hashagg, there's not really much point in doing any of the following steps. That's why Jeff is tackling this problem first.

Also, Jeff already posted a PoC of the memory-bounded hashagg, although it only worked for aggregates with fixed-size state, and not that great for cases like array_agg where the state grows. Maybe the improvements to aggregate functions proposed by David Rowley last week might help in those cases, as that'd allow spilling those states to disk.

So while the approach proposed by Jeff may turn out not to be the right one, I think memory accounting needs to be solved first (even if it's not committed until the whole feature is ready).

Others have also mentioned that we might want to use this mechanism
to track memory for other operators, like Sort or HashJoin, which
might be simpler and more accurate.

That would imply redefining the meaning of work_mem for those
operations; it would suddenly include a lot more overhead space than
it used to, causing them to spill to disk much more quickly than
before, unless one changes the work_mem setting to compensate. Not
sure that people would like such a change.

Don't we tweak the work_mem meaning over time anyway? Maybe not to such extent, but for example the hashjoin 9.5 improvements certainly change this by packing the tuples more densely, sizing the buckets differently etc. It's true the changes are in the other direction (i.e. batching less frequently) though.

OTOH this would make the accounting more accurate (e.g. eliminating differences between cases with different amounts of overhead because of different tuple width, for example). Wouldn't that be a good thing, even if people need to tweak the work_mem? Isn't that kinda acceptable when upgrading to the next major version?

An even bigger problem is that it would pretty much break the logic
around LACKMEM() in tuplesort.c, which supposes that spilling
individual tuples to disk is a reliable and linear way to decrease
the accounted-for memory. Individual pfree's would typically not
decrease the accounted total in this implementation. Depending on
what you have in mind for the spill-to-disk behavior, this issue
could be a fail for HashAgg as well, which is another reason not to
commit this patch in advance of seeing the use-case.

That's true, but I think the plan was always to wait for the whole feature, and only then commit all the pieces.


--
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to