On Thu, Apr 2, 2026, 10:38 Tom Lane <[email protected]> wrote: > > I did some bisecting using the attached simplified test case, and found > that the query execution time jumps from circa 60ms to circa 7500ms here: > > 1f39bce021540fde00990af55b4432c55ef4b3c7 is the first bad commit > commit 1f39bce021540fde00990af55b4432c55ef4b3c7 > Author: Jeff Davis <[email protected]> > Date: Wed Mar 18 15:42:02 2020 -0700 > > Disk-based Hash Aggregation. > > While performing hash aggregation, track memory usage when adding new > groups to a hash table. If the memory usage exceeds work_mem, enter > "spill mode". > > (Times quoted are on a Mac M4 Pro, but in assert-enabled builds so > maybe not directly comparable to production.) > > I'm bemused as to why: the test case has work_mem set high enough that > we shouldn't be triggering spill mode, so why did this change affect > it at all? >
Even stranger, the more spills induced via smaller work_mem the faster it goes. This suggests something getting more expensive as the hash table gets larger. Significantly more, like O(n^2) or worse. I wonder if it is the size of the hash table itself (entry count) or the size of the entries? Does a table with one row matching each entry have the problem or only when the hash bucket is hit multiple times and values aggregated? Why is the reported size used so much larger with the custom function? I have some experiments in mind that could answer some of these. Tracking hash table memory usage dynamically can be tricky. I would imagine that user defined aggregates make it more difficult. > regards, tom lane > >
