On Sat, Jul 25, 2020 at 5:05 PM Tomas Vondra <[email protected]> wrote: > I'm not sure what you mean by "reported memory usage doesn't reflect the > space used for transition state"? Surely it does include that, we've > built the memory accounting stuff pretty much exactly to do that. > > I think it's pretty clear what's happening - in the sorted case there's > only a single group getting new values at any moment, so when we decide > to spill we'll only add rows to that group and everything else will be > spilled to disk.
Right. > In the unsorted case however we manage to initialize all groups in the > hash table, but at that point the groups are tiny an fit into work_mem. > As we process more and more data the groups grow, but we can't evict > them - at the moment we don't have that capability. So we end up > processing everything in memory, but significantly exceeding work_mem. work_mem was set to 200MB, which is more than the reported "Peak Memory Usage: 1605334kB". So either the random case significantly exceeds work_mem and the "Peak Memory Usage" accounting is wrong (because it doesn't report this excess), or the random case really doesn't exceed work_mem but has a surprising advantage over the sorted case. -- Peter Geoghegan
