On Mon, Aug 31, 2020 at 11:34:34PM -0700, Jeff Davis wrote:
On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote:
So I'm wondering if the hashagg is not ignoring similar non-I/O costs
for the spilling case. In particular, the initial section computing
startup_cost seems to ignore that we may need to so some of the stuff
repeatedly - for example we'll repeat hash lookups for spilled
tuples,
and so on.

To fix that, we'd also need to change the cost of in-memory HashAgg,
right?


Why? I don't think we need to change costing of in-memory HashAgg. My
assumption was we'd only tweak startup_cost for cases with spilling by
adding something like (cpu_operator_cost * npartitions * ntuples).

The other thing is that sort seems to be doing only about half the
physical I/O (as measured by iosnoop) compared to hashagg, even
though
the estimates of pages / input_bytes are exactly the same. For
hashagg
the iosnoop shows 5921MB reads and 7185MB writes, while sort only
does
2895MB reads and 3655MB writes. Which kinda matches the observed
sizes
of temp files in the two cases, so the input_bytes for sort seems to
be
a bit overestimated.

Hmm, interesting.


FWIW I suspect some of this difference may be due to logical vs.
physical I/O. iosnoop only tracks physical I/O sent to the device, but
maybe we do much more logical I/O and it simply does not expire from
page cache for the sort. It might behave differently for larger data
set, longer query, ...

How reasonable is it to be making these kinds of changes to the cost
model right now? I think your analysis is solid, but I'm worried about
making more intrusive changes very late in the cycle.

I had originally tried to limit the cost model changes to the new plans
I am introducing -- that is, HashAgg plans expected to require disk.
That's why I came up with a rather arbitrary penalty.


I don't know. I certainly understand the desire not to change things
this late. OTOH I'm worried that we'll end up receiving a lot of poor
plans post release.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to