On Sun, Jul 19, 2020 at 02:17:15PM -0700, Jeff Davis wrote:
On Sat, 2020-07-18 at 21:15 -0400, Tom Lane wrote:
Jeff Davis <[email protected]> writes:
> What is your opinion about pessimizing the HashAgg disk costs (not
> affecting HashAgg plans expected to stay in memory)? Tomas Vondra
> presented some evidence that Sort had some better IO patterns in
> some
> cases that weren't easily reflected in a principled way in the cost
> model.
Hm, was that in some other thread? I didn't find any such info
in a quick look through this one.
https://www.postgresql.org/message-id/2df2e0728d48f498b9d6954b5f9080a34535c385.camel%40j-davis.com
FWIW the two messages to look at are these two:
1) report with initial data
https://www.postgresql.org/message-id/20200519151202.u2p2gpiawoaznsv2%40development
2) updated stats, with the block pre-allocation and tlist projection
https://www.postgresql.org/message-id/20200521001255.kfaihp3afv6vy6uq%40development
But I'm not convinced we actually need to tweak the costing - we've
ended up fixing two things, and I think a lot of the differences in I/O
patterns disappeared thanks to this.
For sort, the stats of request sizes look like this:
type | bytes | count | pct
------+---------+-------+-------
RA | 131072 | 26034 | 59.92
RA | 16384 | 6160 | 14.18
RA | 8192 | 3636 | 8.37
RA | 32768 | 3406 | 7.84
RA | 65536 | 3270 | 7.53
RA | 24576 | 361 | 0.83
...
W | 1310720 | 8070 | 34.26
W | 262144 | 1213 | 5.15
W | 524288 | 1056 | 4.48
W | 1056768 | 689 | 2.93
W | 786432 | 292 | 1.24
W | 802816 | 199 | 0.84
...
And for the hashagg, it looks like this:
type | bytes | count | pct
------+---------+--------+--------
RA | 131072 | 200816 | 70.93
RA | 8192 | 23640 | 8.35
RA | 16384 | 19324 | 6.83
RA | 32768 | 19279 | 6.81
RA | 65536 | 19273 | 6.81
...
W | 1310720 | 18000 | 65.91
W | 524288 | 2074 | 7.59
W | 1048576 | 660 | 2.42
W | 8192 | 409 | 1.50
W | 786432 | 354 | 1.30
...
so it's actually a tad better than sort, because larger proportion of
both reads and writes is in larger chunks (reads 128kB, writes 1280kB).
I think the device had default read-ahead setting, which I assume
explains the 128kB.
For the statistics of deltas between requests - for sort
type | block_delta | count | pct
------+-------------+-------+-------
RA | 256 | 13432 | 30.91
RA | 16 | 3291 | 7.57
RA | 32 | 3272 | 7.53
RA | 64 | 3266 | 7.52
RA | 128 | 2877 | 6.62
RA | 1808 | 1278 | 2.94
RA | -2320 | 483 | 1.11
RA | 28928 | 386 | 0.89
...
W | 2560 | 7856 | 33.35
W | 2064 | 4921 | 20.89
W | 2080 | 586 | 2.49
W | 30960 | 300 | 1.27
W | 2160 | 253 | 1.07
W | 1024 | 248 | 1.05
...
and for hashagg:
type | block_delta | count | pct
------+-------------+--------+-------
RA | 256 | 180955 | 63.91
RA | 32 | 19274 | 6.81
RA | 64 | 19273 | 6.81
RA | 128 | 19264 | 6.80
RA | 16 | 19203 | 6.78
RA | 30480 | 9835 | 3.47
At first this might look worse than sort, but 256 sectors matches the
128kB from the request size stats, and it's good match (64% vs. 70%).
There's a minor problem here, though - these stats were collected before
we fixed the tlist issue, so hashagg was spilling about 10x the amount
of data compared to sort+groupagg. So maybe that's the first thing we
should do, before contemplating changes to the costing - collecting
fresh data. I can do that, if needed.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services