Re: Default setting for enable_hashagg_disk

Tomas Vondra Thu, 25 Jun 2020 16:12:54 -0700

On Thu, Jun 25, 2020 at 02:28:02PM -0700, Jeff Davis wrote:

On Thu, 2020-06-25 at 15:56 -0400, Bruce Momjian wrote:

It is my understanding that spill of sorts is mostly read
sequentially,
while hash reads are random.  Is that right?  Is that not being
costed
properly?


I don't think there's a major problem with the cost model, but it could
probably use some tweaking.

Hash writes are random. The hash reads should be mostly sequential (for
large partitions it will be 128-block extents, or 1MB). The cost model
assumes 50% sequential and 50% random.


The important bit here is that while the logical writes are random,
those are effectively combined in page cache and the physical writes are
pretty sequential. So I think the cost model is fairly reasonable.

Note: Judging by iosnoop stats shared in the thread linked by Jeff.

Sorts are written sequentially and read randomly, but there's
prefetching to keep the reads from being too random. The cost model
assumes 75% sequential and 25% random.

Overall, the IO pattern is better for Sort, but not dramatically so.
Tomas Vondra did some nice analysis here:


https://www.postgresql.org/message-id/20200525021045.dilgcsmgiu4l5jpa@development

That resulted in getting the prealloc and projection patches in.

Regards,
        Jeff Davis


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Default setting for enable_hashagg_disk

Reply via email to