Thank you for the very detailed analysis. Comments inline.

On 7/15/22 7:12 PM, David Rowley wrote:
On Sat, 16 Jul 2022 at 10:40, Jonathan S. Katz <jk...@postgresql.org> wrote:
What I find interesting is the resistance to adding any documentation
around this feature to guide users in case they hit the regression. I
understand it can be difficult to provide guidance on issues related to
adjusting work_mem, but even just a hint in the release notes to say "if
you see a performance regression you may need to adjust work_mem" would
be helpful. This would help people who are planning upgrades to at least
know what to watch out for.

Looking back at the final graph in the blog [1], l see that work_mem
is a pretty surprising GUC.  I'm sure many people would expect that
setting work_mem to some size that allows the sort to be entirely done
in RAM would be the fastest way. And that does appear to be the case,
as 16GB was the only setting which allowed that.  However, I bet it
would surprise many people to see that 8GB wasn't 2nd fastest. Even
128MB was faster than 8GB!

Yeah that is interesting. And while some of those settings are less likely in the wild, I do think we are going to see larger and larger "work_mem" settings as instance sizes continue to grow. That said, your PG15 benchmarks are overall faster than the PG14, and that is what I am looking at in the context of this release.

Most likely that's because the machine I tested that on has lots of
RAM spare for kernel buffers which would allow all that disk activity
for batching not actually to cause physical reads or writes.  I bet
that would have looked different if I'd run a few concurrent sorts
with 128MB of work_mem. They'd all be competing for kernel buffers in
that case.

So I agree with Andres here. It seems weird to me to try to document
this new thing that I caused when we don't really make any attempt to
document all the other weird stuff with work_mem.

I can't argue with this.

My note on the documentation was primarily around to seeing countless user issues post-upgrade where queries that "once performed well no longer do so." I want to ensure that our users at least have a starting point to work on resolving the issues, even if they end up being very nuanced.

Perhaps a next step (and a separate step from this) is to assess the guidance we give on the upgrade page[1] about some common things they should check for. Then we can have the "boilerplate" there.

I think the problem can actually be worse with work_mem sizes in
regards to hash tables.  The probing phase of a hash join causes
memory access patterns that the CPU cannot determine which can result
in poor performance when the hash table size is larger than the CPU's
L3 cache size.  If you have fast enough disks, it seems realistic that
given the right workload (most likely much more than 1 probe per
bucket) that you could also get better performance by having lower
values of work_mem.

If we're going to document the generic context anomaly then we should
go all out and document all of the above, plus all the other weird
stuff I've not thought of.  However, I think, short of having an
actual patch to review, it might be better to leave it until someone
can come up with some text that's comprehensive enough to be worthy of
reading.  I don't think I could do the topic justice.  I'm also not
sure any wisdom we write about this would be of much use in the real
world given that its likely concurrency has a larger effect, and we
don't have much ability to control that.

Understood. I don't think that is fair to ask for this release, but don't sell your short on explaining the work_mem nuances.

FWIW, I think it would be better for us just to solve these problems
in code instead.  Having memory gating to control the work_mem from a
pool and teaching sort about CPU caches might be better than
explaining to users that tuning work_mem is hard.

+1. Again thank you for taking the time for the thorough explanation and of course, working on the patch and fixes.

Jonathan

[1] https://www.postgresql.org/docs/current/upgrading.html

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to