Hi,
On 8/11/23 14:05, Merlin Moncure wrote:
On Thu, Jul 27, 2023 at 8:28 AM David Geier <geidav...@gmail.com> wrote:
Hi,
On 6/7/23 23:37, Andres Freund wrote:
> I think we're starting to hit quite a few limits related to the
process model,
> particularly on bigger machines. The overhead of cross-process
context
> switches is inherently higher than switching between threads in
the same
> process - and my suspicion is that that overhead will continue to
> increase. Once you have a significant number of connections we
end up spending
> a *lot* of time in TLB misses, and that's inherent to the
process model,
> because you can't share the TLB across processes.
Another problem I haven't seen mentioned yet is the excessive kernel
memory usage because every process has its own set of page table
entries
(PTEs). Without huge pages the amount of wasted memory can be huge if
shared buffers are big.
Hm, noted this upthread, but asking again, does this
help/benefit interactions with the operating system make oom kill
situations less likely? These things are the bane of my existence,
and I'm having a hard time finding a solution that prevents them other
than running pgbouncer and lowering max_connections, which adds
complexity. I suspect I'm not the only one dealing with this.
What's really scary about these situations is they come without
warning. Here's a pretty typical example per sar -r.
The conjecture here is that lots of idle connections make the server
appear to have less memory available than it looks, and sudden
transient demands can cause it to destabilize.
It does in the sense that your server will have more memory available in
case you have many long living connections around. Every connection has
less kernel memory overhead if you will. Of course even then a runaway
query will be able to invoke the OOM killer. The unfortunate thing with
the OOM killer is that, in my experience, it often kills the
checkpointer. That's because the checkpointer will touch all of shared
buffers over time which makes it likely to get selected by the OOM
killer. Have you tried disabling memory overcommit?
--
David Geier
(ServiceNow)