On Thu, 8 Jun 2023 at 14:44, Hannu Krosing <han...@google.com> wrote:
>
> On Thu, Jun 8, 2023 at 2:15 PM Matthias van de Meent
> <boekewurm+postg...@gmail.com> wrote:
> >
> > On Thu, 8 Jun 2023 at 11:54, Hannu Krosing <han...@google.com> wrote:
> > >
> > > This part was touched in the "AMA with a Linux Kernale Hacker"
> > > Unconference session where he mentioned that the had proposed a
> > > 'mshare' syscall for this.
> > >
> > > So maybe a more fruitful way to fixing the perceived issues with
> > > process model is to push for small changes in Linux to overcome these
> > > avoiding a wholesale rewrite ?
> >
> > We support not just Linux, but also Windows and several (?) BSDs. I'm
> > not against pushing Linux to make things easier for us, but Linux is
> > an open source project, too, where someone need to put in time to get
> > the shiny things that you want. And I'd rather see our time spent in
> > PostgreSQL, as Linux is only used by a part of our user base.
>
> Do we have any statistics for the distribution of our user base ?
>
> My gut feeling says that for performance-critical use the non-Linux is
> in low single digits at best.
>
> My fascination for OpenSource started with realisation that instead of
> workarounds you can actually fix the problem at source. So if the
> specific problem is that TLB is not shared then the proper fix is
> making it shared instead of rewriting everything else to get around
> it. None of us is limited to writing code in PostgreSQL only. If the
> easiest and more generix fix can be done in Linux then so be it.

TLB is a CPU hardware facility, not something that the OS can decide
to share between processes. While sharing (some) OS memory management
facilities across threads might be possible (as you mention, that
mshare syscall would be an example), that doesn't solve the issue of
the hardware not supporting sharing TLB entries across processes. We'd
use less kernel memory for memory management, but the CPU would still
stall on TLB misses every time we switch processes on the CPU (unless
we somehow were able to use non-process-namespaced TLB entries, which
would make our processes not meaningfully different from threads
w.r.t. address space).

> > >
> > > Maybe we can already remove the distinction between static and dynamic
> > > shared memory ?
> >
> > That sounds like a bad idea, dynamic shared memory is more expensive
> > to maintain than our static shared memory systems, not in the least
> > because DSM is not guaranteed to share the same addresses in each
> > process' address space.
>
> Then this too needs to be fixed

That needs kernel facilities in all (most?) supported OSes, and I
think that's much more work than moving to threads:
Allocations from the kernel are arbitrarily random across the
available address space, so a DSM segment that is allocated in one
backend might overlap with unshared allocations of a different
backend, making those backends have conflicting memory address spaces.
The only way to make that work is to have a shared memory addressing
space, but some backends just not having the allocation mapped into
their local address space; which seems only slightly more isolated
than threads and much more effort to maintain.

> > > Though I already heard some complaints at the conference discussions
> > > that having the dynamic version available has made some developers
> > > sloppy in using it resulting in wastefulness.
> >
> > Do you know any examples of this wastefulness?
>
> No. Just somebody mentioned it in a hallway conversation and the rest
> of the developers present mumbled approvingly :)

The only "wastefulness" that I know of in our use of DSM is the queue,
and that's by design: We need to move data from a backend's private
memory to memory that's accessible to other backends; i.e. shared
memory. You can't do that without copying or exposing your private
memory.

> > > Still we should be focusing our attention at solving the issues and
> > > not at "moving to threads" and hoping this will fix the issues by
> > > itself.
> >
> > I suspect that it is much easier to solve some of the issues when
> > working in a shared address space.
>
> Probably. But it would come at the cost of needing to change a lot of
> other parts of PostgreSQL.
>
> I am not against making code cleaner for potential threaded model
> support. I am just a bit sceptical about the actual switch being easy,
> or doable in the next 10-15 years.

PostgreSQL only has a support cycle of 5 years. 5 years after the last
release of un-threaded PostgreSQL we could drop support for "legacy"
extension models that don't support threading.

> > E.g. resizing shared_buffers is difficult right now due to the use of
> > a static allocation of shared memory, but if we had access to a single
> > shared address space, it'd be easier to do any cleanup necessary for
> > dynamically increasing/decreasing its size.
>
> This again could be done with shared memory mapping + dynamic shared memory.

Yes, but as I said, that's much more difficult than lock and/or atomic
operations on shared-between-backends static variables, because if
these variables aren't in shared memory you need to pass the messages
to update the variables to all backends.

> > Same with parallel workers - if we have a shared address space, the
> > workers can pass any sized objects around without being required to
> > move the tuples through DSM and waiting for the leader process to
> > empty that buffer when it gets full.
>
> Larger shared memory :)
>
> Same for shared plan cache and shared schema cache.

Shared memory in processes is not free, if only because the TLB gets
saturated much faster.

> > Sure, most of that is probably possible with DSM as well, it's just
> > that I see a lot more issues that you need to take care of when you
> > don't have a shared address space (such as the pointer translation we
> > do in dsa_get_address).
>
> All of the above seem to point to the need of a single thing - having
> an option for shared memory mappings .
>
> So let's focus on fixing things with minimal required change.

That seems logical, but not all kernels support dynamic shared memory
mappings. And, as for your suggested solution, I couldn't find much
info on this mshare syscall (or its successor mmap/VM_SHARED_PT), nor
on whether it would actually fix the TLB issue.

> And this would not have an adverse affect on systems that can not
> share mapping, they just won't become faster. And thay are all welcome
> to add the option for shared mappings too if they see enough value in
> it.
>
> It could sound like the same thing as threaded model, but should need
> much less changes and likely no changes for most out-of-tree
> extensions

We can't expect the kernel to fix everything for us - that's what we
build PostgreSQL for. Where possible, we do want to rely on OS
primitives, but I'm not sure that it would be easy to share memory
address mappings across backends, for reasons including the above
("That needs kernel facilities in all [...] more effort to maintain").

Kind regards,

Matthias van de Meent
Neon, Inc.


Reply via email to