Hi, On 2026-02-09 20:45:28 +0530, Ashutosh Bapat wrote: > 2. Address space reservation for shared memory > ============================================ > > Currently the shared memory layout is designed to pack everything tight > together, leaving no space between mappings for resizing. Here is how it > looks like for one mapping in /proc/$PID/maps, /dev/zero represents the > anonymous shared memory we talk about: > > 00400000-00490000 /path/bin/postgres > ... > 012d9000-0133e000 [heap] > 7f443a800000-7f470a800000 /dev/zero (deleted) > 7f470a800000-7f471831d000 /usr/lib/locale/locale-archive > 7f4718400000-7f4718401000 /usr/lib64/libstdc++.so.6.0.34 > ... > > Make the layout more dynamic via splitting every shared memory segment > into two parts: > > * An anonymous file, which actually contains shared memory content. > Such an anonymous file is created via memfd_create, it lives in > memory, behaves like a regular file and semantically equivalent to an > anonymous memory allocated via mmap with MAP_ANONYMOUS. > > * A reservation mapping, which size is much larger than required shared > segment size. This mapping is created with flag MAP_NORESERVE (to not > count the reserved space against memory limits). The anonymous file is > mapped into this reservation mapping. > > If we have to change the address maps while resizing the shared buffer > pool, it is needed to be done in Postmaster too, so that the new > backends will inherit the resized address space from the Postmaster. > However, Postmaster is not invovled in ProcSignalBarrier mechanism and > we don't want it to spend time in things other than its core > functionality. To achive that, maximum required address space maps are > setup upfront with read and write access when starting the server. When > resizing the buffer pool only the backing file object is resized from > the coordinator. This also makes the ProcSignalBarrier handling code > light for backends other than the coordinator. > > The resulting layout looks like this: > > 00400000-00490000 /path/bin/postgres > ... > 3f526000-3f590000 rw-p [heap] > 7fbd827fe000-7fbd8bdde000 rw-s /memfd:main (deleted) -- anon file > 7fbd8bdde000-7fbe82800000 ---s /memfd:main (deleted) -- reservation > 7fbe82800000-7fbe90670000 r--p /usr/lib/locale/locale-archive > 7fbe90800000-7fbe90941000 r-xp /usr/lib64/libstdc++.so.6.0.34 > > To resize a shared memory segment in this layout it's possible to use > ftruncate on the memory mapped file. > > This approach also do not impact the actual memory usage as reported by > the kernel.
I still don't see what the point of having multiple mappings and using memfd is. We need to reserve the address space for the maximum sized allocation in postmaster, otherwise there's absolutely no guarantee that it's available at those addresses in all the children - which you do as you explain here. Therefore, the maximum size of each "suballocation" needs to be reserved ahead of time. At which point I don't see the point of having multiple mmaps. It just makes things more complicated and expensive (each mmap makes fork & exit slower). Even if we decide to use memfd, because we consider MADV_DONTNEED to not be suitable for some reason, what's the point of having more than one mapping using memfd? Greetings, Andres Freund
