Hi, On 2026-02-10 11:02:13 +0530, Ashutosh Bapat wrote: > > I still don't see what the point of having multiple mappings and using memfd > > is. We need to reserve the address space for the maximum sized allocation in > > postmaster, otherwise there's absolutely no guarantee that it's available at > > those addresses in all the children - which you do as you explain > > here. Therefore, the maximum size of each "suballocation" needs to be > > reserved > > ahead of time. At which point I don't see the point of having multiple > > mmaps. It just makes things more complicated and expensive (each mmap makes > > fork & exit slower). > > > > Even if we decide to use memfd, because we consider MADV_DONTNEED to not be > > suitable for some reason, what's the point of having more than one mapping > > using memfd?
(this should reference MADV_REMOVE, not MADV_DONTNEED) > There are just two mappings now compared to 6 earlier. If I am reading > Jakub's benchmarking correctly, even 6 segments didn't show much > regression in his benchmarks. Having just two should not see much > regression. If we use multiple mappings we could control the > properties of each segment separately - e.g. use huge pages for some > (buffer blocks) and not for others. In Windows it seems it is easy to > create multiple segments than punching holes in an existing segments. > When we port the feature to Windows or other platforms, being able to > treat all the segments in the same way would be an advantage. > Said that I am not discarding the idea of using a single fd and then > punching holes using fallocate() altogether; we will use it if > multiple mappings do not bring any advantages. Let's also see how the > on-demand shared memory segment feature being discussed in this thread > with Heikki gets shaped. I think the multiple memory mappings approach is just too restrictive. If we e.g. eventually want to make some of the other major allocations that depend on NBuffers react to resizing shared buffers, it's very easy to do if all it requires is calling madvise(TYPEALIGN(start, page_size), MADV_REMOVE, TYPEALIGN_DOWN(end, page_size)); There are several cases that are pretty easy to handle that way: - Buffer Blocks - Buffer Descriptors - Sync request queue (part of the "Checkpointer Data" allocation) - Checkpoint BufferIds (for sorting the to-be-checkpointed data) - Buffer IO Condition Variables But if you want to support making these resizable with the separate mappings approach, it gets considerably more complicated and the number of mappings increases more substantially. We also don't need a lot less infrastructure in shmem.c that way. We could e.g. make ShmemInitStruct() reservere the entire requested size (to avoid OOM killer issues) and have a ShmemInitStructExt() that allows the caller choose whether to reserve. No different segment IDs etc are needed. Greetings, Andres Freund
