Re: Changing shared_buffers without restart

Jakub Wartak Thu, 12 Feb 2026 06:13:26 -0800

On Tue, Feb 10, 2026 at 4:21 PM Ashutosh Bapat
<[email protected]> wrote:
>
> Hi Jakub,
>
> On Tue, Feb 10, 2026 at 8:07 PM Jakub Wartak
> <[email protected]> wrote:
> >
> > > I see the bug. Fixed in the attached diff. Please apply it on top of
> > > 20260209 and let me know if it fixes the issue for you. I will include
> > > it in the next set of patches.
> >
> > Yes, it fixes that "bug1", given
> >     shared_buffers = '32 GB'
> >     max_shared_buffers = '32 GB'
> >     max_connections = 1000
> >     huge_pages = 'on'
>
> PrepareHugePages() seems like a kludge but I haven't yet gotten time
> to do something about it.
>
> >
> > That must be some logic error there in the patch, because if I have
> > huge_pages=on I want it to fail to start instead of silenty fallback
> > to off in huge_pages_status.
>
> Thanks a lot for all your tests. I will come around to fixing HP bugs
> once I have tackled the high level items mentioned in [1] and shared
> memory management rewrite that Heikki is suggesting. May I request you
> to keep these tests with you till then. Or if you could investigate
> these bugs and provide patches containing fixes, that will help as
> well. But it may not be as easy as the bug 1 fix.


Hi Ashutosh,

OK, so with huge_page_fix.diff.no_ci with just shared_buffers=1GB,
max_shared_buffers=2GB and sysctl hugepages=634 (that's what
shared_memory_size_in_huge_page told me) was failing on fallocate for
small main ~240MB (not even the big buffers):

2026-02-12 14:08:42.924 CET [314850] DEBUG:  segment[main]: mmap(241172480)
2026-02-12 14:08:42.936 CET [314850] FATAL:  segment[main]: could not
allocate space for anonymous file: No space left on device

[pid 314850] mmap(NULL, 1335885824, PROT_NONE,
MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7ce247a00000
[pid 314850] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 4
[pid 314850] close(4)                   = 0
[pid 314850] memfd_create("main", MFD_HUGETLB) = 4
[pid 314850] mmap(NULL, 241172480, PROT_NONE,
MAP_SHARED|MAP_NORESERVE|MAP_HUGETLB, 4, 0) = 0x7ce239400000
[pid 314850] ftruncate(4, 241172480)    = 0
[pid 314850] fallocate(4, 0, 0, 241172480) = -1 ENOSPC (No space left on device)

but before fallocate() failure we were having proper allocation of:
    1335885824/1024/1024 = 1274MB
    241172480/1024/1024 = 230MB

so rougly (1274+230)/2MB HPs are needed, so = 752 , if I raise it to that brings
us to that there are three (!) mmap() calls:

[pid 317348] mmap(NULL, 1335885824, PROT_NONE,
MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x722157800000
[pid 317348] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 4
[pid 317348] close(4)                   = 0
[pid 317348] memfd_create("main", MFD_HUGETLB) = 4
[pid 317348] mmap(NULL, 241172480, PROT_NONE,
MAP_SHARED|MAP_NORESERVE|MAP_HUGETLB, 4, 0) = 0x722149200000
[pid 317348] ftruncate(4, 241172480)    = 0
[pid 317348] fallocate(4, 0, 0, 241172480) = 0
[pid 317348] openat(AT_FDCWD, "postmaster.pid", O_RDWR) = 5
[pid 317348] close(5)                   = 0
[pid 317348] openat(AT_FDCWD, "/proc/meminfo", O_RDONLY) = 5
[pid 317348] close(5)                   = 0
[pid 317348] memfd_create("buffers", MFD_HUGETLB) = 5
[pid 317348] mmap(NULL, 2149580800, PROT_NONE,
MAP_SHARED|MAP_NORESERVE|MAP_HUGETLB, 5, 0) = 0x7220c9000000
[pid 317348] ftruncate(5, 1075838976)   = 0
[pid 317348] fallocate(5, 0, 0, 1075838976) = -1 ENOSPC (No space left
on device)

So it looks like the patch requests huge-pages like that:
- it allocated 1335885824 = 1274MB with reservation from start
- it adds 230MB for memfd "main" (starts with without reservation,
  lazy reserves and then full use via ftruncate -- all ok)
- then it adds not reserved 2149580800 = 2050MB (ok as it is max_shared_buffers)
  and then lazily allocated 1GB, but when really trying to touch
  it failed on fallocate() because pages were already consumed by
  first mmap() probe call

That gives us already: 1.2 + 0.2 + 1 = 2.4GB for just shared_buffers=1GB

I don't know the rationale, but it appears that PrepareHugePages() should simply
free the huge page memory once it has validated it is there:
    /* Map total amount of memory to test its availability. */

The server starts properly with HPs if I just add this munmap() there
based on that above comment (assuming PrepareHugePages() is just testing
stuff):

@@ -927,8 +913,7 @@ PrepareHugePages()
 #else
        if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY)
        {
-               Size            hugepagesize,
-                                       total_size = 0;
+               Size            hugepagesize;
                int                     huge_mmap_flags;

                GetHugePageSize(&hugepagesize, &huge_mmap_flags, NULL);
@@ -964,6 +949,8 @@ PrepareHugePages()
        SetConfigOption("huge_pages_status", (ptr == MAP_FAILED) ? "off" : "on",
                                        PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
        huge_pages_on = ptr != MAP_FAILED;
+       if(ptr != MAP_FAILED)
+               munmap(ptr, total_size);

Hope that helps a little! (I have run out of time when trying to see if and why
shared_memory_size_in_huge_page is wrongly calculated).

To sum I think those are issues as patchset stands:
- huge_page_fix.diff.no_ci (wrong calculation)
- lack of munmap() of HPs as per above
- probably logic for huge_pages=on vs huge_pages_status=off
  should be reviewed (it should fallback only with try)

TBH, I haven't really looked at the code outside of that region, I'm just
trespasser that was interested in memfd ;)

-J.

Re: Changing shared_buffers without restart

Reply via email to