On 16.09.2013 13:15, Andres Freund wrote:
On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote:
On 14.09.2013 02:41, Richard Poole wrote:
The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
on systems that support it. It's based on Christian Kruse's patch from
last year, incorporating suggestions from Andres Freund.
I don't understand the logic in figuring out the pagesize, and the smallest
supported hugepage size. First of all, even without the patch, why do we
round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel
will round up the request all by itself. The mmap() man page doesn't say
anything about length having to be a multiple of pages size.
I think it does:
EINVAL We don't like addr, length, or offset (e.g., they are too
large, or not aligned on a page boundary).
That doesn't mean that they *all* have to be aligned on a page boundary.
It's understandable that 'addr' and 'offset' have to be, but it doesn't
make much sense for 'length'.
and
A file is mapped in multiples of the page size. For a file that is not
a multiple
of the page size, the remaining memory is zeroed when mapped, and
writes to that
region are not written out to the file. The effect of changing the
size of the
underlying file of a mapping on the pages that correspond to added
or removed
regions of the file is unspecified.
And no, according to my past experience, the kernel does *not* do any
such rounding up. It will just fail.
I wrote a little test program to play with different values (attached).
I tried this on my laptop with a 3.2 kernel (uname -r: 3.10-2-amd6), and
on a VM with a fresh Centos 6.4 install with 2.6.32 kernel
(2.6.32-358.18.1.el6.x86_64), and they both work the same:
$ ./mmaptest 100 # mmap 100 bytes
in a different terminal:
$ cat /proc/meminfo | grep HugePages_Rsvd
HugePages_Rsvd: 1
So even a tiny allocation, much smaller than any page size, succeeds,
and it reserves a huge page. I tried the same with larger values; the
kernel always uses huge pages, and rounds up the allocation to a
multiple of the huge page size.
So, let's just get rid of the /sys scanning code.
Robert, do you remember why you put the "pagesize =
sysconf(_SC_PAGE_SIZE);" call in the new mmap() shared memory allocator?
- Heikki
#include <sys/mman.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main(int argc, char **argv)
{
char *ptr;
int size;
size = (argc > 1) ? atoi(argv[1]) : (100 * 4096);
ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (ptr != (void *) -1)
printf("success: %p\n", ptr);
else
printf("failure: %s\n", strerror(errno));
sleep(10);
return 0;
}
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers