On 16.09.2013 13:15, Andres Freund wrote:
On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote:
On 14.09.2013 02:41, Richard Poole wrote:
The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory
on systems that support it. It's based on Christian Kruse's patch from
last year, incorporating suggestions from Andres Freund.

I don't understand the logic in figuring out the pagesize, and the smallest
supported hugepage size. First of all, even without the patch, why do we
round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel
will round up the request all by itself. The mmap() man page doesn't say
anything about length having to be a multiple of pages size.

I think it does:
        EINVAL We don't like addr, length, or offset (e.g., they are  too
               large,  or  not aligned on a page boundary).

That doesn't mean that they *all* have to be aligned on a page boundary. It's understandable that 'addr' and 'offset' have to be, but it doesn't make much sense for 'length'.

and
        A file is mapped in multiples of the page size.  For a file that is not 
a multiple
        of  the  page size, the remaining memory is zeroed when mapped, and 
writes to that
        region are not written out to the file.  The effect of changing the  
size  of  the
        underlying  file  of  a  mapping  on the pages that correspond to added 
or removed
        regions of the file is unspecified.

And no, according to my past experience, the kernel does *not* do any
such rounding up. It will just fail.

I wrote a little test program to play with different values (attached). I tried this on my laptop with a 3.2 kernel (uname -r: 3.10-2-amd6), and on a VM with a fresh Centos 6.4 install with 2.6.32 kernel (2.6.32-358.18.1.el6.x86_64), and they both work the same:

$ ./mmaptest 100 # mmap 100 bytes

in a different terminal:
$ cat /proc/meminfo  | grep HugePages_Rsvd
HugePages_Rsvd:        1

So even a tiny allocation, much smaller than any page size, succeeds, and it reserves a huge page. I tried the same with larger values; the kernel always uses huge pages, and rounds up the allocation to a multiple of the huge page size.

So, let's just get rid of the /sys scanning code.

Robert, do you remember why you put the "pagesize = sysconf(_SC_PAGE_SIZE);" call in the new mmap() shared memory allocator?

- Heikki
#include <sys/mman.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

int main(int argc, char **argv)
{
	char *ptr;
	int size;

	size = (argc > 1) ? atoi(argv[1]) : (100 * 4096);

	ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
			   MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);

	if (ptr != (void *) -1)
		printf("success: %p\n", ptr);
	else
		printf("failure: %s\n", strerror(errno));

	sleep(10);

	return 0;
}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to