On (25/06/08 16:50), Andrew Hastings didst pronounce:
> Mel Gorman wrote:
> >HUGETLB_MORECORE currently exists to allow glibc to back malloc() with
> >large pages instead of small pages. However, not all applications use glibc
> >malloc() nor is it always desirable to back malloc() with huge pages. There
> >exists a requirement that a hugepage-aware application be able to allocate
> >hugepages directly.
> >
> >Currently, each application is expected to discover the filesystem
> >themselves,
> >mmap() the file and other house-keeping tasks. libhugetlbfs already
> >implements
> >much of this complex logic internally. This patch exposes a simple API for
> >the
> >allocation and freeing of regions backed by hugepages that applications
> >linking
> >directly to libhugetlbfs can use. The implementation is a little simplistic
> >but can be optimised later if and when applications perceive its
> >performance
> >to be a bottleneck. The API itself should not need to change as a
> >multi-page
> >aware API would be an additional rather than a replacement interface.
> >
> >Ideally, this or something like it would be merged for 2.0. Comments?
>
> Sure, this seems like a reasonable idea.
>
Sweet
> >Signed-off-by: Mel Gorman <[EMAIL PROTECTED]>
> >---
> > Makefile | 5 +
> > alloc.c | 152
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > hugetlbfs.h | 11 ++++
> > version.lds | 6 ++
> > 4 files changed, 173 insertions(+), 1 deletion(-)
> >
> >diff -rup -X /usr/src/patchset-0.6/bin//dontdiff
> >libhugetlbfs-clean/alloc.c libhugetlbfs-userspace-alloc/alloc.c
> >--- libhugetlbfs-clean/alloc.c 2008-06-24 11:15:54.000000000 -0700
> >+++ libhugetlbfs-userspace-alloc/alloc.c 2008-06-25
> >13:40:37.000000000 -0700
> >@@ -0,0 +1,131 @@
> >+/*
> >+ * libhugetlbfs - Easy use of Linux hugepages
> >+ * alloc.c - Simple allocator of regions backed by hugepages
> >+ *
> >+ * This library is free software; you can redistribute it and/or
> >+ * modify it under the terms of the GNU Lesser General Public License
> >+ * as published by the Free Software Foundation; either version 2.1 of
> >+ * the License, or (at your option) any later version.
> >+ *
> >+ * This library is distributed in the hope that it will be useful, but
> >+ * WITHOUT ANY WARRANTY; without even the implied warranty of
> >+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> >+ * Lesser General Public License for more details.
> >+ *
> >+ * You should have received a copy of the GNU Lesser General Public
> >+ * License along with this library; if not, write to the Free Software
> >+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301
> >USA
> >+ */
> >+
> >+#define _GNU_SOURCE
> >+#include <errno.h>
> >+#include <stdio.h>
> >+#include <stdlib.h>
> >+#include <string.h>
> >+#include <unistd.h>
> >+#include <sys/mman.h>
> >+#include <sys/types.h>
> >+
> >+#include "hugetlbfs.h"
> >+#include "libhugetlbfs_internal.h"
> >+
> >+/**
> >+ * hugepage_malloc - Allocate an amount of memory backed by huge pages
>
> The function name in the comment doesn't match the function definition.
> (Personally, I prefer "hugepage_malloc" over "hugepages_malloc" but
> perhaps this is an Americanism.)
>
I have poor naming taste. I changed it to hugepages_malloc() because it
can be multiple pages depending on the value of len. I do not feel
stronly and am open to suggestion until a release with the symbol
occurs.
> >+ * len: Size of the region to allocate
> >+ * flags: Flags specifying the behaviour of the function
> >+ *
> >+ * This function allocates a buffer similar in principal to malloc()
> >except
> >+ * that the region is backed by huge pages and at least hugepage-aligned.
> >+ * This is not a suitable drop-in for malloc() and is only suitable in the
> >+ * event the length is expected to be hugepage-aligned. However, a
> >+ * malloc-like library could use this function to create additional heap
> >+ * similar in principal to what morecore does for glibc malloc.
> >+ */
> >+void *hugepages_malloc(size_t len, ghp_t flags)
> >+{
> >+ void *buf;
> >+ int heap_fd;
> >+
> >+ /* Create a file descriptor for the new region */
> >+ heap_fd = hugetlbfs_unlinked_fd();
> >+ if (heap_fd < 0) {
> >+ ERROR("Couldn't open hugetlbfs file for %d sized heap\n",
> >len);
>
> I think you need "%zd" for size_t or the value will be truncated on
> 64-bit platforms. Grammar nit: "%d-sized" or "size %d".
Good spot. Fixed
> >+ return NULL;
> >+ }
> >+
> >+ /* Map the requested region */
> >+ buf = mmap(NULL, len, PROT_READ|PROT_WRITE,
> >+ MAP_PRIVATE, heap_fd, len);
> >+ if (buf == MAP_FAILED) {
> >+ WARNING("New heap segment map failed: %s\n",
> >strerror(errno));
> >+ return NULL;
>
> Still need a close(heap_fd) on mmap failure.
>
Very good spot, fixed.
> >+ }
> >+
> >+ /* Close the file so we do not have to track the descriptor */
> >+ if (close(heap_fd) != 0) {
> >+ WARNING("Failed to close new heap fd: %s\n",
> >strerror(errno));
> >+ munmap(buf, len);
> >+ return NULL;
> >+ }
> >+
> >+ /* woo, new buffer of shiny */
> >+ return buf;
> >+}
> >+
> >+#define MAPS_BUF_SZ 4096
> >+/**
> >+ * hugepages_free - Free a region allocated that was backed by large pages
> >+ * ptr - The pointer to the buffer returned by hugepages_malloc()
> >+ *
> >+ * This function finds a region to free based on the contents of
> >+ * /proc/pid/maps. The assumption is made that the ptr is the start of
> >+ * a hugepage region allocated with hugepages_malloc. No checking is made
> >+ * that the pointer is to a hugepage backed region.
> >+ */
> >+void hugepages_free(void *ptr)
> >+{
> >+ FILE *fd;
> >+ char line[MAPS_BUF_SZ];
> >+ unsigned long start = 0, end = 0;
> >+
> >+ /*
> >+ * /proc/self/maps is used to determine the length of the original
> >+ * allocation. As mappings are based on different files, we can
> >+ * assume that maps will not merge. If the hugepages were truely
>
> "truly"
Fixed
>
> >+ * anonymous, this assumption would be broken.
> >+ */
> >+ fd = fopen("/proc/self/maps", "r");
> >+ if (!fd) {
> >+ ERROR("Failed to open /proc/self/maps\n");
> >+ return;
> >+ }
> >+
> >+ /* Parse /proc/maps for address ranges line by line */
> >+ while (!feof(fd)) {
> >+ char *bufptr;
> >+ char *saveptr = NULL;
> >+
> >+ /* Read a line of input */
> >+ if (fgets(line, MAPS_BUF_SZ, fd) == NULL)
> >+ break;
> >+
> >+ /* Parse the line to get the start and end of each mapping */
> >+ bufptr = strtok_r(line, " ", &saveptr);
> >+ bufptr = strtok_r(bufptr, "-", &saveptr);
> >+ start = strtoull(bufptr, NULL, 16);
> >+ bufptr = strtok_r(NULL, "-", &saveptr);
> >+
> >+ /* If the correct mapping is found, remove it */
> >+ if (start == (unsigned long)ptr) {
> >+ end = strtoull(bufptr, NULL, 16);
> >+ munmap(ptr, end - start);
> >+ break;
> >+ }
> >+ }
> >+
> >+ /* Print a warning if the ptr appeared to point nowhere */
> >+ if (end == 0)
> >+ ERROR("hugepages_free using invalid or double free\n");
> >+
> >+ fclose(fd);
> >+}
> >diff -rup -X /usr/src/patchset-0.6/bin//dontdiff
> >libhugetlbfs-clean/hugetlbfs.h libhugetlbfs-userspace-alloc/hugetlbfs.h
> >--- libhugetlbfs-clean/hugetlbfs.h 2008-05-16 13:43:11.000000000 -0700
> >+++ libhugetlbfs-userspace-alloc/hugetlbfs.h 2008-06-25
> >13:28:51.000000000 -0700
> >@@ -33,4 +33,15 @@ long dump_proc_pid_maps(void);
> >
> > #define PF_LINUX_HUGETLB 0x100000
> >
> >+/*
> >+ * Direct alloc flags and types
> >+ * GHP_FORCELARGE - Returns NULL if a huge page cannot be allocated
> >(default)
> >+ */
> >+typedef unsigned long ghp_t;
> >+#define GHP_FORCELARGE 0x01UL
>
> Looks like ghp* and GHP* might be part of an unrelated patch?
>
No, it just happens to be the default behaviour right now. Later I
envision that flags could be used that say "allocate a region with huge
pages if possible and small pages otherwise, but do not return NULL".
This was a bad choice though. Maybe specifying GHP_DEFAULT would be a
better option as with a number of flags, there may be a desire to
specify what the default flag combination would be. Would that be
better?
I wanted to have the flags arguement for the API either way for
extending later without altering the function signature.
> >+
> >+/* Direct alloc functions */
> >+void *hugepages_malloc(size_t len, ghp_t flags);
> >+void hugepages_free(void *ptr);
> >+
> > #endif /* _HUGETLBFS_H */
> >diff -rup -X /usr/src/patchset-0.6/bin//dontdiff
> >libhugetlbfs-clean/Makefile libhugetlbfs-userspace-alloc/Makefile
> >--- libhugetlbfs-clean/Makefile 2008-05-16 13:43:11.000000000 -0700
> >+++ libhugetlbfs-userspace-alloc/Makefile 2008-06-25
> >10:33:38.000000000 -0700
> >@@ -1,7 +1,8 @@
> > PREFIX = /usr/local
> >
> >-LIBOBJS = hugeutils.o version.o init.o morecore.o debug.o
> >+LIBOBJS = hugeutils.o version.o init.o morecore.o debug.o alloc.o
> > INSTALL_OBJ_LIBS = libhugetlbfs.so libhugetlbfs.a
> >+INSTALL_HEADERS = hugetlbfs.h
> > LDSCRIPT_TYPES = B BDT
> > LDSCRIPT_DIST_ELF = elf32ppclinux elf64ppc elf_i386 elf_x86_64
> > INSTALL_OBJSCRIPT = ld.hugetlbfs
> >@@ -89,6 +90,7 @@ endif
> > LIBOBJS32 += $(LIBOBJS:%=obj32/%)
> > LIBOBJS64 += $(LIBOBJS:%=obj64/%)
> >
> >+HEADERDIR = $(PREFIX)/include
> > LIBDIR32 = $(PREFIX)/$(LIB32)
> > LIBDIR64 = $(PREFIX)/$(LIB64)
> > LDSCRIPTDIR = $(PREFIX)/share/libhugetlbfs/ldscripts
> >@@ -254,6 +256,7 @@ objscript.%: %
> > install: libs $(OBJDIRS:%=%/install) $(INSTALL_OBJSCRIPT:%=objscript.%)
> > @$(VECHO) INSTALL
> > $(INSTALL) -d $(DESTDIR)$(LDSCRIPTDIR)
> >+ $(INSTALL) -m 644 $(INSTALL_HEADERS) $(HEADERDIR)
> > $(INSTALL) -m 644 $(INSTALL_LDSCRIPTS:%=ldscripts/%)
> > $(DESTDIR)$(LDSCRIPTDIR)
> > $(INSTALL) -d $(DESTDIR)$(BINDIR)
> > for x in $(INSTALL_OBJSCRIPT); do \
> >diff -rup -X /usr/src/patchset-0.6/bin//dontdiff
> >libhugetlbfs-clean/version.lds libhugetlbfs-userspace-alloc/version.lds
> >--- libhugetlbfs-clean/version.lds 2008-05-16 13:43:11.000000000 -0700
> >+++ libhugetlbfs-userspace-alloc/version.lds 2008-06-25
> >10:21:25.000000000 -0700
> >@@ -7,3 +7,9 @@ VERS_1.0 {
> > local:
> > *;
> > };
> >+
> >+HTLBFS_2.0 {
> >+ global:
> >+ hugepages_malloc;
> >+ hugepages_free;
> >+};
>
> -Andrew Hastings
> Cray Inc.
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Libhugetlbfs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel