Hi Anthony,
Thanks for your comments.
On Sat, Feb 23, 2008 at 12:29:27PM -0600, Anthony Liguori wrote:
> In general, I don't think it causes any real harm if we always align the
> ram address to a large page boundary. If we aren't on Linux (and can't
> determine what the large page size is), we can just set hpagesize to
> getpagesize(). I think there's a good reason for this that I'll explain
> below.
I thought about doing that (gets rid of the 4GB+ special casing) but we
lose the ability to compact smaller allocations in a single largepage.
Right now the VGA BIOS and the BIOS fit in the same largepage, for
example.
> > +
> > +void *alloc_huge_area(unsigned long memory, const char *path)
> > +{
> > + void *area;
> > + int fd;
> > + char *filename;
> > + char *tmpfile = "/kvm.XXXXXX";
> > +
> > + filename = qemu_malloc(4096);
> > + if (!filename)
> > + return NULL;
> > +
> > + memset(filename, 0, 4096);
> > + strncpy(filename, path, 4096 - strlen(tmpfile) - 1);
> > + strcat(filename, tmpfile);
> > +
> > + hpagesize = gethugepagesize() * 1024;
> > + if (!hpagesize)
> > + return NULL;
> > +
> > + mkstemp(filename);
> >
>
> mkstemp returns a file descriptor so the following open is not required.
Right, will fix.
> > + fd = open(filename, O_RDWR);
> > + if (fd < 0) {
> > + perror("open");
> > + hpagesize = 0;
> > + exit(0);
> > + }
> > + memory = (memory+hpagesize-1) & ~(hpagesize-1);
> >
>
> I'm a little surprised that hugetlbfs doesn't require an ftruncate()
> before the mmap(). Does an ftruncate() do any harm? If so, it would be
> better to have one.
Its not needed because hugetlbfs automatically adjusts the inode size on
demand.
Don't think it will do any harm and its compliant with normal
filesystems.
> > + area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> >
>
> I think a MAP_SHARED may have some advantages in that it then becomes
> possible to pass this file descriptor around to other processes so they
> can mmap() the same memory region. I don't know if that works with
> hugetlbfs but it certainly does with tmpfs. My thinking is that this
> code can be made generic so it works with either hugetlbfs or tmpfs.
Yes, makes sense.
> Furthermore, I think it would be interesting if we defaulted to trying
> to create the memory file in something like /dev/kvm-mem or something
> more appropriately named. An administrator can then either mount a
> hugetlbfs or tmpfs mount there. We still probably want to provide an
> option to override where to create the file and we want to be able to
> fall back to a normal malloc() of course but at least this makes it
> possible for the distros to set things up so that it Just Work without
> the user having to understand things like hugetlbfs and tmpfs.
I like the idea. I'll change "-hugetlb-path" to "-mem-path" for now (and
test it with tmpfs). Not so fond of hardcoding a path in QEMU though.
> Maybe we'll even see something that merges the two filesystems in the
> future so that if a huge page allocation fails, it falls back to
> creating a normal tmpfs file. Perhaps that's a reasonable mount option
> to add to hugetlbfs.
> Instead of registering an atexit() handler, I think it would be better
> to unlink immediately after doing the mkstemp(). This reduces the
> possibility of leaking the file in the event of catastrophe (like a kill
> -SIGKILL).
Right, will fix.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel