Hi Anthony, 

Thanks for your comments.

On Sat, Feb 23, 2008 at 12:29:27PM -0600, Anthony Liguori wrote:

> In general, I don't think it causes any real harm if we always align the 
> ram address to a large page boundary.  If we aren't on Linux (and can't 
> determine what the large page size is), we can just set hpagesize to 
> getpagesize().  I think there's a good reason for this that I'll explain 
> below.

I thought about doing that (gets rid of the 4GB+ special casing) but we
lose the ability to compact smaller allocations in a single largepage.

Right now the VGA BIOS and the BIOS fit in the same largepage, for
example.

> > +
> > +void *alloc_huge_area(unsigned long memory, const char *path)
> > +{
> > +   void *area;
> > +   int fd;
> > +   char *filename;
> > +   char *tmpfile = "/kvm.XXXXXX";
> > +
> > +   filename = qemu_malloc(4096);
> > +   if (!filename) 
> > +           return NULL;
> > +
> > +   memset(filename, 0, 4096);
> > +   strncpy(filename, path, 4096 - strlen(tmpfile) - 1);
> > +   strcat(filename, tmpfile);
> > +   
> > +   hpagesize = gethugepagesize() * 1024;
> > +   if (!hpagesize)
> > +           return NULL;
> > +
> > +   mkstemp(filename);
> >   
> 
> mkstemp returns a file descriptor so the following open is not required.

Right, will fix.

> > +   fd = open(filename, O_RDWR);
> > +   if (fd < 0) {
> > +           perror("open");
> > +           hpagesize = 0;
> > +           exit(0);
> > +   }
> > +   memory = (memory+hpagesize-1) & ~(hpagesize-1);
> >   
> 
> I'm a little surprised that hugetlbfs doesn't require an ftruncate() 
> before the mmap().  Does an ftruncate() do any harm?  If so, it would be 
> better to have one.

Its not needed because hugetlbfs automatically adjusts the inode size on
demand.

Don't think it will do any harm and its compliant with normal
filesystems.

> > +   area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> >   
> 
> I think a MAP_SHARED may have some advantages in that it then becomes 
> possible to pass this file descriptor around to other processes so they 
> can mmap() the same memory region.  I don't know if that works with 
> hugetlbfs but it certainly does with tmpfs.  My thinking is that this 
> code can be made generic so it works with either hugetlbfs or tmpfs.

Yes, makes sense.

> Furthermore, I think it would be interesting if we defaulted to trying 
> to create the memory file in something like /dev/kvm-mem or something 
> more appropriately named.  An administrator can then either mount a 
> hugetlbfs or tmpfs mount there.  We still probably want to provide an 
> option to override where to create the file and we want to be able to 
> fall back to a normal malloc() of course but at least this makes it 
> possible for the distros to set things up so that it Just Work without 
> the user having to understand things like hugetlbfs and tmpfs.

I like the idea. I'll change "-hugetlb-path" to "-mem-path" for now (and
test it with tmpfs). Not so fond of hardcoding a path in QEMU though.

> Maybe we'll even see something that merges the two filesystems in the 
> future so that if a huge page allocation fails, it falls back to 
> creating a normal tmpfs file.  Perhaps that's a reasonable mount option 
> to add to hugetlbfs.

> Instead of registering an atexit() handler, I think it would be better 
> to unlink immediately after doing the mkstemp().  This reduces the 
> possibility of leaking the file in the event of catastrophe (like a kill 
> -SIGKILL).

Right, will fix.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to