This trivial patch never did manage to find its way
in. Marcelo called it to my attention earlier in
the week. I've tweaked it to apply to kvm-83 and
the resulting patch is attached. I've left the
prior e-mail discussion below for reference.
-john
john cooper wrote:
This patch from over a month ago doesn't seem to have
made it into kvm-73 and may have been lost in the
shuffle. Attached is essentially the same patch but
as applied to kvm-73, and validated relative to that
version.
In a nutshell the intention here is to allow
preallocation of guest huge page backed memory at
qemu initialization time to avoid a quirk in the
kernel's huge page accounting allowing overcommit
of huge pages. Failure of the kernel to resolve a
guest fault to overcommitted huge page memory during
runtime results in sigkill termination of the guest.
This patch provides the option of avoiding such
behavior at the cost of up-front preallocation of
physical huge pages backing the guest.
-john
Anthony Liguori wrote:
john cooper wrote:
Anthony Liguori wrote:
john cooper wrote:
As it currently exists alloc_hpage_mem() is tied to
the notion of huge page allocation as it will reference
gethugepagesize() irrespective of *mem_path. So even
in the case of tmpfs backed files, if the host kernel
has been configured with CONFIG_HUGETLBFS we will wind
up doing allocations of /dev/shm mapped files at
/proc/meminfo:Hugepagesize granularity.
Which is fine. It just means we round -m values up to even numbers.
Well, yes it will round the allocation. But from a
minimally sufficient 4KB boundary to that of 4MB/2MB
relative to a 32/64 bit x86 host which is excessive.
Probably not what was intended but probably not too
much of a concern as "-mem-path /dev/shm" is likely
only used in debug of this flag and associated logic.
I don't see it currently being worth the trouble to
correct from a squeaky clean POV, and doing so may
drag in far more than the header file we've just
booted above to deal with this architecture/config
dependency.
Renaming a function to a name that's less accurate seems bad to me.
I don't mean to be pedantic, but it seems like a strange thing to
do. I prefer it the way it was before.
I don't see any harm reverting the name. But I do
believe it is largely cosmetic as given the above,
the current code does require some work to make it
independent of huge page assumptions. Update attached.
-john
Looks good to me.
Acked-by: Anthony Liguori <aligu...@us.ibm.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
john.coo...@third-harmonic.com
kernel/x86/Kbuild | 4 ++--
qemu/vl.c | 27 ++++++++++++++++++++-------
2 files changed, 22 insertions(+), 9 deletions(-)
=================================================================
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -237,6 +237,7 @@ int semihosting_enabled = 0;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+int mem_prealloc = 1; /* force preallocation of physical target memory */
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -4116,7 +4117,10 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be
allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory,
also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory,
also\n"
+ " enables allocation of guest memory with huge
pages\n"
+ "-mem-prealloc toggles preallocation of -mem-path backed physical
memory\n"
+ " at startup. Default is enabled.\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -4246,6 +4250,7 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_mem_prealloc
};
typedef struct QEMUOption {
@@ -4381,6 +4386,7 @@ static const QEMUOption qemu_options[] =
{ "icount", HAS_ARG, QEMU_OPTION_icount },
{ "incoming", HAS_ARG, QEMU_OPTION_incoming },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
{ NULL },
};
@@ -4662,7 +4668,7 @@ void *alloc_mem_area(size_t memory, unsi
{
char *filename;
void *area;
- int fd;
+ int fd, flags;
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -4690,13 +4696,17 @@ void *alloc_mem_area(size_t memory, unsi
*/
ftruncate(fd, memory);
- area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+ /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+ flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+ area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
if (area == MAP_FAILED) {
- perror("mmap");
- close(fd);
- return NULL;
+ perror("alloc_mem_area: can't mmap hugetlbfs pages");
+ close(fd);
+ return (NULL);
}
-
*len = memory;
return area;
}
@@ -5377,6 +5387,9 @@ int main(int argc, char **argv, char **e
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = !mem_prealloc;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
=================================================================
--- a/kernel/x86/Kbuild
+++ b/kernel/x86/Kbuild
@@ -9,8 +9,8 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_e
ifeq ($(EXT_CONFIG_KVM_TRACE),y)
kvm-objs += kvm_trace.o
endif
-ifeq ($(CONFIG_DMAR),y)
-kvm-objs += vtd.o
+ifeq ($(CONFIG_IOMMU_API),y)
+kvm-objs += iommu.o
endif
kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
kvm-amd-objs := svm.o ../external-module-compat.o