Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote: On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote: On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote: Resending series, after fixing some coding style issues. Does anybody has any feedback about this proposal? Changes v1 - v2: - Coding style fixes Original cover letter: I was investigating if there are any mechanisms that allow manually pinning of guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and noticed that -mem-path could be used for that, except that it currently removes any files it creates (using mkstemp()) immediately, not allowing numactl to be used on the backing files, as a result. This patches add a -keep-mem-path-files option to make QEMU create the files inside -mem-path with more predictable names, and not remove them after creation. Some previous discussions about the subject, for reference: - Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 - Message-ID: 4c7d7c2a.7000...@codemonkey.ws http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 A more recent thread can be found at: - Message-ID: 20111029184502.gh11...@in.ibm.com http://article.gmane.org/gmane.comp.emulators.qemu/123001 Note that this is just a mechanism to facilitate manual static binding using numactl on hugetlbfs later, for optimization. This may be especially useful for single large multi-node guests use-cases (and, of course, has to be used with care). I don't know if it is a good idea to use the memory range names as a publicly- visible interface. Another option may be to use a single file instead, and mmap different regions inside the same file for each memory region. I an open to comments and suggestions. Example (untested) usage to bind manually each half of the RAM of a guest to a different NUMA node: $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram I'd suggest that instead of making the memory file name into a public ABI QEMU needs to maintain, QEMU could expose the info via a monitor command. eg $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/3, offset=1G, length=1G This example takes advantage of the fact that with Linux, you can still access a deleted file via /proc/self/fd/NNN, which AFAICT, would avoid the need for a --keep-mem-path-files. I like the suggestion. But other processes still need to be able to open those files if we want to do anything useful with them. In this case, I guess it's better to let QEMU itself build a /proc/getpid()/fd/fd string instead of using /proc/self and forcing the client to find out what's the right PID? Anyway, even if we want to avoid file-descriptor and /proc tricks, we can still use the interface you suggest. Then we wouldn't need to have any filename assumptions: the filenames could be completly random, as they would be reported using the new monitor command. Opps, yes of course. I did intend that client apps could use the files, so I should have used /proc/$PID and not /proc/self By returning info via a monitor command you also avoid hardcoding the use of 1 single file for all of memory. You also avoid hardcoding the fact that QEMU stores the nodes in contiguous order inside the node. eg QEMU could easily return data like this $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/4, offset=0G, length=1G or more ingeneous options Sounds good. -- Eduardo -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info
[RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
Resending series, after fixing some coding style issues. Does anybody has any feedback about this proposal? Changes v1 - v2: - Coding style fixes Original cover letter: I was investigating if there are any mechanisms that allow manually pinning of guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and noticed that -mem-path could be used for that, except that it currently removes any files it creates (using mkstemp()) immediately, not allowing numactl to be used on the backing files, as a result. This patches add a -keep-mem-path-files option to make QEMU create the files inside -mem-path with more predictable names, and not remove them after creation. Some previous discussions about the subject, for reference: - Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 - Message-ID: 4c7d7c2a.7000...@codemonkey.ws http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 A more recent thread can be found at: - Message-ID: 20111029184502.gh11...@in.ibm.com http://article.gmane.org/gmane.comp.emulators.qemu/123001 Note that this is just a mechanism to facilitate manual static binding using numactl on hugetlbfs later, for optimization. This may be especially useful for single large multi-node guests use-cases (and, of course, has to be used with care). I don't know if it is a good idea to use the memory range names as a publicly- visible interface. Another option may be to use a single file instead, and mmap different regions inside the same file for each memory region. I an open to comments and suggestions. Example (untested) usage to bind manually each half of the RAM of a guest to a different NUMA node: $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram Eduardo Habkost (6): file_ram_alloc(): coding style fixes file_ram_alloc(): use g_strdup_printf() instead of asprintf() vl.c: change mem_prealloc to bool (v2) file_ram_alloc: change length argument to size_t (v2) file_ram_alloc(): extract temporary-file creation code to separate function (v2) add -keep-mem-path-files option (v2) cpu-all.h |3 ++- exec.c | 68 +++ qemu-options.hx | 12 ++ vl.c|9 ++-- 4 files changed, 75 insertions(+), 17 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote: Resending series, after fixing some coding style issues. Does anybody has any feedback about this proposal? Changes v1 - v2: - Coding style fixes Original cover letter: I was investigating if there are any mechanisms that allow manually pinning of guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and noticed that -mem-path could be used for that, except that it currently removes any files it creates (using mkstemp()) immediately, not allowing numactl to be used on the backing files, as a result. This patches add a -keep-mem-path-files option to make QEMU create the files inside -mem-path with more predictable names, and not remove them after creation. Some previous discussions about the subject, for reference: - Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 - Message-ID: 4c7d7c2a.7000...@codemonkey.ws http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 A more recent thread can be found at: - Message-ID: 20111029184502.gh11...@in.ibm.com http://article.gmane.org/gmane.comp.emulators.qemu/123001 Note that this is just a mechanism to facilitate manual static binding using numactl on hugetlbfs later, for optimization. This may be especially useful for single large multi-node guests use-cases (and, of course, has to be used with care). I don't know if it is a good idea to use the memory range names as a publicly- visible interface. Another option may be to use a single file instead, and mmap different regions inside the same file for each memory region. I an open to comments and suggestions. Example (untested) usage to bind manually each half of the RAM of a guest to a different NUMA node: $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram I'd suggest that instead of making the memory file name into a public ABI QEMU needs to maintain, QEMU could expose the info via a monitor command. eg $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/3, offset=1G, length=1G This example takes advantage of the fact that with Linux, you can still access a deleted file via /proc/self/fd/NNN, which AFAICT, would avoid the need for a --keep-mem-path-files. By returning info via a monitor command you also avoid hardcoding the use of 1 single file for all of memory. You also avoid hardcoding the fact that QEMU stores the nodes in contiguous order inside the node. eg QEMU could easily return data like this $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/4, offset=0G, length=1G or more ingeneous options Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote: On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote: Resending series, after fixing some coding style issues. Does anybody has any feedback about this proposal? Changes v1 - v2: - Coding style fixes Original cover letter: I was investigating if there are any mechanisms that allow manually pinning of guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and noticed that -mem-path could be used for that, except that it currently removes any files it creates (using mkstemp()) immediately, not allowing numactl to be used on the backing files, as a result. This patches add a -keep-mem-path-files option to make QEMU create the files inside -mem-path with more predictable names, and not remove them after creation. Some previous discussions about the subject, for reference: - Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684 - Message-ID: 4c7d7c2a.7000...@codemonkey.ws http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835 A more recent thread can be found at: - Message-ID: 20111029184502.gh11...@in.ibm.com http://article.gmane.org/gmane.comp.emulators.qemu/123001 Note that this is just a mechanism to facilitate manual static binding using numactl on hugetlbfs later, for optimization. This may be especially useful for single large multi-node guests use-cases (and, of course, has to be used with care). I don't know if it is a good idea to use the memory range names as a publicly- visible interface. Another option may be to use a single file instead, and mmap different regions inside the same file for each memory region. I an open to comments and suggestions. Example (untested) usage to bind manually each half of the RAM of a guest to a different NUMA node: $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram I'd suggest that instead of making the memory file name into a public ABI QEMU needs to maintain, QEMU could expose the info via a monitor command. eg $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/3, offset=1G, length=1G This example takes advantage of the fact that with Linux, you can still access a deleted file via /proc/self/fd/NNN, which AFAICT, would avoid the need for a --keep-mem-path-files. I like the suggestion. But other processes still need to be able to open those files if we want to do anything useful with them. In this case, I guess it's better to let QEMU itself build a /proc/getpid()/fd/fd string instead of using /proc/self and forcing the client to find out what's the right PID? Anyway, even if we want to avoid file-descriptor and /proc tricks, we can still use the interface you suggest. Then we wouldn't need to have any filename assumptions: the filenames could be completly random, as they would be reported using the new monitor command. By returning info via a monitor command you also avoid hardcoding the use of 1 single file for all of memory. You also avoid hardcoding the fact that QEMU stores the nodes in contiguous order inside the node. eg QEMU could easily return data like this $ qemu-system-x86_64 [...] -m 2048 -smp 4 \ -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \ -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \ -monitor stdio (qemu) info mem-nodes node0: file=/proc/self/fd/3, offset=0G, length=1G node1: file=/proc/self/fd/4, offset=0G, length=1G or more ingeneous options Sounds good. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html