Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-03 Thread Daniel P. Berrange
On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote:
 On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
  On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
   Resending series, after fixing some coding style issues. Does anybody has 
   any
   feedback about this proposal?
   
   Changes v1 - v2:
- Coding style fixes
   
   Original cover letter:
   
   I was investigating if there are any mechanisms that allow manually 
   pinning of
   guest RAM to specific host NUMA nodes, in the case of multi-node KVM 
   guests, and
   noticed that -mem-path could be used for that, except that it currently 
   removes
   any files it creates (using mkstemp()) immediately, not allowing numactl 
   to be
   used on the backing files, as a result. This patches add a 
   -keep-mem-path-files
   option to make QEMU create the files inside -mem-path with more 
   predictable
   names, and not remove them after creation.
   
   Some previous discussions about the subject, for reference:
- Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
- Message-ID: 4c7d7c2a.7000...@codemonkey.ws
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
   
   A more recent thread can be found at:
- Message-ID: 20111029184502.gh11...@in.ibm.com
  http://article.gmane.org/gmane.comp.emulators.qemu/123001
   
   Note that this is just a mechanism to facilitate manual static binding 
   using
   numactl on hugetlbfs later, for optimization. This may be especially 
   useful for
   single large multi-node guests use-cases (and, of course, has to be used 
   with
   care).
   
   I don't know if it is a good idea to use the memory range names as a 
   publicly-
   visible interface. Another option may be to use a single file instead, 
   and mmap
   different regions inside the same file for each memory region. I an open 
   to
   comments and suggestions.
   
   Example (untested) usage to bind manually each half of the RAM of a guest 
   to a
   different NUMA node:
   
$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
  -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
  -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
$ numactl --offset=1G --length=1G --membind=1 --file 
   /mnt/hugetlbfs/FOO/pc.ram
$ numactl --offset=0  --length=1G --membind=2 --file 
   /mnt/hugetlbfs/FOO/pc.ram
  
  I'd suggest that instead of making the memory file name into a
  public ABI QEMU needs to maintain, QEMU could expose the info
  via a monitor command. eg
  
 $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
   -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
   -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
   -monitor stdio
 (qemu) info mem-nodes
  node0: file=/proc/self/fd/3, offset=0G, length=1G
  node1: file=/proc/self/fd/3, offset=1G, length=1G
  
  This example takes advantage of the fact that with Linux, you can
  still access a deleted file via /proc/self/fd/NNN, which AFAICT,
  would avoid the need for a --keep-mem-path-files.
 
 I like the suggestion.
 
 But other processes still need to be able to open those files if we want
 to do anything useful with them. In this case, I guess it's better to
 let QEMU itself build a /proc/getpid()/fd/fd string instead of
 using /proc/self and forcing the client to find out what's the right
 PID?
 
 Anyway, even if we want to avoid file-descriptor and /proc tricks, we
 can still use the interface you suggest. Then we wouldn't need to have
 any filename assumptions: the filenames could be completly random, as
 they would be reported using the new monitor command.

Opps, yes of course. I did intend that client apps could use the
files, so I should have used  /proc/$PID and not /proc/self

 
  
  By returning info via a monitor command you also avoid hardcoding
  the use of 1 single file for all of memory. You also avoid hardcoding
  the fact that QEMU stores the nodes in contiguous order inside the
  node. eg QEMU could easily return data like this
  
  
 $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
   -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
   -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
   -monitor stdio
 (qemu) info mem-nodes
  node0: file=/proc/self/fd/3, offset=0G, length=1G
  node1: file=/proc/self/fd/4, offset=0G, length=1G
  
  or more ingeneous options
 
 Sounds good.
 
 -- 
 Eduardo

-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info 

[RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-02 Thread Eduardo Habkost
Resending series, after fixing some coding style issues. Does anybody has any
feedback about this proposal?

Changes v1 - v2:
 - Coding style fixes

Original cover letter:

I was investigating if there are any mechanisms that allow manually pinning of
guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and
noticed that -mem-path could be used for that, except that it currently removes
any files it creates (using mkstemp()) immediately, not allowing numactl to be
used on the backing files, as a result. This patches add a -keep-mem-path-files
option to make QEMU create the files inside -mem-path with more predictable
names, and not remove them after creation.

Some previous discussions about the subject, for reference:
 - Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com
   http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
 - Message-ID: 4c7d7c2a.7000...@codemonkey.ws
   http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835

A more recent thread can be found at:
 - Message-ID: 20111029184502.gh11...@in.ibm.com
   http://article.gmane.org/gmane.comp.emulators.qemu/123001

Note that this is just a mechanism to facilitate manual static binding using
numactl on hugetlbfs later, for optimization. This may be especially useful for
single large multi-node guests use-cases (and, of course, has to be used with
care).

I don't know if it is a good idea to use the memory range names as a publicly-
visible interface. Another option may be to use a single file instead, and mmap
different regions inside the same file for each memory region. I an open to
comments and suggestions.

Example (untested) usage to bind manually each half of the RAM of a guest to a
different NUMA node:

 $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
   -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
   -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
 $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram
 $ numactl --offset=0  --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram


Eduardo Habkost (6):
  file_ram_alloc(): coding style fixes
  file_ram_alloc(): use g_strdup_printf() instead of asprintf()
  vl.c: change mem_prealloc to bool (v2)
  file_ram_alloc: change length argument to size_t (v2)
  file_ram_alloc(): extract temporary-file creation code to separate
function (v2)
  add -keep-mem-path-files option (v2)

 cpu-all.h   |3 ++-
 exec.c  |   68 +++
 qemu-options.hx |   12 ++
 vl.c|9 ++--
 4 files changed, 75 insertions(+), 17 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-02 Thread Daniel P. Berrange
On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
 Resending series, after fixing some coding style issues. Does anybody has any
 feedback about this proposal?
 
 Changes v1 - v2:
  - Coding style fixes
 
 Original cover letter:
 
 I was investigating if there are any mechanisms that allow manually pinning of
 guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, 
 and
 noticed that -mem-path could be used for that, except that it currently 
 removes
 any files it creates (using mkstemp()) immediately, not allowing numactl to be
 used on the backing files, as a result. This patches add a 
 -keep-mem-path-files
 option to make QEMU create the files inside -mem-path with more predictable
 names, and not remove them after creation.
 
 Some previous discussions about the subject, for reference:
  - Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com
http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
  - Message-ID: 4c7d7c2a.7000...@codemonkey.ws
http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
 
 A more recent thread can be found at:
  - Message-ID: 20111029184502.gh11...@in.ibm.com
http://article.gmane.org/gmane.comp.emulators.qemu/123001
 
 Note that this is just a mechanism to facilitate manual static binding using
 numactl on hugetlbfs later, for optimization. This may be especially useful 
 for
 single large multi-node guests use-cases (and, of course, has to be used with
 care).
 
 I don't know if it is a good idea to use the memory range names as a publicly-
 visible interface. Another option may be to use a single file instead, and 
 mmap
 different regions inside the same file for each memory region. I an open to
 comments and suggestions.
 
 Example (untested) usage to bind manually each half of the RAM of a guest to a
 different NUMA node:
 
  $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
-mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
  $ numactl --offset=1G --length=1G --membind=1 --file 
 /mnt/hugetlbfs/FOO/pc.ram
  $ numactl --offset=0  --length=1G --membind=2 --file 
 /mnt/hugetlbfs/FOO/pc.ram

I'd suggest that instead of making the memory file name into a
public ABI QEMU needs to maintain, QEMU could expose the info
via a monitor command. eg

   $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
 -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
 -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
 -monitor stdio
   (qemu) info mem-nodes
node0: file=/proc/self/fd/3, offset=0G, length=1G
node1: file=/proc/self/fd/3, offset=1G, length=1G

This example takes advantage of the fact that with Linux, you can
still access a deleted file via /proc/self/fd/NNN, which AFAICT,
would avoid the need for a --keep-mem-path-files.

By returning info via a monitor command you also avoid hardcoding
the use of 1 single file for all of memory. You also avoid hardcoding
the fact that QEMU stores the nodes in contiguous order inside the
node. eg QEMU could easily return data like this


   $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
 -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
 -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
 -monitor stdio
   (qemu) info mem-nodes
node0: file=/proc/self/fd/3, offset=0G, length=1G
node1: file=/proc/self/fd/4, offset=0G, length=1G

or more ingeneous options

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)

2012-07-02 Thread Eduardo Habkost
On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
 On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
  Resending series, after fixing some coding style issues. Does anybody has 
  any
  feedback about this proposal?
  
  Changes v1 - v2:
   - Coding style fixes
  
  Original cover letter:
  
  I was investigating if there are any mechanisms that allow manually pinning 
  of
  guest RAM to specific host NUMA nodes, in the case of multi-node KVM 
  guests, and
  noticed that -mem-path could be used for that, except that it currently 
  removes
  any files it creates (using mkstemp()) immediately, not allowing numactl to 
  be
  used on the backing files, as a result. This patches add a 
  -keep-mem-path-files
  option to make QEMU create the files inside -mem-path with more predictable
  names, and not remove them after creation.
  
  Some previous discussions about the subject, for reference:
   - Message-ID: 1281534738-8310-1-git-send-email-andre.przyw...@amd.com
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
   - Message-ID: 4c7d7c2a.7000...@codemonkey.ws
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
  
  A more recent thread can be found at:
   - Message-ID: 20111029184502.gh11...@in.ibm.com
 http://article.gmane.org/gmane.comp.emulators.qemu/123001
  
  Note that this is just a mechanism to facilitate manual static binding using
  numactl on hugetlbfs later, for optimization. This may be especially useful 
  for
  single large multi-node guests use-cases (and, of course, has to be used 
  with
  care).
  
  I don't know if it is a good idea to use the memory range names as a 
  publicly-
  visible interface. Another option may be to use a single file instead, and 
  mmap
  different regions inside the same file for each memory region. I an open to
  comments and suggestions.
  
  Example (untested) usage to bind manually each half of the RAM of a guest 
  to a
  different NUMA node:
  
   $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
 -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
 -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
   $ numactl --offset=1G --length=1G --membind=1 --file 
  /mnt/hugetlbfs/FOO/pc.ram
   $ numactl --offset=0  --length=1G --membind=2 --file 
  /mnt/hugetlbfs/FOO/pc.ram
 
 I'd suggest that instead of making the memory file name into a
 public ABI QEMU needs to maintain, QEMU could expose the info
 via a monitor command. eg
 
$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
  -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
  -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
  -monitor stdio
(qemu) info mem-nodes
 node0: file=/proc/self/fd/3, offset=0G, length=1G
 node1: file=/proc/self/fd/3, offset=1G, length=1G
 
 This example takes advantage of the fact that with Linux, you can
 still access a deleted file via /proc/self/fd/NNN, which AFAICT,
 would avoid the need for a --keep-mem-path-files.

I like the suggestion.

But other processes still need to be able to open those files if we want
to do anything useful with them. In this case, I guess it's better to
let QEMU itself build a /proc/getpid()/fd/fd string instead of
using /proc/self and forcing the client to find out what's the right
PID?

Anyway, even if we want to avoid file-descriptor and /proc tricks, we
can still use the interface you suggest. Then we wouldn't need to have
any filename assumptions: the filenames could be completly random, as
they would be reported using the new monitor command.

 
 By returning info via a monitor command you also avoid hardcoding
 the use of 1 single file for all of memory. You also avoid hardcoding
 the fact that QEMU stores the nodes in contiguous order inside the
 node. eg QEMU could easily return data like this
 
 
$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
  -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
  -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
  -monitor stdio
(qemu) info mem-nodes
 node0: file=/proc/self/fd/3, offset=0G, length=1G
 node1: file=/proc/self/fd/4, offset=0G, length=1G
 
 or more ingeneous options

Sounds good.

-- 
Eduardo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html