Re: [libvirt] [PATCH 5/6] qemuProcessHook: Call virNuma*() iff needed

2015-03-30 Thread Ján Tomko
On Fri, Mar 27, 2015 at 03:26:53PM +0100, Michal Privoznik wrote:
 Once upon a time, there was a little domain. And the domain was pinned
 onto a NUMA node and hasn't fully allocated its memory:
 
   memory unit='KiB'2355200/memory
   currentMemory unit='KiB'1048576/currentMemory
 
   numatune
 memory mode='strict' nodeset='0'/
   /numatune
 
 Oh little me, said the domain, what will I do with so few memory.

s/few/little/

 If I only had a few megabytes more. But the old admin noticed
 whimpering, barely audible to untrained human ear. And good admin he

the whimpering
the good admin (?)

 was, he gave the domain yet more memory. But the old NUMA topology
 witch forbidden to allocate more memory on the node zero. So he

forbidden - forbade or forbid

 decided to allocate it on a different node:
 
 virsh # numatune little_domain --nodeset 0-1
 
 virsh # setmem little_domain 2355200
 
 The little domain was happy. For a while. Until bad, sharp teeth

a bad? the bad?

 shaped creature came. Every process in the system was afraid of him.
 The OOM Killer they called him. Oh no, he's after the little domain.
 There's no escape.
 
 Do you kids know why? Because when the little domain was born, her
 father, Libvirt, called numa_set_membind(). So even if the admin
 allowed her to allocate memory from other nodes in the cgroups, the
 membind() forbid it.
 
 So what's the lesson? Libvirt should rely on cgroups, whenever
 possible and use numa_set_membind() as the last ditch effort.
 
 Signed-off-by: Michal Privoznik mpriv...@redhat.com
 ---
  src/qemu/qemu_process.c | 32 
  1 file changed, 28 insertions(+), 4 deletions(-)
 

I don't have much experience proofreading children's books,
but the logic looks okay to me.

 diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
 index 79f763e..cba042d 100644
 --- a/src/qemu/qemu_process.c
 +++ b/src/qemu/qemu_process.c
 @@ -3154,6 +3154,7 @@ static int qemuProcessHook(void *data)
  int fd;
  virBitmapPtr nodeset = NULL;
  virDomainNumatuneMemMode mode;
 +bool doNuma = true;
  
  /* This method cannot use any mutexes, which are not
   * protected across fork()
 @@ -3185,11 +3186,34 @@ static int qemuProcessHook(void *data)
  goto cleanup;
  
  mode = virDomainNumatuneGetMode(h-vm-def-numa, -1);
 -nodeset = virDomainNumatuneGetNodeset(h-vm-def-numa,
 -  priv-autoNodeset, -1);
  
 -if (virNumaSetupMemoryPolicy(mode, nodeset)  0)
 -goto cleanup;
 +if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) {
 +virCgroupPtr cgroup = NULL;
 +
 +/* Create dummy cgroup ... */
 +if (virCgroupNewSelf(cgroup)  0)
 +goto cleanup;

The domain's cgroup is accessible under priv-cgroup here, you can use
that one instead.

 +
 +/* ... just to detect if cpuset cgroup is present */
 +if (virCgroupHasController(cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) {
 +/* Because if it's not, we must use virNuma* APIs to bind
 + * memory onto desired nodes. CGroup way is preferred, as
 + * it allows runtime tuning, while virNuma - well, once
 + * set and child (qemu) is exec()-ed, we can't do
 + * anything about the settings. virNuma* does not take
 + * any PID argument after all. */

Can this comment be shortened?

Jan


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 5/6] qemuProcessHook: Call virNuma*() iff needed

2015-03-30 Thread Michal Privoznik
On 30.03.2015 15:15, Ján Tomko wrote:
 On Fri, Mar 27, 2015 at 03:26:53PM +0100, Michal Privoznik wrote:
 Once upon a time, there was a little domain. And the domain was pinned
 onto a NUMA node and hasn't fully allocated its memory:

   memory unit='KiB'2355200/memory
   currentMemory unit='KiB'1048576/currentMemory

   numatune
 memory mode='strict' nodeset='0'/
   /numatune

 Oh little me, said the domain, what will I do with so few memory.
 
 s/few/little/
 
 If I only had a few megabytes more. But the old admin noticed
 whimpering, barely audible to untrained human ear. And good admin he
 
 the whimpering
 the good admin (?)
 
 was, he gave the domain yet more memory. But the old NUMA topology
 witch forbidden to allocate more memory on the node zero. So he
 
 forbidden - forbade or forbid
 
 decided to allocate it on a different node:

 virsh # numatune little_domain --nodeset 0-1

 virsh # setmem little_domain 2355200

 The little domain was happy. For a while. Until bad, sharp teeth
 
 a bad? the bad?
 
 shaped creature came. Every process in the system was afraid of him.
 The OOM Killer they called him. Oh no, he's after the little domain.
 There's no escape.

 Do you kids know why? Because when the little domain was born, her
 father, Libvirt, called numa_set_membind(). So even if the admin
 allowed her to allocate memory from other nodes in the cgroups, the
 membind() forbid it.

 So what's the lesson? Libvirt should rely on cgroups, whenever
 possible and use numa_set_membind() as the last ditch effort.

 Signed-off-by: Michal Privoznik mpriv...@redhat.com
 ---
  src/qemu/qemu_process.c | 32 
  1 file changed, 28 insertions(+), 4 deletions(-)

 
 I don't have much experience proofreading children's books,
 but the logic looks okay to me.
 
 diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
 index 79f763e..cba042d 100644
 --- a/src/qemu/qemu_process.c
 +++ b/src/qemu/qemu_process.c
 @@ -3154,6 +3154,7 @@ static int qemuProcessHook(void *data)
  int fd;
  virBitmapPtr nodeset = NULL;
  virDomainNumatuneMemMode mode;
 +bool doNuma = true;
  
  /* This method cannot use any mutexes, which are not
   * protected across fork()
 @@ -3185,11 +3186,34 @@ static int qemuProcessHook(void *data)
  goto cleanup;
  
  mode = virDomainNumatuneGetMode(h-vm-def-numa, -1);
 -nodeset = virDomainNumatuneGetNodeset(h-vm-def-numa,
 -  priv-autoNodeset, -1);
  
 -if (virNumaSetupMemoryPolicy(mode, nodeset)  0)
 -goto cleanup;
 +if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) {
 +virCgroupPtr cgroup = NULL;
 +
 +/* Create dummy cgroup ... */
 +if (virCgroupNewSelf(cgroup)  0)
 +goto cleanup;
 
 The domain's cgroup is accessible under priv-cgroup here, you can use
 that one instead.

Good point, I though that CGroups are created later in the process. But
due to handshaking with child, we are guaranteed that we can access
priv-cgroup. Will send v2 in a while.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 5/6] qemuProcessHook: Call virNuma*() iff needed

2015-03-27 Thread Michal Privoznik
Once upon a time, there was a little domain. And the domain was pinned
onto a NUMA node and hasn't fully allocated its memory:

  memory unit='KiB'2355200/memory
  currentMemory unit='KiB'1048576/currentMemory

  numatune
memory mode='strict' nodeset='0'/
  /numatune

Oh little me, said the domain, what will I do with so few memory.
If I only had a few megabytes more. But the old admin noticed
whimpering, barely audible to untrained human ear. And good admin he
was, he gave the domain yet more memory. But the old NUMA topology
witch forbidden to allocate more memory on the node zero. So he
decided to allocate it on a different node:

virsh # numatune little_domain --nodeset 0-1

virsh # setmem little_domain 2355200

The little domain was happy. For a while. Until bad, sharp teeth
shaped creature came. Every process in the system was afraid of him.
The OOM Killer they called him. Oh no, he's after the little domain.
There's no escape.

Do you kids know why? Because when the little domain was born, her
father, Libvirt, called numa_set_membind(). So even if the admin
allowed her to allocate memory from other nodes in the cgroups, the
membind() forbid it.

So what's the lesson? Libvirt should rely on cgroups, whenever
possible and use numa_set_membind() as the last ditch effort.

Signed-off-by: Michal Privoznik mpriv...@redhat.com
---
 src/qemu/qemu_process.c | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 79f763e..cba042d 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -3154,6 +3154,7 @@ static int qemuProcessHook(void *data)
 int fd;
 virBitmapPtr nodeset = NULL;
 virDomainNumatuneMemMode mode;
+bool doNuma = true;
 
 /* This method cannot use any mutexes, which are not
  * protected across fork()
@@ -3185,11 +3186,34 @@ static int qemuProcessHook(void *data)
 goto cleanup;
 
 mode = virDomainNumatuneGetMode(h-vm-def-numa, -1);
-nodeset = virDomainNumatuneGetNodeset(h-vm-def-numa,
-  priv-autoNodeset, -1);
 
-if (virNumaSetupMemoryPolicy(mode, nodeset)  0)
-goto cleanup;
+if (mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) {
+virCgroupPtr cgroup = NULL;
+
+/* Create dummy cgroup ... */
+if (virCgroupNewSelf(cgroup)  0)
+goto cleanup;
+
+/* ... just to detect if cpuset cgroup is present */
+if (virCgroupHasController(cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) {
+/* Because if it's not, we must use virNuma* APIs to bind
+ * memory onto desired nodes. CGroup way is preferred, as
+ * it allows runtime tuning, while virNuma - well, once
+ * set and child (qemu) is exec()-ed, we can't do
+ * anything about the settings. virNuma* does not take
+ * any PID argument after all. */
+doNuma = false;
+}
+virCgroupFree(cgroup);
+}
+
+if (doNuma) {
+nodeset = virDomainNumatuneGetNodeset(h-vm-def-numa,
+  priv-autoNodeset, -1);
+
+if (virNumaSetupMemoryPolicy(mode, nodeset)  0)
+goto cleanup;
+}
 
 ret = 0;
 
-- 
2.0.5

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list