[libvirt] [RFC PATCH v2 1/1] qemu: host NUMA hugepage policy without guest NUMA

2016-10-30 Thread Sam Bobroff
At the moment, guests that are backed by hugepages in the host are
only able to use policy to control the placement of those hugepages
on a per-(guest-)CPU basis. Policy applied globally is ignored.

Such guests would use  and
a  block with  but no  elements.

This patch corrects this by, in this specific case, changing the QEMU
command line from "-mem-prealloc -mem-path=..." (which cannot
specify NUMA policy) to "-object memory-backend-file ..." (which can).

Note: This is not visible to the guest and does not appear to create
a migration incompatibility.

Signed-off-by: Sam Bobroff 
---
Hello libvirt community,

This patch is a RFC of an attempt to understand and fix a problem with NUMA
memory placement policy being ignored in a specific situation.

Previous discussion on this issue:
https://www.redhat.com/archives/libvir-list/2016-October/msg00033.html
https://www.redhat.com/archives/libvir-list/2016-October/msg00514.html

(Sorry, this is a long description because the situation is complicated and
there are several ways of approaching it and I need to explain my reasoning for
choosing this specific method.)

The issue I'm trying to fix:

The problem is apparent when using this combination of features:
* The guest is backed by hugepages in the host.
* The backing memory is constrained by NUMA memory placement policy.
* The guest either does not use NUMA nodes at all, or does but does not specify 
a per-node policy.

The guest XML would therefore contain blocks similar to the following:




(And does not use ... .)

(Policy works correctly without hugepages, or when memnode is used.)

What happens is that the guest runs successfully but the NUMA memory placement
policy is ignored, allowing memory to be allocated on any (host) node.

Attempted solutions:

The most obvious approach to me is a trivial patch to
qemuBuildMemoryBackendStr().  The test that checks to see if a memory-backend
is necessary (around line 3332 in qemu_command.c) is this:

/* If none of the following is requested... */
if (!needHugepage && !userNodeset && !memAccess && !nodeSpecified && 
!force) {
/* report back that using the new backend is not necessary
 * to achieve the desired configuration */
ret = 1;
} else {

This test does not consider the nodemask, so when no other reason exists to use
a memory-backend (which is the case in point) the policy is lost.

Adding nodemask to this test appears to easily fixe the issue. However:
* It only works when the guest uses NUMA nodes (because otherwise the whole
  function is skipped at a higher level).
* When the guest does use NUMA nodes, it causes a memory-backend object to be
  added to each one (when they did not previously have one) and this causes
  migration incompatability.

So it seems worthwhile to look for other options.

Other ways of controlling NUMA memory placement are numactl (libnuma) or
cgroups but in practice neither can work in this situation due to the
interaction of several features:
* numactl and cgroups can only apply policy to all memory allocated by that 
thread.
* When a guest is backed by hugepages, memory needs to be preallcated.
* QEMU performs memory pre-allocation from it's main thread, not the VCPU 
threads.

This seems to mean that policy applied to the QEMU VCPU threads does not apply
to pre-allocated memory and policy applied to the QEMU main thread incorrectly
applies to all of QEMU's allocations, not just the VCPU ones. My testing seems 
to
confirm this.

If that is true, then neither of those approaches can work (at least, not
without changes to QEMU).

At this point it appeared that the only ways to proceed would all require a
migration-breaking change, so I started to look at QEMU to see if that could be
worked around somehow.

... another option:

While investigating QEMU I discovered that it is possible to use a QEMU memory
backend object without attaching it to any guest NUMA node and when it is used
that way it does not break migration. This is (to me at least) a new way of
using a memory-backend object, but it appears to directly replace -mem-path and
-mem-prealloc and additionally allow NUMA memory placement policy to be
specified. (The ID field is still required even though it is not used.)

For example, this...

-m 4G -mem-path /dev/hugepages -mem-prealloc

... seems to be equavalent to this:

-m 4G -object 
memory-backend-file,id=xxx,prealloc=yes,path=/dev/hugepages,size=4G

But this is now possible as well:

-m 4G -object 
memory-backend-file,id=xxx,prealloc=yes,path=/dev/hugepages,host-nodes=0-1,size=4G

... so it seems to be a way of solving this issue without breaking migration.

Additionally, there seems to be code specifically in QEMU to prevent the
migration data from changing in this case, which leads me to believe that this
is an intended use of the option but I haven't directly consulted with the QEMU
community.

Impleme

Re: [libvirt] [PATCH 1/1] qemu: host NUMA hugepage policy without guest NUMA

2016-10-26 Thread Sam Bobroff
On Tue, Oct 25, 2016 at 02:13:07PM +0200, Martin Kletzander wrote:
> On Tue, Oct 25, 2016 at 01:10:23PM +1100, Sam Bobroff wrote:
> >On Tue, Oct 18, 2016 at 10:43:31PM +0200, Martin Kletzander wrote:
> >>On Mon, Oct 17, 2016 at 03:45:09PM +1100, Sam Bobroff wrote:
> >>>On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote:
> >>>>On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote:
> >>>>>I did look at the libnuma and cgroups approaches, but I was concerned 
> >>>>>they
> >>>>>wouldn't work in this case, because of the way QEMU allocates memory when
> >>>>>mem-prealloc is used: the memory is allocated in the main process, 
> >>>>>before the
> >>>>>CPU threads are created. (This is based only on a bit of hacking and 
> >>>>>debugging
> >>>>>in QEMU, but it does seem explain the behaviour I've seen so far.)
> >>>>>
> >>>>
> >>>>But we use numactl before QEMU is exec()'d.
> >>>
> >>>Sorry, I jumped ahead a bit. I'll try to explain what I mean:
> >>>
> >>>I think the problem with using this method would be that the NUMA policy is
> >>>applied to all allocations by QEMU, not just ones related to the memory
> >>>backing. I'm not sure if that would cause a serious problem but it seems 
> >>>untidy,
> >>>and it doesn't happen in other situations (i.e. with separate memory 
> >>>backend
> >>>objects, QEMU sets up the policy specifically for each one and other
> >>>allocations aren't affected, AFAIK).  Presumably, if memory were very
> >>>restricted it could prevent the guest from starting.
> >>>
> >>
> >>Yes, it is, that's what  does if you don't have any
> >>other () specifics set.
> >>
> >>>>>I think QEMU could be altered to move the preallocations into the VCPU
> >>>>>threads but it didn't seem trivial and I suspected the QEMU community 
> >>>>>would
> >>>>>point out that there was already a way to do it using backend objects.  
> >>>>>Another
> >>>>>option would be to add a -host-nodes parameter to QEMU so that the 
> >>>>>policy can
> >>>>>be given without adding a memory backend object. (That seems like a more
> >>>>>reasonable change to QEMU.)
> >>>>>
> >>>>
> >>>>I think upstream won't like that, mostly because there is already a
> >>>>way.  And that is using memory-backend object.  I think we could just
> >>>>use that and disable changing it live.  But upstream will probably want
> >>>>that to be configurable or something.
> >>>
> >>>Right, but isn't this already an issue in the cases where libvirt is 
> >>>already
> >>>using memory backend objects and NUMA policy? (Or does libvirt already 
> >>>disable
> >>>changing it live in those situations?)
> >>>
> >>
> >>It is.  I'm not trying to say libvirt is perfect.  There are bugs,
> >>e.g. like this one.  The problem is that we tried to do *everything*,
> >>but it's not currently possible.  I'm trying to explain how stuff works
> >>now.  It definitely needs some fixing, though.
> >
> >OK :-)
> >
> >Well, given our discussion, do you think it's worth a v2 of my original patch
> >or would it be better to drop it in favour of some broader change?
> >
> 
> Honestly, I thought about the approaches so much I'm now not sure I'll
> make a good decision.  RFC could do.  If I were to pick, I would go with
> a new setting that would control whether we want the binding to be
> changeable throughout the domain's lifetime or not so that we can make
> better decisions (and don't feel bad about the bad ones).

I feel the same way.

OK, I'll try an RFC patch with a lot of description.

I'm specifically trying to address the issue I originally raised, which isn't
quite the same thing as the changeability of the bindings but I'll keep that in
mind. I think your point about changing the bindings will apply in the
same way whenever QEMU's memory-backend objects are used with their
"host-nodes" attribute (since they are what causes QEMU to apply policy), so I
don't think I'm suggesting any significant change there.

If you want to add the new setting you mention above, I'd be happy to base my
patch of top of that work. ;-)

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/1] qemu: host NUMA hugepage policy without guest NUMA

2016-10-24 Thread Sam Bobroff
On Tue, Oct 18, 2016 at 10:43:31PM +0200, Martin Kletzander wrote:
> On Mon, Oct 17, 2016 at 03:45:09PM +1100, Sam Bobroff wrote:
> >On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote:
> >>On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote:
> >>>I did look at the libnuma and cgroups approaches, but I was concerned they
> >>>wouldn't work in this case, because of the way QEMU allocates memory when
> >>>mem-prealloc is used: the memory is allocated in the main process, before 
> >>>the
> >>>CPU threads are created. (This is based only on a bit of hacking and 
> >>>debugging
> >>>in QEMU, but it does seem explain the behaviour I've seen so far.)
> >>>
> >>
> >>But we use numactl before QEMU is exec()'d.
> >
> >Sorry, I jumped ahead a bit. I'll try to explain what I mean:
> >
> >I think the problem with using this method would be that the NUMA policy is
> >applied to all allocations by QEMU, not just ones related to the memory
> >backing. I'm not sure if that would cause a serious problem but it seems 
> >untidy,
> >and it doesn't happen in other situations (i.e. with separate memory backend
> >objects, QEMU sets up the policy specifically for each one and other
> >allocations aren't affected, AFAIK).  Presumably, if memory were very
> >restricted it could prevent the guest from starting.
> >
> 
> Yes, it is, that's what  does if you don't have any
> other () specifics set.
> 
> >>>I think QEMU could be altered to move the preallocations into the VCPU
> >>>threads but it didn't seem trivial and I suspected the QEMU community would
> >>>point out that there was already a way to do it using backend objects.  
> >>>Another
> >>>option would be to add a -host-nodes parameter to QEMU so that the policy 
> >>>can
> >>>be given without adding a memory backend object. (That seems like a more
> >>>reasonable change to QEMU.)
> >>>
> >>
> >>I think upstream won't like that, mostly because there is already a
> >>way.  And that is using memory-backend object.  I think we could just
> >>use that and disable changing it live.  But upstream will probably want
> >>that to be configurable or something.
> >
> >Right, but isn't this already an issue in the cases where libvirt is already
> >using memory backend objects and NUMA policy? (Or does libvirt already 
> >disable
> >changing it live in those situations?)
> >
> 
> It is.  I'm not trying to say libvirt is perfect.  There are bugs,
> e.g. like this one.  The problem is that we tried to do *everything*,
> but it's not currently possible.  I'm trying to explain how stuff works
> now.  It definitely needs some fixing, though.

OK :-)

Well, given our discussion, do you think it's worth a v2 of my original patch
or would it be better to drop it in favour of some broader change?

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/1] qemu: host NUMA hugepage policy without guest NUMA

2016-10-16 Thread Sam Bobroff
On Fri, Oct 14, 2016 at 10:19:42AM +0200, Martin Kletzander wrote:
> On Fri, Oct 14, 2016 at 11:52:22AM +1100, Sam Bobroff wrote:
> >On Thu, Oct 13, 2016 at 11:34:43AM +0200, Martin Kletzander wrote:
> >>On Thu, Oct 13, 2016 at 11:34:16AM +1100, Sam Bobroff wrote:
> >>>On Wed, Oct 12, 2016 at 10:27:50AM +0200, Martin Kletzander wrote:
> >>>>On Wed, Oct 12, 2016 at 03:04:53PM +1100, Sam Bobroff wrote:
> >>>>>At the moment, guests that are backed by hugepages in the host are
> >>>>>only able to use policy to control the placement of those hugepages
> >>>>>on a per-(guest-)CPU basis. Policy applied globally is ignored.
> >>>>>
> >>>>>Such guests would use  and
> >>>>>a  block with  but no  >>>>>.../> elements.
> >>>>>
> >>>>>This patch corrects this by, in this specific case, changing the QEMU
> >>>>>command line from "-mem-prealloc -mem-path=..." (which cannot
> >>>>>specify NUMA policy) to "-object memory-backend-file ..." (which can).
> >>>>>
> >>>>>Note: This is not visible to the guest and does not appear to create
> >>>>>a migration incompatibility.
> >>>>>
> >>>>
> >>>>It could make sense, I haven't tried yet, though.  However, I still
> >>>>don't see the point in using memory-backend-file.  Is it that when you
> >>>>don't have cpuset cgroup the allocation doesn't work well?  Because it
> >>>>certainly does work for me.
> >>>
> >>>Thanks for taking a look at this :-)
> >>>
> >>>The point of using a memory-backend-file is that with it, the NUMA policy 
> >>>can
> >>>be specified to QEMU, but with -mem-path it can't. It seems to be a way to 
> >>>tell
> >>>QEMU to apply NUMA policy in the right place. It does seem odd to me to use
> >>>memory-backend-file without attaching the backend to a guest NUMA node, 
> >>>but it
> >>>seems to do the right thing in this case. (If there are guest NUMA nodes, 
> >>>or if
> >>>hugepages aren't being used, policy is correctly applied.)
> >>>
> >>>I'll describe my test case in detail, perhaps there's something I don't 
> >>>understand
> >>>happening.
> >>>
> >>>* I set up a machine with two (fake) NUMA nodes (0 and 1), with 2G of 
> >>>hugepages
> >>> on node 1, and none on node 0.
> >>>
> >>>* I create a 2G guest using virt-install:
> >>>
> >>>virt-install --name ppc --memory=2048 --disk ~/tmp/tmp.qcow2 --cdrom 
> >>>~/tmp/ubuntu-16.04-server-ppc64el.iso --wait 0 --virt-type qemu 
> >>>--memorybacking hugepages=on --graphics vnc --arch ppc64le
> >>>
> >>>* I "virsh destroy" and then "virsh edit" to add this block to the guest 
> >>>XML:
> >>>
> >>> 
> >>>
> >>> 
> >>>
> >>>* "virsh start", and the machine starts (I believe it should fail due to 
> >>>insufficient memory satasfying the policy).
> >>>* "numastat -p $(pidof qemu-system-ppc64)" shows something like this:
> >>>
> >>>Per-node process memory usage (in MBs) for PID 8048 (qemu-system-ppc)
> >>>  Node 0  Node 1   Total
> >>> --- --- ---
> >>>Huge 0.00 2048.00 2048.00
> >>>Heap 8.120.008.12
> >>>Stack0.030.000.03
> >>>Private 35.806.10   41.90
> >>>  --- --- ---
> >>>Total   43.95 2054.10 2098.05
> >>>
> >>>So it looks like it's allocated hugepages from node 1, isn't this 
> >>>violating the
> >>>policy I set via numatune?
> >>>
> >>
> >>Oh, now I get it.  We are doing our best to apply that policy to qemu
> >>even when we don't have this option.  However, using this works even
> >>better (which is probably* what we want).  And that's the reasoning
> >>behind this.
> >>
> &g

Re: [libvirt] [PATCH 1/1] qemu: host NUMA hugepage policy without guest NUMA

2016-10-13 Thread Sam Bobroff
On Thu, Oct 13, 2016 at 11:34:43AM +0200, Martin Kletzander wrote:
> On Thu, Oct 13, 2016 at 11:34:16AM +1100, Sam Bobroff wrote:
> >On Wed, Oct 12, 2016 at 10:27:50AM +0200, Martin Kletzander wrote:
> >>On Wed, Oct 12, 2016 at 03:04:53PM +1100, Sam Bobroff wrote:
> >>>At the moment, guests that are backed by hugepages in the host are
> >>>only able to use policy to control the placement of those hugepages
> >>>on a per-(guest-)CPU basis. Policy applied globally is ignored.
> >>>
> >>>Such guests would use  and
> >>>a  block with  but no  >>>.../> elements.
> >>>
> >>>This patch corrects this by, in this specific case, changing the QEMU
> >>>command line from "-mem-prealloc -mem-path=..." (which cannot
> >>>specify NUMA policy) to "-object memory-backend-file ..." (which can).
> >>>
> >>>Note: This is not visible to the guest and does not appear to create
> >>>a migration incompatibility.
> >>>
> >>
> >>It could make sense, I haven't tried yet, though.  However, I still
> >>don't see the point in using memory-backend-file.  Is it that when you
> >>don't have cpuset cgroup the allocation doesn't work well?  Because it
> >>certainly does work for me.
> >
> >Thanks for taking a look at this :-)
> >
> >The point of using a memory-backend-file is that with it, the NUMA policy can
> >be specified to QEMU, but with -mem-path it can't. It seems to be a way to 
> >tell
> >QEMU to apply NUMA policy in the right place. It does seem odd to me to use
> >memory-backend-file without attaching the backend to a guest NUMA node, but 
> >it
> >seems to do the right thing in this case. (If there are guest NUMA nodes, or 
> >if
> >hugepages aren't being used, policy is correctly applied.)
> >
> >I'll describe my test case in detail, perhaps there's something I don't 
> >understand
> >happening.
> >
> >* I set up a machine with two (fake) NUMA nodes (0 and 1), with 2G of 
> >hugepages
> > on node 1, and none on node 0.
> >
> >* I create a 2G guest using virt-install:
> >
> >virt-install --name ppc --memory=2048 --disk ~/tmp/tmp.qcow2 --cdrom 
> >~/tmp/ubuntu-16.04-server-ppc64el.iso --wait 0 --virt-type qemu 
> >--memorybacking hugepages=on --graphics vnc --arch ppc64le
> >
> >* I "virsh destroy" and then "virsh edit" to add this block to the guest XML:
> >
> > 
> >
> > 
> >
> >* "virsh start", and the machine starts (I believe it should fail due to 
> >insufficient memory satasfying the policy).
> >* "numastat -p $(pidof qemu-system-ppc64)" shows something like this:
> >
> >Per-node process memory usage (in MBs) for PID 8048 (qemu-system-ppc)
> >  Node 0  Node 1   Total
> > --- --- ---
> >Huge 0.00 2048.00 2048.00
> >Heap 8.120.008.12
> >Stack0.030.000.03
> >Private 35.806.10   41.90
> >  --- --- ---
> >Total   43.95 2054.10 2098.05
> >
> >So it looks like it's allocated hugepages from node 1, isn't this violating 
> >the
> >policy I set via numatune?
> >
> 
> Oh, now I get it.  We are doing our best to apply that policy to qemu
> even when we don't have this option.  However, using this works even
> better (which is probably* what we want).  And that's the reasoning
> behind this.
> 
> * I'm saying probably because when I was adding numactl binding to be
>   used together with cgroups, I was told that we couldn't change the
>   binding afterwards and it's bad.  I feel like we could do something
>   with that and it would help us in the future, but there needs to be a
>   discussion, I guess.  Because I might be one of the few =)
> 
> So to recapitulate that, there are three options how to affect the
> allocation of qemu's memory:
> 
> 1) numactl (libnuma): it works as expected, but cannot be changed later
> 
> 2) cgroups: so strict it has to be applied after qemu started, due to
>that it doesn't work right, especially for stuff that gets all
>pre-allocated (like hugepages).  it can be changed later, but it
>won&#

Re: [libvirt] [PATCH 1/1] qemu: host NUMA hugepage policy without guest NUMA

2016-10-12 Thread Sam Bobroff
On Wed, Oct 12, 2016 at 09:48:07AM +0200, Peter Krempa wrote:
> On Wed, Oct 12, 2016 at 15:04:53 +1100, Sam Bobroff wrote:
> > At the moment, guests that are backed by hugepages in the host are
> > only able to use policy to control the placement of those hugepages
> > on a per-(guest-)CPU basis. Policy applied globally is ignored.
> > 
> > Such guests would use  and
> > a  block with  but no  > .../> elements.
> > 
> > This patch corrects this by, in this specific case, changing the QEMU
> > command line from "-mem-prealloc -mem-path=..." (which cannot
> > specify NUMA policy) to "-object memory-backend-file ..." (which can).
> > 
> > Note: This is not visible to the guest and does not appear to create
> > a migration incompatibility.
> > 
> > Signed-off-by: Sam Bobroff 
> > ---
> > There was some discussion leading up to this patch, here:
> > 
> > https://www.redhat.com/archives/libvir-list/2016-October/msg00033.html
> > 
> > During that discussion it seemed that fixing this issue would cause
> > migration incompatibility but after some careful testing, it appears
> > that it only does when a memory-backend object is attached to a guest
> > NUMA node and that is not the case here. If only one is created, and
> > used globally (not attached via mem=), the migration data does not
> > seem to be changed and so it seems reasonable to try something like
> > this patch.
> > 
> > This patch does work for my test cases but I don't claim a deep
> > understanding of the libvirt code so this is at least partly a RFC.
> > Comments welcome :-)
> > 
> > Cheers,
> > Sam.
> > 
> >  src/qemu/qemu_command.c | 55 
> > -
> >  src/qemu/qemu_command.h |  2 +-
> >  2 files changed, 46 insertions(+), 11 deletions(-)
> > 
> > diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> > index 0804133..c28c8f2 100644
> > --- a/src/qemu/qemu_command.c
> > +++ b/src/qemu/qemu_command.c
> > @@ -3143,7 +3143,7 @@ qemuBuildMemoryBackendStr(unsigned long long size,
> >int guestNode,
> >virBitmapPtr userNodeset,
> >virBitmapPtr autoNodeset,
> > -  virDomainDefPtr def,
> > +  const virDomainDefPtr def,
> 
> This does not work as expected. Using 'const' with types that hide the
> pointer makes the pointer const and not the structure that the pointer
> points to.

Ah, of course you're right.

> >virQEMUCapsPtr qemuCaps,
> >virQEMUDriverConfigPtr cfg,
> >const char **backendType,
> > @@ -7129,12 +7129,18 @@ qemuBuildSmpCommandLine(virCommandPtr cmd,
> >  
> >  static int
> >  qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg,
> > -const virDomainDef *def,
> > +const virDomainDefPtr def,
> 
> Same as above. This change is wrong.

OK.

> >  virQEMUCapsPtr qemuCaps,
> > -virCommandPtr cmd)
> > +virCommandPtr cmd,
> > +virBitmapPtr auto_nodeset)
> >  {
> >  const long system_page_size = virGetSystemPageSizeKB();
> >  char *mem_path = NULL;
> > +virBitmapPtr nodemask = NULL;
> > +const char *backendType = NULL;
> > +char *backendStr = NULL;
> > +virJSONValuePtr props;
> > +int rv = -1;
> >  
> >  /*
> >   *  No-op if hugepages were not requested.
> > @@ -7159,18 +7165,47 @@ qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg,
> >  if (qemuGetHupageMemPath(cfg, def->mem.hugepages[0].size, &mem_path) < 
> > 0)
> >  return -1;
> >  
> > -virCommandAddArgList(cmd, "-mem-prealloc", "-mem-path", mem_path, 
> > NULL);
> > +if (virDomainNumatuneMaybeGetNodeset(def->numa, auto_nodeset,
> > + &nodemask, -1) < 0)
> > +return -1;
> > +if (nodemask && virQEMUCapsGet(qemuCaps, 
> > QEMU_CAPS_OBJECT_MEMORY_FILE)) {
> > +props = virJSONValueNewObject();
> 
> This is leaked, qemuBuildMemoryBackendStr allocates one itself.

Ah of course, sorry. I think I can just remove that line.

> > +if (qemuBuildMemoryBackendStr(virDomainDefGetMemoryInitial(def),
> > +  0, -1, NULL, auto_nodeset,

Re: [libvirt] [PATCH 1/1] qemu: host NUMA hugepage policy without guest NUMA

2016-10-12 Thread Sam Bobroff
On Wed, Oct 12, 2016 at 10:27:50AM +0200, Martin Kletzander wrote:
> On Wed, Oct 12, 2016 at 03:04:53PM +1100, Sam Bobroff wrote:
> >At the moment, guests that are backed by hugepages in the host are
> >only able to use policy to control the placement of those hugepages
> >on a per-(guest-)CPU basis. Policy applied globally is ignored.
> >
> >Such guests would use  and
> >a  block with  but no  >.../> elements.
> >
> >This patch corrects this by, in this specific case, changing the QEMU
> >command line from "-mem-prealloc -mem-path=..." (which cannot
> >specify NUMA policy) to "-object memory-backend-file ..." (which can).
> >
> >Note: This is not visible to the guest and does not appear to create
> >a migration incompatibility.
> >
> 
> It could make sense, I haven't tried yet, though.  However, I still
> don't see the point in using memory-backend-file.  Is it that when you
> don't have cpuset cgroup the allocation doesn't work well?  Because it
> certainly does work for me.

Thanks for taking a look at this :-)

The point of using a memory-backend-file is that with it, the NUMA policy can
be specified to QEMU, but with -mem-path it can't. It seems to be a way to tell
QEMU to apply NUMA policy in the right place. It does seem odd to me to use
memory-backend-file without attaching the backend to a guest NUMA node, but it
seems to do the right thing in this case. (If there are guest NUMA nodes, or if
hugepages aren't being used, policy is correctly applied.)

I'll describe my test case in detail, perhaps there's something I don't 
understand
happening.

* I set up a machine with two (fake) NUMA nodes (0 and 1), with 2G of hugepages
  on node 1, and none on node 0.

* I create a 2G guest using virt-install:

virt-install --name ppc --memory=2048 --disk ~/tmp/tmp.qcow2 --cdrom 
~/tmp/ubuntu-16.04-server-ppc64el.iso --wait 0 --virt-type qemu --memorybacking 
hugepages=on --graphics vnc --arch ppc64le

* I "virsh destroy" and then "virsh edit" to add this block to the guest XML:

  
 
  

* "virsh start", and the machine starts (I believe it should fail due to 
insufficient memory satasfying the policy).
* "numastat -p $(pidof qemu-system-ppc64)" shows something like this:

Per-node process memory usage (in MBs) for PID 8048 (qemu-system-ppc)
   Node 0  Node 1   Total
  --- --- ---
Huge 0.00 2048.00 2048.00
Heap 8.120.008.12
Stack0.030.000.03
Private 35.806.10   41.90
  --- --- ---
Total   43.95 2054.10 2098.05

So it looks like it's allocated hugepages from node 1, isn't this violating the
policy I set via numatune?

> >Signed-off-by: Sam Bobroff 
> >---
> >There was some discussion leading up to this patch, here:
> >
> >https://www.redhat.com/archives/libvir-list/2016-October/msg00033.html
> >
> >During that discussion it seemed that fixing this issue would cause
> >migration incompatibility but after some careful testing, it appears
> >that it only does when a memory-backend object is attached to a guest
> >NUMA node and that is not the case here. If only one is created, and
> >used globally (not attached via mem=), the migration data does not
> >seem to be changed and so it seems reasonable to try something like
> >this patch.
> >
> >This patch does work for my test cases but I don't claim a deep
> >understanding of the libvirt code so this is at least partly a RFC.
> >Comments welcome :-)
> >
> >Cheers,
> >Sam.
> >
> >src/qemu/qemu_command.c | 55 
> >-
> >src/qemu/qemu_command.h |  2 +-
> >2 files changed, 46 insertions(+), 11 deletions(-)
> >
> >diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> >index 0804133..c28c8f2 100644
> >--- a/src/qemu/qemu_command.c
> >+++ b/src/qemu/qemu_command.c
> >@@ -3143,7 +3143,7 @@ qemuBuildMemoryBackendStr(unsigned long long size,
> >  int guestNode,
> >  virBitmapPtr userNodeset,
> >  virBitmapPtr autoNodeset,
> >-  virDomainDefPtr def,
> >+  const virDomainDefPtr def,
> >  virQEMUCapsPtr qemuCaps,
> >  virQEMUDriverConfigPtr cfg,
> 

[libvirt] [PATCH 1/1] qemu: host NUMA hugepage policy without guest NUMA

2016-10-11 Thread Sam Bobroff
At the moment, guests that are backed by hugepages in the host are
only able to use policy to control the placement of those hugepages
on a per-(guest-)CPU basis. Policy applied globally is ignored.

Such guests would use  and
a  block with  but no  elements.

This patch corrects this by, in this specific case, changing the QEMU
command line from "-mem-prealloc -mem-path=..." (which cannot
specify NUMA policy) to "-object memory-backend-file ..." (which can).

Note: This is not visible to the guest and does not appear to create
a migration incompatibility.

Signed-off-by: Sam Bobroff 
---
There was some discussion leading up to this patch, here:

https://www.redhat.com/archives/libvir-list/2016-October/msg00033.html

During that discussion it seemed that fixing this issue would cause
migration incompatibility but after some careful testing, it appears
that it only does when a memory-backend object is attached to a guest
NUMA node and that is not the case here. If only one is created, and
used globally (not attached via mem=), the migration data does not
seem to be changed and so it seems reasonable to try something like
this patch.

This patch does work for my test cases but I don't claim a deep
understanding of the libvirt code so this is at least partly a RFC.
Comments welcome :-)

Cheers,
Sam.

 src/qemu/qemu_command.c | 55 -
 src/qemu/qemu_command.h |  2 +-
 2 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 0804133..c28c8f2 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -3143,7 +3143,7 @@ qemuBuildMemoryBackendStr(unsigned long long size,
   int guestNode,
   virBitmapPtr userNodeset,
   virBitmapPtr autoNodeset,
-  virDomainDefPtr def,
+  const virDomainDefPtr def,
   virQEMUCapsPtr qemuCaps,
   virQEMUDriverConfigPtr cfg,
   const char **backendType,
@@ -7129,12 +7129,18 @@ qemuBuildSmpCommandLine(virCommandPtr cmd,
 
 static int
 qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg,
-const virDomainDef *def,
+const virDomainDefPtr def,
 virQEMUCapsPtr qemuCaps,
-virCommandPtr cmd)
+virCommandPtr cmd,
+virBitmapPtr auto_nodeset)
 {
 const long system_page_size = virGetSystemPageSizeKB();
 char *mem_path = NULL;
+virBitmapPtr nodemask = NULL;
+const char *backendType = NULL;
+char *backendStr = NULL;
+virJSONValuePtr props;
+int rv = -1;
 
 /*
  *  No-op if hugepages were not requested.
@@ -7159,18 +7165,47 @@ qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg,
 if (qemuGetHupageMemPath(cfg, def->mem.hugepages[0].size, &mem_path) < 0)
 return -1;
 
-virCommandAddArgList(cmd, "-mem-prealloc", "-mem-path", mem_path, NULL);
+if (virDomainNumatuneMaybeGetNodeset(def->numa, auto_nodeset,
+ &nodemask, -1) < 0)
+return -1;
+if (nodemask && virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE)) {
+props = virJSONValueNewObject();
+if (qemuBuildMemoryBackendStr(virDomainDefGetMemoryInitial(def),
+  0, -1, NULL, auto_nodeset,
+  def, qemuCaps, cfg, &backendType,
+  &props, false) < 0)
+goto cleanup;
+if (!(backendStr = virQEMUBuildObjectCommandlineFromJSON(backendType,
+ "mem",
+ props)))
+goto cleanup;
+virCommandAddArgList(cmd, "-object", backendStr, NULL);
+rv = 0;
+cleanup:
+VIR_FREE(backendStr);
+VIR_FREE(props);
+}
+else {
+if (nodemask || 1)
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+   _("Memory file backend objects are "
+ "unsupported by QEMU binary. Global NUMA "
+ "hugepage policy will be ignored."));
+virCommandAddArgList(cmd, "-mem-prealloc", "-mem-path", mem_path, 
NULL);
+rv = 0;
+}
 VIR_FREE(mem_path);
 
-return 0;
+return rv;
 }
 
 
 static int
 qemuBuildMemCommandLine(virCommandPtr cmd,
 virQEMUDriverConfigPtr cfg,
-const virDomainDef *def,
-virQEMUCapsPtr qemuCaps)
+const virDomainDefPtr def,
+virQEMUCapsPtr q

Re: [libvirt] Question about QEMU + hugepages + NUMA memory + migration

2016-10-05 Thread Sam Bobroff
On Tue, Oct 04, 2016 at 11:28:15AM +0200, Martin Kletzander wrote:
> On Tue, Oct 04, 2016 at 02:34:57PM +1100, Sam Bobroff wrote:
> >On Mon, Oct 03, 2016 at 04:27:25PM +0200, Martin Kletzander wrote:
> >>On Mon, Aug 01, 2016 at 02:01:22PM +1000, Sam Bobroff wrote:
> >>>Hi libvirt people,
> >>>
> >>>I've been looking at a (probable) bug and I'm not sure how to progress. The
> >>>situation is a bit complicated and involves both QEMU and libvirt (and I 
> >>>think
> >>>it may have been looked at already) so I would really appreciate some 
> >>>advice on
> >>>how to approach it. I'm using a pretty recent master version of libvirt 
> >>>from
> >>>git and I'm testing on a ppc64le host with a similar guest but this doesn't
> >>>seem to be arch-specific.
> >>>
> >>
> >>Sorry I haven't replied earlier and I'm respawning this thread now, I
> >>just noticed this thread being marked for replying only after I fixed
> >>similar thing to what you describe.
> >
> >No problem! :-)
> >
> >>>If I create a QEMU guest (e.g. via virt-install) that requests both 
> >>>hugepage
> >>>backing on the host and NUMA memory placement on the host, the NUMA 
> >>>placement
> >>>seems to be ignored. If I do:
> >>>
> >>># echo 0 > /proc/sys/vm/nr_hugepages
> >>># echo 512 > 
> >>>/sys/devices/system/node/node0/hugepages/hugepages-16384kB/nr_hugepages
> >>># virt-install --name tmp --memory=4096 --graphics none --memorybacking 
> >>>hugepages=yes --disk none --import --wait 0 --numatune=8
> >>>
> >>
> >>So to be clear, we want to use 16M hugepages, but only allocated from
> >>node 8 while the only allocated ones are on node 0.
> >
> >Yes.
> >
> >>>... then hugepages are allocated on node 0 and the machine starts 
> >>>successfully,
> >>>which seems like a bug.
> >>>
> >>
> >>This definitely is a bug.  But I'm afraid it's not in libvirt.  I'll do
> >>some explaining first.
> >
> >OK.
> >
> >>>I believe it should fail to start due to insufficient memory, and in fact 
> >>>that
> >>>is what happens if cgroup support isn't detected in the host: there seems 
> >>>to be
> >>>a fall-back path in libvirt (probably using mbind()) that works as I would
> >>>expect.
> >>>
> >>
> >>Yes, we are using multiple things to enforce NUMA binding:
> >>
> >> 1) cgroups - This restricts all the allocations made on the account of
> >>  the process, even when KVM does that.  We cannot use that
> >>  until the QEMU process is running because it needs to
> >>  allocate some data from the DMA region which is usually
> >>  only on one node.
> >>
> >> 2) numactl's mbind() - it doesn't apply to kernel allocations, so
> >>whatever KVM allocates it's not restricted by
> >>this.  We use this always on the process before
> >>exec()-ing qemu binary (if compiled with
> >>support to numactl, of course)
> >>
> >> 3) memory-backend-file's host-nodes parameter - This is the best
> >> option that is used when QEMU
> >> supports it, but due to migration
> >> compatibility we use it only when
> >> requested with 
> >> elements.
> >
> >OK good, that matches what I thought I'd worked out :-)
> >
> >>>Note: the relevant part of the guest XML seems to be this:
> >>>»···
> >>>»···»···
> >>>»···
> >>>»···
> >>>»···»···
> >>>»···
> >>>
> >>>It seems fairly clear what is happening: although QEMU is capable of 
> >>>allocating
> >>>hugepages on specific NUMA nodes (using "memory-backend-file") libvirt is 
> >>>not
> >>>passing those options to QEMU in this situation.
> >>>
> >>
> >>I'm guessing you're talking about the host-nodes= parameter.
> >
> >I am, yes.
> &g

Re: [libvirt] Question about QEMU + hugepages + NUMA memory + migration

2016-10-03 Thread Sam Bobroff
On Mon, Oct 03, 2016 at 04:27:25PM +0200, Martin Kletzander wrote:
> On Mon, Aug 01, 2016 at 02:01:22PM +1000, Sam Bobroff wrote:
> >Hi libvirt people,
> >
> >I've been looking at a (probable) bug and I'm not sure how to progress. The
> >situation is a bit complicated and involves both QEMU and libvirt (and I 
> >think
> >it may have been looked at already) so I would really appreciate some advice 
> >on
> >how to approach it. I'm using a pretty recent master version of libvirt from
> >git and I'm testing on a ppc64le host with a similar guest but this doesn't
> >seem to be arch-specific.
> >
> 
> Sorry I haven't replied earlier and I'm respawning this thread now, I
> just noticed this thread being marked for replying only after I fixed
> similar thing to what you describe.

No problem! :-)

> >If I create a QEMU guest (e.g. via virt-install) that requests both hugepage
> >backing on the host and NUMA memory placement on the host, the NUMA placement
> >seems to be ignored. If I do:
> >
> ># echo 0 > /proc/sys/vm/nr_hugepages
> ># echo 512 > 
> >/sys/devices/system/node/node0/hugepages/hugepages-16384kB/nr_hugepages
> ># virt-install --name tmp --memory=4096 --graphics none --memorybacking 
> >hugepages=yes --disk none --import --wait 0 --numatune=8
> >
> 
> So to be clear, we want to use 16M hugepages, but only allocated from
> node 8 while the only allocated ones are on node 0.

Yes.

> >... then hugepages are allocated on node 0 and the machine starts 
> >successfully,
> >which seems like a bug.
> >
> 
> This definitely is a bug.  But I'm afraid it's not in libvirt.  I'll do
> some explaining first.

OK.

> >I believe it should fail to start due to insufficient memory, and in fact 
> >that
> >is what happens if cgroup support isn't detected in the host: there seems to 
> >be
> >a fall-back path in libvirt (probably using mbind()) that works as I would
> >expect.
> >
> 
> Yes, we are using multiple things to enforce NUMA binding:
> 
>  1) cgroups - This restricts all the allocations made on the account of
>   the process, even when KVM does that.  We cannot use that
>   until the QEMU process is running because it needs to
>   allocate some data from the DMA region which is usually
>   only on one node.
> 
>  2) numactl's mbind() - it doesn't apply to kernel allocations, so
> whatever KVM allocates it's not restricted by
> this.  We use this always on the process before
> exec()-ing qemu binary (if compiled with
> support to numactl, of course)
> 
>  3) memory-backend-file's host-nodes parameter - This is the best
>  option that is used when QEMU
>  supports it, but due to migration
>  compatibility we use it only when
>  requested with 
>  elements.

OK good, that matches what I thought I'd worked out :-)

> >Note: the relevant part of the guest XML seems to be this:
> >»···
> >»···»···
> >»···
> >»···
> >»···»···
> >»···
> >
> >It seems fairly clear what is happening: although QEMU is capable of 
> >allocating
> >hugepages on specific NUMA nodes (using "memory-backend-file") libvirt is not
> >passing those options to QEMU in this situation.
> >
> 
> I'm guessing you're talking about the host-nodes= parameter.

I am, yes.

> >I investigated this line of reasoning and if I hack libvirt to pass those
> >options to QEMU it does indeed fix the problem... but it renders the machine
> >state migration-incompatible with unfixed versions. This seems to have been 
> >why
> >this hasn't been fixed already :-(
> >
> >So what can we do?
> >
> 
> From virt-install POV I would suggest adding some functionality that
> would probe whether it works with  and use it if possible.  Or
> dig deep down in kvm or qemu to see why the allocations do not conform
> to the mbind().

Sorry, I don't think I understand your answer.

You seem to be saying that it doesn't matter that the above XML causes memory
to be allocated incorrectly, as long as we don't use it (by changing
virt-install to use something else). But shouldn't the XML work correctly even
if it was generated by hand? (Sorry if I'm misunderstanding!)

And

Re: [libvirt] [PATCH] qemu: Only use memory-backend-file with NUMA if needed

2016-09-27 Thread Sam Bobroff
On Fri, Sep 23, 2016 at 03:20:56PM +0200, Martin Kletzander wrote:
> If this reminds you of a commit message from around a year ago, it's
> 41c2aa729f0af084ede95ee9a06219a2dd5fb5df and yes, we're dealing with
> "the same thing" again.  Or f309db1f4d51009bad0d32e12efc75530b66836b and
> it's similar.
> 
> There is a logic in place that if there is no real need for
> memory-backend-file, qemuBuildMemoryBackendStr() returns 0.  However
> that wasn't the case with hugepage backing.  The reason for that was
> that we abused the 'pagesize' variable for storing that information, but
> we should rather have a separate one that specifies whether we really
> need the new object for hugepage backing.  And that variable should be
> set only if this particular NUMA cell needs special treatment WRT
> hugepages.
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1372153
> 
> Signed-off-by: Martin Kletzander 

Hi Martin,

I tested your patch and it works but it seems to make the snapshot/migration
information incompatible with the current master code. e.g. after applying the
patch I can't load a snapshot that was created before applying the patch
(obviously only for a guest configuration that is affected by the patch). The
error I get is this:

$ virsh snapshot-revert tmp --snapshotname 1475026515
error: operation failed: Unknown ramblock "/objects/ram-node0", cannot accept 
migration
error while loading state for instance 0x0 of device 'ram'
Error -22 while loading VM state

Presumably this is caused by the same change that fixes some other migrations,
but is this case a problem?

Cheers,
Sam.
> ---
> 
> Notes:
> This fixes migration from older libvirts.  By "older", I mean
> pre-(circa-)1.2.7, also in some cases pre-1.2.11, in some other cases
> pre-v1.2.20.  It's pretty messy.  It could be back-ported as far as it's
> easy to do.
> 
>  src/qemu/qemu_command.c   |  8 +---
>  tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args | 10 --
>  2 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index 1ac0fa116310..316aced5124a 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -3202,6 +3202,7 @@ qemuBuildMemoryBackendStr(unsigned long long size,
>  int ret = -1;
>  virJSONValuePtr props = NULL;
>  bool nodeSpecified = virDomainNumatuneNodeSpecified(def->numa, 
> guestNode);
> +bool needHugepage = !!pagesize;
> 
>  *backendProps = NULL;
>  *backendType = NULL;
> @@ -3224,10 +3225,10 @@ qemuBuildMemoryBackendStr(unsigned long long size,
>  mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT;
> 
>  if (pagesize == 0) {
> +bool thisHugepage = false;
> +
>  /* Find the huge page size we want to use */
>  for (i = 0; i < def->mem.nhugepages; i++) {
> -bool thisHugepage = false;
> -
>  hugepage = &def->mem.hugepages[i];
> 
>  if (!hugepage->nodemask) {
> @@ -3249,6 +3250,7 @@ qemuBuildMemoryBackendStr(unsigned long long size,
> 
>  if (thisHugepage) {
>  /* Hooray, we've found the page size */
> +needHugepage = true;
>  break;
>  }
>  }
> @@ -3335,7 +3337,7 @@ qemuBuildMemoryBackendStr(unsigned long long size,
>  }
> 
>  /* If none of the following is requested... */
> -if (!pagesize && !userNodeset && !memAccess && !nodeSpecified && !force) 
> {
> +if (!needHugepage && !userNodeset && !memAccess && !nodeSpecified && 
> !force) {
>  /* report back that using the new backend is not necessary
>   * to achieve the desired configuration */
>  ret = 1;
> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args 
> b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args
> index 447bb52d00b7..a37f03d0d2ed 100644
> --- a/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args
> +++ b/tests/qemuxml2argvdata/qemuxml2argv-hugepages-pages2.args
> @@ -10,12 +10,10 @@ QEMU_AUDIO_DRV=none \
>  -M pc \
>  -m 1024 \
>  -smp 2,sockets=2,cores=1,threads=1 \
> --object memory-backend-file,id=ram-node0,prealloc=yes,\
> -mem-path=/dev/hugepages2M/libvirt/qemu,size=268435456 \
> --numa node,nodeid=0,cpus=0,memdev=ram-node0 \
> --object memory-backend-file,id=ram-node1,prealloc=yes,\
> -mem-path=/dev/hugepages2M/libvirt/qemu,size=805306368 \
> --numa node,nodeid=1,cpus=1,memdev=ram-node1 \
> +-mem-prealloc \
> +-mem-path /dev/hugepages2M/libvirt/qemu \
> +-numa node,nodeid=0,cpus=0,mem=256 \
> +-numa node,nodeid=1,cpus=1,mem=768 \
>  -uuid ef1bdff4-27f3-4e85-a807-5fb4d58463cc \
>  -nographic \
>  -nodefaults \
> -- 
> 2.10.0
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] qemu: Bug: NUMA memory policy ignored in some situations

2016-08-16 Thread Sam Bobroff
Hi all,

I've found that if I create a QEMU guest that:

(a) is backed by host hugepages and
(b) specifies a host NUMA memory placement policy

... then the NUMA policy (b) is ignored.  It's even possible to start the guest
if the policy specifies that memory must come from non-existant host nodes.

Is this a bug?

(I ask because I suspect that it is a bug, and that it can't be fixed trivially
due to migration considerations. Is that true?)

A few more details:

I do (a) using this in the guest XML:
 

And (b) using:


If the XML contains  and  to bind
guest NUMA nodes to host NUMA nodes then the policy is followed and this
behaviour isn't seen. The guest can't be started if the policy specifies
invalid host nodes.

(Strangely, if the host doesn't have the cpuset cgroup mount set up or otherwise
lacks cgroup support, then the policy is not ignored. This appears to be due to
a fallback path in libvirt so it seems like NUMA memory policy is intended to
work in this situation.)

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Question about QEMU + hugepages + NUMA memory + migration

2016-07-31 Thread Sam Bobroff
Hi libvirt people,

I've been looking at a (probable) bug and I'm not sure how to progress. The
situation is a bit complicated and involves both QEMU and libvirt (and I think
it may have been looked at already) so I would really appreciate some advice on
how to approach it. I'm using a pretty recent master version of libvirt from
git and I'm testing on a ppc64le host with a similar guest but this doesn't
seem to be arch-specific.

If I create a QEMU guest (e.g. via virt-install) that requests both hugepage
backing on the host and NUMA memory placement on the host, the NUMA placement
seems to be ignored. If I do:

# echo 0 > /proc/sys/vm/nr_hugepages
# echo 512 > 
/sys/devices/system/node/node0/hugepages/hugepages-16384kB/nr_hugepages
# virt-install --name tmp --memory=4096 --graphics none --memorybacking 
hugepages=yes --disk none --import --wait 0 --numatune=8

... then hugepages are allocated on node 0 and the machine starts successfully,
which seems like a bug.

I believe it should fail to start due to insufficient memory, and in fact that
is what happens if cgroup support isn't detected in the host: there seems to be
a fall-back path in libvirt (probably using mbind()) that works as I would
expect.

Note: the relevant part of the guest XML seems to be this:
»···
»···»···
»···
»···
»···»···
»···

It seems fairly clear what is happening: although QEMU is capable of allocating
hugepages on specific NUMA nodes (using "memory-backend-file") libvirt is not
passing those options to QEMU in this situation.

I investigated this line of reasoning and if I hack libvirt to pass those
options to QEMU it does indeed fix the problem... but it renders the machine
state migration-incompatible with unfixed versions. This seems to have been why
this hasn't been fixed already :-(

So what can we do?

I assume it's not acceptible to just break migration with a bugfix, and I can
only think of two ways to fix migration:

(a) Add a new flag to the XML, and for guests without the flag, maintain the
old buggy behaviour (and therefore migration compatability).

(b) Hack QEMU so that migration can succeed between un-fixed and fixed
versions. (And possibly also in the reverse direction?)

I don't like (a) because it's visible in the XML, and would have to be carried
forever (or at least a long time?).

I don't really like (b) either because it's tricky, and even if it could be
made to work reliably, it would add mess and risk to the migration code. I'm
not sure how the QEMU community would feel about it either. However, I did hack
up some code and it worked at least in some simple cases.

Can anyone see a better approach? Is anyone already working on this?

Thanks,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] fix

2014-08-14 Thread Sam Bobroff
On 14/08/14 20:14, Ján Tomko wrote:
> On 08/14/2014 11:52 AM, Michal Privoznik wrote:
>> On 14.08.2014 10:41, Ján Tomko wrote:
>>> Also add qemuDomainChangeGraphicsPasswords,
>>> qemuProcessVerifyGuestCPU and qemuProcessInitPCIAddresses.
>>>
>>> Replace tabs by spaces.
>>> ---
>>> I'll do some testing on the patch and push it later with this squashed in
>>> if it works well.
>>>
>>>   src/qemu/qemu_domain.c  |  2 +-
>>>   src/qemu/qemu_hotplug.c | 12 
>>>   src/qemu/qemu_hotplug.h |  3 ++-
>>>   src/qemu/qemu_process.c | 33 -
>>>   4 files changed, 31 insertions(+), 19 deletions(-)
>>>
>>
>> ACK to the original with this squashed in.
>>
> 
> Now pushed.
> 
> Jan
> 
> 
Jan, Michal,

Thanks for your help :-)

(I'll be more careful with formatting in future.)

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v2 1/1] qemu: Tidy up job handling during live migration

2014-08-11 Thread Sam Bobroff
During a QEMU live migration several warning messages about job
handling could be written to syslog on the destination host:

"entering monitor without asking for a nested job is dangerous"

The messages are written because the job handling during migration
uses hard coded asyncJob values in several places that are incorrect.

This patch passes the required asyncJob value around and prevents
the warnings as well as any issues that the warnings may be referring
to.

Signed-off-by: Sam Bobroff 
---

v2:
* Handle failures from qemuDomainObjEnterMonitorAsync() rather than
ignoring them.
* Include qemuDomainChangeGraphicsPasswords().

 src/qemu/qemu_domain.c|  6 --
 src/qemu/qemu_domain.h|  2 +-
 src/qemu/qemu_driver.c| 21 -
 src/qemu/qemu_migration.c |  3 ++-
 src/qemu/qemu_process.c   | 46 --
 src/qemu/qemu_process.h   |  1 +
 6 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 4f63c88..59b2647 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -2497,7 +2497,8 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver,
 
 int
 qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
-   virDomainObjPtr vm)
+   virDomainObjPtr vm,
+   int asyncJob)
 {
 qemuDomainObjPrivatePtr priv = vm->privateData;
 char **aliases;
@@ -2505,7 +2506,8 @@ qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
 if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT))
 return 0;
 
-qemuDomainObjEnterMonitor(driver, vm);
+if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0)
+   return -1;
 if (qemuMonitorGetDeviceAliases(priv->mon, &aliases) < 0) {
 qemuDomainObjExitMonitor(driver, vm);
 return -1;
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 67972b9..8736889 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -369,7 +369,7 @@ extern virDomainXMLNamespace 
virQEMUDriverDomainXMLNamespace;
 extern virDomainDefParserConfig virQEMUDriverDomainDefParserConfig;
 
 int qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
-   virDomainObjPtr vm);
+   virDomainObjPtr vm, int asyncJob);
 
 bool qemuDomainDefCheckABIStability(virQEMUDriverPtr driver,
 virDomainDefPtr src,
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 33541d3..b0439d2 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -1616,7 +1616,8 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr 
conn,
 goto cleanup;
 }
 
-if (qemuProcessStart(conn, driver, vm, NULL, -1, NULL, NULL,
+if (qemuProcessStart(conn, driver, vm, QEMU_ASYNC_JOB_NONE,
+ NULL, -1, NULL, NULL,
  VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
  start_flags) < 0) {
 virDomainAuditStart(vm, "booted", false);
@@ -5446,7 +5447,8 @@ qemuDomainSaveImageStartVM(virConnectPtr conn,
 }
 
 /* Set the migration source and start it up. */
-ret = qemuProcessStart(conn, driver, vm, "stdio", *fd, path, NULL,
+ret = qemuProcessStart(conn, driver, vm, QEMU_ASYNC_JOB_NONE,
+   "stdio", *fd, path, NULL,
VIR_NETDEV_VPORT_PROFILE_OP_RESTORE,
VIR_QEMU_PROCESS_START_PAUSED);
 
@@ -6143,7 +6145,8 @@ qemuDomainObjStart(virConnectPtr conn,
 }
 }
 
-ret = qemuProcessStart(conn, driver, vm, NULL, -1, NULL, NULL,
+ret = qemuProcessStart(conn, driver, vm, QEMU_ASYNC_JOB_NONE,
+   NULL, -1, NULL, NULL,
VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags);
 virDomainAuditStart(vm, "booted", ret >= 0);
 if (ret >= 0) {
@@ -6500,7 +6503,7 @@ qemuDomainAttachDeviceLive(virDomainObjPtr vm,
 }
 
 if (ret == 0)
-qemuDomainUpdateDeviceList(driver, vm);
+qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE);
 
 return ret;
 }
@@ -6560,7 +6563,7 @@ qemuDomainDetachDeviceLive(virDomainObjPtr vm,
 }
 
 if (ret == 0)
-qemuDomainUpdateDeviceList(driver, vm);
+qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE);
 
 return ret;
 }
@@ -14101,8 +14104,8 @@ static int 
qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
 if (config)
 virDomainObjAssignDef(vm, config, false, NULL);
 
-rc = qemuProcessStart(snapshot->domain->conn,
-  driver, vm, NULL, -1, NULL, snap,
+rc = qemuProcessStart(snapshot->domain->conn, driver, vm,
+  QEMU_ASYNC_JOB_NONE, N

Re: [libvirt] [PATCH 1/1] qemu: Tidy up job handling during live migration

2014-08-11 Thread Sam Bobroff
On 02/08/14 00:04, Ján Tomko wrote:
> On 08/01/2014 03:12 AM, Sam Bobroff wrote:
>> On 30/07/14 01:52, Ján Tomko wrote:
>>> On 07/29/2014 02:41 AM, Sam Bobroff wrote:
>>>> During a QEMU live migration several warning messages about job
>>>> handling could be written to syslog on the destination host:
>>>>
>>>> "entering monitor without asking for a nested job is dangerous"
>>>>
>>>> The messages are written because the job handling during migration
>>>> uses hard coded asyncJob values in several places that are incorrect.
>>>>
>>>> This patch passes the required asyncJob value around and prevents
>>>> the warnings as well as any issues that the warnings may be referring
>>>> to.
>>>>
>>>> Signed-off-by: Sam Bobroff 
>>>> --
>>>
>>> This patch seems to fix the deadlock that can happen if the migrated domain 
>>> is
>>> destroyed at the destination reported here:
>>> https://www.redhat.com/archives/libvir-list/2014-May/msg00236.html
>>>
>>> It looks good to me, but it seems there are more functions calling
>>> qemuDomainObjEnterMonitor that can be called from qemuProcessStart,
>>> for example qemuDomainChangeGraphicsPasswords.
>>
>> Yes, I was fairly sure there would be other cases; I just fixed all the
>> ones that actually occurred during my tests.
>>
>> In fact it seems like for the cases I'm looking at here, where it's the
>> async job owner thread that's using the EnterMonitor functions, passing
>> asyncJob around is a waste of time anyway because we know the correct
>> value of asyncJob to use: it's stored in priv->job.asyncJob.
>>
>> Why not have qemuDomainObjEnterMonitorInternal() automatically switch to
>> creating a nested job in this case?
>>
>> It seems easy to do and would simplify some code as well; what do you think?
> 
> We've had it that way before - it didn't work that well:
> http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=193cd0f3

Interesting. I'll stick to the simpler fix for now then.

>>
>>>> @@ -2505,7 +2506,7 @@ qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
>>>>  if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT))
>>>>  return 0;
>>>>  
>>>> -qemuDomainObjEnterMonitor(driver, vm);
>>>> +ignore_value(qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob));
>>>
>>> Also, the return value of this call could be dangerous to ignore if asyncJob
>>> != NONE.
>>
>> True, but the patch hasn't introduced this, and the full story is even
>> worse ;-)
>>
>>  void qemuDomainObjEnterMonitor(virQEMUDriverPtr driver,
>>virDomainObjPtr obj)
>>  {
>> ignore_value(qemuDomainObjEnterMonitorInternal(driver, obj,
>>QEMU_ASYNC_JOB_NONE));
>>  }
>>
> 
> qemuDomainObjEnterMonitorInternal is called with QEMU_ASYNC_JOB_NONE here. It
> always returns 0 in that case and it's safe to ignore. The problem is when you
> use other async jobs:
> 
> static int
> qemuDomainObjEnterMonitorInternal(virQEMUDriverPtr driver,
>   virDomainObjPtr obj,
>   qemuDomainAsyncJob asyncJob)
> {
> qemuDomainObjPrivatePtr priv = obj->privateData;
> 
> if (asyncJob != QEMU_ASYNC_JOB_NONE) {
> int ret;
> if ((ret = qemuDomainObjBeginNestedJob(driver, obj, asyncJob)) < 0)
> return ret;
> if (!virDomainObjIsActive(obj)) {
> virReportError(VIR_ERR_OPERATION_FAILED, "%s",
>_("domain is no longer running"));
> /* Still referenced by the containing async job.  */
> ignore_value(qemuDomainObjEndJob(driver, obj));
> return -1;
> }
> }
> ...

Ah thanks, I see what you mean.

I'll post an updated version of my patch that handles error results and
I'll include the qemuDomainChangeGraphicsPasswords() (via
qemuProcessInitPasswords()) function you mentioned above as well.

I'd be happy to update other functions as well, but I can't see a simple
way of finding every way that qemuProcessStart() might end up calling
qemuDomainObjEnterMonitorInternal(). If you can recommend one I'll
handle them as well, but otherwise would you accept the patch with only
the known ones fixed? (It would at least be an improvement and would
make it easy to fix the other cases as they are found.)

> Jan
> 

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/1] qemu: Tidy up job handling during live migration

2014-07-31 Thread Sam Bobroff
On 30/07/14 01:52, Ján Tomko wrote:
> On 07/29/2014 02:41 AM, Sam Bobroff wrote:
>> During a QEMU live migration several warning messages about job
>> handling could be written to syslog on the destination host:
>>
>> "entering monitor without asking for a nested job is dangerous"
>>
>> The messages are written because the job handling during migration
>> uses hard coded asyncJob values in several places that are incorrect.
>>
>> This patch passes the required asyncJob value around and prevents
>> the warnings as well as any issues that the warnings may be referring
>> to.
>>
>> Signed-off-by: Sam Bobroff 
>> --
> 
> This patch seems to fix the deadlock that can happen if the migrated domain is
> destroyed at the destination reported here:
> https://www.redhat.com/archives/libvir-list/2014-May/msg00236.html
> 
> It looks good to me, but it seems there are more functions calling
> qemuDomainObjEnterMonitor that can be called from qemuProcessStart,
> for example qemuDomainChangeGraphicsPasswords.

Yes, I was fairly sure there would be other cases; I just fixed all the
ones that actually occurred during my tests.

In fact it seems like for the cases I'm looking at here, where it's the
async job owner thread that's using the EnterMonitor functions, passing
asyncJob around is a waste of time anyway because we know the correct
value of asyncJob to use: it's stored in priv->job.asyncJob.

Why not have qemuDomainObjEnterMonitorInternal() automatically switch to
creating a nested job in this case?

It seems easy to do and would simplify some code as well; what do you think?

>> @@ -2505,7 +2506,7 @@ qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
>>  if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT))
>>  return 0;
>>  
>> -qemuDomainObjEnterMonitor(driver, vm);
>> +ignore_value(qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob));
> 
> Also, the return value of this call could be dangerous to ignore if asyncJob
> != NONE.

True, but the patch hasn't introduced this, and the full story is even
worse ;-)

 void qemuDomainObjEnterMonitor(virQEMUDriverPtr driver,
   virDomainObjPtr obj)
 {
ignore_value(qemuDomainObjEnterMonitorInternal(driver, obj,
   QEMU_ASYNC_JOB_NONE));
 }

> 
> Jan

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1/1] qemu: Tidy up job handling during live migration

2014-07-28 Thread Sam Bobroff
During a QEMU live migration several warning messages about job
handling could be written to syslog on the destination host:

"entering monitor without asking for a nested job is dangerous"

The messages are written because the job handling during migration
uses hard coded asyncJob values in several places that are incorrect.

This patch passes the required asyncJob value around and prevents
the warnings as well as any issues that the warnings may be referring
to.

Signed-off-by: Sam Bobroff 
---

 src/qemu/qemu_domain.c|  5 +++--
 src/qemu/qemu_domain.h|  2 +-
 src/qemu/qemu_driver.c| 21 -
 src/qemu/qemu_migration.c |  3 ++-
 src/qemu/qemu_process.c   | 33 ++---
 src/qemu/qemu_process.h   |  1 +
 6 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 4f63c88..3abbb14 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -2497,7 +2497,8 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver,
 
 int
 qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
-   virDomainObjPtr vm)
+   virDomainObjPtr vm,
+   int asyncJob)
 {
 qemuDomainObjPrivatePtr priv = vm->privateData;
 char **aliases;
@@ -2505,7 +2506,7 @@ qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
 if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT))
 return 0;
 
-qemuDomainObjEnterMonitor(driver, vm);
+ignore_value(qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob));
 if (qemuMonitorGetDeviceAliases(priv->mon, &aliases) < 0) {
 qemuDomainObjExitMonitor(driver, vm);
 return -1;
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 67972b9..8736889 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -369,7 +369,7 @@ extern virDomainXMLNamespace 
virQEMUDriverDomainXMLNamespace;
 extern virDomainDefParserConfig virQEMUDriverDomainDefParserConfig;
 
 int qemuDomainUpdateDeviceList(virQEMUDriverPtr driver,
-   virDomainObjPtr vm);
+   virDomainObjPtr vm, int asyncJob);
 
 bool qemuDomainDefCheckABIStability(virQEMUDriverPtr driver,
 virDomainDefPtr src,
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 33541d3..b0439d2 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -1616,7 +1616,8 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr 
conn,
 goto cleanup;
 }
 
-if (qemuProcessStart(conn, driver, vm, NULL, -1, NULL, NULL,
+if (qemuProcessStart(conn, driver, vm, QEMU_ASYNC_JOB_NONE,
+ NULL, -1, NULL, NULL,
  VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
  start_flags) < 0) {
 virDomainAuditStart(vm, "booted", false);
@@ -5446,7 +5447,8 @@ qemuDomainSaveImageStartVM(virConnectPtr conn,
 }
 
 /* Set the migration source and start it up. */
-ret = qemuProcessStart(conn, driver, vm, "stdio", *fd, path, NULL,
+ret = qemuProcessStart(conn, driver, vm, QEMU_ASYNC_JOB_NONE,
+   "stdio", *fd, path, NULL,
VIR_NETDEV_VPORT_PROFILE_OP_RESTORE,
VIR_QEMU_PROCESS_START_PAUSED);
 
@@ -6143,7 +6145,8 @@ qemuDomainObjStart(virConnectPtr conn,
 }
 }
 
-ret = qemuProcessStart(conn, driver, vm, NULL, -1, NULL, NULL,
+ret = qemuProcessStart(conn, driver, vm, QEMU_ASYNC_JOB_NONE,
+   NULL, -1, NULL, NULL,
VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags);
 virDomainAuditStart(vm, "booted", ret >= 0);
 if (ret >= 0) {
@@ -6500,7 +6503,7 @@ qemuDomainAttachDeviceLive(virDomainObjPtr vm,
 }
 
 if (ret == 0)
-qemuDomainUpdateDeviceList(driver, vm);
+qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE);
 
 return ret;
 }
@@ -6560,7 +6563,7 @@ qemuDomainDetachDeviceLive(virDomainObjPtr vm,
 }
 
 if (ret == 0)
-qemuDomainUpdateDeviceList(driver, vm);
+qemuDomainUpdateDeviceList(driver, vm, QEMU_ASYNC_JOB_NONE);
 
 return ret;
 }
@@ -14101,8 +14104,8 @@ static int 
qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
 if (config)
 virDomainObjAssignDef(vm, config, false, NULL);
 
-rc = qemuProcessStart(snapshot->domain->conn,
-  driver, vm, NULL, -1, NULL, snap,
+rc = qemuProcessStart(snapshot->domain->conn, driver, vm,
+  QEMU_ASYNC_JOB_NONE, NULL, -1, NULL, snap,
   VIR_NETDEV_VPORT_PROFILE_OP_CREATE,
   VIR_QEMU_PROCESS_START_PAUSED);
   

Re: [libvirt] Syslog "nested job is dangerous" message

2014-07-21 Thread Sam Bobroff
On 17/07/14 17:54, Ján Tomko wrote:
> On 07/17/2014 08:51 AM, Jiri Denemark wrote:
>> On Thu, Jul 17, 2014 at 13:29:52 +1000, Sam Bobroff wrote:
>>> On 17/07/14 12:50, Eric Blake wrote:
>>>> On 07/16/2014 07:52 PM, Sam Bobroff wrote:
>>>>> Hello everyone,
>>>>
>>>> [Can you configure your mailer to wrap long lines?]
>>>
>>> [No problem, done.]
>>>
>>>>> After performing a migration, the syslog often contains several
>>>>> messages like this:
>>>>>
>>>>> "This thread seems to be the async job owner; entering monitor
>>>>> without asking for a nested job is dangerous"
>>>>
>>>> The sign of a bug that we need to fix.  What version of libvirt are
>>>> you using?  We may have already fixed it.
>>>
>>> I've been testing on 1.2.6, but I will re-test on the latest code from git.
>>
>> Hmm, it what were you doing when the message appeared in the log? I
>> fixed wrong usage of our job tracking APIs long time ago
>> (e4e28220b572cf4c71d07740c24150411f0704a6 in 1.0.3) so I guess there
>> must be another place we don't do it correctly.
>>
> 
> That commit was essentially reverted by my commit 9846402 when fixing
> migration crashes: https://bugzilla.redhat.com/show_bug.cgi?id=1018267
> 
> I think the message was harmless during migration as we don't allow other jobs
> (except destroy).
> 
> Jan

I've re-tested this on the latest libvirt from git (1.2.7) and I still
see several occurrences of the "dangerous" message on the destination
machine after performing a live migration with --copy-storage-all.

It seems like qemuDomainObjEnterMonitorInternal() is expecting the
asyncJob parameter to have the value of the currently running async job,
and in the cases that cause the warning message, it's being called via
qemuDomainObjEnterMonitor() which hard codes it to QEMU_ASYNC_NONE.

It looks like these calls should be changed to
qemuDomainObjEnterMonitorAsync() with the right asyncJob parameter,
which I believe is QEMU_ASYNC_JOB_MIGRATION_IN for the cases I'm looking
at (and the old value of QEMU_ASYNC_NONE for all other callers). There
aren't many cases of this, so fixing it this way is fairly simple but it
does involve passing the asyncJob through some intermediate functions.

Another approach would be to change qemuDomainObjEnterMonitorInternal()
so that it can automatically do the same thing: It could see that there
was an async job running, and that it's owned by the current thread, and
then call qemuDomainObjBeginNestedJob() (or directly call
qemuDomainObjBeginJobInternal() since we know that the checks in
qemuDomainObjBeginNestedJob() are going to succeed). It seems like this
might also simplify some other code that is doing pretty much just what
I described above.

The second approach does fix all the cases I'm looking at, and would
also fix other similar cases I haven't yet found, but I'm not sure that
it would be entirely safe or if it's circumventing some sort of safety
checking.

Any comments?
Which approach should I pursue?

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Syslog "nested job is dangerous" message

2014-07-16 Thread Sam Bobroff
On 17/07/14 12:50, Eric Blake wrote:
> On 07/16/2014 07:52 PM, Sam Bobroff wrote:
>> Hello everyone,
> 
> [Can you configure your mailer to wrap long lines?]

[No problem, done.]

>> After performing a migration, the syslog often contains several
>> messages like this:
>> 
>> "This thread seems to be the async job owner; entering monitor
>> without asking for a nested job is dangerous"
> 
> The sign of a bug that we need to fix.  What version of libvirt are
> you using?  We may have already fixed it.

I've been testing on 1.2.6, but I will re-test on the latest code from git.

>> The message makes it sound as if the user has done something
>> dangerous but, after looking at the code that produces the message,
>> it appears to be more to do with how the libvirt job code is used.
> 
> Rather, libvirt itself has done something dangerous.
> 
>> 
>> The commit that added the warning
>> (6eede368bc8e3df2c94c2ec1a913885ce08e99db) doesn't explain why it
>> might be dangerous...
> 
> Jiri can probably explain better, but it means there is a race
> condition where libvirt can lose track of a long-running job and
> cause memory corruption in its management of the guest.
> 
>> Would it be appropriate to re-word the message, or perhaps change
>> it to a debug message so that it's not normally seen by users?
> 
> No.
> 
>> 
>> Is it important to track down the cases that are generating the
>> warning and fix them? (Could it cause some kind of significant
>> problem?)
> 
> Yes.

Acknowledged.

If they still occur on the latest code I'll start tracking them down :-)

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Syslog "nested job is dangerous" message

2014-07-16 Thread Sam Bobroff
Hello everyone,

After performing a migration, the syslog often contains several messages like 
this:

"This thread seems to be the async job owner; entering monitor without asking 
for a nested job is dangerous"

The message makes it sound as if the user has done something dangerous but, 
after looking at the code that produces the message, it appears to be more to 
do with how the libvirt job code is used.

The commit that added the warning (6eede368bc8e3df2c94c2ec1a913885ce08e99db) 
doesn't explain why it might be dangerous...

qemu: Warn on possibly incorrect usage of EnterMonitor* 
 
 
qemuDomainObjEnterMonitor{,WithDriver} should not be called from async  

jobs, only EnterMonitorAsync variant is allowed.  

... so I would appreciate some advice from people who understand that area.

Would it be appropriate to re-word the message, or perhaps change it to a debug 
message so that it's not normally seen by users?

Is it important to track down the cases that are generating the warning and fix 
them? (Could it cause some kind of significant problem?)

Cheers,
Sam.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list