Re: [libvirt] [PATCH v3] lxc: fix show the wrong xml when guest start failed

2015-02-12 Thread John Ferlan


On 02/04/2015 08:42 AM, Luyao Huang wrote:
 https://bugzilla.redhat.com/show_bug.cgi?id=1176503
 
 When guest start failed, libvirt will keep the current vm-def,
 this will make a issue that we cannot get a right xml after guest
 start failed. And don't call the stop/release hook to do some
 other clean work.
 
 Call virLXCProcessCleanup to help us clean the source and call
 the hooks if start a vm failed
 
 Signed-off-by: Luyao Huang lhu...@redhat.com
 ---
 v2: use virLXCProcessCleanup to free the source and call the hook.
 v3: rework the patch to suit the virLXCProcessStart code changed.
 
  src/lxc/lxc_process.c | 76 
 ++-
  1 file changed, 32 insertions(+), 44 deletions(-)
 



FWIW:

v1 review starts here:

http://www.redhat.com/archives/libvir-list/2015-January/msg00112.html


At first I was 'concerned' about the number of changes, but after doing
more investigation - I think the patch is correct in general; however, I
think it needs some adjustments as there are really two bugs here...
I'll send an update shortly for comparison/review...

Essentially though - I think the console check*s* could be done earlier
before we get into other setup that requires going thru cleanup: (or
what error: was). That's one bug - which is a configuration error
which will prevent startup. Since it's easier to check early, provide an
error, and return -1 - that's what I think is cleaner solution.

Second, the other bug which you were trying to clean up at the same time
is that existing cleanup paths don't properly clean all things up. The
error path worked only when/once the container started and some
pre-container start cleanup was done, but a few important ones were
missing - so that's a separate patch.

I also will add a patch to add/modify the debugging to help future such
efforts...

BTW:
It seems commit id '88a1b542' from Cédric Bosdonnat caused yet another
merge conflict and seemingly from the description might have been in the
same area, but alas a quick check shows, it wasn't the same issue.

I'll leave the notes I was developing on this patch prior to just biting
the bullet and reformatting so you can perhaps see my thought process.

 diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c
 index 01da344..1a6cfbb 100644
 --- a/src/lxc/lxc_process.c
 +++ b/src/lxc/lxc_process.c
 @@ -1022,6 +1022,7 @@ int virLXCProcessStart(virConnectPtr conn,
  virCgroupPtr selfcgroup;
  int status;
  char *pidfile = NULL;
 +bool need_stop = false;
  


I think the check:

if (vm-def-nconsoles == 0) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, %s,
   _(At least one PTY console is required));
goto cleanup;
}

Should perhaps be moved to just before this code:

nttyFDs = vm-def-nconsoles;
if (VIR_ALLOC_N(ttyFDs, nttyFDs)  0)
goto cleanup;
for (i = 0; i  vm-def-nconsoles; i++)
ttyFDs[i] = -1;

and that would fix the bug from the bz... as well as ensuring we don't
have a if (VIR_ALLOC_N(ttdFDs, 0)  0)... In fact is there a reason
why that check cannot move much higher and be after the cgroup checks
which return -1?  While t it, why not move the following too:

for (i = 0; i  vm-def-nconsoles; i++) {
if (vm-def-consoles[i]-source.type != VIR_DOMAIN_CHR_TYPE_PTY) {
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, %s,
   _(Only PTY console types are supported));
goto cleanup;
}
}

and then remove that from the loop later on.

This way checks that go to cleanup are a result of some error in
processing rather than some container configuration error...

Then the remainder of the code could be perhaps patch 2 which is fixing
a different, although related problem.

  if (virCgroupNewSelf(selfcgroup)  0)
  return -1;
 @@ -1259,6 +1260,7 @@ int virLXCProcessStart(virConnectPtr conn,
  goto cleanup;
  }
  
 +need_stop = true;
  priv-stopReason = VIR_DOMAIN_EVENT_STOPPED_FAILED;
  priv-wantReboot = false;
  vm-def-id = vm-pid;

I was going to suggest using vm-pid (e.g.  0) instead of 'need_stop';
however, I couldn't convince myself that would be 'safe enough'...

 @@ -1272,20 +1274,20 @@ int virLXCProcessStart(virConnectPtr conn,
  
  if (VIR_CLOSE(handshakefds[1])  0) {
  virReportSystemError(errno, %s, _(could not close handshake fd));
 -goto error;
 +goto cleanup;
  }
  
  if (virCommandHandshakeWait(cmd)  0)
 -goto error;
 +goto cleanup;
  
  /* Write domain status to disk for the controller to
   * read when it starts */
  if (virDomainSaveStatus(driver-xmlopt, cfg-stateDir, vm)  0)
 -goto error;
 +goto cleanup;
  
  /* Allow the child to exec the controller */
  if (virCommandHandshakeNotify(cmd)  0)
 -goto error;
 +goto cleanup;
  
  if (virAtomicIntInc(driver-nactive) == 1  

Re: [libvirt] [PATCH v3] lxc: fix show the wrong xml when guest start failed

2015-02-12 Thread lhuang


On 02/13/2015 04:45 AM, John Ferlan wrote:


On 02/04/2015 08:42 AM, Luyao Huang wrote:

https://bugzilla.redhat.com/show_bug.cgi?id=1176503

When guest start failed, libvirt will keep the current vm-def,
this will make a issue that we cannot get a right xml after guest
start failed. And don't call the stop/release hook to do some
other clean work.

Call virLXCProcessCleanup to help us clean the source and call
the hooks if start a vm failed

Signed-off-by: Luyao Huang lhu...@redhat.com
---
v2: use virLXCProcessCleanup to free the source and call the hook.
v3: rework the patch to suit the virLXCProcessStart code changed.

  src/lxc/lxc_process.c | 76 ++-
  1 file changed, 32 insertions(+), 44 deletions(-)




FWIW:

v1 review starts here:

http://www.redhat.com/archives/libvir-list/2015-January/msg00112.html


At first I was 'concerned' about the number of changes, but after doing
more investigation - I think the patch is correct in general; however, I
think it needs some adjustments as there are really two bugs here...
I'll send an update shortly for comparison/review...


Yes, i agree about these patch need more adjustment, because i think 
maybe there is a better way to fix these issue but i cannot find them ;)


and thanks for your patches.

Essentially though - I think the console check*s* could be done earlier
before we get into other setup that requires going thru cleanup: (or
what error: was). That's one bug - which is a configuration error
which will prevent startup. Since it's easier to check early, provide an
error, and return -1 - that's what I think is cleaner solution.


Looks good for me

Second, the other bug which you were trying to clean up at the same time
is that existing cleanup paths don't properly clean all things up. The
error path worked only when/once the container started and some
pre-container start cleanup was done, but a few important ones were
missing - so that's a separate patch.

I also will add a patch to add/modify the debugging to help future such
efforts...


Thanks


BTW:
It seems commit id '88a1b542' from Cédric Bosdonnat caused yet another
merge conflict and seemingly from the description might have been in the
same area, but alas a quick check shows, it wasn't the same issue.

I'll leave the notes I was developing on this patch prior to just biting
the bullet and reformatting so you can perhaps see my thought process.


Yes, commit 88a1b542 is different from the issue i try to fix.

diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c
index 01da344..1a6cfbb 100644
--- a/src/lxc/lxc_process.c
+++ b/src/lxc/lxc_process.c
@@ -1022,6 +1022,7 @@ int virLXCProcessStart(virConnectPtr conn,
  virCgroupPtr selfcgroup;
  int status;
  char *pidfile = NULL;
+bool need_stop = false;
  


I think the check:

 if (vm-def-nconsoles == 0) {
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED, %s,
_(At least one PTY console is required));
 goto cleanup;
 }

Should perhaps be moved to just before this code:

 nttyFDs = vm-def-nconsoles;
 if (VIR_ALLOC_N(ttyFDs, nttyFDs)  0)
 goto cleanup;
 for (i = 0; i  vm-def-nconsoles; i++)
 ttyFDs[i] = -1;

and that would fix the bug from the bz... as well as ensuring we don't
have a if (VIR_ALLOC_N(ttdFDs, 0)  0)... In fact is there a reason
why that check cannot move much higher and be after the cgroup checks
which return -1?  While t it, why not move the following too:

 for (i = 0; i  vm-def-nconsoles; i++) {
 if (vm-def-consoles[i]-source.type != VIR_DOMAIN_CHR_TYPE_PTY) {
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED, %s,
_(Only PTY console types are supported));
 goto cleanup;
 }
 }

and then remove that from the loop later on.


Yes, after your words, i think console check in this place should be 
improved.

This way checks that go to cleanup are a result of some error in
processing rather than some container configuration error...

Then the remainder of the code could be perhaps patch 2 which is fixing
a different, although related problem.


  if (virCgroupNewSelf(selfcgroup)  0)
  return -1;
@@ -1259,6 +1260,7 @@ int virLXCProcessStart(virConnectPtr conn,
  goto cleanup;
  }
  


...

-VIR_FREE(vm-def-seclabels[0]-imagelabel);
-VIR_DELETE_ELEMENT(vm-def-seclabels, 0, vm-def-nseclabels);
+err = virSaveLastError();
+if (need_stop) {
+virLXCProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED);
+} else {
+virSecurityManagerRestoreAllLabel(driver-securityManager,
+  vm-def, false);
+virSecurityManagerReleaseLabel(driver-securityManager, vm-def);
+/* Clear out dynamically assigned labels */
+if (vm-def-nseclabels 
+vm-def-seclabels[0]-type == 

[libvirt] [PATCH v3] lxc: fix show the wrong xml when guest start failed

2015-02-04 Thread Luyao Huang
https://bugzilla.redhat.com/show_bug.cgi?id=1176503

When guest start failed, libvirt will keep the current vm-def,
this will make a issue that we cannot get a right xml after guest
start failed. And don't call the stop/release hook to do some
other clean work.

Call virLXCProcessCleanup to help us clean the source and call
the hooks if start a vm failed

Signed-off-by: Luyao Huang lhu...@redhat.com
---
v2: use virLXCProcessCleanup to free the source and call the hook.
v3: rework the patch to suit the virLXCProcessStart code changed.

 src/lxc/lxc_process.c | 76 ++-
 1 file changed, 32 insertions(+), 44 deletions(-)

diff --git a/src/lxc/lxc_process.c b/src/lxc/lxc_process.c
index 01da344..1a6cfbb 100644
--- a/src/lxc/lxc_process.c
+++ b/src/lxc/lxc_process.c
@@ -1022,6 +1022,7 @@ int virLXCProcessStart(virConnectPtr conn,
 virCgroupPtr selfcgroup;
 int status;
 char *pidfile = NULL;
+bool need_stop = false;
 
 if (virCgroupNewSelf(selfcgroup)  0)
 return -1;
@@ -1259,6 +1260,7 @@ int virLXCProcessStart(virConnectPtr conn,
 goto cleanup;
 }
 
+need_stop = true;
 priv-stopReason = VIR_DOMAIN_EVENT_STOPPED_FAILED;
 priv-wantReboot = false;
 vm-def-id = vm-pid;
@@ -1272,20 +1274,20 @@ int virLXCProcessStart(virConnectPtr conn,
 
 if (VIR_CLOSE(handshakefds[1])  0) {
 virReportSystemError(errno, %s, _(could not close handshake fd));
-goto error;
+goto cleanup;
 }
 
 if (virCommandHandshakeWait(cmd)  0)
-goto error;
+goto cleanup;
 
 /* Write domain status to disk for the controller to
  * read when it starts */
 if (virDomainSaveStatus(driver-xmlopt, cfg-stateDir, vm)  0)
-goto error;
+goto cleanup;
 
 /* Allow the child to exec the controller */
 if (virCommandHandshakeNotify(cmd)  0)
-goto error;
+goto cleanup;
 
 if (virAtomicIntInc(driver-nactive) == 1  driver-inhibitCallback)
 driver-inhibitCallback(true, driver-inhibitOpaque);
@@ -1298,7 +1300,7 @@ int virLXCProcessStart(virConnectPtr conn,
_(guest failed to start: %s), out);
 }
 
-goto error;
+goto cleanup;
 }
 
 /* We know the cgroup must exist by this synchronization
@@ -1310,13 +1312,13 @@ int virLXCProcessStart(virConnectPtr conn,
   vm-def-resource-partition :
   NULL,
   -1, priv-cgroup)  0)
-goto error;
+goto cleanup;
 
 if (!priv-cgroup) {
 virReportError(VIR_ERR_INTERNAL_ERROR,
_(No valid cgroup for machine %s),
vm-def-name);
-goto error;
+goto cleanup;
 }
 
 /* And we can get the first monitor connection now too */
@@ -1329,17 +1331,17 @@ int virLXCProcessStart(virConnectPtr conn,
 virReportError(VIR_ERR_INTERNAL_ERROR,
_(guest failed to start: %s), ebuf);
 }
-goto error;
+goto cleanup;
 }
 
 if (autoDestroy 
 virCloseCallbacksSet(driver-closeCallbacks, vm,
  conn, lxcProcessAutoDestroy)  0)
-goto error;
+goto cleanup;
 
 if (virDomainObjSetDefTransient(caps, driver-xmlopt,
 vm, false)  0)
-goto error;
+goto cleanup;
 
 /* We don't need the temporary NIC names anymore, clear them */
 virLXCProcessCleanInterfaces(vm-def);
@@ -1358,47 +1360,38 @@ int virLXCProcessStart(virConnectPtr conn,
  * If the script raised an error abort the launch
  */
 if (hookret  0)
-goto error;
+goto cleanup;
 }
 
 rc = 0;
 
  cleanup:
-if (rc != 0  !err)
-err = virSaveLastError();
-virCommandFree(cmd);
 if (VIR_CLOSE(logfd)  0) {
 virReportSystemError(errno, %s, _(could not close logfile));
 rc = -1;
 }
-for (i = 0; i  nveths; i++) {
-if (rc != 0  veths[i])
-ignore_value(virNetDevVethDelete(veths[i]));
-VIR_FREE(veths[i]);
-}
 if (rc != 0) {
-if (vm-newDef) {
-virDomainDefFree(vm-newDef);
-vm-newDef = NULL;
-}
-if (priv-monitor) {
-virObjectUnref(priv-monitor);
-priv-monitor = NULL;
-}
-virDomainConfVMNWFilterTeardown(vm);
-
-virSecurityManagerRestoreAllLabel(driver-securityManager,
-  vm-def, false);
-virSecurityManagerReleaseLabel(driver-securityManager, vm-def);
-/* Clear out dynamically assigned labels */
-if (vm-def-nseclabels 
-vm-def-seclabels[0]-type == VIR_DOMAIN_SECLABEL_DYNAMIC) {
-VIR_FREE(vm-def-seclabels[0]-model);
-VIR_FREE(vm-def-seclabels[0]-label);
-