Re: [PATCH] vm : forbid to start a being removed vm

2021-04-29 Thread Hogan Wang
> On 3/11/21 2:13 AM, Hogan Wang wrote:
> > From: Zhuang Shengen 
> > 
> > When a vm is doing migration phase confirm, and then start it 
> > concurrently, it will lead to the vm out of libvirtd control.
> > 
> > Cause Analysis:
> > 1. thread1 migrate vm out.
> > 2. thread2 start the migrating vm.
> > 3. thread1 remove vm from domain list after migrate success.
> 
> I guess I'm puzzled here. Thread1 has started migration or finished too? 
> How can thread2 start the VM during migration? Is that an offline migration?
> 
> To me it seems like there's a problem with what jobs are allowed during 
> migration, or their order. Looking at qemuMigrationSrcConfirm() which is the 
> function that does final step of migration (at the source), so it ends 
> migration job, then it starts MODIFY job (which potentially unlocks the 
> domain object) and only then it sets vm->removing = true and removes it from 
> the list.
> 
> What I am trying to say is that because the vm object can be briefly unlocked 
> thus another thread might lock it successfully, proceed to BeginJob(), where 
> it unlocks the object again only to let the first thread to remove it from 
> the list.
> 
> It's not only virDomainCreateWithFlags() that's affected - basically any 
> other API that grabs a job. Even though, plenty of them require domain to be 
> running, therefore they exit early with "domain not running" error message.
> 
> I wonder whether something among these lines fixes the problem for you 
> (compile tested only, but it shows the idea):
>

Thanks for your suggestions, your proposal can not fix this issue, the reason 
is that virDomainCreateWithFlags task will finish the job after 
qemuMigrationSrcConfirm task remove the vm object from the domain list, and 
then qemu process still started however libvirtd lost the vm object.

I think qemuDomainObjBeginJobInternal should return error when some task 
acquire job for a removed vm object(vm->removing is true).

diff --git a/src/qemu/qemu_domainjob.c b/src/qemu/qemu_domainjob.c
index 50cfc45f5b..246c3208fc 100644
--- a/src/qemu/qemu_domainjob.c
+++ b/src/qemu/qemu_domainjob.c
@@ -880,6 +880,9 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver,
if (!nested && !qemuDomainNestedJobAllowed(&priv->job, job))
goto retry;

+if (obj->removing)
+goto error;
+
ignore_value(virTimeMillisNow(&now));

if (job) {
@@ -1011,6 +1014,10 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver,
 "due to max_queued limit"));
}
ret = -2;
+} else if (obj->removing) {
+virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+   _("cannot acquire job mutex for removed domain"));
+ret = -2;
} else {
virReportSystemError(errno, "%s", _("cannot acquire job mutex"));
}
--

> diff --git i/src/qemu/qemu_migration.c w/src/qemu/qemu_migration.c index 
> 79dcb4a15d..4b1bb77155 100644
> --- i/src/qemu/qemu_migration.c
> +++ w/src/qemu/qemu_migration.c
> @@ -3468,6 +3468,7 @@ qemuMigrationSrcConfirm(virQEMUDriverPtr driver,
>   qemuMigrationJobPhase phase;
>   g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);
>   int ret = -1;
> +bool haveJob;
> 
>   if (!qemuMigrationJobIsActive(vm, QEMU_ASYNC_JOB_MIGRATION_OUT))
>   goto cleanup;
> @@ -3485,15 +3486,21 @@ qemuMigrationSrcConfirm(virQEMUDriverPtr driver,
>  cookiein, cookieinlen,
>  flags, cancelled);
> 
> -qemuMigrationJobFinish(driver, vm);
> +haveJob = qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) >= 0;
> +
>   if (!virDomainObjIsActive(vm)) {
>   if (!cancelled && ret == 0 && flags & VIR_MIGRATE_UNDEFINE_SOURCE) {
>   virDomainDeleteConfig(cfg->configDir, cfg->autostartDir, vm);
>   vm->persistent = 0;
>   }
> -qemuDomainRemoveInactiveJob(driver, vm);
> +qemuDomainRemoveInactive(driver, vm);
>   }
> 
> +if (haveJob)
> +qemuDomainObjEndJob(driver, vm);
> +
> +qemuMigrationJobFinish(driver, vm);
> +
>cleanup:
>   virDomainObjEndAPI(&vm);
>   return ret;
> 
> 
> 
> > 4. thread2 acquired the vm job success and start the vm.
> > 5. cannot find the vm any more by 'virsh list' command. Actually,
> > the started vm is not exist in the domain list.
> > 
> > Solution:
> > Check the vm->removing state before start.
> > 
> > 
> > Signed-off-by: Zhuang Shengen 
&g

[PATCH] already removed obj is not allowed to acquire job

2021-03-30 Thread Hogan Wang
From: Zhuang Shengen 

a removed vm begin a job may cause unanticipated results.so
add judgement in qemuDomainObjBeginJobInternal to forbid
a removed obj acquire the job

Signed-off-by: Zhuang Shengen 
Reviewed-by: Hogan Wang 
---
 src/qemu/qemu_domainjob.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/qemu/qemu_domainjob.c b/src/qemu/qemu_domainjob.c
index 50cfc45f5b..246c3208fc 100644
--- a/src/qemu/qemu_domainjob.c
+++ b/src/qemu/qemu_domainjob.c
@@ -880,6 +880,9 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver,
 if (!nested && !qemuDomainNestedJobAllowed(&priv->job, job))
 goto retry;
 
+if (obj->removing == 1)
+goto error;
+
 ignore_value(virTimeMillisNow(&now));
 
 if (job) {
@@ -1011,6 +1014,10 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver,
  "due to max_queued limit"));
 }
 ret = -2;
+} else if (obj->removing == 1) {
+virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+   _("cannot acquire job mutex for removed domain"));
+ret = -2;
 } else {
 virReportSystemError(errno, "%s", _("cannot acquire job mutex"));
 }
-- 
2.23.0




Re: Re: [PATCH] vm : forbid to start a removing vm

2021-03-26 Thread Hogan Wang
> On Fri, Mar 05, 2021 at 09:19:52AM +0800, Hogan Wang wrote:
> >From: Zhuang Shengen 
> >
> >When a vm is doing migration phase confirm, and then start it 
> >concurrently, it will lead to the vm out of libvirtd control.
> >
> >Cause Analysis:
> >1. thread1 migrate vm out.
> >2. thread2 start the migrating vm.
> >3. thread1 remove vm from domain list after migrate success.
> >4. thread2 acquired the vm job success and start the vm.
> >5. cannot find the vm any more by 'virsh list' command. Actually,
> >   the started vm is not exist in the domain list.
> >
> >Solution:
> >Check the vm->removing state before start.
> >
> 
> Well, this would only fix starting it, but there could be other ways
> that domain can be started, right?  Like restoring it from a save.
> 
> Anyway, I think the issue here is that the CreateWithFlags is even
> able to get a job started, I think you should look into that.
> 

Yes, it's, a removed vm begin a job may cause unanticipated results.
Therefore, I think qemuDomainObjBeginJobInternal should return error
after a removed vm acquire job success. I will push an new patch to
fix it.

> >
> >Signed-off-by: Zhuang Shengen 
> >Reviewed-by: Hogan Wang 
> >---
> > src/qemu/qemu_driver.c | 6 ++
> > 1 file changed, 6 insertions(+)
> >
> >diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 
> >d1a3659774..a5dfea94cb 100644
> >--- a/src/qemu/qemu_driver.c
> >+++ b/src/qemu/qemu_driver.c
> >@@ -6637,6 +6637,12 @@ qemuDomainCreateWithFlags(virDomainPtr dom, unsigned 
> >int flags)
> > goto endjob;
> > }
> >
> >+if (vm->removing) {
> >+virReportError(VIR_ERR_OPERATION_INVALID,
> >+   "%s", _("domain is already removing"));
> >+goto endjob;
> >+}
> >+
> > if (qemuDomainObjStart(dom->conn, driver, vm, flags,
> >QEMU_ASYNC_JOB_START) < 0)
> > goto endjob;
> >--
> >2.23.0
> >
> >




[PATCH] vm : forbid to start a removing vm

2021-03-10 Thread Hogan Wang
From: Zhuang Shengen 

When a vm is doing migration phase confirm, and then start it concurrently,
it will lead to the vm out of libvirtd control.

Cause Analysis:
1. thread1 migrate vm out.
2. thread2 start the migrating vm.
3. thread1 remove vm from domain list after migrate success.
4. thread2 acquired the vm job success and start the vm.
5. cannot find the vm any more by 'virsh list' command. Actually,
   the started vm is not exist in the domain list.

Solution:
Check the vm->removing state before start.


Signed-off-by: Zhuang Shengen 
Reviewed-by: Hogan Wang 
---
 src/qemu/qemu_driver.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index d1a3659774..a5dfea94cb 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -6637,6 +6637,12 @@ qemuDomainCreateWithFlags(virDomainPtr dom, unsigned int 
flags)
 goto endjob;
 }
 
+if (vm->removing) {
+virReportError(VIR_ERR_OPERATION_INVALID,
+   "%s", _("domain is already removing"));
+goto endjob;
+}
+
 if (qemuDomainObjStart(dom->conn, driver, vm, flags,
QEMU_ASYNC_JOB_START) < 0)
 goto endjob;
-- 
2.23.0




[PATCH] vm : forbid to start a removing vm

2021-03-04 Thread Hogan Wang
From: Zhuang Shengen 

When a vm is doing migration phase confirm, and then start it concurrently,
it will lead to the vm out of libvirtd control.

Cause Analysis:
1. thread1 migrate vm out.
2. thread2 start the migrating vm.
3. thread1 remove vm from domain list after migrate success.
4. thread2 acquired the vm job success and start the vm.
5. cannot find the vm any more by 'virsh list' command. Actually,
   the started vm is not exist in the domain list.

Solution:
Check the vm->removing state before start.


Signed-off-by: Zhuang Shengen 
Reviewed-by: Hogan Wang 
---
 src/qemu/qemu_driver.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index d1a3659774..a5dfea94cb 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -6637,6 +6637,12 @@ qemuDomainCreateWithFlags(virDomainPtr dom, unsigned int 
flags)
 goto endjob;
 }
 
+if (vm->removing) {
+virReportError(VIR_ERR_OPERATION_INVALID,
+   "%s", _("domain is already removing"));
+goto endjob;
+}
+
 if (qemuDomainObjStart(dom->conn, driver, vm, flags,
QEMU_ASYNC_JOB_START) < 0)
 goto endjob;
-- 
2.23.0




[PATCH python] iothread: fix memory access out of bounds

2021-02-23 Thread Hogan Wang
From: suruifeng 

When the 'pcpu' is larger then the last 'iothr->cpumap' bits,
set the list element to False to avoid out of bounds access
'iothr->cpumap'.

Signed-off-by: suruifeng 
Reviewed-by: Hogan Wang 

---
 libvirt-override.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libvirt-override.c b/libvirt-override.c
index 1f55864..b099f51 100644
--- a/libvirt-override.c
+++ b/libvirt-override.c
@@ -1625,10 +1625,14 @@ libvirt_virDomainGetIOThreadInfo(PyObject *self 
ATTRIBUTE_UNUSED,
 VIR_PY_TUPLE_SET_GOTO(iothrtpl, 1, iothrmap, cleanup);
 
 for (pcpu = 0; pcpu < cpunum; pcpu++)
-VIR_PY_LIST_SET_GOTO(iothrmap, pcpu,
- PyBool_FromLong(VIR_CPU_USED(iothr->cpumap,
-  pcpu)),
- cleanup);
+if (VIR_CPU_MAPLEN(pcpu + 1) > iothr->cpumaplen) {
+VIR_PY_LIST_SET_GOTO(iothrmap, pcpu, PyBool_FromLong(0), 
cleanup);
+} else {
+VIR_PY_LIST_SET_GOTO(iothrmap, pcpu,
+ 
PyBool_FromLong(VIR_CPU_USED(iothr->cpumap,
+  pcpu)),
+ cleanup);
+}
 }
 
 py_retval = py_iothrinfo;
-- 
2.23.0