Re: [libvirt] [BUG] libvirtd on destination crash frequently while migrating vms concurrently

2013-10-11 Thread Wangyufei (A)
Thanks a lot, I'll have a try.

> -Original Message-
> From: Michal Privoznik [mailto:mpriv...@redhat.com]
> Sent: Friday, October 11, 2013 8:58 PM
> To: Wangyufei (A)
> Cc: libvir-list@redhat.com; jdene...@redhat.com; Wangrui (K)
> Subject: Re: [libvirt] [BUG] libvirtd on destination crash frequently while
> migrating vms concurrently
> 
> On 27.09.2013 09:55, Wangyufei (A) wrote:
> > Hello,
> > I found a problem that libvirtd on destination crash frequently while
> > migrating vms concurrently. For example, if I migrate 10 vms
> > concurrently ceaselessly, then after about 30 minutes the libvirtd on
> > destination will crash. So I analyzed and found two bugs during
> > migration process.
> > First, during migration prepare phase on destination, libvirtd assigns
> > ports to qemu to be startd on destination. But the port increase
> > operation is not aomic, so there's a chance that multi vms get the same
> > port, and only the first one can start successfully, others will fail to
> > start. I've applied a patch to solve this bug, and I test it, it works
> > well. If only this bug exists, libvirtd will not crash. The second bug
> > is fatal.
> > Second, I found the libvirtd crash because of segment fault which is
> > produced by accessing vm released. Apparently it's caused by
> > multi-thread operation, thread A access vm data which has released by
> > thread B. At last I proved my thought right.
> 
> So I've just pushed the patch upstream. Please give it a try if it
> resolves your problem.
> 
> Michal


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [BUG] libvirtd on destination crash frequently while migrating vms concurrently

2013-10-11 Thread Michal Privoznik
On 27.09.2013 09:55, Wangyufei (A) wrote:
> Hello,
> I found a problem that libvirtd on destination crash frequently while
> migrating vms concurrently. For example, if I migrate 10 vms
> concurrently ceaselessly, then after about 30 minutes the libvirtd on
> destination will crash. So I analyzed and found two bugs during
> migration process.
> First, during migration prepare phase on destination, libvirtd assigns
> ports to qemu to be startd on destination. But the port increase
> operation is not aomic, so there’s a chance that multi vms get the same
> port, and only the first one can start successfully, others will fail to
> start. I’ve applied a patch to solve this bug, and I test it, it works
> well. If only this bug exists, libvirtd will not crash. The second bug
> is fatal.
> Second, I found the libvirtd crash because of segment fault which is
> produced by accessing vm released. Apparently it’s caused by
> multi-thread operation, thread A access vm data which has released by
> thread B. At last I proved my thought right.

So I've just pushed the patch upstream. Please give it a try if it
resolves your problem.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [BUG] libvirtd on destination crash frequently while migrating vms concurrently

2013-10-11 Thread Michal Privoznik
On 27.09.2013 09:55, Wangyufei (A) wrote:
> Hello,
> I found a problem that libvirtd on destination crash frequently while
> migrating vms concurrently. For example, if I migrate 10 vms
> concurrently ceaselessly, then after about 30 minutes the libvirtd on
> destination will crash. So I analyzed and found two bugs during
> migration process.
> First, during migration prepare phase on destination, libvirtd assigns
> ports to qemu to be startd on destination. But the port increase
> operation is not aomic, so there’s a chance that multi vms get the same
> port, and only the first one can start successfully, others will fail to
> start. I’ve applied a patch to solve this bug, and I test it, it works
> well. If only this bug exists, libvirtd will not crash. The second bug
> is fatal.
> Second, I found the libvirtd crash because of segment fault which is
> produced by accessing vm released. Apparently it’s caused by
> multi-thread operation, thread A access vm data which has released by
> thread B. At last I proved my thought right.
>  
> Step 1. Because of bug one, the port is already occupied, so qemu on
> destination failed to start and sent a HANGUP signal to libvirtd, then
> libvirtd received this VIR_EVENT_HANDLE_HANGUP event, thread A dealing
> with events called qemuProcessHandleMonitorEOF as following:
>  
> #0  qemuProcessHandleMonitorEOF (mon=0x7f4dcd9c3130, vm=0x7f4dcd9c9780)
> at qemu/qemu_process.c:399
> #1  0x7f4dc18d9e87 in qemuMonitorIO (watch=68, fd=27, events=8,
> opaque=0x7f4dcd9c3130) at qemu/qemu_monitor.c:668
> #2  0x7f4dccae6604 in virEventPollDispatchHandles (nfds=18,
> fds=0x7f4db4017e70) at util/vireventpoll.c:500
> #3  0x7f4dccae7ff2 in virEventPollRunOnce () at util/vireventpoll.c:646
> #4  0x7f4dccae60e4 in virEventRunDefaultImpl () at util/virevent.c:273
> #5  0x7f4dccc40b25 in virNetServerRun (srv=0x7f4dcd8d26b0)
> at rpc/virnetserver.c:1106
> #6  0x7f4dcd6164c9 in main (argc=3, argv=0x7fff8d8f9f88)
> at libvirtd.c:1518
> 

In fact I saw the very same issue and I proposed a patch:

https://www.redhat.com/archives/libvir-list/2013-October/msg00347.html

I got ACKed however, prior pushing it I've done some testing and it
seems that under heavy load it doesn't play nice (qemuhotplug test is
getting NULL monitor, ouch.) But if you can apply the patch and see if
it fixes your problem that would be helpful - at least knowing I'm going
the right way.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [BUG] libvirtd on destination crash frequently while migrating vms concurrently

2013-09-29 Thread Wangyufei (A)
Hi guys,
   Is there any problem with my analysis? Am I right?

   If my analysis is right, do we have any plan to solve this kind of problems 
caused by deletion of driver lock?

   Thanks for your time and kind to reply.

   _
   From: Wangyufei (A)
   Sent: Friday, September 27, 2013 3:56 PM
   To: libvir-list@redhat.com
   Cc: Wangrui (K); Wangyufei (A); Michal Privoznik; jdene...@redhat.com
   Subject: [BUG] libvirtd on destination crash frequently while migrating vms 
concurrently


   Hello,
   I found a problem that libvirtd on destination crash frequently while 
migrating vms concurrently. For example, if I migrate 10 vms concurrently 
ceaselessly, then after about 30 minutes the libvirtd on destination will 
crash. So I analyzed and found two bugs during migration process.
   First, during migration prepare phase on destination, libvirtd assigns ports 
to qemu to be startd on destination. But the port increase operation is not 
aomic, so there's a chance that multi vms get the same port, and only the first 
one can start successfully, others will fail to start. I've applied a patch to 
solve this bug, and I test it, it works well. If only this bug exists, libvirtd 
will not crash. The second bug is fatal.
   Second, I found the libvirtd crash because of segment fault which is 
produced by accessing vm released. Apparently it's caused by multi-thread 
operation, thread A access vm data which has released by thread B. At last I 
proved my thought right.

   Step 1. Because of bug one, the port is already occupied, so qemu on 
destination failed to start and sent a HANGUP signal to libvirtd, then libvirtd 
received this VIR_EVENT_HANDLE_HANGUP event, thread A dealing with events 
called qemuProcessHandleMonitorEOF as following:

#0  qemuProcessHandleMonitorEOF (mon=0x7f4dcd9c3130, vm=0x7f4dcd9c9780)
at qemu/qemu_process.c:399
#1  0x7f4dc18d9e87 in qemuMonitorIO (watch=68, fd=27, events=8,
opaque=0x7f4dcd9c3130) at qemu/qemu_monitor.c:668
#2  0x7f4dccae6604 in virEventPollDispatchHandles (nfds=18,
fds=0x7f4db4017e70) at util/vireventpoll.c:500
#3  0x7f4dccae7ff2 in virEventPollRunOnce () at util/vireventpoll.c:646
#4  0x7f4dccae60e4 in virEventRunDefaultImpl () at util/virevent.c:273
#5  0x7f4dccc40b25 in virNetServerRun (srv=0x7f4dcd8d26b0)
at rpc/virnetserver.c:1106
#6  0x7f4dcd6164c9 in main (argc=3, argv=0x7fff8d8f9f88)
at libvirtd.c:1518


static int virEventPollDispatchHandles(int nfds, struct pollfd *fds) {
..
/*
deleted flag is still false now, so we pass through to 
qemuProcessHandleMonitorEOF
*/
if (eventLoop.handles[i].deleted) {
EVENT_DEBUG("Skip deleted n=%d w=%d f=%d", i,
eventLoop.handles[i].watch, eventLoop.handles[i].fd);
continue;
}


   Step 2: Thread B dealing with migration on destination set deleted flag in 
virEventPollRemoveHandle as following:

#0  virEventPollRemoveHandle (watch=74) at util/vireventpoll.c:176
#1  0x7f4dccae5e6f in virEventRemoveHandle (watch=74)
at util/virevent.c:97
#2  0x7f4dc18d8ca8 in qemuMonitorClose (mon=0x7f4dbc030910)
at qemu/qemu_monitor.c:831
#3  0x7f4dc18bec63 in qemuProcessStop (driver=0x7f4dcd9bd400,
vm=0x7f4dbc00ed20, reason=VIR_DOMAIN_SHUTOFF_FAILED, flags=0)
at qemu/qemu_process.c:4302
#4  0x7f4dc18c1a83 in qemuProcessStart (conn=0x7f4dbc031020,
driver=0x7f4dcd9bd400, vm=0x7f4dbc00ed20,
migrateFrom=0x7f4dbc01af90 "tcp:[::]:49152", stdin_fd=-1,
stdin_path=0x0, snapshot=0x0,
vmop=VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_START, flags=6)
at qemu/qemu_process.c:4145
#5  0x7f4dc18cc688 in qemuMigrationPrepareAny (driver=0x7f4dcd9bd400,

   Step 3: Thread B cleanup vm in qemuMigrationPrepareAny after 
qemuProcessStart failed.

#0  virDomainObjDispose (obj=0x7f4dcd9c9780) at conf/domain_conf.c:2009
#1  0x7f4dccb0ccd9 in virObjectUnref (anyobj=0x7f4dcd9c9780)
at util/virobject.c:266
#2  0x7f4dccb42340 in virDomainObjListRemove (doms=0x7f4dcd9bd4f0,
dom=0x7f4dcd9c9780) at conf/domain_conf.c:2342
#3  0x7f4dc189ac33 in qemuDomainRemoveInactive (driver=0x7f4dcd9bd400,
vm=0x7f4dcd9c9780) at qemu/qemu_domain.c:1993
#4  0x7f4dc18ccad5 in qemuMigrationPrepareAny (driver=0x7f4dcd9bd400,

   Step 4: Thread A access priv which is released by thread B before, then 
libvirtd crash, bomb!

static void
qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
virDomainObjPtr vm)
{
virQEMUDriverPtr driver = qemu_driver;
virDomainEventPtr event = NULL;
qemuDomainObjPrivatePtr priv;
int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN;
int stopReason = VIR_DOMAIN_SHUTOFF_SHUTDOWN;
const char *auditReason = "shutdown";

VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);

virObjectLock(vm);

priv = vm->privateData;
(gdb) p priv
$1 = (qemuDomainObj

[libvirt] [BUG] libvirtd on destination crash frequently while migrating vms concurrently

2013-09-27 Thread Wangyufei (A)
Hello,
I found a problem that libvirtd on destination crash frequently while migrating 
vms concurrently. For example, if I migrate 10 vms concurrently ceaselessly, 
then after about 30 minutes the libvirtd on destination will crash. So I 
analyzed and found two bugs during migration process.
First, during migration prepare phase on destination, libvirtd assigns ports to 
qemu to be startd on destination. But the port increase operation is not aomic, 
so there's a chance that multi vms get the same port, and only the first one 
can start successfully, others will fail to start. I've applied a patch to 
solve this bug, and I test it, it works well. If only this bug exists, libvirtd 
will not crash. The second bug is fatal.
Second, I found the libvirtd crash because of segment fault which is produced 
by accessing vm released. Apparently it's caused by multi-thread operation, 
thread A access vm data which has released by thread B. At last I proved my 
thought right.

Step 1. Because of bug one, the port is already occupied, so qemu on 
destination failed to start and sent a HANGUP signal to libvirtd, then libvirtd 
received this VIR_EVENT_HANDLE_HANGUP event, thread A dealing with events 
called qemuProcessHandleMonitorEOF as following:

#0  qemuProcessHandleMonitorEOF (mon=0x7f4dcd9c3130, vm=0x7f4dcd9c9780)
at qemu/qemu_process.c:399
#1  0x7f4dc18d9e87 in qemuMonitorIO (watch=68, fd=27, events=8,
opaque=0x7f4dcd9c3130) at qemu/qemu_monitor.c:668
#2  0x7f4dccae6604 in virEventPollDispatchHandles (nfds=18,
fds=0x7f4db4017e70) at util/vireventpoll.c:500
#3  0x7f4dccae7ff2 in virEventPollRunOnce () at util/vireventpoll.c:646
#4  0x7f4dccae60e4 in virEventRunDefaultImpl () at util/virevent.c:273
#5  0x7f4dccc40b25 in virNetServerRun (srv=0x7f4dcd8d26b0)
at rpc/virnetserver.c:1106
#6  0x7f4dcd6164c9 in main (argc=3, argv=0x7fff8d8f9f88)
at libvirtd.c:1518


static int virEventPollDispatchHandles(int nfds, struct pollfd *fds) {
..
/*
deleted flag is still false now, so we pass through to 
qemuProcessHandleMonitorEOF
*/
if (eventLoop.handles[i].deleted) {
EVENT_DEBUG("Skip deleted n=%d w=%d f=%d", i,
eventLoop.handles[i].watch, eventLoop.handles[i].fd);
continue;
}


Step 2: Thread B dealing with migration on destination set deleted flag in 
virEventPollRemoveHandle as following:

#0  virEventPollRemoveHandle (watch=74) at util/vireventpoll.c:176
#1  0x7f4dccae5e6f in virEventRemoveHandle (watch=74)
at util/virevent.c:97
#2  0x7f4dc18d8ca8 in qemuMonitorClose (mon=0x7f4dbc030910)
at qemu/qemu_monitor.c:831
#3  0x7f4dc18bec63 in qemuProcessStop (driver=0x7f4dcd9bd400,
vm=0x7f4dbc00ed20, reason=VIR_DOMAIN_SHUTOFF_FAILED, flags=0)
at qemu/qemu_process.c:4302
#4  0x7f4dc18c1a83 in qemuProcessStart (conn=0x7f4dbc031020,
driver=0x7f4dcd9bd400, vm=0x7f4dbc00ed20,
migrateFrom=0x7f4dbc01af90 "tcp:[::]:49152", stdin_fd=-1,
stdin_path=0x0, snapshot=0x0,
vmop=VIR_NETDEV_VPORT_PROFILE_OP_MIGRATE_IN_START, flags=6)
at qemu/qemu_process.c:4145
#5  0x7f4dc18cc688 in qemuMigrationPrepareAny (driver=0x7f4dcd9bd400,

Step 3: Thread B cleanup vm in qemuMigrationPrepareAny after qemuProcessStart 
failed.

#0  virDomainObjDispose (obj=0x7f4dcd9c9780) at conf/domain_conf.c:2009
#1  0x7f4dccb0ccd9 in virObjectUnref (anyobj=0x7f4dcd9c9780)
at util/virobject.c:266
#2  0x7f4dccb42340 in virDomainObjListRemove (doms=0x7f4dcd9bd4f0,
dom=0x7f4dcd9c9780) at conf/domain_conf.c:2342
#3  0x7f4dc189ac33 in qemuDomainRemoveInactive (driver=0x7f4dcd9bd400,
vm=0x7f4dcd9c9780) at qemu/qemu_domain.c:1993
#4  0x7f4dc18ccad5 in qemuMigrationPrepareAny (driver=0x7f4dcd9bd400,

Step 4: Thread A access priv which is released by thread B before, then 
libvirtd crash, bomb!

static void
qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
virDomainObjPtr vm)
{
virQEMUDriverPtr driver = qemu_driver;
virDomainEventPtr event = NULL;
qemuDomainObjPrivatePtr priv;
int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN;
int stopReason = VIR_DOMAIN_SHUTOFF_SHUTDOWN;
const char *auditReason = "shutdown";

VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);

virObjectLock(vm);

priv = vm->privateData;
(gdb) p priv
$1 = (qemuDomainObjPrivatePtr) 0x0
if (priv->beingDestroyed) {

At last if anything bad happened to make qemuProcessStart failed during 
migration on destination, we'll be in the big trouble that accessing some 
memory freed. I didn't find any locks or flags exist could stop this happening. 
Please help me out, thanks a lot.

Best Regards,
-WangYufei



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list