Oops, forgot to CC qemu-devel, add it.
> -----Original Message----- > From: Gonglei (Arei) > Sent: Friday, May 19, 2017 8:17 PM > To: 'Paolo Bonzini'; yanghongyang; m...@redhat.com > Cc: quint...@redhat.com; Dr. David Alan Gilbert; Huangzhichao > Subject: RE: Migration downtime more than 5s when migrating guest with > massive disks > > > > -----Original Message----- > > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > > Sent: Friday, May 19, 2017 6:19 PM > > > > On 19/05/2017 12:00, Yang Hongyang wrote: > > > We found that migration downtime is unacceptable when migrating guest > > with > > > 60 disks, more than 5.5 seconds. > > > By debugging, we find out the problem is there's too many > > > memory_region_transaction_commit() operations during guest load, about > > > 31w+ times. > > > Any idea to optimize the migration downtime in this scenario? > > > maybe reduce the times of memory_region_transaction_commit() call, but > > how? > > > or we could optimize the time cost of > memory_region_transaction_commit() > > call, > > > but I think that wouldn't help much. > > > > It would. Right now memory_region_transaction_commit() is roughly > > O(n^2) (n devices * n BARs), and there are n of them. > > > > Reducing memory_region_transaction_commit to O(n) would be a large > > change. One idea is to share the AddressSpaceDispatch for AddressSpaces > > that have the same root memory region (after resolving aliases). The > > starting point would be to change mem_begin/mem_commit/mem_add from > a > > MemoryListener to an loop on the FlatView, storing the > > AddressSpaceDispatch in the FlatView. > > > How about do O(1) for stopping stage of live migration? > Because the cpu is stopped in this phase, it wouldn't cause > side effects IMHO, right? > > Thanks, > -Gonglei > > > One bandaid solution is to use virtio-scsi in the guest, with multiple > > disks behind one controller. > > > > Thanks, > > > > Paolo