On 01/24/2018 02:16 PM, Eric Blake wrote: > On 01/24/2018 12:17 AM, Liang Li wrote: >> We found that when doing drive mirror to a low speed shared storage, >> if there was heavy BLK IO write workload in VM after the 'ready' event, >> drive mirror block job can't be canceled immediately, it would keep >> running until the heavy BLK IO workload stopped in the VM. This patch >> fixed this issue. > > I think you are breaking semantics here. Libvirt relies on > 'block-job-cancel' after the 'ready' event to be a clean point-in-time > snapshot, but that is only possible if there is no out-of-order pending > I/O at the time the action takes place. Breaking in the middle of the > loop, without using bdrv_drain(), risks leaving an inconsistent copy of > data in the mirror not corresponding to any point-in-time on the source. > > There's ongoing work on adding async mirroring; this may be a better > solution to the issue you are seeing. > > https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05419.html >
Sounds like another point for the idea of using a "completion mode" in a 2.0 API instead of treating "cancel" like a valid way of completing a job. (Kevin: If you're taking this up, it would be *very* nice to have jobs have an option via job-set-property or some such command that allows us to change our desired completion mode on the fly, which frees up cancel to be simply a cancel.) --js