On 3/7/22 10:51 AM, Daniel P. Berrangé wrote: > On Mon, Mar 07, 2022 at 10:44:56AM +0100, Claudio Fontana wrote: >> Hello Daniel, >> >> On 3/7/22 10:27 AM, Daniel P. Berrangé wrote: >>> On Sat, Mar 05, 2022 at 02:19:39PM +0100, Claudio Fontana wrote: >>>> >>>> Hello all, >>>> >>>> I have been looking at some reports of bad qemu savevm performance in >>>> large VMs (around 20+ Gb), >>>> when used in libvirt commands like: >>>> >>>> >>>> virsh save domain /dev/null >>>> >>>> >>>> >>>> I have written a simple test to run in a Linux centos7-minimal-2009 guest, >>>> which allocates and touches 20G mem. >>>> >>>> With any qemu version since around 2020, I am not seeing more than 580 >>>> Mb/Sec even in the most ideal of situations. >>>> >>>> This drops to around 122 Mb/sec after commit: >>>> cbde7be900d2a2279cbc4becb91d1ddd6a014def . >>>> >>>> Here is the bisection for this particular drop in throughput: >>>> >>>> commit cbde7be900d2a2279cbc4becb91d1ddd6a014def (HEAD, refs/bisect/bad) >>>> Author: Daniel P. Berrangé <berra...@redhat.com> >>>> Date: Fri Feb 19 18:40:12 2021 +0000 >>>> >>>> migrate: remove QMP/HMP commands for speed, downtime and cache size >>>> >>>> The generic 'migrate_set_parameters' command handle all types of param. >>>> >>>> Only the QMP commands were documented in the deprecations page, but the >>>> rationale for deprecating applies equally to HMP, and the replacements >>>> exist. Furthermore the HMP commands are just shims to the QMP commands, >>>> so removing the latter breaks the former unless they get >>>> re-implemented. >>>> >>>> Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> >>>> Signed-off-by: Daniel P. Berrangé <berra...@redhat.com> >>> >>> That doesn't make a whole lot of sense as a bisect result. >>> How reliable is that bisect end point ? Have you bisected >>> to that point more than once ? >> >> I did run through the bisect itself only once, so I'll double check that. >> The results seem to be reproducible almost to the second though, a savevm >> that took 35 seconds before the commit takes 2m 48 seconds after. >> >> For this test I am using libvirt v6.0.0. >> >> If it helps, these are the current_migration->parameters pre-commit >> (captured in qemu_savevm_state_iterate): >> >> >> pre-commit: in qemu_savevm_state_iterate: >> >> (gdb) p current_migration->parameters > >> tls_authz = 0x0, has_max_bandwidth = true, max_bandwidth = >> 9223372036853727232, has_downtime_limit = true, downtime_limit = 300, > > snip > >> and post-commit: in qemu_savevm_state_iterate: >> >> (gdb) p current_migration->parameters > > snip > >> tls_authz = 0x0, has_max_bandwidth = true, max_bandwidth = 134217728, >> has_downtime_limit = true, downtime_limit = 300, > >> so there seems to be a difference in the max_bandwidth parameter, >> do we have a limit suddenly having effect for max_bandwidth after the commit? > > Yes, that's very strange. I think we'll need to capture the QMP commands that > libvirt is sending to QEMU, so see if there';s a difference in what it sends. > This might indicate a latent bug in libvirt.
In the pre-commit case I see: 2022-03-07 10:41:00.928+0000: 132544: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0fd00028a0 event={"timestamp": {"seconds": 1646649660, "microseconds": 927920}, "event": "STOP"} 2022-03-07 10:41:00.929+0000: 132544: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0fd00028a0 reply={"return": {}, "id": "libvirt-13"} 2022-03-07 10:41:00.934+0000: 132549: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7f0fd00028a0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-14"}^M fd=-1 2022-03-07 10:41:00.934+0000: 132544: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7f0fd00028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-14"}^M len=93 ret=93 errno=0 2022-03-07 10:41:00.935+0000: 132544: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0fd00028a0 reply={"return": {}, "id": "libvirt-14"} 2022-03-07 10:41:00.936+0000: 132549: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7f0fd00028a0 msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-15"}^M fd=32 2022-03-07 10:41:00.936+0000: 132544: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7f0fd00028a0 buf={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-15"}^M len=72 ret=72 errno=0 2022-03-07 10:41:00.936+0000: 132544: info : qemuMonitorIOWrite:457 : QEMU_MONITOR_IO_SEND_FD: mon=0x7f0fd00028a0 fd=32 ret=72 errno=0 2022-03-07 10:41:00.937+0000: 132544: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7f0fd00028a0 reply={"return": {}, "id": "libvirt-15"} 2022-03-07 10:41:00.937+0000: 132549: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7f0fd00028a0 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-16"}^M fd=-1 2022-03-07 10:41:00.937+0000: 132544: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7f0fd00028a0 buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-16"}^M len=112 ret=112 errno=0 In the post-commit case I see: 2022-03-07 10:47:07.316+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650027, "microseconds": 316537}, "event": "STOP"} 2022-03-07 10:47:07.317+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {}, "id": "libvirt-13"} 2022-03-07 10:47:07.322+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-14"}^M fd=-1 2022-03-07 10:47:07.322+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-14"}^M len=93 ret=93 errno=0 2022-03-07 10:47:07.324+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-14", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}} 2022-03-07 10:47:07.324+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found 2022-03-07 10:47:07.324+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-15"}^M fd=32 2022-03-07 10:47:07.324+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-15"}^M len=72 ret=72 errno=0 2022-03-07 10:47:07.324+0000: 134386: info : qemuMonitorIOWrite:457 : QEMU_MONITOR_IO_SEND_FD: mon=0x7fa4380028a0 fd=32 ret=72 errno=0 2022-03-07 10:47:07.325+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {}, "id": "libvirt-15"} 2022-03-07 10:47:07.326+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-16"}^M fd=-1 2022-03-07 10:47:07.326+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-16"}^M len=112 ret=112 errno=0 2022-03-07 10:47:07.328+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650027, "microseconds": 327843}, "event": "MIGRATION", "data": {"status": "setup"}} 2022-03-07 10:47:07.328+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {}, "id": "libvirt-16"} 2022-03-07 10:47:07.449+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650027, "microseconds": 449199}, "event": "MIGRATION_PASS", "data": {"pass": 1}} 2022-03-07 10:47:07.449+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650027, "microseconds": 449363}, "event": "MIGRATION", "data": {"status": "active"}} 2022-03-07 10:47:07.807+0000: 134387: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"query-migrate","id":"libvirt-17"}^M fd=-1 2022-03-07 10:47:07.807+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"query-migrate","id":"libvirt-17"}^M len=47 ret=47 errno=0 2022-03-07 10:47:07.809+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {"blocked": false, "expected-downtime": 300, "status": "active", "setup-time": 121, "total-time": 481, "ram": {"total": 32213049344, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 971380, "page-size": 4096, "remaining": 31357165568, "mbps": 70.597440000000006, "transferred": 28723376, "duplicate": 202401, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 26849280, "normal": 6555}}, "id": "libvirt-17"} 2022-03-07 10:47:20.063+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650040, "microseconds": 63299}, "event": "MIGRATION_PASS", "data": {"pass": 2}} 2022-03-07 10:47:20.068+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650040, "microseconds": 68660}, "event": "MIGRATION_PASS", "data": {"pass": 3}} 2022-03-07 10:47:20.142+0000: 134386: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7fa4380028a0 event={"timestamp": {"seconds": 1646650040, "microseconds": 142735}, "event": "MIGRATION", "data": {"status": "completed"}} 2022-03-07 10:47:20.143+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"query-migrate","id":"libvirt-18"}^M fd=-1 2022-03-07 10:47:20.143+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"query-migrate","id":"libvirt-18"}^M len=47 ret=47 errno=0 2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"return": {"blocked": false, "status": "completed", "setup-time": 121, "downtime": 79, "total-time": 12815, "ram": {"total": 32213049344, "postcopy-requests": 0, "dirty-sync-count": 3, "multifd-bytes": 0, "pages-per-second": 32710, "page-size": 4096, "remaining": 0, "mbps": 584.63040491570825, "transferred": 927267953, "duplicate": 7655360, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 856694784, "normal": 209154}}, "id": "libvirt-18"} 2022-03-07 10:47:20.145+0000: 134391: info : qemuMonitorSend:995 : QEMU_MONITOR_SEND_MSG: mon=0x7fa4380028a0 msg={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M fd=-1 2022-03-07 10:47:20.145+0000: 134386: info : qemuMonitorIOWrite:452 : QEMU_MONITOR_IO_WRITE: mon=0x7fa4380028a0 buf={"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-19"}^M len=93 ret=93 errno=0 2022-03-07 10:47:20.146+0000: 134386: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa4380028a0 reply={"id": "libvirt-19", "error": {"class": "CommandNotFound", "desc": "The command migrate_set_speed has not been found"}} 2022-03-07 10:47:20.147+0000: 134391: error : qemuMonitorJSONCheckError:412 : internal error: unable to execute QEMU command 'migrate_set_speed': The command migrate_set_speed has not been found 2022-03-07 10:47:20.150+0000: 134391: info : qemuMonitorClose:917 : QEMU_MONITOR_CLOSE: mon=0x7fa4380028a0 refs=2 > > If you libvirt_log_filters=2:qemu_monitor then it ought to capture the > QMP commands. > > Regards, > Daniel >