Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
Hi Marc, Currently the block job handling needs to be sorted out before events can assume independent handling from RPC contexts. Sorry, I have not been able to revisit this in the past 2 months, but this is something I would very much like to fix. I will try looking up the exact dependency of the block layer so that this series could make some progress. regards, Prerna On Wed, Mar 21, 2018 at 1:24 PM, Marc Hartmayer wrote: > On Tue, Oct 24, 2017 at 07:34 PM +0200, Prerna Saxena < > saxenap@gmail.com> wrote: > > As noted in > > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > > libvirt-QEMU driver handles all async events from the main loop. > > Each event handling needs the per-VM lock to make forward progress. In > > the case where an async event is received for the same VM which has an > > RPC running, the main loop is held up contending for the same lock. > > > > This impacts scalability, and should be addressed on priority. > > > > Note that libvirt does have a 2-step deferred handling for a few event > > categories, but (1) That is insufficient since blockign happens before > > the handler could disambiguate which one needs to be posted to this > > other queue. > > (2) There needs to be homogeniety. > > > > The current series builds a framework for recording and handling VM > > events. > > It initializes per-VM event queue, and a global event queue pointing to > > events from all the VMs. Event handling is staggered in 2 stages: > > - When an event is received, it is enqueued in the per-VM queue as well > > as the global queues. > > - The global queue is built into the QEMU Driver as a threadpool > > (currently with a single thread). > > - Enqueuing of a new event triggers the global event worker thread, which > > then attempts to take a lock for this event's VM. > > - If the lock is available, the event worker runs the function > handling > > this event type. Once done, it dequeues this event from the global > > as well as per-VM queues. > > - If the lock is unavailable(ie taken by RPC thread), the event > worker > > thread leaves this as-is and picks up the next event. > > - Once the RPC thread completes, it looks for events pertaining to the > > VM in the per-VM event queue. It then processes the events serially > > (holding the VM lock) until there are no more events remaining for > > this VM. At this point, the per-VM lock is relinquished. > > > > Patch Series status: > > Strictly RFC only. No compilation issues. I have not had a chance to > > (stress) test it after rebase to latest master. > > Note that documentation and test coverage is TBD, since a few open > > points remain. > > > > Known issues/ caveats: > > - RPC handling time will become non-deterministic. > > - An event will only be "notified" to a client once the RPC for same VM > completes. > > - Needs careful consideration in all cases where a QMP event is used to > > "signal" an RPC thread, else will deadlock. > > > > Will be happy to drive more discussion in the community and completely > > implement it. > > > > Prerna Saxena (8): > > Introduce virObjectTrylock() > > QEMU Event handling: Introduce async event helpers in qemu_event.[ch] > > Setup global and per-VM event queues. Also initialize per-VM queues > > when libvirt reconnects to an existing VM. > > Events: Allow monitor to "enqueue" events to a queue. Also introduce a > > framework of handlers for each event type, that can be called when > > the handler is running an event. > > Events: Plumb event handling calls before a domain's APIs complete. > > Code refactor: Move helper functions of doCoreDump*, syncNicRxFilter*, > > and qemuOpenFile* to qemu_process.[ch] > > Fold back the 2-stage event implementation for a few events : > > Watchdog, Monitor EOF, Serial changed, Guest panic, Nic RX filter > > changed .. into single level. > > Initialize the per-VM event queues in context of domain init. > > > > src/Makefile.am |1 + > > src/conf/domain_conf.h |3 + > > src/libvirt_private.syms |1 + > > src/qemu/qemu_conf.h |4 + > > src/qemu/qemu_driver.c | 1710 +++ > > src/qemu/qemu_event.c| 317 +++ > > src/qemu/qemu_event.h| 231 + > > src/qemu/qemu_monitor.c | 592 ++-- > > src/qemu/qemu_monitor.h | 80 +- > > src/qemu/qemu_monitor_json.c | 291 +++--- > > src/qemu/qemu_process.c | 2031 ++ > > > src/qemu/qemu_process.h | 88 ++ > > src/util/virobject.c | 26 + > > src/util/virobject.h |4 + > > src/util/virthread.c |5 + > > src/util/virthread.h |1 + > > tests/qemumonitortestutils.c |2 +- > > 17 files changed, 3411 insertions(+), 1976 deletions(-) > > create mode 100644 src/qemu/qemu_event.c > > create mode 100644 src/qemu/qemu_event.h >
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
On Tue, Oct 24, 2017 at 07:34 PM +0200, Prerna Saxena wrote: > As noted in > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > libvirt-QEMU driver handles all async events from the main loop. > Each event handling needs the per-VM lock to make forward progress. In > the case where an async event is received for the same VM which has an > RPC running, the main loop is held up contending for the same lock. > > This impacts scalability, and should be addressed on priority. > > Note that libvirt does have a 2-step deferred handling for a few event > categories, but (1) That is insufficient since blockign happens before > the handler could disambiguate which one needs to be posted to this > other queue. > (2) There needs to be homogeniety. > > The current series builds a framework for recording and handling VM > events. > It initializes per-VM event queue, and a global event queue pointing to > events from all the VMs. Event handling is staggered in 2 stages: > - When an event is received, it is enqueued in the per-VM queue as well > as the global queues. > - The global queue is built into the QEMU Driver as a threadpool > (currently with a single thread). > - Enqueuing of a new event triggers the global event worker thread, which > then attempts to take a lock for this event's VM. > - If the lock is available, the event worker runs the function handling > this event type. Once done, it dequeues this event from the global > as well as per-VM queues. > - If the lock is unavailable(ie taken by RPC thread), the event worker > thread leaves this as-is and picks up the next event. > - Once the RPC thread completes, it looks for events pertaining to the > VM in the per-VM event queue. It then processes the events serially > (holding the VM lock) until there are no more events remaining for > this VM. At this point, the per-VM lock is relinquished. > > Patch Series status: > Strictly RFC only. No compilation issues. I have not had a chance to > (stress) test it after rebase to latest master. > Note that documentation and test coverage is TBD, since a few open > points remain. > > Known issues/ caveats: > - RPC handling time will become non-deterministic. > - An event will only be "notified" to a client once the RPC for same VM > completes. > - Needs careful consideration in all cases where a QMP event is used to > "signal" an RPC thread, else will deadlock. > > Will be happy to drive more discussion in the community and completely > implement it. > > Prerna Saxena (8): > Introduce virObjectTrylock() > QEMU Event handling: Introduce async event helpers in qemu_event.[ch] > Setup global and per-VM event queues. Also initialize per-VM queues > when libvirt reconnects to an existing VM. > Events: Allow monitor to "enqueue" events to a queue. Also introduce a > framework of handlers for each event type, that can be called when > the handler is running an event. > Events: Plumb event handling calls before a domain's APIs complete. > Code refactor: Move helper functions of doCoreDump*, syncNicRxFilter*, > and qemuOpenFile* to qemu_process.[ch] > Fold back the 2-stage event implementation for a few events : > Watchdog, Monitor EOF, Serial changed, Guest panic, Nic RX filter > changed .. into single level. > Initialize the per-VM event queues in context of domain init. > > src/Makefile.am |1 + > src/conf/domain_conf.h |3 + > src/libvirt_private.syms |1 + > src/qemu/qemu_conf.h |4 + > src/qemu/qemu_driver.c | 1710 +++ > src/qemu/qemu_event.c| 317 +++ > src/qemu/qemu_event.h| 231 + > src/qemu/qemu_monitor.c | 592 ++-- > src/qemu/qemu_monitor.h | 80 +- > src/qemu/qemu_monitor_json.c | 291 +++--- > src/qemu/qemu_process.c | 2031 > ++ > src/qemu/qemu_process.h | 88 ++ > src/util/virobject.c | 26 + > src/util/virobject.h |4 + > src/util/virthread.c |5 + > src/util/virthread.h |1 + > tests/qemumonitortestutils.c |2 +- > 17 files changed, 3411 insertions(+), 1976 deletions(-) > create mode 100644 src/qemu/qemu_event.c > create mode 100644 src/qemu/qemu_event.h > > -- > 2.9.5 > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list Hey Prerna, any updates so far? :) I’m just curious about the status of this series as it would fix a performance problem we’ve. Thank you. Beste Grüße / Kind regards Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
I spent a while trying to work through this proposal, here are a few points which need more thought: On Wed, Nov 8, 2017 at 7:22 PM, Daniel P. Berrange wrote: > On Mon, Nov 06, 2017 at 06:43:12AM +0100, Prerna wrote: > > Thanks for your reply Daniel. I am still on vacation all of this week so > > have not been able to respond. > > Few questions inline: > > > > On Thu, Oct 26, 2017 at 2:43 PM, Daniel P. Berrange > > > wrote: > > > > > On Tue, Oct 24, 2017 at 10:34:53AM -0700, Prerna Saxena wrote: > > > > > > > > As noted in > > > > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > > > > libvirt-QEMU driver handles all async events from the main loop. > > > > Each event handling needs the per-VM lock to make forward progress. > In > > > > the case where an async event is received for the same VM which has > an > > > > RPC running, the main loop is held up contending for the same lock. > > > > > > > > This impacts scalability, and should be addressed on priority. > > > > > > > > Note that libvirt does have a 2-step deferred handling for a few > event > > > > categories, but (1) That is insufficient since blockign happens > before > > > > the handler could disambiguate which one needs to be posted to this > > > > other queue. > > > > (2) There needs to be homogeniety. > > > > > > > > The current series builds a framework for recording and handling VM > > > > events. > > > > It initializes per-VM event queue, and a global event queue pointing > to > > > > events from all the VMs. Event handling is staggered in 2 stages: > > > > - When an event is received, it is enqueued in the per-VM queue as > well > > > > as the global queues. > > > > - The global queue is built into the QEMU Driver as a threadpool > > > > (currently with a single thread). > > > > - Enqueuing of a new event triggers the global event worker thread, > which > > > > then attempts to take a lock for this event's VM. > > > > - If the lock is available, the event worker runs the function > > > handling > > > > this event type. Once done, it dequeues this event from the > global > > > > as well as per-VM queues. > > > > - If the lock is unavailable(ie taken by RPC thread), the event > > > worker > > > > thread leaves this as-is and picks up the next event. > > > > - Once the RPC thread completes, it looks for events pertaining to > the > > > > VM in the per-VM event queue. It then processes the events serially > > > > (holding the VM lock) until there are no more events remaining for > > > > this VM. At this point, the per-VM lock is relinquished. > > > > > > One of the nice aspects of processing the QEMU events in the main event > > > loop is that handling of them is self-throttling. ie if one QEMU > process > > > goes mental and issues lots of events, we'll spend alot of time > processing > > > them all, but our memory usage is still bounded. > > > > > > If we take events from the VM and put them on a queue that is processed > > > asynchronously, and the event processing thread gets stuck for some > > > reason, then libvirt will end up queuing an unbounded number of events. > > > This could cause considerable memory usage in libvirt. This could be > > > exploited by a malicious VM to harm libvirt. eg a malicious QEMU could > > > stop responding to monitor RPC calls, which would tie up the RPC > threads > > > and handling of RPC calls. This would in turn prevent events being > > > processed due to being unable to acquire the state lock. Now the VM > > > can flood libvirtd with events which will all be read off the wire > > > and queued in memory, potentially forever. So libvirt memory usage > > > will balloon to an arbitrary level. > > > > > > Now, the current code isn't great when faced with malicious QEMU > > > because with the same approach, QEMU can cause libvirtd main event > > > loop to stall as you found. This feels less bad than unbounded > > > memory usage though - if libvirt uses lots of memory, this will > > > push other apps on the host into swap, and/or trigger the OOM > > > killer. > > > > > > So I agree that we need to make event processing asynchronous from > > > the main loop. When doing that through, I think we need to put an > > > upper limit on the number of events we're willing to queue from a > > > VM. When we get that limit, we should update the monitor event loop > > > watch so that we stop reading further events, until existing events > > > have been processed. > > > > We can update the watch at the monitor socket level. Ie, if we have hit > > threshold limits reading events off the monitor socket, we ignore this > > socket FD going forward. Now, this also means that we will miss any > replies > > coming off the same socket. It will mean that legitimate RPC replies > coming > > off the same socket will get ignored too. And in this case, we deadlock, > > since event notifications will not be processed until the ongoing RPC > > completes. > > An RPC waiting for a
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
On Wed, Nov 15, 2017 at 12:07 AM, Marc Hartmayer < mhart...@linux.vnet.ibm.com> wrote: > On Tue, Oct 24, 2017 at 07:34 PM +0200, Prerna Saxena < > saxenap@gmail.com> wrote: > > As noted in > > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > > libvirt-QEMU driver handles all async events from the main loop. > > Each event handling needs the per-VM lock to make forward progress. In > > the case where an async event is received for the same VM which has an > > RPC running, the main loop is held up contending for the same lock. > > What's about the remaining qemuMonitorCallbacks? The main event loop can > still be 'blocked' by e.g. qemuProcessHandleMonitorEOF if the VM is > already locked by a worker thread. In fact, we currently have a problem > with D-Bus which causes a D-Bus call (of a worker thread) to run into > the timeout of 30 seconds. During these 30 seconds the main event loop > is stuck. > > EOF handling in the current series is still yet another event, and hence is done only once worker threads complete. It needs to go into a priority thread pool, and should wake up RPC workers. > I tried the patch series and got a segmentation fault: > > Thread 1 "libvirtd" received signal SIGSEGV, Segmentation fault. > 0x03ff98faa452 in virEnqueueVMEvent (qlist=0x3ff908ce760, > ev=) at ../../src/qemu/qemu_event.c:153 153 > vmq_entry->ev->ev_id = vmq->last->ev->ev_id + 1; > (gdb) bt > #0 0x03ff98faa452 in virEnqueueVMEvent (qlist=0x3ff908ce760, > ev=) at ../../src/qemu/qemu_event.c:153 > #1 0x03ff98fc3564 in qemuProcessEnqueueEvent (mon=, > vm=, ev=, opaque=0x3ff90548ec0) at > ../../src/qemu/qemu_process.c:1864 > #2 0x03ff98fe4804 in qemuMonitorEnqueueEvent > (mon=mon@entry=0x3ff4c007440, ev=0x2aa1e0104c0) at > ../../src/qemu/qemu_monitor.c:1325 > #3 0x03ff98fe7102 in qemuMonitorEmitShutdown > (mon=mon@entry=0x3ff4c007440, guest=, > seconds=seconds@entry=1510683878, micros=micros@entry=703956) at > ../../src/qemu/qemu_monitor.c:1365 > #4 0x03ff98ffc19a in qemuMonitorJSONHandleShutdown > (mon=0x3ff4c007440, data=, seconds=1510683878, > micros=) at ../../src/qemu/qemu_monitor_json.c:552 > #5 0x03ff98ffbb8a in qemuMonitorJSONIOProcessEvent > (mon=mon@entry=0x3ff4c007440, obj=obj@entry=0x2aa1e012030) at > ../../src/qemu/qemu_monitor_json.c:208 > #6 0x03ff99002138 in qemuMonitorJSONIOProcessLine > (mon=mon@entry=0x3ff4c007440, line=0x2aa1e010460 "{\"timestamp\": > {\"seconds\": 1510683878, \"microseconds\": 703956}, \"event\": > \"SHUTDOWN\"}", msg=msg@entry=0x0) at > ../../src/qemu/qemu_monitor_json.c:237 > #7 0x03ff990022b4 in qemuMonitorJSONIOProcess > (mon=mon@entry=0x3ff4c007440, data=0x2aa1e014bc0 "{\"timestamp\": > {\"seconds\": 1510683878, \"microseconds\": 703956}, \"event\": > \"SHUTDOWN\"}\r\n", len=85, msg=msg@entry=0x0) at > ../../src/qemu/qemu_monitor_json.c:279 > #8 0x03ff98fe4b44 in qemuMonitorIOProcess > (mon=mon@entry=0x3ff4c007440) at ../../src/qemu/qemu_monitor.c:443 > #9 0x03ff98fe5d00 in qemuMonitorIO (watch=, > fd=, events=0, opaque=0x3ff4c007440) at > ../../src/qemu/qemu_monitor.c:697 > #10 0x03ffa68d6442 in virEventPollDispatchHandles (nfds= out>, fds=0x2aa1e013990) at ../../src/util/vireventpoll.c:508 > #11 0x03ffa68d66c8 in virEventPollRunOnce () at > ../../src/util/vireventpoll.c:657 > #12 0x03ffa68d44e4 in virEventRunDefaultImpl () at > ../../src/util/virevent.c:327 > #13 0x03ffa6a83c5e in virNetDaemonRun (dmn=0x2aa1dfe3eb0) at > ../../src/rpc/virnetdaemon.c:838 > #14 0x02aa1df29cc4 in main (argc=, argv= out>) at ../../daemon/libvirtd.c:1494 > > Thanks for trying it out. Let me look into this. > > > > This impacts scalability, and should be addressed on priority. > > > > Note that libvirt does have a 2-step deferred handling for a few event > > categories, but (1) That is insufficient since blockign happens before > > the handler could disambiguate which one needs to be posted to this > > other queue. > > (2) There needs to be homogeniety. > > > > The current series builds a framework for recording and handling VM > > events. > > It initializes per-VM event queue, and a global event queue pointing to > > events from all the VMs. Event handling is staggered in 2 stages: > > - When an event is received, it is enqueued in the per-VM queue as well > > as the global queues. > > - The global queue is built into the QEMU Driver as a threadpool > > (currently with a single thread). > > - Enqueuing of a new event triggers the global event worker thread, which > > then attempts to take a lock for this event's VM. > > - If the lock is available, the event worker runs the function > handling > > this event type. Once done, it dequeues this event from the global > > as well as per-VM queues. > > - If the lock is unavailable(ie taken by RPC thread), the event > worker > > thread leaves this as-is and picks up the next event. > > - Once the RPC thread c
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
On Tue, Oct 24, 2017 at 07:34 PM +0200, Prerna Saxena wrote: > As noted in > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > libvirt-QEMU driver handles all async events from the main loop. > Each event handling needs the per-VM lock to make forward progress. In > the case where an async event is received for the same VM which has an > RPC running, the main loop is held up contending for the same lock. What's about the remaining qemuMonitorCallbacks? The main event loop can still be 'blocked' by e.g. qemuProcessHandleMonitorEOF if the VM is already locked by a worker thread. In fact, we currently have a problem with D-Bus which causes a D-Bus call (of a worker thread) to run into the timeout of 30 seconds. During these 30 seconds the main event loop is stuck. I tried the patch series and got a segmentation fault: Thread 1 "libvirtd" received signal SIGSEGV, Segmentation fault. 0x03ff98faa452 in virEnqueueVMEvent (qlist=0x3ff908ce760, ev=) at ../../src/qemu/qemu_event.c:153 153 vmq_entry->ev->ev_id = vmq->last->ev->ev_id + 1; (gdb) bt #0 0x03ff98faa452 in virEnqueueVMEvent (qlist=0x3ff908ce760, ev=) at ../../src/qemu/qemu_event.c:153 #1 0x03ff98fc3564 in qemuProcessEnqueueEvent (mon=, vm=, ev=, opaque=0x3ff90548ec0) at ../../src/qemu/qemu_process.c:1864 #2 0x03ff98fe4804 in qemuMonitorEnqueueEvent (mon=mon@entry=0x3ff4c007440, ev=0x2aa1e0104c0) at ../../src/qemu/qemu_monitor.c:1325 #3 0x03ff98fe7102 in qemuMonitorEmitShutdown (mon=mon@entry=0x3ff4c007440, guest=, seconds=seconds@entry=1510683878, micros=micros@entry=703956) at ../../src/qemu/qemu_monitor.c:1365 #4 0x03ff98ffc19a in qemuMonitorJSONHandleShutdown (mon=0x3ff4c007440, data=, seconds=1510683878, micros=) at ../../src/qemu/qemu_monitor_json.c:552 #5 0x03ff98ffbb8a in qemuMonitorJSONIOProcessEvent (mon=mon@entry=0x3ff4c007440, obj=obj@entry=0x2aa1e012030) at ../../src/qemu/qemu_monitor_json.c:208 #6 0x03ff99002138 in qemuMonitorJSONIOProcessLine (mon=mon@entry=0x3ff4c007440, line=0x2aa1e010460 "{\"timestamp\": {\"seconds\": 1510683878, \"microseconds\": 703956}, \"event\": \"SHUTDOWN\"}", msg=msg@entry=0x0) at ../../src/qemu/qemu_monitor_json.c:237 #7 0x03ff990022b4 in qemuMonitorJSONIOProcess (mon=mon@entry=0x3ff4c007440, data=0x2aa1e014bc0 "{\"timestamp\": {\"seconds\": 1510683878, \"microseconds\": 703956}, \"event\": \"SHUTDOWN\"}\r\n", len=85, msg=msg@entry=0x0) at ../../src/qemu/qemu_monitor_json.c:279 #8 0x03ff98fe4b44 in qemuMonitorIOProcess (mon=mon@entry=0x3ff4c007440) at ../../src/qemu/qemu_monitor.c:443 #9 0x03ff98fe5d00 in qemuMonitorIO (watch=, fd=, events=0, opaque=0x3ff4c007440) at ../../src/qemu/qemu_monitor.c:697 #10 0x03ffa68d6442 in virEventPollDispatchHandles (nfds=, fds=0x2aa1e013990) at ../../src/util/vireventpoll.c:508 #11 0x03ffa68d66c8 in virEventPollRunOnce () at ../../src/util/vireventpoll.c:657 #12 0x03ffa68d44e4 in virEventRunDefaultImpl () at ../../src/util/virevent.c:327 #13 0x03ffa6a83c5e in virNetDaemonRun (dmn=0x2aa1dfe3eb0) at ../../src/rpc/virnetdaemon.c:838 #14 0x02aa1df29cc4 in main (argc=, argv=) at ../../daemon/libvirtd.c:1494 > > This impacts scalability, and should be addressed on priority. > > Note that libvirt does have a 2-step deferred handling for a few event > categories, but (1) That is insufficient since blockign happens before > the handler could disambiguate which one needs to be posted to this > other queue. > (2) There needs to be homogeniety. > > The current series builds a framework for recording and handling VM > events. > It initializes per-VM event queue, and a global event queue pointing to > events from all the VMs. Event handling is staggered in 2 stages: > - When an event is received, it is enqueued in the per-VM queue as well > as the global queues. > - The global queue is built into the QEMU Driver as a threadpool > (currently with a single thread). > - Enqueuing of a new event triggers the global event worker thread, which > then attempts to take a lock for this event's VM. > - If the lock is available, the event worker runs the function handling > this event type. Once done, it dequeues this event from the global > as well as per-VM queues. > - If the lock is unavailable(ie taken by RPC thread), the event worker > thread leaves this as-is and picks up the next event. > - Once the RPC thread completes, it looks for events pertaining to the > VM in the per-VM event queue. It then processes the events serially > (holding the VM lock) until there are no more events remaining for > this VM. At this point, the per-VM lock is relinquished. > > Patch Series status: > Strictly RFC only. No compilation issues. I have not had a chance to > (stress) test it after rebase to latest master. > Note that documentation and test coverage is TBD, since a few open > points remain. > > Known issues/ caveats: > - RPC handling time will become non-dete
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
On Mon, Nov 06, 2017 at 06:43:12AM +0100, Prerna wrote: > Thanks for your reply Daniel. I am still on vacation all of this week so > have not been able to respond. > Few questions inline: > > On Thu, Oct 26, 2017 at 2:43 PM, Daniel P. Berrange > wrote: > > > On Tue, Oct 24, 2017 at 10:34:53AM -0700, Prerna Saxena wrote: > > > > > > As noted in > > > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > > > libvirt-QEMU driver handles all async events from the main loop. > > > Each event handling needs the per-VM lock to make forward progress. In > > > the case where an async event is received for the same VM which has an > > > RPC running, the main loop is held up contending for the same lock. > > > > > > This impacts scalability, and should be addressed on priority. > > > > > > Note that libvirt does have a 2-step deferred handling for a few event > > > categories, but (1) That is insufficient since blockign happens before > > > the handler could disambiguate which one needs to be posted to this > > > other queue. > > > (2) There needs to be homogeniety. > > > > > > The current series builds a framework for recording and handling VM > > > events. > > > It initializes per-VM event queue, and a global event queue pointing to > > > events from all the VMs. Event handling is staggered in 2 stages: > > > - When an event is received, it is enqueued in the per-VM queue as well > > > as the global queues. > > > - The global queue is built into the QEMU Driver as a threadpool > > > (currently with a single thread). > > > - Enqueuing of a new event triggers the global event worker thread, which > > > then attempts to take a lock for this event's VM. > > > - If the lock is available, the event worker runs the function > > handling > > > this event type. Once done, it dequeues this event from the global > > > as well as per-VM queues. > > > - If the lock is unavailable(ie taken by RPC thread), the event > > worker > > > thread leaves this as-is and picks up the next event. > > > - Once the RPC thread completes, it looks for events pertaining to the > > > VM in the per-VM event queue. It then processes the events serially > > > (holding the VM lock) until there are no more events remaining for > > > this VM. At this point, the per-VM lock is relinquished. > > > > One of the nice aspects of processing the QEMU events in the main event > > loop is that handling of them is self-throttling. ie if one QEMU process > > goes mental and issues lots of events, we'll spend alot of time processing > > them all, but our memory usage is still bounded. > > > > If we take events from the VM and put them on a queue that is processed > > asynchronously, and the event processing thread gets stuck for some > > reason, then libvirt will end up queuing an unbounded number of events. > > This could cause considerable memory usage in libvirt. This could be > > exploited by a malicious VM to harm libvirt. eg a malicious QEMU could > > stop responding to monitor RPC calls, which would tie up the RPC threads > > and handling of RPC calls. This would in turn prevent events being > > processed due to being unable to acquire the state lock. Now the VM > > can flood libvirtd with events which will all be read off the wire > > and queued in memory, potentially forever. So libvirt memory usage > > will balloon to an arbitrary level. > > > > Now, the current code isn't great when faced with malicious QEMU > > because with the same approach, QEMU can cause libvirtd main event > > loop to stall as you found. This feels less bad than unbounded > > memory usage though - if libvirt uses lots of memory, this will > > push other apps on the host into swap, and/or trigger the OOM > > killer. > > > > So I agree that we need to make event processing asynchronous from > > the main loop. When doing that through, I think we need to put an > > upper limit on the number of events we're willing to queue from a > > VM. When we get that limit, we should update the monitor event loop > > watch so that we stop reading further events, until existing events > > have been processed. > > We can update the watch at the monitor socket level. Ie, if we have hit > threshold limits reading events off the monitor socket, we ignore this > socket FD going forward. Now, this also means that we will miss any replies > coming off the same socket. It will mean that legitimate RPC replies coming > off the same socket will get ignored too. And in this case, we deadlock, > since event notifications will not be processed until the ongoing RPC > completes. An RPC waiting for a reply should release the mutex, but have the job change lock. So the risk would be if processing an event required obtaining the job change lock. IIUC, the current code should already suffer from that risk though, because events processed directly in the main loop thread would also need to acquire the job change lock. One approach would be to require that whi
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
Thanks for your reply Daniel. I am still on vacation all of this week so have not been able to respond. Few questions inline: On Thu, Oct 26, 2017 at 2:43 PM, Daniel P. Berrange wrote: > On Tue, Oct 24, 2017 at 10:34:53AM -0700, Prerna Saxena wrote: > > > > As noted in > > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > > libvirt-QEMU driver handles all async events from the main loop. > > Each event handling needs the per-VM lock to make forward progress. In > > the case where an async event is received for the same VM which has an > > RPC running, the main loop is held up contending for the same lock. > > > > This impacts scalability, and should be addressed on priority. > > > > Note that libvirt does have a 2-step deferred handling for a few event > > categories, but (1) That is insufficient since blockign happens before > > the handler could disambiguate which one needs to be posted to this > > other queue. > > (2) There needs to be homogeniety. > > > > The current series builds a framework for recording and handling VM > > events. > > It initializes per-VM event queue, and a global event queue pointing to > > events from all the VMs. Event handling is staggered in 2 stages: > > - When an event is received, it is enqueued in the per-VM queue as well > > as the global queues. > > - The global queue is built into the QEMU Driver as a threadpool > > (currently with a single thread). > > - Enqueuing of a new event triggers the global event worker thread, which > > then attempts to take a lock for this event's VM. > > - If the lock is available, the event worker runs the function > handling > > this event type. Once done, it dequeues this event from the global > > as well as per-VM queues. > > - If the lock is unavailable(ie taken by RPC thread), the event > worker > > thread leaves this as-is and picks up the next event. > > - Once the RPC thread completes, it looks for events pertaining to the > > VM in the per-VM event queue. It then processes the events serially > > (holding the VM lock) until there are no more events remaining for > > this VM. At this point, the per-VM lock is relinquished. > > One of the nice aspects of processing the QEMU events in the main event > loop is that handling of them is self-throttling. ie if one QEMU process > goes mental and issues lots of events, we'll spend alot of time processing > them all, but our memory usage is still bounded. > > If we take events from the VM and put them on a queue that is processed > asynchronously, and the event processing thread gets stuck for some > reason, then libvirt will end up queuing an unbounded number of events. > This could cause considerable memory usage in libvirt. This could be > exploited by a malicious VM to harm libvirt. eg a malicious QEMU could > stop responding to monitor RPC calls, which would tie up the RPC threads > and handling of RPC calls. This would in turn prevent events being > processed due to being unable to acquire the state lock. Now the VM > can flood libvirtd with events which will all be read off the wire > and queued in memory, potentially forever. So libvirt memory usage > will balloon to an arbitrary level. > > Now, the current code isn't great when faced with malicious QEMU > because with the same approach, QEMU can cause libvirtd main event > loop to stall as you found. This feels less bad than unbounded > memory usage though - if libvirt uses lots of memory, this will > push other apps on the host into swap, and/or trigger the OOM > killer. > > So I agree that we need to make event processing asynchronous from > the main loop. When doing that through, I think we need to put an > upper limit on the number of events we're willing to queue from a > VM. When we get that limit, we should update the monitor event loop > watch so that we stop reading further events, until existing events > have been processed. > > We can update the watch at the monitor socket level. Ie, if we have hit threshold limits reading events off the monitor socket, we ignore this socket FD going forward. Now, this also means that we will miss any replies coming off the same socket. It will mean that legitimate RPC replies coming off the same socket will get ignored too. And in this case, we deadlock, since event notifications will not be processed until the ongoing RPC completes. > The other attractive thing bout the way events currently work is that > it is also automatically avoids lock acquisition priority problems > wrt incoming RPC calls. ie, we are guaranteed that once thread > workers have finished all currnetly queued RPC calls, we will be able > to acquire the VM lock to process the event. All further RPC calls > on the wire won't be read off the wire until events have been > processed. > > With events being processed asychronously from RPC calls, there is > a risk of starvation where the event loop thread constantly looses > the race to acquire the VM lock vs ot
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
On Tue, Oct 24, 2017 at 10:34:53AM -0700, Prerna Saxena wrote: > > As noted in > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > libvirt-QEMU driver handles all async events from the main loop. > Each event handling needs the per-VM lock to make forward progress. In > the case where an async event is received for the same VM which has an > RPC running, the main loop is held up contending for the same lock. > > This impacts scalability, and should be addressed on priority. > > Note that libvirt does have a 2-step deferred handling for a few event > categories, but (1) That is insufficient since blockign happens before > the handler could disambiguate which one needs to be posted to this > other queue. > (2) There needs to be homogeniety. > > The current series builds a framework for recording and handling VM > events. > It initializes per-VM event queue, and a global event queue pointing to > events from all the VMs. Event handling is staggered in 2 stages: > - When an event is received, it is enqueued in the per-VM queue as well > as the global queues. > - The global queue is built into the QEMU Driver as a threadpool > (currently with a single thread). > - Enqueuing of a new event triggers the global event worker thread, which > then attempts to take a lock for this event's VM. > - If the lock is available, the event worker runs the function handling > this event type. Once done, it dequeues this event from the global > as well as per-VM queues. > - If the lock is unavailable(ie taken by RPC thread), the event worker > thread leaves this as-is and picks up the next event. > - Once the RPC thread completes, it looks for events pertaining to the > VM in the per-VM event queue. It then processes the events serially > (holding the VM lock) until there are no more events remaining for > this VM. At this point, the per-VM lock is relinquished. One of the nice aspects of processing the QEMU events in the main event loop is that handling of them is self-throttling. ie if one QEMU process goes mental and issues lots of events, we'll spend alot of time processing them all, but our memory usage is still bounded. If we take events from the VM and put them on a queue that is processed asynchronously, and the event processing thread gets stuck for some reason, then libvirt will end up queuing an unbounded number of events. This could cause considerable memory usage in libvirt. This could be exploited by a malicious VM to harm libvirt. eg a malicious QEMU could stop responding to monitor RPC calls, which would tie up the RPC threads and handling of RPC calls. This would in turn prevent events being processed due to being unable to acquire the state lock. Now the VM can flood libvirtd with events which will all be read off the wire and queued in memory, potentially forever. So libvirt memory usage will balloon to an arbitrary level. Now, the current code isn't great when faced with malicious QEMU because with the same approach, QEMU can cause libvirtd main event loop to stall as you found. This feels less bad than unbounded memory usage though - if libvirt uses lots of memory, this will push other apps on the host into swap, and/or trigger the OOM killer. So I agree that we need to make event processing asynchronous from the main loop. When doing that through, I think we need to put an upper limit on the number of events we're willing to queue from a VM. When we get that limit, we should update the monitor event loop watch so that we stop reading further events, until existing events have been processed. The other attractive thing bout the way events currently work is that it is also automatically avoids lock acquisition priority problems wrt incoming RPC calls. ie, we are guaranteed that once thread workers have finished all currnetly queued RPC calls, we will be able to acquire the VM lock to process the event. All further RPC calls on the wire won't be read off the wire until events have been processed. With events being processed asychronously from RPC calls, there is a risk of starvation where the event loop thread constantly looses the race to acquire the VM lock vs other incoming RPC calls. I guess this is one reason why you choose to process pending events at the end of the RPC call processing. > Known issues/ caveats: > - RPC handling time will become non-deterministic. > - An event will only be "notified" to a client once the RPC for same VM > completes. Yeah, these two scenarios are not very nice IMHO. I'm also pretty wary of having 2 completely different places in which events are processed. ie some events are processed by the dedicated event thread, while other events are processed immediately after an RPC call. When you have these kind of distinct code paths it tends to lead to bugs because one code path will be taken most of the time, and the other code path only taken in unusal circumstances (and is thus rarely tested and liable to
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
On Thu, Oct 26, 2017 at 10:21:17 +0530, Prerna wrote: > On Wed, Oct 25, 2017 at 4:12 PM, Jiri Denemark wrote: > > On Tue, Oct 24, 2017 at 10:34:53 -0700, Prerna Saxena wrote: [...] > > > Patch Series status: > > > Strictly RFC only. No compilation issues. I have not had a chance to > > > (stress) test it after rebase to latest master. > > > Note that documentation and test coverage is TBD, since a few open > > > points remain. > > > > > > Known issues/ caveats: > > > - RPC handling time will become non-deterministic. > > > - An event will only be "notified" to a client once the RPC for same VM > > completes. > > > - Needs careful consideration in all cases where a QMP event is used to > > > "signal" an RPC thread, else will deadlock. > > > > This last issue is actually a show stopper here. We need to make sure > > QMP events are processed while a job is still active on the same domain. > > Otherwise thinks kile block jobs and migration, which are long running > > jobs driven by events, will break. > > > > Jirka > > > Completely agree, which is why I have explicitly mentioned this. > However, I do not completely follow why it needs to be this way. Can the > block job APIs between QEMU <--> libvirt be fixed so that such behaviour is > avoided ? Not really. Events from qemu are a big improvement from the times when we were polling for state of jobs. Additionally migration with storage in libvirt uses blockjobs which are asynchronous in qemu but libvirt needs to wait for them synchronously. Since blockjobs in qemu need to be asynchronous (they take a long time and libvirt needs to be able to use the monitor meanwhile) requires us to do handling of incomming events while a libvirt API is active. signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
On Wed, Oct 25, 2017 at 4:12 PM, Jiri Denemark wrote: > On Tue, Oct 24, 2017 at 10:34:53 -0700, Prerna Saxena wrote: > > > > As noted in > > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > > libvirt-QEMU driver handles all async events from the main loop. > > Each event handling needs the per-VM lock to make forward progress. In > > the case where an async event is received for the same VM which has an > > RPC running, the main loop is held up contending for the same lock. > > > > This impacts scalability, and should be addressed on priority. > > > > Note that libvirt does have a 2-step deferred handling for a few event > > categories, but (1) That is insufficient since blockign happens before > > the handler could disambiguate which one needs to be posted to this > > other queue. > > (2) There needs to be homogeniety. > > > > The current series builds a framework for recording and handling VM > > events. > > It initializes per-VM event queue, and a global event queue pointing to > > events from all the VMs. Event handling is staggered in 2 stages: > > - When an event is received, it is enqueued in the per-VM queue as well > > as the global queues. > > - The global queue is built into the QEMU Driver as a threadpool > > (currently with a single thread). > > - Enqueuing of a new event triggers the global event worker thread, which > > then attempts to take a lock for this event's VM. > > - If the lock is available, the event worker runs the function > handling > > this event type. Once done, it dequeues this event from the global > > as well as per-VM queues. > > - If the lock is unavailable(ie taken by RPC thread), the event > worker > > thread leaves this as-is and picks up the next event. > > If I get it right, the event is either processed immediately when its VM > object is unlocked or it has to wait until the current job running on > the VM object finishes even though the lock may be released before that. > Correct? If so, this needs to be addressed. > In most cases, the lock is released just before we end the API. However, it is a small change that can be made. > > > - Once the RPC thread completes, it looks for events pertaining to the > > VM in the per-VM event queue. It then processes the events serially > > (holding the VM lock) until there are no more events remaining for > > this VM. At this point, the per-VM lock is relinquished. > > > > Patch Series status: > > Strictly RFC only. No compilation issues. I have not had a chance to > > (stress) test it after rebase to latest master. > > Note that documentation and test coverage is TBD, since a few open > > points remain. > > > > Known issues/ caveats: > > - RPC handling time will become non-deterministic. > > - An event will only be "notified" to a client once the RPC for same VM > completes. > > - Needs careful consideration in all cases where a QMP event is used to > > "signal" an RPC thread, else will deadlock. > > This last issue is actually a show stopper here. We need to make sure > QMP events are processed while a job is still active on the same domain. > Otherwise thinks kile block jobs and migration, which are long running > jobs driven by events, will break. > > Jirka > Completely agree, which is why I have explicitly mentioned this. However, I do not completely follow why it needs to be this way. Can the block job APIs between QEMU <--> libvirt be fixed so that such behaviour is avoided ? Regards, Prerna -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
On Tue, Oct 24, 2017 at 10:34:53 -0700, Prerna Saxena wrote: > > As noted in > https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html > libvirt-QEMU driver handles all async events from the main loop. > Each event handling needs the per-VM lock to make forward progress. In > the case where an async event is received for the same VM which has an > RPC running, the main loop is held up contending for the same lock. > > This impacts scalability, and should be addressed on priority. > > Note that libvirt does have a 2-step deferred handling for a few event > categories, but (1) That is insufficient since blockign happens before > the handler could disambiguate which one needs to be posted to this > other queue. > (2) There needs to be homogeniety. > > The current series builds a framework for recording and handling VM > events. > It initializes per-VM event queue, and a global event queue pointing to > events from all the VMs. Event handling is staggered in 2 stages: > - When an event is received, it is enqueued in the per-VM queue as well > as the global queues. > - The global queue is built into the QEMU Driver as a threadpool > (currently with a single thread). > - Enqueuing of a new event triggers the global event worker thread, which > then attempts to take a lock for this event's VM. > - If the lock is available, the event worker runs the function handling > this event type. Once done, it dequeues this event from the global > as well as per-VM queues. > - If the lock is unavailable(ie taken by RPC thread), the event worker > thread leaves this as-is and picks up the next event. If I get it right, the event is either processed immediately when its VM object is unlocked or it has to wait until the current job running on the VM object finishes even though the lock may be released before that. Correct? If so, this needs to be addressed. > - Once the RPC thread completes, it looks for events pertaining to the > VM in the per-VM event queue. It then processes the events serially > (holding the VM lock) until there are no more events remaining for > this VM. At this point, the per-VM lock is relinquished. > > Patch Series status: > Strictly RFC only. No compilation issues. I have not had a chance to > (stress) test it after rebase to latest master. > Note that documentation and test coverage is TBD, since a few open > points remain. > > Known issues/ caveats: > - RPC handling time will become non-deterministic. > - An event will only be "notified" to a client once the RPC for same VM > completes. > - Needs careful consideration in all cases where a QMP event is used to > "signal" an RPC thread, else will deadlock. This last issue is actually a show stopper here. We need to make sure QMP events are processed while a job is still active on the same domain. Otherwise thinks kile block jobs and migration, which are long running jobs driven by events, will break. Jirka -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [[RFC] 0/8] Implement async QEMU event handling in libvirtd.
As noted in https://www.redhat.com/archives/libvir-list/2017-May/msg00016.html libvirt-QEMU driver handles all async events from the main loop. Each event handling needs the per-VM lock to make forward progress. In the case where an async event is received for the same VM which has an RPC running, the main loop is held up contending for the same lock. This impacts scalability, and should be addressed on priority. Note that libvirt does have a 2-step deferred handling for a few event categories, but (1) That is insufficient since blockign happens before the handler could disambiguate which one needs to be posted to this other queue. (2) There needs to be homogeniety. The current series builds a framework for recording and handling VM events. It initializes per-VM event queue, and a global event queue pointing to events from all the VMs. Event handling is staggered in 2 stages: - When an event is received, it is enqueued in the per-VM queue as well as the global queues. - The global queue is built into the QEMU Driver as a threadpool (currently with a single thread). - Enqueuing of a new event triggers the global event worker thread, which then attempts to take a lock for this event's VM. - If the lock is available, the event worker runs the function handling this event type. Once done, it dequeues this event from the global as well as per-VM queues. - If the lock is unavailable(ie taken by RPC thread), the event worker thread leaves this as-is and picks up the next event. - Once the RPC thread completes, it looks for events pertaining to the VM in the per-VM event queue. It then processes the events serially (holding the VM lock) until there are no more events remaining for this VM. At this point, the per-VM lock is relinquished. Patch Series status: Strictly RFC only. No compilation issues. I have not had a chance to (stress) test it after rebase to latest master. Note that documentation and test coverage is TBD, since a few open points remain. Known issues/ caveats: - RPC handling time will become non-deterministic. - An event will only be "notified" to a client once the RPC for same VM completes. - Needs careful consideration in all cases where a QMP event is used to "signal" an RPC thread, else will deadlock. Will be happy to drive more discussion in the community and completely implement it. Prerna Saxena (8): Introduce virObjectTrylock() QEMU Event handling: Introduce async event helpers in qemu_event.[ch] Setup global and per-VM event queues. Also initialize per-VM queues when libvirt reconnects to an existing VM. Events: Allow monitor to "enqueue" events to a queue. Also introduce a framework of handlers for each event type, that can be called when the handler is running an event. Events: Plumb event handling calls before a domain's APIs complete. Code refactor: Move helper functions of doCoreDump*, syncNicRxFilter*, and qemuOpenFile* to qemu_process.[ch] Fold back the 2-stage event implementation for a few events : Watchdog, Monitor EOF, Serial changed, Guest panic, Nic RX filter changed .. into single level. Initialize the per-VM event queues in context of domain init. src/Makefile.am |1 + src/conf/domain_conf.h |3 + src/libvirt_private.syms |1 + src/qemu/qemu_conf.h |4 + src/qemu/qemu_driver.c | 1710 +++ src/qemu/qemu_event.c| 317 +++ src/qemu/qemu_event.h| 231 + src/qemu/qemu_monitor.c | 592 ++-- src/qemu/qemu_monitor.h | 80 +- src/qemu/qemu_monitor_json.c | 291 +++--- src/qemu/qemu_process.c | 2031 ++ src/qemu/qemu_process.h | 88 ++ src/util/virobject.c | 26 + src/util/virobject.h |4 + src/util/virthread.c |5 + src/util/virthread.h |1 + tests/qemumonitortestutils.c |2 +- 17 files changed, 3411 insertions(+), 1976 deletions(-) create mode 100644 src/qemu/qemu_event.c create mode 100644 src/qemu/qemu_event.h -- 2.9.5 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list