On 10/23/2012 05:26 PM, Paolo Bonzini wrote: >>> Yes, that's the point of doing things asynchronously---you do not need >>> to do everything within stop_machine, you can start canceling AIO as >>> soon as the OS sends the hot-unplug request. Then you only proceed with >>> stop_machine and freeing device memory when the first part. >> >> You cannot always cancel I/O (for example threaded I/O already in progress). > > Yep, but we try to do this anyway today and nothing changes really. The > difference is between hotplug never completing and blocking > synchronously, vs. hotplug never completing and not invoking the > asynchronous callback. I.e. really no difference at all.
Not cancelling is not the same as not completing; the request will complete on its own eventually. The question is whether the programming model is synchronous or callback based. > >>> In other words, isolate can complete asynchronously. >> >> Can it? I don't think so. >> >> Here's how I see it: >> >> 1. non-malicious guest stops driving device >> 2. isolate() >> 3. a malicious guest cannot drive the device at this point >> 4. some kind of barrier to let the device, or drive activity from a >> malicious guest, wind down >> 5. destroy() >> >> If you need to report the completion of step 2, it cannot be done >> asynchronously. > > In hardware everything is asynchronous anyway. It will *look* > synchronous, because if CPU#0 is stuck in a synchronous isolate(), and > CPU#1 polls for the outcome, CPU#1 will lock on the BQL held by CPU#0. That is fine. isolate() is expensive but it is cpu bound, it does not involve any I/O (unlike the barrier afterwards, which has to wait on any I/O which we were not able to cancel). > But our interfaces had better support asynchronicity, and indeed they > do: after you write to the "eject" register, the "up" will show the > device as present until after destroy is done. This can be changed to > show the device as present only until after step 4 is done. Let's say we want to eject the hotplug hardware itself (just as an example). With refcounts, the callback that updates "up" will hold on to to it via refcounts. With stop_machine(), you need to cancel that callback, or wait for it somehow, or it can arrive after the stop_machine() and bite you. > >> We may also want notification after step 4 (or 5); if the device holds >> some host resource someone may want to know that it is ready for reuse. > > I think guest notification should be after (4), while management > notification should be after (5). Yes. After (2) we can return from the eject mmio. -- error compiling committee.c: too many arguments to function