Am 10.03.2023 um 14:05 hat Fiona Ebner geschrieben: > Am 09.03.23 um 18:46 schrieb Kevin Wolf: > > Am 09.03.2023 um 14:59 hat Paolo Bonzini geschrieben: > >> On 3/9/23 13:31, Hanna Czenczek wrote: > >>> On 09.03.23 13:08, Paolo Bonzini wrote: > >>>> On Thu, Mar 9, 2023 at 1:05 PM Paolo Bonzini <pbonz...@redhat.com> wrote: > >>>>> I think having to do this is problematic, because the blk_drain should > >>>>> leave no pending operation. > >>>>> > >>>>> Here it seems okay because you do it in a controlled situation, but the > >>>>> main thread can also issue blk_drain(), or worse bdrv_drained_begin(), > >>>>> and there would be pending I/O operations when it returns. > >>> > >>> Not really. We would stop in the middle of a trim that processes a list > >>> of discard requests. So I see it more like stopping in the middle of > >>> anything that processes guest requests. Once drain ends, we continue > >>> processing them, and that’s not exactly pending I/O. > >>> > >>> There is a pending object in s->bus->dma->aiocb on the IDE side, so > >>> there is a pending DMA operation, but naïvely, I don’t see that as a > >>> problem. > >> > >> What about the bdrv_drain_all() when a VM stops, would the guest continue > >> to > >> access memory and disks after bdrv_drain() return? > > > > That one shouldn't be a problem because the devices are stopped before > > the backends. > > > >> Migration could also be a problem, because the partial TRIM would not be > >> recorded in the s->bus->error_status field of IDEState (no surprise there, > >> it's not an error). Also, errors happening after bdrv_drain() might not be > >> migrated correctly. > > > > Yes, I think it would be good to have the I/O operation fully completed > > on the IDE level rather than just in the block layer. > > > >>> Or the issue is generally that IDE uses dma_* functions, which might > >>> cause I/O functions to be run from new BHs (I guess through > >>> reschedule_dma()?). > > > > None of those complicated scenarios actually. The problem solved by the > > request queuing is simply that nothing else stops the guest from > > submitting new requests to drained nodes if the CPUs are still running. > > > > Drain uses aio_disable_external() to disable processing of external > > events, in particular the ioeventfd used by virtio-blk and virtio-scsi. > > But IDE submits requests through MMIO or port I/O, i.e. the vcpu thread > > exits to userspace and calls directly into the IDE code, so it's > > completely unaffected by aio_disable_external(). > > > >> Ah, you mean that you can have pending I/O operations while blk->in_flight > >> is zero? That would be a problem indeed. We already have BlockDevOps for > >> ide-cd and ide-hd, should we add a .drained_poll callback there? > > > > To be more precise, you suggested in the call that .drained_poll should > > return that IDE is still busy while aiocb != NULL. Without having looked > > at the code in detail yet, that seems to make sense to me. And maybe > > even the blk_inc/dec_in_flight() pair can then go completely away > > because IDE takes care of its drain state itself then. > > > > I assume the addition of drained_poll is meant to be orthogonal to the > fix of the deadlock? At least I can't see how it would help there?
You're right, it doesn't. Basically we're running into the old problem again that draining is overloaded with two different meanings: I want exclusive access to the backend or I want to wait for all my requests to complete. IDE or more generally blk_drain() wants the latter, but queuing is done for implementing the former. > If we have the assumptions: > 1. The TRIM operation should be completed on the IDE level before > draining ends. > 2. Block layer requests issued after draining has begun are queued. > > To me, the conclusion seems to be: > Issue all block layer requests belonging to the IDE TRIM operation up > front. > > The other alternative I see is to break assumption 2, introduce a way > to not queue certain requests while drained, and use it for the > recursive requests issued by ide_issue_trim_cb. But not the initial > one, if that would defeat the purpose of request queuing. Of course > this can't be done if QEMU relies on the assumption in other places > already. I feel like this should be allowed because if anyone has exclusive access in this scenario, it's IDE, so it should be able to bypass the queuing. Of course, the queuing is still needed if someone else drained the backend, so we can't just make TRIM bypass it in general. And if you make it conditional on IDE being in blk_drain(), it already starts to become ugly again... So maybe the while loop is unavoidable. Hmm... But could ide_cancel_dma_sync() just directly use AIO_WAIT_WHILE(s->bus->dma->aiocb) instead of using blk_drain()? Kevin