On Thu, Jan 23, 2025 at 09:47:24AM +0800, Jason Wang wrote: > On Thu, Jan 23, 2025 at 6:34 AM Boyer, Andrew <[email protected]> wrote: > > > > > > > On Jan 22, 2025, at 5:25 PM, Michael S. Tsirkin <[email protected]> wrote: > > > > > > Caution: This message originated from an External Source. Use proper > > > caution when opening attachments, clicking links, or responding. > > > > > > > > > On Wed, Jan 22, 2025 at 10:14:52PM +0000, Boyer, Andrew wrote: > > >> > > >>> On Jan 22, 2025, at 5:07 PM, Michael S. Tsirkin <[email protected]> wrote: > > >>> > > >>> Caution: This message originated from an External Source. Use proper > > >>> caution when opening attachments, clicking links, or responding. > > >>> > > >>> > > >>> On Wed, Jan 22, 2025 at 06:33:04PM +0100, Christian Borntraeger wrote: > > >>>> Am 22.01.25 um 15:44 schrieb Boyer, Andrew: > > >>>> [...] > > >>>> > > >>>>>>>> --- a/drivers/block/virtio_blk.c > > >>>>>>>> +++ b/drivers/block/virtio_blk.c > > >>>>>>>> @@ -379,14 +379,10 @@ static void virtio_commit_rqs(struct > > >>>>>>>> blk_mq_hw_ctx *hctx) > > >>>>>>>> { > > >>>>>>>> struct virtio_blk *vblk = hctx->queue->queuedata; > > >>>>>>>> struct virtio_blk_vq *vq = &vblk->vqs[hctx->queue_num]; > > >>>>>>>> - bool kick; > > >>>>>>>> spin_lock_irq(&vq->lock); > > >>>>>>>> - kick = virtqueue_kick_prepare(vq->vq); > > >>>>>>>> + virtqueue_kick(vq->vq); > > >>>>>>>> spin_unlock_irq(&vq->lock); > > >>>>>>>> - > > >>>>>>>> - if (kick) > > >>>>>>>> - virtqueue_notify(vq->vq); > > >>>>>>>> } > > >>>>>>> > > >>>>>>> I would assume this will be a performance nightmare for normal IO. > > >>>>>> > > >>>>> > > >>>>> Hello Michael and Christian and Jason, > > >>>>> Thank you for taking a look. > > >>>>> > > >>>>> Is the performance concern that the vmexit might lead to the > > >>>>> underlying virtual storage stack doing the work immediately? Any > > >>>>> other job posting to the same queue would presumably be blocked on a > > >>>>> vmexit when it goes to attempt its own notification. That would be > > >>>>> almost the same as having the other job block on a lock during the > > >>>>> operation, although I guess if you are skipping notifications somehow > > >>>>> it would look different. > > >>>> > > >>>> The performance concern is that you hold a lock and then exit. Exits > > >>>> are expensive and can schedule so you will increase the lock holding > > >>>> time significantly. This is begging for lock holder preemption. > > >>>> Really, dont do it. > > >>> > > >>> > > >>> The issue is with hardware that wants a copy of an index sent to > > >>> it in a notification. Now, you have a race: > > >>> > > >>> thread 1: > > >>> > > >>> index = 1 > > >>> -> -> send 1 to hardware > > >>> > > >>> > > >>> thread 2: > > >>> > > >>> index = 2 > > >>> -> send 2 to hardware > > >>> > > >>> the spec unfortunately does not say whether that is legal. > > >>> > > >>> As far as I could tell, the device can easily use the > > >>> wrap counter inside the notification to detect this > > >>> and simply discard the "1" notification. > > >>> > > >>> > > >>> If not, I'd like to understand why. > > >> > > >> "Easily"? > > >> > > >> This is a hardware doorbell block used for many different interfaces and > > >> devices. When the notification write comes through, the doorbell block > > >> updates the queue state and schedules the queue for work. If a second > > >> notification comes in and overwrites that update before the queue is > > >> able to run (going backwards but not wrapping), we'll have no way of > > >> detecting it. > > >> > > >> -Andrew > > >> > > > > > > Does not this work? > > > > > > notification includes two values: > > > > > > 1. offset > > > 2. wrap_counter > > > > > > if ((offset2 < offset1 && wrap_counter2 == wrap_counter1) || > > > offset1 > offset1 && wrap_counter2 != wrap_counter1)) { > > > printf("going backwards, discard offset2"); > > > } > > > > > > > No, Michael, this does not work in our programmable hardware device, > > because it is not software. > > > > -Andrew > > > > Should we send a patch to disable this feature and cc stable first > then consider how to fix this? > > Thanks
Given this has been out there for a long time, I do not see how this will contribute anything useful. -- MST
