On Fri, Apr 5, 2024 at 3:59 PM Jonah Palmer <jonah.pal...@oracle.com> wrote: > > > > On 4/4/24 12:33 PM, Eugenio Perez Martin wrote: > > On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer <jonah.pal...@oracle.com> wrote: > >> > >> > >> > >> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote: > >>> On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.pal...@oracle.com> > >>> wrote: > >>>> > >>>> > >>>> > >>>> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote: > >>>>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.pal...@oracle.com> > >>>>> wrote: > >>>>>> > >>>>>> Initialize sequence variables for VirtQueue and VirtQueueElement > >>>>>> structures. A VirtQueue's sequence variables are initialized when a > >>>>>> VirtQueue is being created or reset. A VirtQueueElement's sequence > >>>>>> variable is initialized when a VirtQueueElement is being initialized. > >>>>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature. > >>>>>> > >>>>>> A VirtQueue's used_seq_idx represents the next expected index in a > >>>>>> sequence of VirtQueueElements to be processed (put on the used ring). > >>>>>> The next VirtQueueElement added to the used ring must match this > >>>>>> sequence number before additional elements can be safely added to the > >>>>>> used ring. It's also particularly useful for helping find the number of > >>>>>> new elements added to the used ring. > >>>>>> > >>>>>> A VirtQueue's current_seq_idx represents the current sequence index. > >>>>>> This value is essentially a counter where the value is assigned to a > >>>>>> new > >>>>>> VirtQueueElement and then incremented. Given its uint16_t type, this > >>>>>> sequence number can be between 0 and 65,535. > >>>>>> > >>>>>> A VirtQueueElement's seq_idx represents the sequence number assigned to > >>>>>> the VirtQueueElement when it was created. This value must match with > >>>>>> the > >>>>>> VirtQueue's used_seq_idx before the element can be put on the used ring > >>>>>> by the device. > >>>>>> > >>>>>> Signed-off-by: Jonah Palmer <jonah.pal...@oracle.com> > >>>>>> --- > >>>>>> hw/virtio/virtio.c | 18 ++++++++++++++++++ > >>>>>> include/hw/virtio/virtio.h | 1 + > >>>>>> 2 files changed, 19 insertions(+) > >>>>>> > >>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c > >>>>>> index fb6b4ccd83..069d96df99 100644 > >>>>>> --- a/hw/virtio/virtio.c > >>>>>> +++ b/hw/virtio/virtio.c > >>>>>> @@ -132,6 +132,10 @@ struct VirtQueue > >>>>>> uint16_t used_idx; > >>>>>> bool used_wrap_counter; > >>>>>> > >>>>>> + /* In-Order sequence indices */ > >>>>>> + uint16_t used_seq_idx; > >>>>>> + uint16_t current_seq_idx; > >>>>>> + > >>>>> > >>>>> I'm having a hard time understanding the difference between these and > >>>>> last_avail_idx and used_idx. It seems to me if we replace them > >>>>> everything will work? What am I missing? > >>>>> > >>>> > >>>> For used_seq_idx, it does work like used_idx except the difference is > >>>> when their values get updated, specifically for the split VQ case. > >>>> > >>>> As you know, for the split VQ case, the used_idx is updated during > >>>> virtqueue_split_flush. However, imagine a batch of elements coming in > >>>> where virtqueue_split_fill is called multiple times before > >>>> virtqueue_split_flush. We want to make sure we write these elements to > >>>> the used ring in-order and we'll know its order based on used_seq_idx. > >>>> > >>>> Alternatively, I thought about replicating the logic for the packed VQ > >>>> case (where this used_seq_idx isn't used) where we start looking at > >>>> vq->used_elems[vq->used_idx] and iterate through until we find a used > >>>> element, but I wasn't sure how to handle the case where elements get > >>>> used (written to the used ring) and new elements get put in used_elems > >>>> before the used_idx is updated. Since this search would require us to > >>>> always start at index vq->used_idx. > >>>> > >>>> For example, say, of three elements getting filled (elem0 - elem2), > >>>> elem1 and elem0 come back first (vq->used_idx = 0): > >>>> > >>>> elem1 - not in-order > >>>> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now > >>>> in-order, write elem0 and elem1 to used ring, mark elements as > >>>> used > >>>> > >>>> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to > >>>> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1 > >>>> (elem1) and iterate to vq->used_idx + 2 (elem2)? > >>>> > >>>> Hmm... now that I'm thinking about it, maybe for the split VQ case we > >>>> could continue looking through the vq->used_elems array until we find an > >>>> unused element... but then again how would we (1) know if the element is > >>>> in-order and (2) know when to stop searching? > >>>> > >>> > >>> Ok I think I understand the problem now. It is aggravated if we add > >>> chained descriptors to the mix. > >>> > >>> We know that the order of used descriptors must be the exact same as > >>> the order they were made available, leaving out in order batching. > >>> What if vq->used_elems at virtqueue_pop and then virtqueue_push just > >>> marks them as used somehow? Two booleans (or flag) would do for a > >>> first iteration. > >>> > >>> If we go with this approach I think used_elems should be renamed actually. > >>> > >> > >> If I'm understanding correctly, I don't think adding newly created > >> elements to vq->used_elems at virtqueue_pop will do much for us. > > > > By knowing what descriptor id must go in each position of the used ring. > > > > Following your example, let's say avail_idx is 10 at that moment. > > Then, the driver makes available the three elements you mention, so: > > used_elems[10] = elem0 > > used_elems[11] = elem1 > > used_elems[12] = elem2 > > > > Now the device uses elem1. virtqueue_push can search linearly for > > elem->index in used_elems[used_idx]...used_elems[avail_idx] range. As > > the device is mis-behaving, no need to optimize this behavior. > > used_elems[11].index == elem->index, so we mark it as used somehow. > > Let's say we add a boolean to VirtQueueElement. > > > > virtqueue_flush can scan used_elems[used_idx]...used_elems[avail_idx] > > looking for used elements. At this moment used_elems[used_idx] is not > > used so it stops there. > > > > Now elem0 is pushed. It is the first one in the > > used_elems[used_idx]...used_elems[avail_idx] range, so we can write it > > to the used ring at fill. used_idx++. We use the rest of the > > descriptor until we find one in used_elems that is not used, which is > > used_elems[12]. > > > > After that virtqueue_flush is called. At its scanning, used_elems[10] > > is used, so it writes it to the used ring. After that, used_elems[11] > > is also used, so it is written also. used_elems[12] is not used, so it > > stops there. > > > > Finally, elem2 is pushed, so used_elems[12] is written. > > virtqueue_flush detects it, so it writes it to the guest's used ring. > > > > Let me know what you think. I find it simpler than declaring new > > indexes, but I may be wrong. > > > > I think I see where you're getting at, but I just have a few clarifying > questions about your proposal here. > > So you're proposing to add entries to used_elems at virtqueue_pop, ok. > > avail_idx = 10, then the driver makes some new entries (elems) available > in the avail ring: > > used_elems[10] = elem0 > used_elems[11] = elem1 > used_elems[12] = elem2 > > At this point, avail_idx = 13, used_idx = 10. > > elem1 gets used first, ok. > > Now, if I'm understanding correctly, you're saying that in > virtqueue_push explicitly (not virtqueue_fill/virtqueue_flush), we scan > used_elems[used_idx] - used_elems[avail_idx] to find used_elems[i].index > == elem->index and mark it as used, e.g. used_elems[i].used = true. > Okay, now used_elems[11] has been marked as used. > > Now we make it to virtqueue_fill. What's the role you want > virtqueue_fill to take here (if any)? >
Sorry I meant virtqueue_fill here. > You say that after we mark this element as used, we go to > virtqueue_flush and scan for used elements in used_elems[used_idx] - > used_elems[avail_idx]. Of course, the first one of this order will be in > used_elems[used_idx], which is currently showing the element as unused, > so we're done with this element for now. > > So, what exactly is the role of virtqueue_flush here? I'm inferring here > that you want the virtqueue_flush role (for split VQs) to do both the > writing to the used ring (normally done in virtqueue_fill) as well as > updating the used_idx (normally done in virtqueue_flush). Is this correct? > I modelled this after the packed vq scenario, where all is updated at _flush. But yes, I think you got it totally right. > Next, elem0 gets used second. > > Again, in virtqueue_push we scan scan used_elems[used_idx] - > used_elems[avail_idx] to find used_elems[i].index == elem->index and > mark it as used. Okay, used_elems[10] has been marked as used. > > Then you say, "It is the first one in the used_elems[used_idx] - > used_elems[avail_idx] range, so we can write it to the used ring at > fill. used_idx++. We use the rest of the descriptor until we find one in > used_elems that is not used, which is used_elems[12]." > > This, to me, sounds like "in virtqueue_fill, when we find an order (e.g. > used_elems[used_idx].index == elem->index) write it to the used ring AND > increment the used_idx. Keep writing elements to the used ring if > used_elems[used_idx].used == true and, for each element being written, > incremented used_idx." > > This is a bit confusing to me since next you say "After that, > virtqueue_flush is called. At its scanning, used_elems[10] is used, so > it writes it to the used ring. After that, used_elems[11] is also used, > so it is written also. used_elems[12] is not used, so it stops there." > > This sounds very similar to what you proposed for virtqueue_fill, except > it looks like you're also saying to do this in virtqueue_flush, hence my > confusion. > > If you wouldn't mind, could you clarify the roles of virtqueue_fill and > virtqueue_flush here for me? Thanks :)! > I see how they're confusing if following the split vq way, sorry! * _fill: Only update used_elems (or equivalent) * _flush: Write information to used ring or descriptor ring. > > This makes it difficult to actually batch used descriptors. My > > proposal is to address it in another series, by delegating it to the > > caller and recovering proper usage of virtqueue_push(idx) parameter. > > The caller can specify either to batch as usual, or to delegate the > > automatic (and potentially inefficient) ordering I'm proposing here. > > > > Just to be clear, for this series, you'd like me to implement a solution > that does *not* consider the case where virtqueue_fill is called > multiple times before virtqueue_flush (and to put a solution for this in > a separate series)? > No, it is supported. It is just not very efficient because of the linear search. For it to be supported properly the caller should indicate virtqueue_fill idx properly. But that requires modifications to all devices, so I'm proposing to do it on top. > Are we not concerned that we might shoot ourselves in the foot here by > implementing a process that may not work well for a batching solution, > especially when we have an almost-working solution for batching and > non-batching cases? > > >> We > >> could just keep adding processed elements to vq->used_elems at > >> virtqueue_fill but instead of: > >> > >> vq->used_elems[seq_idx].in_num = elem->in_num; > >> vq->used_elems[seq_idx].out_num = elem->out_num; > >> > >> We could do: > >> > >> vq->used_elems[seq_idx].in_num = 1; > >> vq->used_elems[seq_idx].out_num = 1; > >> > >> We'd use in_num and out_num as separate flags. in_num could indicate if > >> this element has been written to the used ring while out_num could > >> indicate if this element has been flushed (1 for no, 0 for yes). In > >> other words, when we go to write to the used ring, start at index > >> vq->used_idx and iterate through the used elements. > >> > >> If a used element's in_num and out_num == 0, then this element is > >> invalid (not yet processed) and we stop the search. > >> > >> If a used element's in_num and out_num == 1, then this element is valid, > >> written to the used ring, in_num is set to 0, and the search continues. > >> > >> Lastly, if a used element's in_num == 0 but out_num == 1, then this > >> element has already been written to the used ring but not yet flushed, > >> so ignore this element and continue searching. > >> > >> There should never be a case where in_num == 1 and out_num == 0. > >> > >> However, this would probably mean that before (or right after) we > >> actually perform the flush we'll have to iterate through the used_elems > >> array one more time and set their out_num's to 0 to indicate the element > >> has been flushed. > >> > >> Again, this is just for the batched split VQ case where we have to keep > >> track of elements that have been written but not flushed and elements > >> that have been written and flushed, given that we don't know which > >> elements have actually been written to the used ring until the used_idx > >> is updated. > >> > >> This approach appears more costly though if we're really trying to avoid > >> having this new used_seq_idx VirtQueue member. > >> > >>>> In any case, the use of this variable could be seen as an optimization > >>>> as its value will tell us where to start looking in vq->used_elems > >>>> instead of always starting at vq->used_idx. > >>>> > >>>> If this is like a one-shot scenario where one element gets written and > >>>> then flushed after, then yes in this case used_seq_idx == used_idx. > >>>> > >>>> ------ > >>>> > >>>> For current_seq_idx, this is pretty much just a counter. Every new > >>>> VirtQueueElement created from virtqueue_pop is given a number and the > >>>> counter is incremented. Like grabbing a ticket number and waiting for > >>>> your number to be called. The next person to grab a ticket number will > >>>> be your number + 1. > >>>> > >>> > >>> So it's like last_avail_idx, isn't it? > >>> > >> > >> For the split VQ case, pretty much. Though if we hit this case in > >> virtqueue_split_pop, we may get into some trouble: > >> > >> if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) { > >> goto done; > >> } > >> > > > > That's a fatal mistake and the device breaks, vdev->broken = true. It > > cannot be used anymore from that point on, because of all the checks > > of that variable. > > > > Does that solve the problem? > > > > Ah, it does. My apologies, I should've recognized this would result in > the device breaking. > > >> However for the packed VQ case, last_avail_idx might not work in the way > >> we'd need it to for this implementation. In virtqueue_packed_pop, we see > >> this: > >> > >> elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries; > >> vq->last_avail_idx += elem->ndescs; > >> > >> It would appear as though last_avail_idx is incremented by total number > >> of descriptors associated with the element, which can be greater than 1. > >> This implementation increments by 1 for each element. > >> > >> Actually... It's good you mentioned this because I think my packed VQ > >> implementation is wrong. For packed VQs, vq->used_idx is incremented by > >> the total number of descriptors in the flushed elements and not > >> necessarily the number of elements being flushed like in the split VQ > >> case. I'm adding elements to vq->used_elems in a per-element sequence > >> rather than going by the number of descriptors an element holds, which > >> should be the case for packed VQs. > >> > > > > If you keep it by your proposed index I think you can increase it one > > per head, as they are the entries that are written in both cases. > > unsed_idx should increment properly already. > > > > If you move to my proposal, both should increase by elem->ndescs as > > you suggest here. > > > > Ack! Thanks! > > >>>> Let me know if I'm making any sense. Thanks :) > >>>> > >>>> Jonah > >>>> > >>>>>> /* Last used index value we have signalled on */ > >>>>>> uint16_t signalled_used; > >>>>>> > >>>>>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, > >>>>>> size_t sz) > >>>>>> elem->in_sg[i] = iov[out_num + i]; > >>>>>> } > >>>>>> > >>>>>> + /* Assign sequence index for in-order processing */ > >>>>>> + if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) { > >>>>>> + elem->seq_idx = vq->current_seq_idx++; > >>>>>> + } > >>>>>> + > >>>>>> vq->inuse++; > >>>>>> > >>>>>> trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num); > >>>>>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue > >>>>>> *vq, size_t sz) > >>>>>> vq->shadow_avail_idx = vq->last_avail_idx; > >>>>>> vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter; > >>>>>> > >>>>>> + /* Assign sequence index for in-order processing */ > >>>>>> + if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) { > >>>>>> + elem->seq_idx = vq->current_seq_idx++; > >>>>>> + } > >>>>>> + > >>>>>> trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num); > >>>>>> done: > >>>>>> address_space_cache_destroy(&indirect_desc_cache); > >>>>>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice > >>>>>> *vdev, uint32_t i) > >>>>>> vdev->vq[i].notification = true; > >>>>>> vdev->vq[i].vring.num = vdev->vq[i].vring.num_default; > >>>>>> vdev->vq[i].inuse = 0; > >>>>>> + vdev->vq[i].used_seq_idx = 0; > >>>>>> + vdev->vq[i].current_seq_idx = 0; > >>>>>> virtio_virtqueue_reset_region_cache(&vdev->vq[i]); > >>>>>> } > >>>>>> > >>>>>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, > >>>>>> int queue_size, > >>>>>> vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN; > >>>>>> vdev->vq[i].handle_output = handle_output; > >>>>>> vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size); > >>>>>> + vdev->vq[i].used_seq_idx = 0; > >>>>>> + vdev->vq[i].current_seq_idx = 0; > >>>>>> > >>>>>> return &vdev->vq[i]; > >>>>>> } > >>>>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > >>>>>> index b3c74a1bca..910b2a3427 100644 > >>>>>> --- a/include/hw/virtio/virtio.h > >>>>>> +++ b/include/hw/virtio/virtio.h > >>>>>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement > >>>>>> hwaddr *out_addr; > >>>>>> struct iovec *in_sg; > >>>>>> struct iovec *out_sg; > >>>>>> + uint16_t seq_idx; > >>>>>> } VirtQueueElement; > >>>>>> > >>>>>> #define VIRTIO_QUEUE_MAX 1024 > >>>>>> -- > >>>>>> 2.39.3 > >>>>>> > >>>>> > >>>> > >>> > >> > > >