On Tue, Mar 26, 2024 at 5:49 PM Jonah Palmer <jonah.pal...@oracle.com> wrote: > > > > On 3/25/24 4:33 PM, Eugenio Perez Martin wrote: > > On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer <jonah.pal...@oracle.com> > > wrote: > >> > >> > >> > >> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote: > >>> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.pal...@oracle.com> > >>> wrote: > >>>> > >>>> The goal of these patches is to add support to a variety of virtio and > >>>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature > >>>> indicates that all buffers are used by the device in the same order in > >>>> which they were made available by the driver. > >>>> > >>>> These patches attempt to implement a generalized, non-device-specific > >>>> solution to support this feature. > >>>> > >>>> The core feature behind this solution is a buffer mechanism in the form > >>>> of GLib's GHashTable. The decision behind using a hash table was to > >>>> leverage their ability for quick lookup, insertion, and removal > >>>> operations. Given that our keys are simply numbers of an ordered > >>>> sequence, a hash table seemed like the best choice for a buffer > >>>> mechanism. > >>>> > >>>> --------------------- > >>>> > >>>> The strategy behind this implementation is as follows: > >>>> > >>>> We know that buffers that are popped from the available ring and enqueued > >>>> for further processing will always done in the same order in which they > >>>> were made available by the driver. Given this, we can note their order > >>>> by assigning the resulting VirtQueueElement a key. This key is a number > >>>> in a sequence that represents the order in which they were popped from > >>>> the available ring, relative to the other VirtQueueElements. > >>>> > >>>> For example, given 3 "elements" that were popped from the available > >>>> ring, we assign a key value to them which represents their order (elem0 > >>>> is popped first, then elem1, then lastly elem2): > >>>> > >>>> elem2 -- elem1 -- elem0 ---> Enqueue for processing > >>>> (key: 2) (key: 1) (key: 0) > >>>> > >>>> Then these elements are enqueued for further processing by the host. > >>>> > >>>> While most devices will return these completed elements in the same > >>>> order in which they were enqueued, some devices may not (e.g. > >>>> virtio-blk). To guarantee that these elements are put on the used ring > >>>> in the same order in which they were enqueued, we can use a buffering > >>>> mechanism that keeps track of the next expected sequence number of an > >>>> element. > >>>> > >>>> In other words, if the completed element does not have a key value that > >>>> matches the next expected sequence number, then we know this element is > >>>> not in-order and we must stash it away in a hash table until an order > >>>> can be made. The element's key value is used as the key for placing it > >>>> in the hash table. > >>>> > >>>> If the completed element has a key value that matches the next expected > >>>> sequence number, then we know this element is in-order and we can push > >>>> it on the used ring. Then we increment the next expected sequence number > >>>> and check if the hash table contains an element at this key location. > >>>> > >>>> If so, we retrieve this element, push it to the used ring, delete the > >>>> key-value pair from the hash table, increment the next expected sequence > >>>> number, and check the hash table again for an element at this new key > >>>> location. This process is repeated until we're unable to find an element > >>>> in the hash table to continue the order. > >>>> > >>>> So, for example, say the 3 elements we enqueued were completed in the > >>>> following order: elem1, elem2, elem0. The next expected sequence number > >>>> is 0: > >>>> > >>>> exp-seq-num = 0: > >>>> > >>>> elem1 --> elem1.key == exp-seq-num ? --> No, stash it > >>>> (key: 1) | > >>>> | > >>>> v > >>>> ================ > >>>> |key: 1 - elem1| > >>>> ================ > >>>> --------------------- > >>>> exp-seq-num = 0: > >>>> > >>>> elem2 --> elem2.key == exp-seq-num ? --> No, stash it > >>>> (key: 2) | > >>>> | > >>>> v > >>>> ================ > >>>> |key: 1 - elem1| > >>>> |--------------| > >>>> |key: 2 - elem2| > >>>> ================ > >>>> --------------------- > >>>> exp-seq-num = 0: > >>>> > >>>> elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring > >>>> (key: 0) > >>>> > >>>> exp-seq-num = 1: > >>>> > >>>> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, > >>>> remove elem from table > >>>> | > >>>> v > >>>> ================ > >>>> |key: 2 - elem2| > >>>> ================ > >>>> > >>>> exp-seq-num = 2: > >>>> > >>>> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, > >>>> remove elem from table > >>>> | > >>>> v > >>>> ================ > >>>> | *empty* | > >>>> ================ > >>>> > >>>> exp-seq-num = 3: > >>>> > >>>> lookup(table, exp-seq-num) != NULL ? --> No, done > >>>> --------------------- > >>>> > >>> > >>> I think to use a hashtable to handle this has an important drawback: > >>> it hurts performance on the devices that are using right in-order > >>> because of hash calculus, to benefit devices that are using it badly > >>> by using descriptors out of order. We should use data structs that are > >>> as free as possible for the first, and we don't care to worse the > >>> experience of the devices that enable in_order and they shouldn't. > >>> > >> > >> Right, because if descriptors are coming in in-order, we still search > >> the (empty) hash table. > >> > >> Hmm... what if we introduced a flag to see if we actually should bother > >> searching the hash table? That way we avoid the cost of searching when > >> we really don't need to. > >> > >>> So I suggest reusing vq->used_elems array vq. At each used descriptor > >>> written in the used ring, you know the next head is elem->index + > >>> elem->ndescs, so you can check if that element has been filled or not. > >>> If used, it needs to be flushed too. If not used, just return. > >>> > >>> Of course virtqueue_flush also needs to take this into account. > >>> > >>> What do you think, does it make sense to you? > >>> > >> > >> I'm having a bit of trouble understanding the suggestion here. Would you > >> mind elaborating a bit more for me on this? > >> > >> For example, say elem0, elem1, and elem2 were enqueued in-order (elem0 > >> being first, elem2 last) and then elem2 finishes first, elem1 second, > >> and elem0 third. Given that these elements finish out-of-order, how > >> would you handle these out-of-order elements using your suggestion? > >> > > > > virtqueue_fill is called first with elem2. So vq->used_elems[2 % > > vq->num] is filled with the needed information of the descriptor: > > index, len and ndescs. idx function parameter is ignored. > > > > Optionally, virtqueue_push is called. It checks if > > vq->used_elems[vq->used_idx] is valid. valid can be elem->in_num + > > elem->out_num > 0, and reset them on every used ring write. If it is > > not valid, this is a no-op. Currently, it is not valid. > > > > Same process for elem1. > > > > virtqueue_fill is the same for elem0. But now virtqueue_flush gets > > interesting, as it detects vq->used_elems[0] is used. It scans for the > > first not-used element, and it finds it is vq->used_elems[3]. So it > > needs to write an used elem with id = 2 and the corresponding length. > > > > Maybe it is interesting to implement ways to improve the look for the > > last used descriptor, but if any I'd go for a bitmap and always on top > > of the basis series. > > > > The algorithm has not been tested, so maybe I've missed something. > > > > Thanks! > > > > Thank you for taking the time to clarify for this for me, I appreciate it. > > I spent some time yesterday and this morning working this over in my > head. I believe I understand what you're trying to do here and it makes > more sense than employing a data structure like a hash table for this > kind of job. However, I have a few questions regarding this implementation. > > So, one question is on the reuse of the VirtQueue's used_elems array. > Wont reusing this array cause issues with packed VQ operations, since it > also uses this array? If we want to stick with using this array > specifically, perhaps we may need to rewrite its logic if the device has > negotiated the in_order feature? E.g. > > virtqueue_packed_flush (...) { > if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER) { > // new logic > } else { > // current logic > } > } > ----------- >
That's right. > Regarding this paragraph: > > "virtqueue_fill is called first with elem2. So vq->used_elems[2 % > vq->num] is filled with the needed information of the descriptor: > index, len and ndescs. idx function parameter is ignored." > > This looks exactly like virtqueue_packed_fill except for the idx > parameter we'd pass in (sequence_num % vq->vring.num). > > In any case, regardless of whether this element being passed in is > considered to be in-order or not, we still add this element to > vq->used_elems in virtqueue_fill. Ok, got it. > > Then you say "Optionally, virtqueue_push is called". I assume by > "optionally" you mean we need to know if this is a single-shot operation > or a batched operation. A single-shot operation would call for > virtqueue_push whereas a batched operation would just use > virtqueue_fill. If this is what you meant by that then ok, I understand > that too. > Totally correct. > However, I think before we start considering whether or not we need to > call virtqueue_push or continue with virtqueue_fill, we first should > know whether or not this element is in-order. And I think to do that we > should use the check you mentioned: > > if (vq->used_elems[vq->used_idx].in_num + > vq->used_elems[vq->used_idx].out_num > 0) > > or perhaps: > > if (vq->used_elems[vq->used_idx] != NULL) > > If the element is found not to be in-order, I assume we return and we > are done with the handling of this element for now. > > Now my confusion with this part comes from calling virtqueue_push inside > of the virtqueue_fill function. Wouldn't calling virtqueue_push inside > of virtqueue_fill present some kind of recursive execution path? Unless > I'm missing something here, this probably isn't something we need to do, > right? Maybe I explained something wrong, but virtqueue_fill should not call virtqueue_push. It's up to the caller (virtio-net, virtio-blk, etc) to call one or another. Can you elaborate? > ----------- > > Lastly, when execution reaches virtqueue_flush, what would define an > element as unused? Perhaps... > > if (vq->used_elems[i] == NULL) > used_elems is not an array of pointers but an array of VirtQueueElement so we cannot do this way. > or > > if (vq->used_elems[i].in_num + vq->used_elems[i].out_num > 0) > Right, I propose to reset both in_num = out_num = 0. Thanks! > Thanks Eugenio! > > >> Thanks :) > >> > >>> Thanks! > >>> > >>> > >>>> Jonah Palmer (8): > >>>> virtio: Define InOrderVQElement > >>>> virtio: Create/destroy/reset VirtQueue In-Order hash table > >>>> virtio: Define order variables > >>>> virtio: Implement in-order handling for virtio devices > >>>> virtio-net: in-order handling > >>>> vhost-svq: in-order handling > >>>> vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits > >>>> virtio: Add VIRTIO_F_IN_ORDER property definition > >>>> > >>>> hw/block/vhost-user-blk.c | 1 + > >>>> hw/net/vhost_net.c | 2 + > >>>> hw/net/virtio-net.c | 6 +- > >>>> hw/scsi/vhost-scsi.c | 1 + > >>>> hw/scsi/vhost-user-scsi.c | 1 + > >>>> hw/virtio/vhost-shadow-virtqueue.c | 15 ++++- > >>>> hw/virtio/vhost-user-fs.c | 1 + > >>>> hw/virtio/vhost-user-vsock.c | 1 + > >>>> hw/virtio/virtio.c | 103 ++++++++++++++++++++++++++++- > >>>> include/hw/virtio/virtio.h | 20 +++++- > >>>> net/vhost-vdpa.c | 1 + > >>>> 11 files changed, 145 insertions(+), 7 deletions(-) > >>>> > >>>> -- > >>>> 2.39.3 > >>>> > >>> > >> > > >