Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
On 3/26/24 2:34 PM, Eugenio Perez Martin wrote: On Tue, Mar 26, 2024 at 5:49 PM Jonah Palmer wrote: On 3/25/24 4:33 PM, Eugenio Perez Martin wrote: On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer wrote: On 3/22/24 7:18 AM, Eugenio Perez Martin wrote: On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer wrote: The goal of these patches is to add support to a variety of virtio and vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature indicates that all buffers are used by the device in the same order in which they were made available by the driver. These patches attempt to implement a generalized, non-device-specific solution to support this feature. The core feature behind this solution is a buffer mechanism in the form of GLib's GHashTable. The decision behind using a hash table was to leverage their ability for quick lookup, insertion, and removal operations. Given that our keys are simply numbers of an ordered sequence, a hash table seemed like the best choice for a buffer mechanism. - The strategy behind this implementation is as follows: We know that buffers that are popped from the available ring and enqueued for further processing will always done in the same order in which they were made available by the driver. Given this, we can note their order by assigning the resulting VirtQueueElement a key. This key is a number in a sequence that represents the order in which they were popped from the available ring, relative to the other VirtQueueElements. For example, given 3 "elements" that were popped from the available ring, we assign a key value to them which represents their order (elem0 is popped first, then elem1, then lastly elem2): elem2 -- elem1 -- elem0 ---> Enqueue for processing (key: 2)(key: 1)(key: 0) Then these elements are enqueued for further processing by the host. While most devices will return these completed elements in the same order in which they were enqueued, some devices may not (e.g. virtio-blk). To guarantee that these elements are put on the used ring in the same order in which they were enqueued, we can use a buffering mechanism that keeps track of the next expected sequence number of an element. In other words, if the completed element does not have a key value that matches the next expected sequence number, then we know this element is not in-order and we must stash it away in a hash table until an order can be made. The element's key value is used as the key for placing it in the hash table. If the completed element has a key value that matches the next expected sequence number, then we know this element is in-order and we can push it on the used ring. Then we increment the next expected sequence number and check if the hash table contains an element at this key location. If so, we retrieve this element, push it to the used ring, delete the key-value pair from the hash table, increment the next expected sequence number, and check the hash table again for an element at this new key location. This process is repeated until we're unable to find an element in the hash table to continue the order. So, for example, say the 3 elements we enqueued were completed in the following order: elem1, elem2, elem0. The next expected sequence number is 0: exp-seq-num = 0: elem1 --> elem1.key == exp-seq-num ? --> No, stash it (key: 1) | | v |key: 1 - elem1| - exp-seq-num = 0: elem2 --> elem2.key == exp-seq-num ? --> No, stash it (key: 2) | | v |key: 1 - elem1| |--| |key: 2 - elem2| - exp-seq-num = 0: elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring (key: 0) exp-seq-num = 1: lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, remove elem from table | v |key: 2 - elem2|
Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
On Tue, Mar 26, 2024 at 5:49 PM Jonah Palmer wrote: > > > > On 3/25/24 4:33 PM, Eugenio Perez Martin wrote: > > On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer > > wrote: > >> > >> > >> > >> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote: > >>> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer > >>> wrote: > > The goal of these patches is to add support to a variety of virtio and > vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature > indicates that all buffers are used by the device in the same order in > which they were made available by the driver. > > These patches attempt to implement a generalized, non-device-specific > solution to support this feature. > > The core feature behind this solution is a buffer mechanism in the form > of GLib's GHashTable. The decision behind using a hash table was to > leverage their ability for quick lookup, insertion, and removal > operations. Given that our keys are simply numbers of an ordered > sequence, a hash table seemed like the best choice for a buffer > mechanism. > > - > > The strategy behind this implementation is as follows: > > We know that buffers that are popped from the available ring and enqueued > for further processing will always done in the same order in which they > were made available by the driver. Given this, we can note their order > by assigning the resulting VirtQueueElement a key. This key is a number > in a sequence that represents the order in which they were popped from > the available ring, relative to the other VirtQueueElements. > > For example, given 3 "elements" that were popped from the available > ring, we assign a key value to them which represents their order (elem0 > is popped first, then elem1, then lastly elem2): > > elem2 -- elem1 -- elem0 ---> Enqueue for processing > (key: 2)(key: 1)(key: 0) > > Then these elements are enqueued for further processing by the host. > > While most devices will return these completed elements in the same > order in which they were enqueued, some devices may not (e.g. > virtio-blk). To guarantee that these elements are put on the used ring > in the same order in which they were enqueued, we can use a buffering > mechanism that keeps track of the next expected sequence number of an > element. > > In other words, if the completed element does not have a key value that > matches the next expected sequence number, then we know this element is > not in-order and we must stash it away in a hash table until an order > can be made. The element's key value is used as the key for placing it > in the hash table. > > If the completed element has a key value that matches the next expected > sequence number, then we know this element is in-order and we can push > it on the used ring. Then we increment the next expected sequence number > and check if the hash table contains an element at this key location. > > If so, we retrieve this element, push it to the used ring, delete the > key-value pair from the hash table, increment the next expected sequence > number, and check the hash table again for an element at this new key > location. This process is repeated until we're unable to find an element > in the hash table to continue the order. > > So, for example, say the 3 elements we enqueued were completed in the > following order: elem1, elem2, elem0. The next expected sequence number > is 0: > > exp-seq-num = 0: > > elem1 --> elem1.key == exp-seq-num ? --> No, stash it > (key: 1) | > | > v > > |key: 1 - elem1| > > - > exp-seq-num = 0: > > elem2 --> elem2.key == exp-seq-num ? --> No, stash it > (key: 2) | > | > v > > |key: 1 - elem1| > |--| > |key: 2 - elem2| > > - >
Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
On 3/25/24 4:33 PM, Eugenio Perez Martin wrote: On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer wrote: On 3/22/24 7:18 AM, Eugenio Perez Martin wrote: On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer wrote: The goal of these patches is to add support to a variety of virtio and vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature indicates that all buffers are used by the device in the same order in which they were made available by the driver. These patches attempt to implement a generalized, non-device-specific solution to support this feature. The core feature behind this solution is a buffer mechanism in the form of GLib's GHashTable. The decision behind using a hash table was to leverage their ability for quick lookup, insertion, and removal operations. Given that our keys are simply numbers of an ordered sequence, a hash table seemed like the best choice for a buffer mechanism. - The strategy behind this implementation is as follows: We know that buffers that are popped from the available ring and enqueued for further processing will always done in the same order in which they were made available by the driver. Given this, we can note their order by assigning the resulting VirtQueueElement a key. This key is a number in a sequence that represents the order in which they were popped from the available ring, relative to the other VirtQueueElements. For example, given 3 "elements" that were popped from the available ring, we assign a key value to them which represents their order (elem0 is popped first, then elem1, then lastly elem2): elem2 -- elem1 -- elem0 ---> Enqueue for processing (key: 2)(key: 1)(key: 0) Then these elements are enqueued for further processing by the host. While most devices will return these completed elements in the same order in which they were enqueued, some devices may not (e.g. virtio-blk). To guarantee that these elements are put on the used ring in the same order in which they were enqueued, we can use a buffering mechanism that keeps track of the next expected sequence number of an element. In other words, if the completed element does not have a key value that matches the next expected sequence number, then we know this element is not in-order and we must stash it away in a hash table until an order can be made. The element's key value is used as the key for placing it in the hash table. If the completed element has a key value that matches the next expected sequence number, then we know this element is in-order and we can push it on the used ring. Then we increment the next expected sequence number and check if the hash table contains an element at this key location. If so, we retrieve this element, push it to the used ring, delete the key-value pair from the hash table, increment the next expected sequence number, and check the hash table again for an element at this new key location. This process is repeated until we're unable to find an element in the hash table to continue the order. So, for example, say the 3 elements we enqueued were completed in the following order: elem1, elem2, elem0. The next expected sequence number is 0: exp-seq-num = 0: elem1 --> elem1.key == exp-seq-num ? --> No, stash it (key: 1) | | v |key: 1 - elem1| - exp-seq-num = 0: elem2 --> elem2.key == exp-seq-num ? --> No, stash it (key: 2) | | v |key: 1 - elem1| |--| |key: 2 - elem2| - exp-seq-num = 0: elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring (key: 0) exp-seq-num = 1: lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, remove elem from table | v |key: 2 - elem2| exp-seq-num = 2: lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer wrote: > > > > On 3/22/24 7:18 AM, Eugenio Perez Martin wrote: > > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer > > wrote: > >> > >> The goal of these patches is to add support to a variety of virtio and > >> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature > >> indicates that all buffers are used by the device in the same order in > >> which they were made available by the driver. > >> > >> These patches attempt to implement a generalized, non-device-specific > >> solution to support this feature. > >> > >> The core feature behind this solution is a buffer mechanism in the form > >> of GLib's GHashTable. The decision behind using a hash table was to > >> leverage their ability for quick lookup, insertion, and removal > >> operations. Given that our keys are simply numbers of an ordered > >> sequence, a hash table seemed like the best choice for a buffer > >> mechanism. > >> > >> - > >> > >> The strategy behind this implementation is as follows: > >> > >> We know that buffers that are popped from the available ring and enqueued > >> for further processing will always done in the same order in which they > >> were made available by the driver. Given this, we can note their order > >> by assigning the resulting VirtQueueElement a key. This key is a number > >> in a sequence that represents the order in which they were popped from > >> the available ring, relative to the other VirtQueueElements. > >> > >> For example, given 3 "elements" that were popped from the available > >> ring, we assign a key value to them which represents their order (elem0 > >> is popped first, then elem1, then lastly elem2): > >> > >> elem2 -- elem1 -- elem0 ---> Enqueue for processing > >> (key: 2)(key: 1)(key: 0) > >> > >> Then these elements are enqueued for further processing by the host. > >> > >> While most devices will return these completed elements in the same > >> order in which they were enqueued, some devices may not (e.g. > >> virtio-blk). To guarantee that these elements are put on the used ring > >> in the same order in which they were enqueued, we can use a buffering > >> mechanism that keeps track of the next expected sequence number of an > >> element. > >> > >> In other words, if the completed element does not have a key value that > >> matches the next expected sequence number, then we know this element is > >> not in-order and we must stash it away in a hash table until an order > >> can be made. The element's key value is used as the key for placing it > >> in the hash table. > >> > >> If the completed element has a key value that matches the next expected > >> sequence number, then we know this element is in-order and we can push > >> it on the used ring. Then we increment the next expected sequence number > >> and check if the hash table contains an element at this key location. > >> > >> If so, we retrieve this element, push it to the used ring, delete the > >> key-value pair from the hash table, increment the next expected sequence > >> number, and check the hash table again for an element at this new key > >> location. This process is repeated until we're unable to find an element > >> in the hash table to continue the order. > >> > >> So, for example, say the 3 elements we enqueued were completed in the > >> following order: elem1, elem2, elem0. The next expected sequence number > >> is 0: > >> > >> exp-seq-num = 0: > >> > >> elem1 --> elem1.key == exp-seq-num ? --> No, stash it > >> (key: 1) | > >> | > >> v > >> > >> |key: 1 - elem1| > >> > >> - > >> exp-seq-num = 0: > >> > >> elem2 --> elem2.key == exp-seq-num ? --> No, stash it > >> (key: 2) | > >> | > >> v > >> > >> |key: 1 - elem1| > >> |--| > >> |key: 2 - elem2| > >> > >> - > >> exp-seq-num = 0: > >> > >> elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring > >> (key: 0) > >> > >> exp-seq-num = 1: > >> > >> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, > >> remove elem from table > >>
Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
On 3/22/24 7:18 AM, Eugenio Perez Martin wrote: On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer wrote: The goal of these patches is to add support to a variety of virtio and vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature indicates that all buffers are used by the device in the same order in which they were made available by the driver. These patches attempt to implement a generalized, non-device-specific solution to support this feature. The core feature behind this solution is a buffer mechanism in the form of GLib's GHashTable. The decision behind using a hash table was to leverage their ability for quick lookup, insertion, and removal operations. Given that our keys are simply numbers of an ordered sequence, a hash table seemed like the best choice for a buffer mechanism. - The strategy behind this implementation is as follows: We know that buffers that are popped from the available ring and enqueued for further processing will always done in the same order in which they were made available by the driver. Given this, we can note their order by assigning the resulting VirtQueueElement a key. This key is a number in a sequence that represents the order in which they were popped from the available ring, relative to the other VirtQueueElements. For example, given 3 "elements" that were popped from the available ring, we assign a key value to them which represents their order (elem0 is popped first, then elem1, then lastly elem2): elem2 -- elem1 -- elem0 ---> Enqueue for processing (key: 2)(key: 1)(key: 0) Then these elements are enqueued for further processing by the host. While most devices will return these completed elements in the same order in which they were enqueued, some devices may not (e.g. virtio-blk). To guarantee that these elements are put on the used ring in the same order in which they were enqueued, we can use a buffering mechanism that keeps track of the next expected sequence number of an element. In other words, if the completed element does not have a key value that matches the next expected sequence number, then we know this element is not in-order and we must stash it away in a hash table until an order can be made. The element's key value is used as the key for placing it in the hash table. If the completed element has a key value that matches the next expected sequence number, then we know this element is in-order and we can push it on the used ring. Then we increment the next expected sequence number and check if the hash table contains an element at this key location. If so, we retrieve this element, push it to the used ring, delete the key-value pair from the hash table, increment the next expected sequence number, and check the hash table again for an element at this new key location. This process is repeated until we're unable to find an element in the hash table to continue the order. So, for example, say the 3 elements we enqueued were completed in the following order: elem1, elem2, elem0. The next expected sequence number is 0: exp-seq-num = 0: elem1 --> elem1.key == exp-seq-num ? --> No, stash it (key: 1) | | v |key: 1 - elem1| - exp-seq-num = 0: elem2 --> elem2.key == exp-seq-num ? --> No, stash it (key: 2) | | v |key: 1 - elem1| |--| |key: 2 - elem2| - exp-seq-num = 0: elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring (key: 0) exp-seq-num = 1: lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, remove elem from table | v |key: 2 - elem2| exp-seq-num = 2: lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, remove elem from table |
Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer wrote: > > The goal of these patches is to add support to a variety of virtio and > vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature > indicates that all buffers are used by the device in the same order in > which they were made available by the driver. > > These patches attempt to implement a generalized, non-device-specific > solution to support this feature. > > The core feature behind this solution is a buffer mechanism in the form > of GLib's GHashTable. The decision behind using a hash table was to > leverage their ability for quick lookup, insertion, and removal > operations. Given that our keys are simply numbers of an ordered > sequence, a hash table seemed like the best choice for a buffer > mechanism. > > - > > The strategy behind this implementation is as follows: > > We know that buffers that are popped from the available ring and enqueued > for further processing will always done in the same order in which they > were made available by the driver. Given this, we can note their order > by assigning the resulting VirtQueueElement a key. This key is a number > in a sequence that represents the order in which they were popped from > the available ring, relative to the other VirtQueueElements. > > For example, given 3 "elements" that were popped from the available > ring, we assign a key value to them which represents their order (elem0 > is popped first, then elem1, then lastly elem2): > > elem2 -- elem1 -- elem0 ---> Enqueue for processing > (key: 2)(key: 1)(key: 0) > > Then these elements are enqueued for further processing by the host. > > While most devices will return these completed elements in the same > order in which they were enqueued, some devices may not (e.g. > virtio-blk). To guarantee that these elements are put on the used ring > in the same order in which they were enqueued, we can use a buffering > mechanism that keeps track of the next expected sequence number of an > element. > > In other words, if the completed element does not have a key value that > matches the next expected sequence number, then we know this element is > not in-order and we must stash it away in a hash table until an order > can be made. The element's key value is used as the key for placing it > in the hash table. > > If the completed element has a key value that matches the next expected > sequence number, then we know this element is in-order and we can push > it on the used ring. Then we increment the next expected sequence number > and check if the hash table contains an element at this key location. > > If so, we retrieve this element, push it to the used ring, delete the > key-value pair from the hash table, increment the next expected sequence > number, and check the hash table again for an element at this new key > location. This process is repeated until we're unable to find an element > in the hash table to continue the order. > > So, for example, say the 3 elements we enqueued were completed in the > following order: elem1, elem2, elem0. The next expected sequence number > is 0: > > exp-seq-num = 0: > > elem1 --> elem1.key == exp-seq-num ? --> No, stash it > (key: 1) | > | > v > >|key: 1 - elem1| > > - > exp-seq-num = 0: > > elem2 --> elem2.key == exp-seq-num ? --> No, stash it > (key: 2) | > | > v > >|key: 1 - elem1| >|--| >|key: 2 - elem2| > > - > exp-seq-num = 0: > > elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring > (key: 0) > > exp-seq-num = 1: > > lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, > remove elem from table > | > v > >|key: 2 - elem2| > > > exp-seq-num = 2: > > lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, >
Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
On 3/21/24 3:48 PM, Dongli Zhang wrote: Hi Jonah, Would you mind helping explain how does VIRTIO_F_IN_ORDER improve the performance? https://lore.kernel.org/all/20240321155717.1392787-1-jonah.pal...@oracle.com/#t I tried to look for it from prior discussions but could not find why. https://lore.kernel.org/all/byapr18mb2791df7e6c0f61e2d8698e8fa0...@byapr18mb2791.namprd18.prod.outlook.com/ Thank you very much! Dongli Zhang Hey Dongli, So VIRTIO_F_IN_ORDER can theoretically improve performance under certain conditions. Whether it can improve performance today, I'm not sure. But, if we can guarantee that all buffers are used by the device in the same order in which they're made available by the driver (enforcing a strict in-order processing and completion of requests), then we can leverage this to our advantage. For example, we could simplify device and driver logic such as not needing complex mechanisms to track the completion of out-of-order requests (reduce request management overhead). Though the need of complex mechanisms to force this data to be in-order kind of defeats this benefit. It could also improve cache utilization since sequential access patterns are more cache-friendly compared to random access patterns. Also, in-order processing is more predictable, making it easier to optimize device and driver performance. E.g. it can allow us to fine-tune things without having to account for the variability of out-of-order completions. But again, the actual performance impact will vary depending on the use case and workload. Scenarios that require high levels of parallelism or where out-of-order completions are efficiently managed, the flexibility of out-of-order processing can still be preferable. Jonah On 3/21/24 08:57, Jonah Palmer wrote: The goal of these patches is to add support to a variety of virtio and vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature indicates that all buffers are used by the device in the same order in which they were made available by the driver. These patches attempt to implement a generalized, non-device-specific solution to support this feature. The core feature behind this solution is a buffer mechanism in the form of GLib's GHashTable. The decision behind using a hash table was to leverage their ability for quick lookup, insertion, and removal operations. Given that our keys are simply numbers of an ordered sequence, a hash table seemed like the best choice for a buffer mechanism. - The strategy behind this implementation is as follows: We know that buffers that are popped from the available ring and enqueued for further processing will always done in the same order in which they were made available by the driver. Given this, we can note their order by assigning the resulting VirtQueueElement a key. This key is a number in a sequence that represents the order in which they were popped from the available ring, relative to the other VirtQueueElements. For example, given 3 "elements" that were popped from the available ring, we assign a key value to them which represents their order (elem0 is popped first, then elem1, then lastly elem2): elem2 -- elem1 -- elem0 ---> Enqueue for processing (key: 2)(key: 1)(key: 0) Then these elements are enqueued for further processing by the host. While most devices will return these completed elements in the same order in which they were enqueued, some devices may not (e.g. virtio-blk). To guarantee that these elements are put on the used ring in the same order in which they were enqueued, we can use a buffering mechanism that keeps track of the next expected sequence number of an element. In other words, if the completed element does not have a key value that matches the next expected sequence number, then we know this element is not in-order and we must stash it away in a hash table until an order can be made. The element's key value is used as the key for placing it in the hash table. If the completed element has a key value that matches the next expected sequence number, then we know this element is in-order and we can push it on the used ring. Then we increment the next expected sequence number and check if the hash table contains an element at this key location. If so, we retrieve this element, push it to the used ring, delete the key-value pair from the hash table, increment the next expected sequence number, and check the hash table again for an element at this new key location. This process is repeated until we're unable to find an element in the hash table to continue the order. So, for example, say the 3 elements we enqueued were completed in the following order: elem1, elem2, elem0. The next expected sequence number is 0: exp-seq-num = 0: elem1 --> elem1.key == exp-seq-num ? --> No, stash it (key: 1) |
Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
Hi Jonah, Would you mind helping explain how does VIRTIO_F_IN_ORDER improve the performance? https://lore.kernel.org/all/20240321155717.1392787-1-jonah.pal...@oracle.com/#t I tried to look for it from prior discussions but could not find why. https://lore.kernel.org/all/byapr18mb2791df7e6c0f61e2d8698e8fa0...@byapr18mb2791.namprd18.prod.outlook.com/ Thank you very much! Dongli Zhang On 3/21/24 08:57, Jonah Palmer wrote: > The goal of these patches is to add support to a variety of virtio and > vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature > indicates that all buffers are used by the device in the same order in > which they were made available by the driver. > > These patches attempt to implement a generalized, non-device-specific > solution to support this feature. > > The core feature behind this solution is a buffer mechanism in the form > of GLib's GHashTable. The decision behind using a hash table was to > leverage their ability for quick lookup, insertion, and removal > operations. Given that our keys are simply numbers of an ordered > sequence, a hash table seemed like the best choice for a buffer > mechanism. > > - > > The strategy behind this implementation is as follows: > > We know that buffers that are popped from the available ring and enqueued > for further processing will always done in the same order in which they > were made available by the driver. Given this, we can note their order > by assigning the resulting VirtQueueElement a key. This key is a number > in a sequence that represents the order in which they were popped from > the available ring, relative to the other VirtQueueElements. > > For example, given 3 "elements" that were popped from the available > ring, we assign a key value to them which represents their order (elem0 > is popped first, then elem1, then lastly elem2): > > elem2 -- elem1 -- elem0 ---> Enqueue for processing > (key: 2)(key: 1)(key: 0) > > Then these elements are enqueued for further processing by the host. > > While most devices will return these completed elements in the same > order in which they were enqueued, some devices may not (e.g. > virtio-blk). To guarantee that these elements are put on the used ring > in the same order in which they were enqueued, we can use a buffering > mechanism that keeps track of the next expected sequence number of an > element. > > In other words, if the completed element does not have a key value that > matches the next expected sequence number, then we know this element is > not in-order and we must stash it away in a hash table until an order > can be made. The element's key value is used as the key for placing it > in the hash table. > > If the completed element has a key value that matches the next expected > sequence number, then we know this element is in-order and we can push > it on the used ring. Then we increment the next expected sequence number > and check if the hash table contains an element at this key location. > > If so, we retrieve this element, push it to the used ring, delete the > key-value pair from the hash table, increment the next expected sequence > number, and check the hash table again for an element at this new key > location. This process is repeated until we're unable to find an element > in the hash table to continue the order. > > So, for example, say the 3 elements we enqueued were completed in the > following order: elem1, elem2, elem0. The next expected sequence number > is 0: > > exp-seq-num = 0: > > elem1 --> elem1.key == exp-seq-num ? --> No, stash it > (key: 1) | > | > v > >|key: 1 - elem1| > > - > exp-seq-num = 0: > > elem2 --> elem2.key == exp-seq-num ? --> No, stash it > (key: 2) | > | > v > >|key: 1 - elem1| >|--| >|key: 2 - elem2| > > - > exp-seq-num = 0: > > elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring > (key: 0) > > exp-seq-num = 1: > > lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, > remove elem from table > | >
[RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
The goal of these patches is to add support to a variety of virtio and vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature indicates that all buffers are used by the device in the same order in which they were made available by the driver. These patches attempt to implement a generalized, non-device-specific solution to support this feature. The core feature behind this solution is a buffer mechanism in the form of GLib's GHashTable. The decision behind using a hash table was to leverage their ability for quick lookup, insertion, and removal operations. Given that our keys are simply numbers of an ordered sequence, a hash table seemed like the best choice for a buffer mechanism. - The strategy behind this implementation is as follows: We know that buffers that are popped from the available ring and enqueued for further processing will always done in the same order in which they were made available by the driver. Given this, we can note their order by assigning the resulting VirtQueueElement a key. This key is a number in a sequence that represents the order in which they were popped from the available ring, relative to the other VirtQueueElements. For example, given 3 "elements" that were popped from the available ring, we assign a key value to them which represents their order (elem0 is popped first, then elem1, then lastly elem2): elem2 -- elem1 -- elem0 ---> Enqueue for processing (key: 2)(key: 1)(key: 0) Then these elements are enqueued for further processing by the host. While most devices will return these completed elements in the same order in which they were enqueued, some devices may not (e.g. virtio-blk). To guarantee that these elements are put on the used ring in the same order in which they were enqueued, we can use a buffering mechanism that keeps track of the next expected sequence number of an element. In other words, if the completed element does not have a key value that matches the next expected sequence number, then we know this element is not in-order and we must stash it away in a hash table until an order can be made. The element's key value is used as the key for placing it in the hash table. If the completed element has a key value that matches the next expected sequence number, then we know this element is in-order and we can push it on the used ring. Then we increment the next expected sequence number and check if the hash table contains an element at this key location. If so, we retrieve this element, push it to the used ring, delete the key-value pair from the hash table, increment the next expected sequence number, and check the hash table again for an element at this new key location. This process is repeated until we're unable to find an element in the hash table to continue the order. So, for example, say the 3 elements we enqueued were completed in the following order: elem1, elem2, elem0. The next expected sequence number is 0: exp-seq-num = 0: elem1 --> elem1.key == exp-seq-num ? --> No, stash it (key: 1) | | v |key: 1 - elem1| - exp-seq-num = 0: elem2 --> elem2.key == exp-seq-num ? --> No, stash it (key: 2) | | v |key: 1 - elem1| |--| |key: 2 - elem2| - exp-seq-num = 0: elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring (key: 0) exp-seq-num = 1: lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, remove elem from table | v |key: 2 - elem2| exp-seq-num = 2: lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring, remove elem from table | v