On 9/8/2015 4:21 PM, Tetsuya Mukawa wrote: > On 2015/09/05 1:50, Xie, Huawei wrote: >> There is some format issue with the ascii chart of the tx ring. Update >> that chart. >> Sorry for the trouble. > Hi XIe, > > Thanks for sharing a way to optimize virtio. > I have a few questions. > >> On 9/4/2015 4:25 PM, Xie, Huawei wrote: >>> Hi: >>> >>> Recently I have done one virtio optimization proof of concept. The >>> optimization includes two parts: >>> 1) avail ring set with fixed descriptors >>> 2) RX vectorization >>> With the optimizations, we could have several times of performance boost >>> for purely vhost-virtio throughput. > When you check performance, have you optimized only virtio-net driver? > If so, can we optimize vhost backend(librte_vhost) also using your > optimization way?
We could do some optimization to vhost based on the same vring layout, but as vhost needs to support legacy virtio as well, it couldn't make this assumption. >>> Here i will only cover the first part, which is the prerequisite for the >>> second part. >>> Let us first take RX for example. Currently when we fill the avail ring >>> with guest mbuf, we need >>> a) allocate one descriptor(for non sg mbuf) from free descriptors >>> b) set the idx of the desc into the entry of avail ring >>> c) set the addr/len field of the descriptor to point to guest blank mbuf >>> data area >>> >>> Those operation takes time, and especially step b results in modifed (M) >>> state of the cache line for the avail ring in the virtio processing >>> core. When vhost processes the avail ring, the cache line transfer from >>> virtio processing core to vhost processing core takes pretty much CPU >>> cycles. >>> To solve this problem, this is the arrangement of RX ring for DPDK >>> pmd(for non-mergable case). >>> >>> avail >>> idx >>> + >>> | >>> +----+----+---+-------------+------+ >>> | 0 | 1 | 2 | ... | 254 | 255 | avail ring >>> +-+--+-+--+-+-+---------+---+--+---+ >>> | | | | | | >>> | | | | | | >>> v v v | v v >>> +-+--+-+--+-+-+---------+---+--+---+ >>> | 0 | 1 | 2 | ... | 254 | 255 | desc ring >>> +----+----+---+-------------+------+ >>> | >>> | >>> +----+----+---+-------------+------+ >>> | 0 | 1 | 2 | | 254 | 255 | used ring >>> +----+----+---+-------------+------+ >>> | >>> + >>> Avail ring is initialized with fixed descriptor and is never changed, >>> i.e, the index value of the nth avail ring entry is always n, which >>> means virtio PMD is actually refilling desc ring only, without having to >>> change avail ring. > For example, avail ring is like below. > struct vring_avail { > uint16_t flags; > uint16_t idx; > uint16_t ring[QUEUE_SIZE]; > }; > > My understanding is that virtio-net driver still needs to change > avail_ring.idx, but don't need to change avail_ring.ring[]. > Is this correct? Yes, avail ring is initialized once and never gets updated. It is like virtio frontend is only using descriptor ring. > > Tetsuya > >>> When vhost fetches avail ring, if not evicted, it is always in its first >>> level cache. >>> >>> When RX receives packets from used ring, we use the used->idx as the >>> desc idx. This requires that vhost processes and returns descs from >>> avail ring to used ring in order, which is true for both current dpdk >>> vhost and kernel vhost implementation. In my understanding, there is no >>> necessity for vhost net to process descriptors OOO. One case could be >>> zero copy, for example, if one descriptor doesn't meet zero copy >>> requirment, we could directly return it to used ring, earlier than the >>> descriptors in front of it. >>> To enforce this, i want to use a reserved bit to indicate in order >>> processing of descriptors. >>> >>> For tx ring, the arrangement is like below. Each transmitted mbuf needs >>> a desc for virtio_net_hdr, so actually we have only 128 free slots. >>> >>> >>> >>> >>> ++ >>> >>> || >>> >>> || >>> >>> +-----+-----+-----+--------------+------+------+------+ >>> >>> | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring >>> >>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >>> >>> | | | || | | | >>> >>> v v v || v v v >>> >>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >>> >>> | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for >>> virtio_net_hdr >>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >>> >>> | | | || | | | >>> >>> v v v || v v v >>> >>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+ >>> >>> | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for >>> tx dat >>> >>> >>> >>> /huawei >>> >