On 6/1/2016 2:53 PM, Yuanhan Liu wrote:
> On Wed, Jun 01, 2016 at 06:40:41AM +0000, Xie, Huawei wrote:
>>>     /* Retrieve all of the head indexes first to avoid caching issues. */
>>>     for (i = 0; i < count; i++) {
>>> -           desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
>>> -                                   (vq->size - 1)];
>>> +           used_idx = (vq->last_used_idx + i) & (vq->size - 1);
>>> +           desc_indexes[i] = vq->avail->ring[used_idx];
>>> +
>>> +           vq->used->ring[used_idx].id  = desc_indexes[i];
>>> +           vq->used->ring[used_idx].len = 0;
>>> +           vhost_log_used_vring(dev, vq,
>>> +                           offsetof(struct vring_used, ring[used_idx]),
>>> +                           sizeof(vq->used->ring[used_idx]));
>>>     }
>>>  
>>>     /* Prefetch descriptor index. */
>>>     rte_prefetch0(&vq->desc[desc_indexes[0]]);
>>> -   rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);
>>> -
>>>     for (i = 0; i < count; i++) {
>>>             int err;
>>>  
>>> -           if (likely(i + 1 < count)) {
>>> +           if (likely(i + 1 < count))
>>>                     rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
>>> -                   rte_prefetch0(&vq->used->ring[(used_idx + 1) &
>>> -                                                 (vq->size - 1)]);
>>> -           }
>>>  
>>>             pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
>>>             if (unlikely(pkts[i] == NULL)) {
>>> @@ -916,18 +920,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>>>                     rte_pktmbuf_free(pkts[i]);
>>>                     break;
>>>             }
>>> -
>>> -           used_idx = vq->last_used_idx++ & (vq->size - 1);
>>> -           vq->used->ring[used_idx].id  = desc_indexes[i];
>>> -           vq->used->ring[used_idx].len = 0;
>>> -           vhost_log_used_vring(dev, vq,
>>> -                           offsetof(struct vring_used, ring[used_idx]),
>>> -                           sizeof(vq->used->ring[used_idx]));
>>>     }
>> Had tried post-updating used ring in batch,  but forget the perf change.
> I would assume pre-updating gives better performance gain, as we are
> fiddling with avail and used ring together, which would be more cache
> friendly.

The distance between entry for avail ring and used ring are at least 8
cache lines.
The benefit comes from batch updates, if applicable.

>
>> One optimization would be on vhost_log_used_ring.
>> I have two ideas,
>> a) In QEMU side, we always assume use ring will be changed. so that we
>> don't need to log used ring in VHOST.
>>
>> Michael: feasible in QEMU? comments on this?
>>
>> b) We could always mark the total used ring modified rather than entry
>> by entry.
> I doubt it's worthwhile. One fact is that vhost_log_used_ring is
> a non operation in most time: it will take action only in the short
> gap of during live migration.
>
> And FYI, I even tried with all vhost_log_xxx being removed, it showed
> no performance boost at all. Therefore, it's not a factor that will
> impact performance.

I knew this.

>       --yliu
>

Reply via email to