On 18/04/17 23:44, Bodireddy, Bhanuprakash wrote:
Hi Bhanuprakash,

I was doing some Physical to Virtual tests, and whenever the number of flows
reaches the rx batch size performance dropped a lot. I created an
experimental patch where I added an intermediate queue and flush it at the
end of the rx batch.

When I found your patch I decided to give it try to see how it behaves.
I also modified you patch in such a way that it will flush the queue after every
call to dp_netdev_process_rxq_port().
I presume you were doing like below in the pmd_thread_main receive loop?

for (i = 0; i < poll_cnt; i++) {
             dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
                                        poll_list[i].port_no);
             dp_netdev_drain_txq_ports(pmd);
         }
Yes this is exactly what I did. It would be interesting to see what IXIA thinks
of this change ;)
Here are some pkt forwarding stats for the Physical to Physical scenario, for
two 82599ES 10G port with 64 byte packets being send at wire speed:

Number      plain                patch +
of flows  git clone    patch      flush
========  =========  =========  =========
   10       10727283   13527752   13393844
   32        7042253   11285572     11228799
   50        7515491    9642650      9607791
  100        5838699    9461239      9430730
  500        5285066    7859123      7845807
1000        5226477    7146404      7135601
Thanks for sharing the numbers, I do agree with your findings and I saw very 
similar results with our v3 patch.
In any case we see significant throughput improvement with the patch.


I do not have an IXIA to do the latency tests you performed, however I do
have a XENA tester which has a basic latency measurement feature.
I used the following script to get the latency numbers:

https://github.com/chaudron/XenaPythonLib/blob/latency/examples/latenc
y.py
Thanks for pointing this, it could be useful for users with no IXIA setup.


As you can see in the numbers below, the default queue introduces quite
some latency, however doing the flush every rx batch brings the latency down
to almost the original values. The results mimics your test case 2, sending 10G
traffic @ wire speed:

   ===== GIT CLONE
   Pkt size  min(ns)  avg(ns)  max(ns)
    512      4,631      5,022    309,914
   1024      5,545      5,749    104,294
   1280      5,978      6,159     45,306
   1518      6,419      6,774    946,850

   ===== PATCH
   Pkt size  min(ns)  avg(ns)  max(ns)
    512      4,928    492,228  1,995,026
   1024      5,761    499,206  2,006,628
   1280      6,186    497,975  1,986,175
   1518      6,579    494,434  2,005,947

   ===== PATCH + FLUSH
   Pkt size  min(ns)  avg(ns)  max(ns)
    512      4,711      5,064    182,477
   1024      5,601      5,888    701,654
   1280      6,018      6,491    533,037
   1518      6,467      6,734    312,471
The latency numbers above are very encouraging indeed. However with RFC2544 
tests especially on IXIA, we do have lot of parameters to tune.
I see that the latency stats fluctuate a lot with change in acceptable 'Frame 
Loss'.  I am not expert of IXIA myself, but trying to figure out acceptable
settings and trying to measure latency/throughput.
I just figured out that XENA also has the RFC2544 tests, and I decided to give it a shot. I also noticed that if packets get dropped the results get really off. In the end I did the tests with 99% wire speed @10G no packets got lost and the results are stable.
Here are the results for test 2, 30 flows, 512 byte packets:

         Avg     Min     Max
PLAIN    15.397   5.288  880.598
PATCH    28.521  11.358  925.001
FLUSH    15.958   5.352  917.889

Maybe it will be good to re-run your latency tests with the flush for every rx
batch. This might get ride of your huge latency while still increasing the
performance in the case the rx batch shares the same egress port.

The overall patchset looks fine to me, see some comments inline.
Thanks for reviewing the patch.

+#define MAX_LOOP_TO_DRAIN 128
Is defining this inline ok?
I see that this convention is used in ovs.

           NULL,
           NULL,
           netdev_dpdk_vhost_reconfigure,
-        netdev_dpdk_vhost_rxq_recv);
+        netdev_dpdk_vhost_rxq_recv,
+        NULL);
We need this patch even more in the vhost case as there is an even bigger
drop in performance when we exceed the rx batch size. I measured around
40%, when reducing the rx batch size to 4, and using 1 vs 5 flows (single PMD).
Completely Agree. Infact we did a quick patch doing batching for vhost ports as 
well and found significant performance improvement(though it's not thoroughly 
tested for all corner cases).
We have that in our backlog and we will trying posting that patch as an RFC 
atleast to get feedback from the community.
Thanks! looking forward to it. Will definitely review, and test it!
-Bhanuprakash.


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to