@Maxim: after PR327 is merged I'll make the changes to replace scheduler mode with direct pktio mode.
@Honnappa: Right now the sender mode is using one pktout_queue per thread. Can be modified for more interfaces (more pktout_queue) of course.... but I would vote for faster interfaces (40G ?). @Bill: Nice!!!... but kind of tricky to get it right (in limited time). The simple workaround is using 40G interfaces for this benchmark (if I remember correctly from Nokia's results, it goes to around 20 mpps per core on a regular Xeon) On 7 December 2017 at 21:32, Bill Fischofer <bill.fischo...@linaro.org> wrote: > > > On Thu, Dec 7, 2017 at 12:55 PM, Honnappa Nagarahalli > <honnappa.nagaraha...@linaro.org> wrote: >> >> On 7 December 2017 at 08:01, Bogdan Pricope <bogdan.pric...@linaro.org> >> wrote: >> > TX is at line rate. Probably will get RX at line rate in direct mode, >> > too. >> > Problem is how can you see the performance increase/degradation if you >> > can process more than line rate with one core? >> >> Any possibility to add one more port? >> > > The usual way to measure this is to insert a process_packet() routine in the > loop that consumes a configurable number of cycles. Real applications do > more than just RX/TX processing but do something with the packets. The lower > the system overhead the larger the cycle budget process_packet() has while > maintaining line rate. A good benchmarking tool will self-tune this to find > the number of cycles process_packet() can consume at line rate. That's the > measure of efficiency of most interest from a data plane application > perspective. > >> >> > >> > I guess .. enable csum option... ? >> > >> > On 7 December 2017 at 15:46, Maxim Uvarov <maxim.uva...@linaro.org> >> > wrote: >> >> nice. TX is on line rate, right? Next step probably to add RX path >> >> without >> >> scheduler. And we will have good testing environment. >> >> >> >> >> >> On 7 December 2017 at 16:12, Bogdan Pricope <bogdan.pric...@linaro.org> >> >> wrote: >> >>> >> >>> More results with odp_generator in lava setup: >> >>> >> >>> 7.6 mpps (TX) / 5.9 mpps (RX) - api-next with PR313 (Petri): >> >>> 8.3 mpps (TX) / 6.3 mpps (RX) - api-next with PR313 (Petri) + >> >>> remove 1m sleep + replace atomic counters >> >>> 14.8 mpps (TX) / 6.5 mpps (RX) - api-next with PR313 (Petri) + remove >> >>> 1m sleep + replace atomic counters + remove csum >> >>> calculation/validation >> >>> 14.8 mpps (TX) / 6.8 mpps (RX) - master with PR327 (remove 1m sleep + >> >>> replace atomic counters + remove csum calculation/validation) >> >>> >> >>> /Bogdan >> >>> >> >>> >> >>> On 6 December 2017 at 13:49, Maxim Uvarov <maxim.uva...@linaro.org> >> >>> wrote: >> >>> > small update. Double checked that increasing num of desc does not >> >>> > give >> >>> > any >> >>> > effect in odp_generator. >> >>> > >> >>> > Disable check sums in odp_generator increases TX from 7M to 13M pps >> >>> > and >> >>> > RX >> >>> > from 5.9M to 6.1M pps. >> >>> > Because of generator uses predefined packets with calculated >> >>> > checksum - >> >>> > there is no need to enable checksum inside generator. >> >>> > >> >>> > It looks like problem inside DPDK driver itself. >> >>> > >> >>> > For this PR I think we need to merge it together with changes to >> >>> > odp_generator (the same as for l2fwd) to enable hw check sum, >> >>> > which has to be disabled by default. >> >>> > >> >>> > Maxim. >> >>> > >> >>> > >> >>> > On 6 December 2017 at 10:46, Maxim Uvarov <maxim.uva...@linaro.org> >> >>> > wrote: >> >>> >> >> >>> >> skip this message. I will recheck. Pushed to lava wrong branch. >> >>> >> >> >>> >> On 6 December 2017 at 10:42, Maxim Uvarov <maxim.uva...@linaro.org> >> >>> >> wrote: >> >>> >>> >> >>> >>> Ilias was right yesterday. If number of descriptors increased to >> >>> >>> 1024 >> >>> >>> then TX became again 10M. >> >>> >>> >> >>> >>> + ret = rte_eth_tx_queue_setup(port_id, i, >> >>> >>> + >> >>> >>> dev_info.tx_desc_lim.nb_max >> >>> >>> > 1024 ? 1024 : dev_info.tx_desc_lim.nb_max, >> >>> >>> >> >>> >>> rte_eth_dev_socket_id(port_id), >> >>> >>> txconf); >> >>> >>> >> >>> >>> + ret = rte_eth_rx_queue_setup(port_id, i, >> >>> >>> + >> >>> >>> dev_info.rx_desc_lim.nb_max >> >>> >>> > 1024 ? 1024 : dev_info.rx_desc_lim.nb_max, >> >>> >>> >> >>> >>> rte_eth_dev_socket_id(port_id), >> >>> >>> NULL, >> >>> >>> pkt_dpdk->pkt_pool); >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> Maxim. >> >>> >>> >> >>> >>> On 5 December 2017 at 11:20, Elo, Matias (Nokia - FI/Espoo) >> >>> >>> <matias....@nokia.com> wrote: >> >>> >>>> >> >>> >>>> When I tested enabling HW checksum with Fortville NICs (i40e) the >> >>> >>>> slower >> >>> >>>> driver path alone caused ~20% throughput drop on l2fwd test. This >> >>> >>>> was >> >>> >>>> without actually calculating the checksums, I simply forced the >> >>> >>>> slower >> >>> >>>> driver path (no vectorization). >> >>> >>>> >> >>> >>>> -Matias >> >>> >>>> >> >>> >>>> >> >>> >>>> > On 5 Dec 2017, at 8:59, Bogdan Pricope >> >>> >>>> > <bogdan.pric...@linaro.org> >> >>> >>>> > wrote: >> >>> >>>> > >> >>> >>>> > On RX side is kind-of expected result since it uses scheduler >> >>> >>>> > mode. >> >>> >>>> > >> >>> >>>> > On TX side there is this drop from 10 mpps to 7.69 mpps that is >> >>> >>>> > unexpected. >> >>> >>>> > >> >>> >>>> > So Petri, when you said: >> >>> >>>> > "DPDK uses less optimized driver code (on Intel NICs at least) >> >>> >>>> > when >> >>> >>>> > any of the L4 checksum offloads is enabled." >> >>> >>>> > >> >>> >>>> > you were referring to this kind of drop in performance? >> >>> >>>> > >> >>> >>>> > There is that 'folklore' that SW csum is faster on small >> >>> >>>> > packets >> >>> >>>> > while >> >>> >>>> > HW csum is faster on bigger packets. Do you have this kind of >> >>> >>>> > data? >> >>> >>>> > >> >>> >>>> > Anyway, for this particular case (odp_generator), since UDP >> >>> >>>> > header/payload is not changing during the test (for now), csum >> >>> >>>> > is >> >>> >>>> > calculated only once at the beginning of the test: so we are >> >>> >>>> > comparing >> >>> >>>> > HW IPv4 + HW UDP csum vs. SW IPv4 csum.... yet, the differences >> >>> >>>> > in >> >>> >>>> > performance is huge... >> >>> >>>> > >> >>> >>>> > >> >>> >>>> > On 4 December 2017 at 20:37, Maxim Uvarov >> >>> >>>> > <maxim.uva...@linaro.org> >> >>> >>>> > wrote: >> >>> >>>> >> I added isocpus and mounted huge page TX became more stable at >> >>> >>>> >> 7.6M. >> >>> >>>> >> But >> >>> >>>> >> anyway it's better to test performance for this PR because >> >>> >>>> >> previous >> >>> >>>> >> speed was 10M. >> >>> >>>> >> >> >>> >>>> >> Maxim. >> >>> >>>> >> >> >>> >>>> >> On 12/04/17 19:42, Honnappa Nagarahalli wrote: >> >>> >>>> >>> Can you run with Linux-DPDK in ODP 2.0? >> >>> >>>> >>> >> >>> >>>> >>> On 4 December 2017 at 09:54, Maxim Uvarov >> >>> >>>> >>> <maxim.uva...@linaro.org> >> >>> >>>> >>> wrote: >> >>> >>>> >>>> after clean patches apply and fix in run scripts I made it >> >>> >>>> >>>> run. >> >>> >>>> >>>> >> >>> >>>> >>>> But results is really bad. --enable-dpdk-zero-copy >> >>> >>>> >>>> >> >>> >>>> >>>> TX rate is: >> >>> >>>> >>>> 7673155 pps >> >>> >>>> >>>> >> >>> >>>> >>>> RX rate is: >> >>> >>>> >>>> 5989846 pps >> >>> >>>> >>>> >> >>> >>>> >>>> >> >>> >>>> >>>> Before patch PR 313 TX was 10M pps. >> >>> >>>> >>>> >> >>> >>>> >>>> I re run task and TX is 3.3M pps. All tests are single core. >> >>> >>>> >>>> So >> >>> >>>> >>>> something strange happens in lava or this PR. >> >>> >>>> >>>> >> >>> >>>> >>>> Maxim. >> >>> >>>> >>>> >> >>> >>>> >>>> >> >>> >>>> >>>> On 12/04/17 17:03, Bogdan Pricope wrote: >> >>> >>>> >>>>> On TX >> >>> >>>> >>>>> (https://lng.validation.linaro.org/scheduler/job/23252.0) >> >>> >>>> >>>>> I >> >>> >>>> >>>>> see: >> >>> >>>> >>>>> >> >>> >>>> >>>>> ODP_REPO='https://github.com/muvarov/odp' >> >>> >>>> >>>>> ODP_BRANCH='api-next' >> >>> >>>> >>>>> >> >>> >>>> >>>>> >> >>> >>>> >>>>> On RX >> >>> >>>> >>>>> (https://lng.validation.linaro.org/scheduler/job/23252.1) >> >>> >>>> >>>>> I >> >>> >>>> >>>>> see: >> >>> >>>> >>>>> >> >>> >>>> >>>>> ODP_REPO='https://github.com/muvarov/odp' >> >>> >>>> >>>>> ODP_BRANCH='devel/api-next_shsum' >> >>> >>>> >>>>> >> >>> >>>> >>>>> >> >>> >>>> >>>>> or are you referring to other test? >> >>> >>>> >>>>> >> >>> >>>> >>>>> >> >>> >>>> >>>>> On 4 December 2017 at 15:53, Maxim Uvarov >> >>> >>>> >>>>> <maxim.uva...@linaro.org> wrote: >> >>> >>>> >>>>>> >> >>> >>>> >>>>>> >> >>> >>>> >>>>>> On 4 December 2017 at 15:11, Bogdan Pricope >> >>> >>>> >>>>>> <bogdan.pric...@linaro.org> >> >>> >>>> >>>>>> wrote: >> >>> >>>> >>>>>>> >> >>> >>>> >>>>>>> You need to put 313 on TX side (not RX). >> >>> >>>> >>>>>> >> >>> >>>> >>>>>> >> >>> >>>> >>>>>> >> >>> >>>> >>>>>> both rx and tx have patches from 313. l2fwd works on recv >> >>> >>>> >>>>>> side. >> >>> >>>> >>>>>> Generator >> >>> >>>> >>>>>> does not work. >> >>> >>>> >>>>>> >> >>> >>>> >>>>>> Maxim. >> >>> >>>> >>>>>> >> >>> >>>> >>>>>> >> >>> >>>> >>>>>>> >> >>> >>>> >>>>>>> >> >>> >>>> >>>>>>> On 4 December 2017 at 13:19, Savolainen, Petri (Nokia - >> >>> >>>> >>>>>>> FI/Espoo) >> >>> >>>> >>>>>>> <petri.savolai...@nokia.com> wrote: >> >>> >>>> >>>>>>>> Is the DPDK version 17.08 ? Other versions might not >> >>> >>>> >>>>>>>> work >> >>> >>>> >>>>>>>> properly. >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> -Petri >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> From: Maxim Uvarov [mailto:maxim.uva...@linaro.org] >> >>> >>>> >>>>>>>> Sent: Monday, December 04, 2017 1:10 PM >> >>> >>>> >>>>>>>> To: Savolainen, Petri (Nokia - FI/Espoo) >> >>> >>>> >>>>>>>> <petri.savolai...@nokia.com> >> >>> >>>> >>>>>>>> Cc: Bogdan Pricope <bogdan.pric...@linaro.org>; >> >>> >>>> >>>>>>>> lng-odp-forward >> >>> >>>> >>>>>>>> <lng-odp@lists.linaro.org> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> Subject: Re: [lng-odp] odp dpdk >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> 313 does not work also: >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> https://lng.validation.linaro.org/scheduler/job/23242.1 >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> I will replace RX side to l2fwd and see that will be >> >>> >>>> >>>>>>>> there. >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> Maxim. >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> On 4 December 2017 at 13:46, Savolainen, Petri (Nokia - >> >>> >>>> >>>>>>>> FI/Espoo) >> >>> >>>> >>>>>>>> <petri.savolai...@nokia.com> wrote: >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> Maxim, try https://github.com/Linaro/odp/pull/313 It has >> >>> >>>> >>>>>>>> been >> >>> >>>> >>>>>>>> tested to >> >>> >>>> >>>>>>>> fix >> >>> >>>> >>>>>>>> checksum insert for 10/40GE Intel NICs. >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> -Petri >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>>> -----Original Message----- >> >>> >>>> >>>>>>>>> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] >> >>> >>>> >>>>>>>>> On >> >>> >>>> >>>>>>>>> Behalf Of >> >>> >>>> >>>>>>>>> Bogdan Pricope >> >>> >>>> >>>>>>>>> Sent: Monday, December 04, 2017 12:21 PM >> >>> >>>> >>>>>>>>> To: Maxim Uvarov <maxim.uva...@linaro.org> >> >>> >>>> >>>>>>>>> Cc: lng-odp-forward <lng-odp@lists.linaro.org> >> >>> >>>> >>>>>>>>> Subject: Re: [lng-odp] odp dpdk >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> I suspect this is actually caused by csum issue in TX >> >>> >>>> >>>>>>>>> side: >> >>> >>>> >>>>>>>>> on >> >>> >>>> >>>>>>>>> RX, >> >>> >>>> >>>>>>>>> socket pktio does not validate csum (and accept the >> >>> >>>> >>>>>>>>> packets) >> >>> >>>> >>>>>>>>> but on >> >>> >>>> >>>>>>>>> dpdk pktio the csum is validated and packets are >> >>> >>>> >>>>>>>>> dropped. >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> I am not seeing this in my setup because default >> >>> >>>> >>>>>>>>> txq_flags >> >>> >>>> >>>>>>>>> for >> >>> >>>> >>>>>>>>> igb >> >>> >>>> >>>>>>>>> driver (1G interface) is >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> .txq_flags = 0 >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> while for ixgbe (10G interface) is: >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | >> >>> >>>> >>>>>>>>> ETH_TXQ_FLAGS_NOOFFLOADS, >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> /B >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> >> >>> >>>> >>>>>>>>> On 1 December 2017 at 23:47, Maxim Uvarov >> >>> >>>> >>>>>>>>> <maxim.uva...@linaro.org> >> >>> >>>> >>>>>>>>> wrote: >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> Looking to dpdk pktio support and generator. It looks >> >>> >>>> >>>>>>>>>> like >> >>> >>>> >>>>>>>>>> receive >> >>> >>>> >>>>>>>>>> part >> >>> >>>> >>>>>>>>>> is broken. If for receive I use sockets it works well >> >>> >>>> >>>>>>>>>> but >> >>> >>>> >>>>>>>>>> receive >> >>> >>>> >>>>>>>>>> with >> >>> >>>> >>>>>>>>>> dpdk does not get any packets. For both master and >> >>> >>>> >>>>>>>>>> api-next. >> >>> >>>> >>>>>>>>>> Can >> >>> >>>> >>>>>>>>>> somebody confirm please that it's so. Lava is not >> >>> >>>> >>>>>>>>>> supper >> >>> >>>> >>>>>>>>>> friendly to >> >>> >>>> >>>>>>>>>> debug issue. >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> 1. Recv >> >>> >>>> >>>>>>>>>> odp_generator -I 0 -m r -c 0x4 >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> https://lng.validation.linaro.org/scheduler/job/23206.1 >> >>> >>>> >>>>>>>>>> Network devices using DPDK-compatible driver >> >>> >>>> >>>>>>>>>> ============================================ >> >>> >>>> >>>>>>>>>> 0000:07:00.1 '82599ES 10-Gigabit SFI/SFP+ Network >> >>> >>>> >>>>>>>>>> Connection >> >>> >>>> >>>>>>>>>> 10fb' >> >>> >>>> >>>>>>>>>> drv=igb_uio unused= >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> 2. Send >> >>> >>>> >>>>>>>>>> odp_generator -I 0 --srcmac 38:ea:a7:93:98:94 --dstmac >> >>> >>>> >>>>>>>>>> 38:ea:a7:93:83:a0 >> >>> >>>> >>>>>>>>>> --srcip 192.168.100.2 --dstip 192.168.100.1 -m u -i 0 >> >>> >>>> >>>>>>>>>> -c >> >>> >>>> >>>>>>>>>> 0x8 >> >>> >>>> >>>>>>>>>> -p 18 -e >> >>> >>>> >>>>>>>>>> 5000 -f 5001 -n 800000000 >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> https://lng.validation.linaro.org/scheduler/job/23206.0 >> >>> >>>> >>>>>>>>>> >> >>> >>>> >>>>>>>>>> Thank you, >> >>> >>>> >>>>>>>>>> Maxim. >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>>>> >> >>> >>>> >>>>>> >> >>> >>>> >>>>>> >> >>> >>>> >>>> >> >>> >>>> >> >> >>> >>>> >> >>> >>> >> >>> >> >> >>> > >> >> >> >> > >