[dpdk-dev] [PATCH] The VMXNET3 PMD can't receive packet suddenly after a lot of traffic coming in. The root cause is due to mbuf allocation fail in vmxnet3_post_rx_bufs() and there is no error handlin
How about the below changes? I have been using below changes and helping to resolve the issue. === = dpdk/lib/librte_pmd_vmxnet3/vmxnet3_ring.h#3 edit (text) = @@ -155,10 +155,11 @@ typedef struct vmxnet3_tx_queue { struct vmxnet3_rxq_stats { uint64_t drop_total; uint64_t drop_err; uint64_t drop_fcs; uint64_t rx_buf_alloc_failure; +uint64_t rx_buf_replenish; }; typedef struct vmxnet3_rx_queue { struct rte_mempool *mp; struct vmxnet3_hw *hw; = dpdk/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c#5 edit (text) = @@ -645,10 +645,32 @@ rcd_done: break; } } } +/* VMXNET3 + * In the above loop, vmxnet3_post_rx_bufs would fai if all the mbufs currently allocated. + * In such scenarios where hw device hasn't left with any of 'rx' descriptors, packets from + * network will not be 'DMA'd to driver. While the only way to refresh 'rxd' back to hw is + * though above i.e. when packet is received from hw. So, there is potential dead-lock. + * + * Now, to break the deadlock, vmxnet3_post_rx_bufs() is triggered below when the poll + * goes empty 'rcd'. vmxnet3_post_rx_bufs() is no-op if all the descriptors are allocated + * in hw + */ + +if (rcd->gen != rxq->comp_ring.gen) { + ring_idx = (uint8_t)((rcd->rqID == rxq->qid1) ? 0 : 1); +if (vmxnet3_post_rx_bufs(rxq, ring_idx) > 0 ) { + if (unlikely(rxq->shared->ctrl.updateRxProd)) { + VMXNET3_WRITE_BAR0_REG(hw, rxprod_reg[ring_idx] + (rxq->queue_id * VMXNET3_REG_ALIGN), + rxq->cmd_ring[ring_idx].next2fill); +} +rxq->stats.rx_buf_replenish++; +} +} + return (nb_rx); } === -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger Sent: 23 July 2015 AM 10:58 To: mac_leehk at yahoo.com.hk Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] The VMXNET3 PMD can't receive packet suddenly after a lot of traffic coming in. The root cause is due to mbuf allocation fail in vmxnet3_post_rx_bufs() and there is no error handling when it is called from vmxnet3_recv_pkts(). Th... On Thu, 23 Jul 2015 09:48:55 +0800 mac_leehk at yahoo.com.hk wrote: > From: marco Thank you for addressing a real bug. But there are several issues with the patch as submitted: * the standard way to handle allocation failure in network drivers is to drop the received packet and reuse the available data buffer (mbuf) for the next packet. It looks like your code would just stop receiving which could cause deadlock. * the mail is formatted in a manner than is incompatible with merging into git. All submissions should have a short < 60 character Subject with a summary followed by a description. I don't know what mail client you used but everything is smashed into the Subject. * all patches require a Signed-off-by with a real name for Developer's Certificate Of Origin * the style is wrong, indentation is a mess please indent with tabs not spaces. * avoid extra comments, often in code too many comments are worse than too few Please rework your patch and resubmit it. > drivers/net/vmxnet3/vmxnet3_rxtx.c | 54 > +++- > 1 file changed, 53 insertions(+), 1 deletion(-) mode change 100644 > => 100755 drivers/net/vmxnet3/vmxnet3_rxtx.c > > diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c > b/drivers/net/vmxnet3/vmxnet3_rxtx.c > old mode 100644 > new mode 100755 > index 39ad6ef..d560bbb > --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c > +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c > @@ -421,6 +421,51 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf > **tx_pkts, > return nb_tx; > } > > +static inline void > +vmxnet3_renew_desc(vmxnet3_rx_queue_t *rxq, uint8_t ring_id,struct > +rte_mbuf *mbuf) { > + uint32_t val = 0; > + struct vmxnet3_cmd_ring *ring = &rxq->cmd_ring[ring_id]; > + > + struct Vmxnet3_RxDesc *rxd; > + vmxnet3_buf_info_t *buf_info = &ring->buf_info[ring->next2fill]; > + > + rxd = (struct Vmxnet3_RxDesc *)(ring->base + ring->next2fill); > + > + if (ring->rid == 0) { > + /* Usually: One HEAD type buf per packet > + * val = (ring->next2fill % rxq->hw->bufs_per_pkt) ? > + * VMXNET3_RXD_BTYPE_BODY : VMXNET3_RXD_BTYPE_HEAD; > + */ > + > + /* We use single packet buffer so all heads here */ > + val = VMXNET3_RXD_BTYPE_HEAD; > + } else { > + /* All BODY type buffers for 2nd ring; which won't be used at all by > ESXi *
[dpdk-dev] KNI performance numbers...
Hi, I am running DPDP KNI application on linux (3.18 kernel) VM (ESXi 5.5), directly connected to another linux box to measure throughput using iperf tool. Link speed: 1Gbps. Maximum throughput I get is 50% with 1470 Bytes. With 512B pkt sizes, throughput drops to 282 Mbps. Tried using KNI loopback modes (and traffic from Ixia), but no change in throughput. KNI is running in single thread mode. One lcore for rx, one for tx and another fir kni thread. Is the result expected? Has anybody got better numbers? Appreciate for input and relevant info. Thanks, -Vithal
[dpdk-dev] Support for MS Hyper-v hypervisor
Hi, Would like to know whether present released versions of DPDK (1.7.x) supports MS hyper-v hypervisor. DPDK roadmap document has it in the list for Version 2.1. Please confirm if we need to wait for the support till the release of DPDK 2.1. Thanks, -Vithal
[dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7
Hi Neil, While most of the newer CPUs supports ssse3, found a I7 not supporting it. So, DPDK can't run these CPUs? Is this restriction acceptable? -sh-3.2$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz stepping: 5 cpu MHz : 2660.068 cache size : 8192 KB physical id : 0 siblings: 8 core id : 0 cpu cores : 4 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm Thanks, -Vithal -Original Message- From: Neil Horman [mailto:nhor...@tuxdriver.com] Sent: Thursday, February 19, 2015 6:13 PM To: Vithal S Mohare Cc: dev at dpdk.org Subject: Re: [dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7 On Wed, Feb 18, 2015 at 04:09:25AM +0000, Vithal S Mohare wrote: > Ok, crash, as expected. So, now dpdk mandates either AVX2 or SSSE2 > supported CPUs. OR applications needs to handle it run-time. > No, sse3 is the minimum, but I think thats been the case for quite some time now, I think. Neil > Thanks, > -Vithal > > -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Tuesday, February 17, 2015 6:32 PM > To: Vithal S Mohare > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7 > > On Tue, Feb 17, 2015 at 08:39:22AM +, Vithal S Mohare wrote: > > Hi, > > > > I am trying to use rte_memcpy optimization patch along with dpdk version > > 1.7. With the patch, while dpdk itself is compiled, applications failed > > with below error: > > --- > > include/rte_memcpy.h:629:2: error: implicit declaration of function > > '_mm_alignr_epi8' [-Werror=implicit-function-declaration] > > /home/vithals/adu_src/build/x-men_dev/Default/shumway/infra/dpdk/shumway_obj/lib/../include/rte_memcpy.h:629:2: > > error: incompatible type for argument 2 of '_mm_storeu_si128' > > --- > > > > After including -mssse3 flags, compilation (cross compiled for a x86 linux > > based platform) went through. Now the question is, when this binary is > > loaded on system that doesn't support SSSE3 instruction set (but just sse2 > > etc), what would be the behavior? > > > A crash. You'll attempt to send an unknown binary instruction into the > execution pipeline and the processor will fault. > > Neil > > >
[dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7
Ok, crash, as expected. So, now dpdk mandates either AVX2 or SSSE2 supported CPUs. OR applications needs to handle it run-time. Thanks, -Vithal -Original Message- From: Neil Horman [mailto:nhor...@tuxdriver.com] Sent: Tuesday, February 17, 2015 6:32 PM To: Vithal S Mohare Cc: dev at dpdk.org Subject: Re: [dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7 On Tue, Feb 17, 2015 at 08:39:22AM +, Vithal S Mohare wrote: > Hi, > > I am trying to use rte_memcpy optimization patch along with dpdk version 1.7. > With the patch, while dpdk itself is compiled, applications failed with > below error: > --- > include/rte_memcpy.h:629:2: error: implicit declaration of function > '_mm_alignr_epi8' [-Werror=implicit-function-declaration] > /home/vithals/adu_src/build/x-men_dev/Default/shumway/infra/dpdk/shumway_obj/lib/../include/rte_memcpy.h:629:2: > error: incompatible type for argument 2 of '_mm_storeu_si128' > --- > > After including -mssse3 flags, compilation (cross compiled for a x86 linux > based platform) went through. Now the question is, when this binary is > loaded on system that doesn't support SSSE3 instruction set (but just sse2 > etc), what would be the behavior? > A crash. You'll attempt to send an unknown binary instruction into the execution pipeline and the processor will fault. Neil
[dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7
Hi, I am trying to use rte_memcpy optimization patch along with dpdk version 1.7. With the patch, while dpdk itself is compiled, applications failed with below error: --- include/rte_memcpy.h:629:2: error: implicit declaration of function '_mm_alignr_epi8' [-Werror=implicit-function-declaration] /home/vithals/adu_src/build/x-men_dev/Default/shumway/infra/dpdk/shumway_obj/lib/../include/rte_memcpy.h:629:2: error: incompatible type for argument 2 of '_mm_storeu_si128' --- After including -mssse3 flags, compilation (cross compiled for a x86 linux based platform) went through. Now the question is, when this binary is loaded on system that doesn't support SSSE3 instruction set (but just sse2 etc), what would be the behavior? Thanks, -Vithal
[dpdk-dev] Callbacks after buffer (mbuf) sent out
To elaborate more... Assuming an example of vmxnet3 driver, at present, when rte_eth_tx_burst() is called to send mbufs, vmxnet3 driver calls vmxnet3_tq_tx_complete() to free up completed buffers. Instead, wondering, if vmxnet3 (or corresponding driver) can trigger application callback for the buffer (mbuf) which gets transmitted out of n/w interface. Application callback can decide now decide whether to reuse the mbuf and continue multicasting on other tunnels. Thanks, -Vithal -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Vithal S Mohare Sent: Tuesday, January 13, 2015 9:21 AM To: dev at dpdk.org Subject: [dpdk-dev] Callbacks after buffer (mbuf) sent out Hi, I am looking for application callbacks after mbufs are sent (tx) out successfully. One of the use cases is for async multicast (over different gre tunnels etc). Using direct/indirect buffers along with ref-count itself is not enough, as actual 'pkt-data' itself changes while flooding on list of tunnels. Thanks, -Vithal
[dpdk-dev] Callbacks after buffer (mbuf) sent out
Hi, I am looking for application callbacks after mbufs are sent (tx) out successfully. One of the use cases is for async multicast (over different gre tunnels etc). Using direct/indirect buffers along with ref-count itself is not enough, as actual 'pkt-data' itself changes while flooding on list of tunnels. Thanks, -Vithal
[dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support
Agree. As the mbuf is already received in the rx-q, may not yield great advantage. On side note, any plans to support RSS for L2 packets ? -Original Message- From: Bruce Richardson [mailto:bruce.richard...@intel.com] Sent: Tuesday, December 23, 2014 3:00 PM To: Vithal S Mohare Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support On Tue, Dec 23, 2014 at 04:23:21AM +, Vithal S Mohare wrote: > Hi Bruce, > > > For example, for a port type that does not support RSS, a callback on RX can > be configured to calculate a hash in software. > > > Wondering if this callback will also be useful to bridge the gap of no RSS > support for L2 packets. i.e. in the rx call-back handler, can applications > calculate hash and feed it back so that spraying happens based on this? Now, > all pure L2 packets (e.g. arp pkts) comes to rx-q 0 of the 'port'. Adding > callback to [port][rx-q:0] would help? > > Thanks, > -Vithal Yes, that could work. The downside is that it is no faster than having an app do the calculation itself, it's just perhaps a little easier to work with in the app. /Bruce > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson > Sent: Monday, December 22, 2014 10:17 PM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support > > This RFC is for a small addition to the ethdev library, to add in support for > callbacks at the RX and TX stages. This allows packet processing to be done > on packets before they get returned to applications using rte_eth_rx_burst > call. > > Use case: the first use case for this is to enable a consistent set of > packets mbufs to be received by applications irrespective of the NIC used to > receive those. For example, for a port type that does not support RSS, a > callback on RX can be configured to calculate a hash in software. > Similarly, this mechanism can be used to add other information to mbufs as > they are received, such as timestamps or sequence numbers, without cluttering > up the main packet processing path with checks for whether packets have these > fields filled in or not. > A second use case is ease of intrumenting existing code. The example > application shows how combining a timestamp insertion callback on RX can be > paired with a latency calculation callback on TX to easily instrument any > application for packet latency. > A third use case is to potentially extend existing NIC capabilities beyond > what is currently supported. For example, where flow director capabilities > can match up to a certain limit of flows - in the thousands, in the case of > NICs using the ixgbe driver - a callback can extend this to potentially > millions of flows by using a software hash table lookup inline for packets > that missing the hardware lookup filters. It would all appear transparent to > the packet handling code in the main application. > > Future extensions: in future the ethdev library can be extended to provide a > standard set of callbacks for use by drivers. > > For now this patch set is RFC and still needs additional work for creating a > remove function for callbacks and to add in additional testing code. > Since this adds in new code into the critical data path, I have run some > performance tests using testpmd with the ixgbe vector drivers (i.e. the > fastest, fast-path we have :-) ). Performance drops due to this patch seems > minimal to non-existant, rough tests on my system indicate a drop of perhaps > 1%. > > All feedback welcome. > > Bruce Richardson (3): > ethdev: rename callbacks field to intr_cbs > ethdev: Add in data rxtx callback support > examples: example showing use of callbacks. > > app/test/virtual_pmd.c | 2 +- > examples/rxtx_callbacks/Makefile | 57 + > examples/rxtx_callbacks/basicfwd.c | 222 > + > examples/rxtx_callbacks/basicfwd.h | 46 +++ > lib/librte_ether/rte_ethdev.c | 103 +-- > lib/librte_ether/rte_ethdev.h | 125 ++- > lib/librte_pmd_bond/rte_eth_bond_api.c | 2 +- > 7 files changed, 543 insertions(+), 14 deletions(-) create mode > 100644 examples/rxtx_callbacks/Makefile create mode 100644 > examples/rxtx_callbacks/basicfwd.c > create mode 100644 examples/rxtx_callbacks/basicfwd.h > > -- > 1.9.3 >
[dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support
Hi Bruce, For example, for a port type that does not support RSS, a callback on RX can be configured to calculate a hash in software. Wondering if this callback will also be useful to bridge the gap of no RSS support for L2 packets. i.e. in the rx call-back handler, can applications calculate hash and feed it back so that spraying happens based on this? Now, all pure L2 packets (e.g. arp pkts) comes to rx-q 0 of the 'port'. Adding callback to [port][rx-q:0] would help? Thanks, -Vithal -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson Sent: Monday, December 22, 2014 10:17 PM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support This RFC is for a small addition to the ethdev library, to add in support for callbacks at the RX and TX stages. This allows packet processing to be done on packets before they get returned to applications using rte_eth_rx_burst call. Use case: the first use case for this is to enable a consistent set of packets mbufs to be received by applications irrespective of the NIC used to receive those. For example, for a port type that does not support RSS, a callback on RX can be configured to calculate a hash in software. Similarly, this mechanism can be used to add other information to mbufs as they are received, such as timestamps or sequence numbers, without cluttering up the main packet processing path with checks for whether packets have these fields filled in or not. A second use case is ease of intrumenting existing code. The example application shows how combining a timestamp insertion callback on RX can be paired with a latency calculation callback on TX to easily instrument any application for packet latency. A third use case is to potentially extend existing NIC capabilities beyond what is currently supported. For example, where flow director capabilities can match up to a certain limit of flows - in the thousands, in the case of NICs using the ixgbe driver - a callback can extend this to potentially millions of flows by using a software hash table lookup inline for packets that missing the hardware lookup filters. It would all appear transparent to the packet handling code in the main application. Future extensions: in future the ethdev library can be extended to provide a standard set of callbacks for use by drivers. For now this patch set is RFC and still needs additional work for creating a remove function for callbacks and to add in additional testing code. Since this adds in new code into the critical data path, I have run some performance tests using testpmd with the ixgbe vector drivers (i.e. the fastest, fast-path we have :-) ). Performance drops due to this patch seems minimal to non-existant, rough tests on my system indicate a drop of perhaps 1%. All feedback welcome. Bruce Richardson (3): ethdev: rename callbacks field to intr_cbs ethdev: Add in data rxtx callback support examples: example showing use of callbacks. app/test/virtual_pmd.c | 2 +- examples/rxtx_callbacks/Makefile | 57 + examples/rxtx_callbacks/basicfwd.c | 222 + examples/rxtx_callbacks/basicfwd.h | 46 +++ lib/librte_ether/rte_ethdev.c | 103 +-- lib/librte_ether/rte_ethdev.h | 125 ++- lib/librte_pmd_bond/rte_eth_bond_api.c | 2 +- 7 files changed, 543 insertions(+), 14 deletions(-) create mode 100644 examples/rxtx_callbacks/Makefile create mode 100644 examples/rxtx_callbacks/basicfwd.c create mode 100644 examples/rxtx_callbacks/basicfwd.h -- 1.9.3
[dpdk-dev] Walk through a given mbuf-pool elements
Hi Konstantin, Thanks for the reply. I was trying to find more intuitive one compare to rte_mempool_obj_iter(). Once mempool is created with objects, shouldn't we able to walk-through by just passing 'mp' object and avoiding other params like vaddr, elt_size etc. For now, I could use callback called during mempool creation itself. Thanks, -Vithal -Original Message- From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com] Sent: Monday, December 15, 2014 3:42 PM To: Vithal S Mohare; dev at dpdk.org Subject: RE: Walk through a given mbuf-pool elements Hi Vithal, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vithal S Mohare > Sent: Monday, December 15, 2014 5:08 AM > To: dev at dpdk.org > Subject: [dpdk-dev] Walk through a given mbuf-pool elements > > [Re-sending the mail after registering to dpdk.org] Team, > > I am looking for a code/api to walk through a dpdk mbuf-pool elements > (similar to rte_mempool_walk() but for elements within > mempool). Calling pkt_mbuf_alloc for 'n' elements and then _free is not an > option. Rte_mempool_obj_itr() walks through, but does > lot more than walking itself. Please suggest if anybody has better > alternatives. Not sure why rte_mempool_obj_iter() wouldn't work for you? It just walks through all elements of the pool and for ach calls a user provided callback. Nothing else. Konstantin > > Thanks, > -Vithal
[dpdk-dev] Walk through a given mbuf-pool elements
[Re-sending the mail after registering to dpdk.org] Team, I am looking for a code/api to walk through a dpdk mbuf-pool elements (similar to rte_mempool_walk() but for elements within mempool). Calling pkt_mbuf_alloc for 'n' elements and then _free is not an option. Rte_mempool_obj_itr() walks through, but does lot more than walking itself. Please suggest if anybody has better alternatives. Thanks, -Vithal