[dpdk-dev] [PATCH] The VMXNET3 PMD can't receive packet suddenly after a lot of traffic coming in. The root cause is due to mbuf allocation fail in vmxnet3_post_rx_bufs() and there is no error handlin

2015-07-23 Thread Vithal S Mohare
How about the below changes? I have been using below changes and helping to 
resolve the issue.

===

= dpdk/lib/librte_pmd_vmxnet3/vmxnet3_ring.h#3 edit (text) =  

@@ -155,10 +155,11 @@ typedef struct vmxnet3_tx_queue {  struct 
vmxnet3_rxq_stats {
uint64_t drop_total;
uint64_t drop_err;
uint64_t drop_fcs;
uint64_t rx_buf_alloc_failure;
+uint64_t rx_buf_replenish;
 };

 typedef struct vmxnet3_rx_queue {
struct rte_mempool  *mp;
struct vmxnet3_hw   *hw;

= dpdk/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c#5 edit (text) =  

@@ -645,10 +645,32 @@ rcd_done:
break;
}
}
}

+/* VMXNET3
+ * In the above loop, vmxnet3_post_rx_bufs would fai if all the mbufs 
currently allocated.
+ * In such scenarios where hw device hasn't left with any of 'rx' 
descriptors, packets from
+ * network will not be 'DMA'd to driver.  While the only way to refresh 
'rxd' back to hw is
+ * though above i.e. when packet is received from hw.  So, there is 
potential dead-lock.
+ *
+ * Now, to break the deadlock, vmxnet3_post_rx_bufs() is triggered below 
when the poll 
+ * goes empty 'rcd'.  vmxnet3_post_rx_bufs() is no-op if all the 
descriptors are allocated
+ * in hw
+ */
+
+if (rcd->gen != rxq->comp_ring.gen) {
+   ring_idx = (uint8_t)((rcd->rqID == rxq->qid1) ? 0 : 1);
+if (vmxnet3_post_rx_bufs(rxq, ring_idx) > 0 ) {
+   if (unlikely(rxq->shared->ctrl.updateRxProd)) {
+   VMXNET3_WRITE_BAR0_REG(hw, rxprod_reg[ring_idx] + 
(rxq->queue_id * VMXNET3_REG_ALIGN),
+   
rxq->cmd_ring[ring_idx].next2fill);
+}
+rxq->stats.rx_buf_replenish++; 
+}
+}
+
return (nb_rx);
 }

===

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger
Sent: 23 July 2015 AM 10:58
To: mac_leehk at yahoo.com.hk
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH] The VMXNET3 PMD can't receive packet suddenly 
after a lot of traffic coming in. The root cause is due to mbuf allocation fail 
in vmxnet3_post_rx_bufs() and there is no error handling when it is called from 
vmxnet3_recv_pkts(). Th...

On Thu, 23 Jul 2015 09:48:55 +0800
mac_leehk at yahoo.com.hk wrote:

> From: marco 

Thank you for addressing a real bug. 

But there are several issues with the patch as submitted:

 * the standard way to handle allocation failure in network drivers is to drop 
the
   received packet and reuse the available data buffer (mbuf) for the next 
packet.
   It looks like your code would just stop receiving which could cause deadlock.

 * the mail is formatted in a manner than is incompatible with merging into git.
   All submissions should have a short < 60 character Subject with a summary
   followed by a description.  I don't know what mail client you used but 
everything
   is smashed into the Subject.

 * all patches require a Signed-off-by with a real name for Developer's 
Certificate Of Origin

 * the style is wrong, indentation is a mess please indent with tabs not spaces.

 * avoid extra comments, often in code too many comments are worse than too few


Please rework your patch and resubmit it.

>  drivers/net/vmxnet3/vmxnet3_rxtx.c |   54 
> +++-
>  1 file changed, 53 insertions(+), 1 deletion(-)  mode change 100644 
> => 100755 drivers/net/vmxnet3/vmxnet3_rxtx.c
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
> b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> old mode 100644
> new mode 100755
> index 39ad6ef..d560bbb
> --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
> +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
> @@ -421,6 +421,51 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
> **tx_pkts,
>   return nb_tx;
>  }
>  
> +static inline void
> +vmxnet3_renew_desc(vmxnet3_rx_queue_t *rxq, uint8_t ring_id,struct 
> +rte_mbuf *mbuf) {
> + uint32_t  val = 0;
> + struct vmxnet3_cmd_ring *ring = &rxq->cmd_ring[ring_id];
> +
> + struct Vmxnet3_RxDesc *rxd;
> + vmxnet3_buf_info_t *buf_info = &ring->buf_info[ring->next2fill];
> +
> + rxd = (struct Vmxnet3_RxDesc *)(ring->base + ring->next2fill);
> +
> + if (ring->rid == 0) {
> + /* Usually: One HEAD type buf per packet
> +  * val = (ring->next2fill % rxq->hw->bufs_per_pkt) ?
> +  * VMXNET3_RXD_BTYPE_BODY : VMXNET3_RXD_BTYPE_HEAD;
> +  */
> +
> + /* We use single packet buffer so all heads here */
> + val = VMXNET3_RXD_BTYPE_HEAD;
> + } else {
> + /* All BODY type buffers for 2nd ring; which won't be used at all by 
> ESXi *

[dpdk-dev] KNI performance numbers...

2015-06-24 Thread Vithal S Mohare
Hi,

I am running DPDP KNI application on linux (3.18 kernel) VM (ESXi 5.5), 
directly connected to another linux box to measure throughput using  iperf 
tool.  Link speed: 1Gbps.   Maximum throughput I get is 50% with 1470 Bytes.  
With 512B pkt sizes, throughput drops to 282 Mbps.

Tried using KNI loopback modes (and traffic from Ixia), but no change in 
throughput.

KNI is running in single thread mode.  One lcore for rx, one for tx and another 
fir kni thread.

Is the result expected?  Has anybody got better numbers?  Appreciate for input 
and relevant info.

Thanks,
-Vithal


[dpdk-dev] Support for MS Hyper-v hypervisor

2015-03-17 Thread Vithal S Mohare
Hi,

Would like to know whether present released versions of DPDK (1.7.x) supports 
MS hyper-v hypervisor.  DPDK roadmap document has it in the list for Version 
2.1.  Please confirm if we need to wait for the support till the release of 
DPDK 2.1.

Thanks,
-Vithal


[dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7

2015-02-24 Thread Vithal S Mohare
Hi Neil,


While most of the newer CPUs supports ssse3,  found a  I7 not supporting it.  
So, DPDK can't run these CPUs?  Is this restriction acceptable?  

-sh-3.2$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 26
model name  : Intel(R) Core(TM) i7 CPU 920  @ 2.67GHz
stepping: 5
cpu MHz : 2660.068
cache size  : 8192 KB
physical id : 0
siblings: 8
core id : 0
cpu cores   : 4
apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm 
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm

Thanks,
-Vithal

-Original Message-
From: Neil Horman [mailto:nhor...@tuxdriver.com] 
Sent: Thursday, February 19, 2015 6:13 PM
To: Vithal S Mohare
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7

On Wed, Feb 18, 2015 at 04:09:25AM +0000, Vithal S Mohare wrote:
> Ok, crash, as expected.   So, now dpdk mandates either AVX2 or SSSE2 
> supported CPUs.   OR applications needs to handle it run-time.
> 
No, sse3 is the minimum, but I think thats been the case for quite some time 
now, I think.

Neil

> Thanks,
> -Vithal
> 
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Tuesday, February 17, 2015 6:32 PM
> To: Vithal S Mohare
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7
> 
> On Tue, Feb 17, 2015 at 08:39:22AM +, Vithal S Mohare wrote:
> > Hi,
> > 
> > I am trying to use rte_memcpy optimization patch along with dpdk version 
> > 1.7.  With the patch, while dpdk itself is compiled, applications failed 
> > with below error:
> > ---
> > include/rte_memcpy.h:629:2: error: implicit declaration of function 
> > '_mm_alignr_epi8' [-Werror=implicit-function-declaration]
> > /home/vithals/adu_src/build/x-men_dev/Default/shumway/infra/dpdk/shumway_obj/lib/../include/rte_memcpy.h:629:2:
> >  error: incompatible type for argument 2 of '_mm_storeu_si128'
> > ---
> > 
> > After including -mssse3 flags, compilation (cross compiled for a x86 linux 
> > based platform) went through.  Now the question is, when this binary is 
> > loaded on system that doesn't support SSSE3 instruction set (but just sse2 
> > etc), what would be the behavior?
> > 
> A crash.  You'll attempt to send an unknown binary instruction into the 
> execution pipeline and the processor will fault.
> 
> Neil
> 
> 
> 


[dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7

2015-02-18 Thread Vithal S Mohare
Ok, crash, as expected.   So, now dpdk mandates either AVX2 or SSSE2 supported 
CPUs.   OR applications needs to handle it run-time.

Thanks,
-Vithal

-Original Message-
From: Neil Horman [mailto:nhor...@tuxdriver.com] 
Sent: Tuesday, February 17, 2015 6:32 PM
To: Vithal S Mohare
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7

On Tue, Feb 17, 2015 at 08:39:22AM +, Vithal S Mohare wrote:
> Hi,
> 
> I am trying to use rte_memcpy optimization patch along with dpdk version 1.7. 
>  With the patch, while dpdk itself is compiled, applications failed with 
> below error:
> ---
> include/rte_memcpy.h:629:2: error: implicit declaration of function 
> '_mm_alignr_epi8' [-Werror=implicit-function-declaration]
> /home/vithals/adu_src/build/x-men_dev/Default/shumway/infra/dpdk/shumway_obj/lib/../include/rte_memcpy.h:629:2:
>  error: incompatible type for argument 2 of '_mm_storeu_si128'
> ---
> 
> After including -mssse3 flags, compilation (cross compiled for a x86 linux 
> based platform) went through.  Now the question is, when this binary is 
> loaded on system that doesn't support SSSE3 instruction set (but just sse2 
> etc), what would be the behavior?
> 
A crash.  You'll attempt to send an unknown binary instruction into the 
execution pipeline and the processor will fault.

Neil




[dpdk-dev] rte_memcpy optimization patch to dpdk ver 1.7

2015-02-17 Thread Vithal S Mohare
Hi,

I am trying to use rte_memcpy optimization patch along with dpdk version 1.7.  
With the patch, while dpdk itself is compiled, applications failed with below 
error:
---
include/rte_memcpy.h:629:2: error: implicit declaration of function 
'_mm_alignr_epi8' [-Werror=implicit-function-declaration]
/home/vithals/adu_src/build/x-men_dev/Default/shumway/infra/dpdk/shumway_obj/lib/../include/rte_memcpy.h:629:2:
 error: incompatible type for argument 2 of '_mm_storeu_si128'
---

After including -mssse3 flags, compilation (cross compiled for a x86 linux 
based platform) went through.  Now the question is, when this binary is loaded 
on system that doesn't support SSSE3 instruction set (but just sse2 etc), what 
would be the behavior?

Thanks,
-Vithal


[dpdk-dev] Callbacks after buffer (mbuf) sent out

2015-01-16 Thread Vithal S Mohare
To elaborate more...
Assuming an example of vmxnet3 driver, at present, when rte_eth_tx_burst() is 
called to send mbufs, vmxnet3 driver calls vmxnet3_tq_tx_complete() to free up  
completed buffers.  Instead, wondering, if vmxnet3 (or corresponding driver) 
can trigger application callback for the buffer (mbuf) which gets transmitted 
out of n/w interface.  Application callback can decide now decide whether to 
reuse the mbuf and continue multicasting on other tunnels.  

Thanks,
-Vithal

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Vithal S Mohare
Sent: Tuesday, January 13, 2015 9:21 AM
To: dev at dpdk.org
Subject: [dpdk-dev] Callbacks after buffer (mbuf) sent out

Hi,

I am looking for application callbacks after mbufs are sent (tx) out 
successfully.   One of the use cases is for async multicast (over different gre 
tunnels etc).   Using direct/indirect buffers along with ref-count itself is 
not enough, as actual 'pkt-data' itself changes while flooding on list of 
tunnels.

Thanks,
-Vithal


[dpdk-dev] Callbacks after buffer (mbuf) sent out

2015-01-13 Thread Vithal S Mohare
Hi,

I am looking for application callbacks after mbufs are sent (tx) out 
successfully.   One of the use cases is for async multicast (over different gre 
tunnels etc).   Using direct/indirect buffers along with ref-count itself is 
not enough, as actual 'pkt-data' itself changes while flooding on list of 
tunnels.

Thanks,
-Vithal


[dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support

2014-12-23 Thread Vithal S Mohare
Agree.  As the mbuf is already received in the rx-q, may not yield great 
advantage.
On side note, any plans to support RSS for L2 packets ?

-Original Message-
From: Bruce Richardson [mailto:bruce.richard...@intel.com] 
Sent: Tuesday, December 23, 2014 3:00 PM
To: Vithal S Mohare
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support

On Tue, Dec 23, 2014 at 04:23:21AM +, Vithal S Mohare wrote:
> Hi Bruce,
> 
> 
> For example, for a port type that does not support RSS, a callback on RX can 
> be configured to calculate a hash in software.
> 
> 
> Wondering if this callback will also be useful to bridge the gap of no RSS 
> support for L2 packets.  i.e. in the rx call-back handler, can applications 
> calculate hash and feed it back so that spraying happens based on this?  Now, 
> all pure L2 packets (e.g. arp pkts) comes to rx-q 0 of the 'port'.  Adding 
> callback to [port][rx-q:0] would help?
> 
> Thanks,
> -Vithal

Yes, that could work. The downside is that it is no faster than having an app 
do the calculation itself, it's just perhaps a little easier to work with in 
the app.

/Bruce

> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Monday, December 22, 2014 10:17 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support
> 
> This RFC is for a small addition to the ethdev library, to add in support for 
> callbacks at the RX and TX stages. This allows packet processing to be done 
> on packets before they get returned to applications using rte_eth_rx_burst 
> call.
> 
> Use case: the first use case for this is to enable a consistent set of 
> packets mbufs to be received by applications irrespective of the NIC used to 
> receive those. For example, for a port type that does not support RSS, a 
> callback on RX can be configured to calculate a hash in software. 
> Similarly, this mechanism can be used to add other information to mbufs as 
> they are received, such as timestamps or sequence numbers, without cluttering 
> up the main packet processing path with checks for whether packets have these 
> fields filled in or not.
> A second use case is ease of intrumenting existing code. The example 
> application shows how combining a timestamp insertion callback on RX can be 
> paired with a latency calculation callback on TX to easily instrument any 
> application for packet latency.
> A third use case is to potentially extend existing NIC capabilities beyond 
> what is currently supported. For example, where flow director capabilities 
> can match up to a certain limit of flows - in the thousands, in the case of 
> NICs using the ixgbe driver - a callback can extend this to potentially 
> millions of flows by using a software hash table lookup inline for packets 
> that missing the hardware lookup filters. It would all appear transparent to 
> the packet handling code in the main application.
> 
> Future extensions: in future the ethdev library can be extended to provide a 
> standard set of callbacks for use by drivers. 
> 
> For now this patch set is RFC and still needs additional work for creating a 
> remove function for callbacks and to add in additional testing code.
> Since this adds in new code into the critical data path, I have run some 
> performance tests using testpmd with the ixgbe vector drivers (i.e. the 
> fastest, fast-path we have :-) ). Performance drops due to this patch seems 
> minimal to non-existant, rough tests on my system indicate a drop of perhaps 
> 1%.
> 
> All feedback welcome.
> 
> Bruce Richardson (3):
>   ethdev: rename callbacks field to intr_cbs
>   ethdev: Add in data rxtx callback support
>   examples: example showing use of callbacks.
> 
>  app/test/virtual_pmd.c |   2 +-
>  examples/rxtx_callbacks/Makefile   |  57 +
>  examples/rxtx_callbacks/basicfwd.c | 222 
> +
>  examples/rxtx_callbacks/basicfwd.h |  46 +++
>  lib/librte_ether/rte_ethdev.c  | 103 +--
>  lib/librte_ether/rte_ethdev.h  | 125 ++-
>  lib/librte_pmd_bond/rte_eth_bond_api.c |   2 +-
>  7 files changed, 543 insertions(+), 14 deletions(-)  create mode 
> 100644 examples/rxtx_callbacks/Makefile  create mode 100644 
> examples/rxtx_callbacks/basicfwd.c
>  create mode 100644 examples/rxtx_callbacks/basicfwd.h
> 
> --
> 1.9.3
> 


[dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support

2014-12-23 Thread Vithal S Mohare
Hi Bruce,


For example, for a port type that does not support RSS, a callback on RX can be 
configured to calculate a hash in software.


Wondering if this callback will also be useful to bridge the gap of no RSS 
support for L2 packets.  i.e. in the rx call-back handler, can applications 
calculate hash and feed it back so that spraying happens based on this?  Now, 
all pure L2 packets (e.g. arp pkts) comes to rx-q 0 of the 'port'.  Adding 
callback to [port][rx-q:0] would help?

Thanks,
-Vithal

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
Sent: Monday, December 22, 2014 10:17 PM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH RFC 0/3] DPDK ethdev callback support

This RFC is for a small addition to the ethdev library, to add in support for 
callbacks at the RX and TX stages. This allows packet processing to be done on 
packets before they get returned to applications using rte_eth_rx_burst call.

Use case: the first use case for this is to enable a consistent set of packets 
mbufs to be received by applications irrespective of the NIC used to receive 
those. For example, for a port type that does not support RSS, a callback on RX 
can be configured to calculate a hash in software. 
Similarly, this mechanism can be used to add other information to mbufs as they 
are received, such as timestamps or sequence numbers, without cluttering up the 
main packet processing path with checks for whether packets have these fields 
filled in or not.
A second use case is ease of intrumenting existing code. The example 
application shows how combining a timestamp insertion callback on RX can be 
paired with a latency calculation callback on TX to easily instrument any 
application for packet latency.
A third use case is to potentially extend existing NIC capabilities beyond what 
is currently supported. For example, where flow director capabilities can match 
up to a certain limit of flows - in the thousands, in the case of NICs using 
the ixgbe driver - a callback can extend this to potentially millions of flows 
by using a software hash table lookup inline for packets that missing the 
hardware lookup filters. It would all appear transparent to the packet handling 
code in the main application.

Future extensions: in future the ethdev library can be extended to provide a 
standard set of callbacks for use by drivers. 

For now this patch set is RFC and still needs additional work for creating a 
remove function for callbacks and to add in additional testing code.
Since this adds in new code into the critical data path, I have run some 
performance tests using testpmd with the ixgbe vector drivers (i.e. the 
fastest, fast-path we have :-) ). Performance drops due to this patch seems 
minimal to non-existant, rough tests on my system indicate a drop of perhaps 1%.

All feedback welcome.

Bruce Richardson (3):
  ethdev: rename callbacks field to intr_cbs
  ethdev: Add in data rxtx callback support
  examples: example showing use of callbacks.

 app/test/virtual_pmd.c |   2 +-
 examples/rxtx_callbacks/Makefile   |  57 +
 examples/rxtx_callbacks/basicfwd.c | 222 +
 examples/rxtx_callbacks/basicfwd.h |  46 +++
 lib/librte_ether/rte_ethdev.c  | 103 +--
 lib/librte_ether/rte_ethdev.h  | 125 ++-
 lib/librte_pmd_bond/rte_eth_bond_api.c |   2 +-
 7 files changed, 543 insertions(+), 14 deletions(-)  create mode 100644 
examples/rxtx_callbacks/Makefile  create mode 100644 
examples/rxtx_callbacks/basicfwd.c
 create mode 100644 examples/rxtx_callbacks/basicfwd.h

--
1.9.3



[dpdk-dev] Walk through a given mbuf-pool elements

2014-12-15 Thread Vithal S Mohare
Hi Konstantin,

Thanks for the reply.
I was trying to find more intuitive one compare to rte_mempool_obj_iter().  
Once mempool is created with objects, shouldn't we able to walk-through by just 
passing 'mp' object and avoiding other params like vaddr, elt_size etc.   For 
now, I could use callback called during mempool creation itself.


Thanks,
-Vithal

-Original Message-
From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com] 
Sent: Monday, December 15, 2014 3:42 PM
To: Vithal S Mohare; dev at dpdk.org
Subject: RE: Walk through a given mbuf-pool elements

Hi Vithal,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vithal S Mohare
> Sent: Monday, December 15, 2014 5:08 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] Walk through a given mbuf-pool elements
> 
> [Re-sending the mail after registering to dpdk.org] Team,
> 
> I am looking for a code/api to walk through a dpdk mbuf-pool elements 
> (similar to rte_mempool_walk() but for elements within
> mempool).   Calling pkt_mbuf_alloc for 'n' elements and then _free is not an 
> option.  Rte_mempool_obj_itr() walks through, but does
> lot more than walking itself.   Please suggest if anybody has better 
> alternatives.

Not sure why rte_mempool_obj_iter() wouldn't work for you?
It just walks through all elements of the pool and for ach calls a user 
provided callback.
Nothing else.

Konstantin

> 
> Thanks,
> -Vithal


[dpdk-dev] Walk through a given mbuf-pool elements

2014-12-15 Thread Vithal S Mohare
[Re-sending the mail after registering to dpdk.org]
Team,

I am looking for a code/api to walk through a dpdk mbuf-pool elements (similar 
to rte_mempool_walk() but for elements within mempool).   Calling 
pkt_mbuf_alloc for 'n' elements and then _free is not an option.  
Rte_mempool_obj_itr() walks through, but does lot more than walking itself.   
Please suggest if anybody has better alternatives.

Thanks,
-Vithal