[dpdk-dev] [PATCH] bnx2x: tx_start_bd->vlan_or_ethertype is le16

2015-12-01 Thread Charles (Chas) Williams
On Wed, 2015-12-02 at 00:34 +0100, Thomas Monjalon wrote:
> 2015-12-01 14:37, Stephen Hemminger:
> > Harish Patil  wrote:
> > > >2015-11-03 12:26, Chas Williams:  
> > > >> --- a/drivers/net/bnx2x/bnx2x.c
> > > >> +++ b/drivers/net/bnx2x/bnx2x.c
> > > >> -tx_start_bd->vlan_or_ethertype = 
> > > >> eh->ether_type;
> > > >> +tx_start_bd->vlan_or_ethertype
> > > >> += 
> > > >> rte_cpu_to_le_16(rte_be_to_cpu_16(eh->ether_type));
> > > 
> > > Minor question - any specific reason to use rte_be_to_cpu_16() on
> > > ether_type alone before converting from native order to le16?
> > 
> > ether_type is in network byte order (big endian)
> > and hardware wants little endian. On x86 the second step is a nop.
> 
> Doesn't it deserve a macro rte_ntole16()?
> It may be in lib/librte_eal/common/include/generic/rte_byteorder.h.

I looked I didn't see anything.  This value, according to the linux
driver, wants to be little endian regardless of the host endian.



[dpdk-dev] [PATCH] bnx2x: tx_start_bd->vlan_or_ethertype is le16

2015-12-02 Thread Charles (Chas) Williams
On Wed, 2015-12-02 at 02:04 +0100, Thomas Monjalon wrote:
> 2015-12-01 18:58, Charles  Williams:
> > On Wed, 2015-12-02 at 00:34 +0100, Thomas Monjalon wrote:
> > > 2015-12-01 14:37, Stephen Hemminger:
> > > > Harish Patil  wrote:
> > > > > >2015-11-03 12:26, Chas Williams:  
> > > > > >> --- a/drivers/net/bnx2x/bnx2x.c
> > > > > >> +++ b/drivers/net/bnx2x/bnx2x.c
> > > > > >> -tx_start_bd->vlan_or_ethertype = 
> > > > > >> eh->ether_type;
> > > > > >> +tx_start_bd->vlan_or_ethertype
> > > > > >> += 
> > > > > >> rte_cpu_to_le_16(rte_be_to_cpu_16(eh->ether_type));
> > > > > 
> > > > > Minor question - any specific reason to use rte_be_to_cpu_16() on
> > > > > ether_type alone before converting from native order to le16?
> > > > 
> > > > ether_type is in network byte order (big endian)
> > > > and hardware wants little endian. On x86 the second step is a nop.
> > > 
> > > Doesn't it deserve a macro rte_ntole16()?
> > > It may be in lib/librte_eal/common/include/generic/rte_byteorder.h.
> > 
> > I looked I didn't see anything.  This value, according to the linux
> > driver, wants to be little endian regardless of the host endian.
> 
> Yes, that's why I suggest to create some macros to do this kind of conversion.
> Example: rte_ntole16 means "network to little endian 16-bit".
> Do you think it would be clearer to use?

This is the only example of this kind of conversion in the source code
so it would be a macro for one user.  If you create rte_ntole16() you
might feel obligated to create the various permutations for which there
are no consumers.



[dpdk-dev] [PATCH] bnx2x: set Ethernet address type during transmit for VF's

2015-12-07 Thread Charles (Chas) Williams
On Sun, 2015-12-06 at 23:34 +, Harish Patil wrote:
> >
> >The original was always setting unicast.  While here, clean up some
> >other references that also point into the Ethernet header.
> >
> >Signed-off-by: Chas Williams <3chas3 at gmail.com>
> >---
> > drivers/net/bnx2x/bnx2x.c | 23 +++
> > drivers/net/bnx2x/ecore_hsi.h |  5 +++--
> > 2 files changed, 18 insertions(+), 10 deletions(-)
> >
> >diff --git a/drivers/net/bnx2x/bnx2x.c b/drivers/net/bnx2x/bnx2x.c
> >index 76444eb..294711f 100644
> >--- a/drivers/net/bnx2x/bnx2x.c
> >+++ b/drivers/net/bnx2x/bnx2x.c
> >@@ -2177,25 +2177,32 @@ int bnx2x_tx_encap(struct bnx2x_tx_queue *txq,
> >struct rte_mbuf **m_head, int m_p
> >   bd_prod = NEXT_TX_BD(bd_prod);
> >   if (IS_VF(sc)) {
> >   struct eth_tx_parse_bd_e2 *tx_parse_bd;
> >-  uint8_t *data = rte_pktmbuf_mtod(m0, uint8_t *);
> >+  const struct ether_hdr *eh = rte_pktmbuf_mtod(m0, 
> >struct ether_hdr *);
> >+  uint8_t mac_type = UNICAST_ADDRESS;
> >
> >   tx_parse_bd =
> >   &txq->tx_ring[TX_BD(bd_prod, txq)].parse_bd_e2;
> >+  if (is_multicast_ether_addr(&eh->d_addr)) {
> 
> Minor comment. unlikely() may be used here to keep it consistent with base
> driver.

It wasn't clear to me that this code path is all that unlikely().

> >+  if (is_broadcast_ether_addr(&eh->d_addr))
> >+  mac_type = BROADCAST_ADDRESS;
> >+  else
> >+  mac_type = MULTICAST_ADDRESS;
> >+  }




[dpdk-dev] [PATCH] bnx2x: set Ethernet address type during transmit for VF's

2015-12-08 Thread Charles (Chas) Williams
On Mon, 2015-12-07 at 17:29 +, Harish Patil wrote:
> >
> >On Sun, 2015-12-06 at 23:34 +, Harish Patil wrote:
> >> >
> >> >The original was always setting unicast.  While here, clean up some
> >> >other references that also point into the Ethernet header.
> >> >
> >> >Signed-off-by: Chas Williams <3chas3 at gmail.com>
> >> >---
> >> > drivers/net/bnx2x/bnx2x.c | 23 +++
> >> > drivers/net/bnx2x/ecore_hsi.h |  5 +++--
> >> > 2 files changed, 18 insertions(+), 10 deletions(-)
> >> >
> >> >diff --git a/drivers/net/bnx2x/bnx2x.c b/drivers/net/bnx2x/bnx2x.c
> >> >index 76444eb..294711f 100644
> >> >--- a/drivers/net/bnx2x/bnx2x.c
> >> >+++ b/drivers/net/bnx2x/bnx2x.c
> >> >@@ -2177,25 +2177,32 @@ int bnx2x_tx_encap(struct bnx2x_tx_queue *txq,
> >> >struct rte_mbuf **m_head, int m_p
> >> >   bd_prod = NEXT_TX_BD(bd_prod);
> >> >   if (IS_VF(sc)) {
> >> >   struct eth_tx_parse_bd_e2 *tx_parse_bd;
> >> >-  uint8_t *data = rte_pktmbuf_mtod(m0, uint8_t *);
> >> >+  const struct ether_hdr *eh =
> >>rte_pktmbuf_mtod(m0, struct ether_hdr *);
> >> >+  uint8_t mac_type = UNICAST_ADDRESS;
> >> >
> >> >   tx_parse_bd =
> >> >   &txq->tx_ring[TX_BD(bd_prod,
> >>txq)].parse_bd_e2;
> >> >+  if (is_multicast_ether_addr(&eh->d_addr)) {
> >>
> >> Minor comment. unlikely() may be used here to keep it consistent with
> >>base
> >> driver.
> >
> >It wasn't clear to me that this code path is all that unlikely().
> 
> Its an optional comment, unlikely() is because fast path traffic is mostly
> unicast.

Multicast traffic isn't all that uncommon.  I also don't see that we gain
much from branch prediction here regardless.  My understanding is that
unlikely() should be used for really unlikely situations since the branch
will not be optimized.

> BTW, have you tested the patches? Another question, not related to your
> change though.

Yes.  This patch is necessary (for VF anyway).  Without it, you need to
manually assign arp addresses.

> I guess this block of code does not have to been under the IS_VF() check.

I don't know.  It is under IS_VF() in the linux driver and it looks like
an attempt to "program" some internal switch on the card.  I don't have
a programmer's guide for this card so it's a complete guess.

> >> >+  if
> >>(is_broadcast_ether_addr(&eh->d_addr))
> >> >+  mac_type = BROADCAST_ADDRESS;
> >> >+  else
> >> >+  mac_type = MULTICAST_ADDRESS;
> >> >+  }




[dpdk-dev] Status of bnx2x pmd?

2015-11-04 Thread Charles (Chas) Williams
On Fri, 2015-10-30 at 18:10 -0400, Mussar, Gary wrote:
> I have been attempting to build testpmd with the bnx2x pmd enabled to try and 
> talk through a bcm57810 10G NIC card. I have tried using SR-IOV VFs and I 
> have tried using the PFs. Neither seem to be able to be properly initialized 
> in testpmd and/or testpmd aborts.
> 
> Is this PMD actually usable or is it trash?
> 
> Gary

See the two patches I have posted to this list recently w/respect to the
bnx2x.  At this point, I have a VF on a 57810 working in the TX direction.
I am unable to figure out what is wrong with the RX direction though.

The rx packet counters increment but I don't seem to get packet delivery.

There is probably a minor issue with bnx2x_fill_accept_flags() being
used in both the PF and VF drivers when setting the rx filter but fixing
that didn't seem to help.

At this point it is a long slog comparing the linux driver to the
PMD driver.




[dpdk-dev] [PATCH v2] devargs: add blacklisting by linux interface name

2015-11-05 Thread Charles (Chas) Williams
On Wed, 2015-11-04 at 23:40 +0100, Thomas Monjalon wrote:
> 2015-10-14 09:41, Charles  Williams:
> > On Tue, 2015-10-13 at 14:49 +0200, Olivier MATZ wrote:
> > > For PCI devices that have several interfaces (I think it's the case for
> > > some Mellanox boards), maybe we should not store the interface name?
> > 
> > I am not sure what you mean here.  If a device has multiple ethernet
> > interfaces, then it should a have seperate PCI device address space for
> > each interface (I dont know of any DPDK drivers that don't make this
> > assumption as well).
> 
> mlx4 and cxgbe?

OK, I see now.  I don't know of a way to tell if a device has multiple
ports just from the pci vendor/device id without maintaining some
sort of table.

Do these devices have multiple interfaces listed in their
/sys/devices/.../net diretory?  If so, matching one of the listed
interfaces can just blacklist the whole device similar to blacklisting
by the device id.




[dpdk-dev] [PATCH v2] devargs: add blacklisting by linux interface name

2015-11-10 Thread Charles (Chas) Williams
On Thu, 2015-11-05 at 11:23 -0800, Stephen Hemminger wrote:
> On Thu, 05 Nov 2015 11:39:04 -0500
> "Charles (Chas) Williams" <3chas3 at gmail.com> wrote:
> 
> > On Wed, 2015-11-04 at 23:40 +0100, Thomas Monjalon wrote:
> > > 2015-10-14 09:41, Charles  Williams:  
> > > > On Tue, 2015-10-13 at 14:49 +0200, Olivier MATZ wrote:  
> > > > > For PCI devices that have several interfaces (I think it's the case 
> > > > > for
> > > > > some Mellanox boards), maybe we should not store the interface name?  
> > > > 
> > > > I am not sure what you mean here.  If a device has multiple ethernet
> > > > interfaces, then it should a have seperate PCI device address space for
> > > > each interface (I dont know of any DPDK drivers that don't make this
> > > > assumption as well).  
> > > 
> > > mlx4 and cxgbe?  
> > 
> > OK, I see now.  I don't know of a way to tell if a device has multiple
> > ports just from the pci vendor/device id without maintaining some
> > sort of table.
> > 
> > Do these devices have multiple interfaces listed in their
> > /sys/devices/.../net diretory?  If so, matching one of the listed
> > interfaces can just blacklist the whole device similar to blacklisting
> > by the device id.
> 
> Devices with multiple ports are supposed to report the port via 
> /sys/class/net/xxx/portid

But I want to find the ports associated by the PCI devices.


> But you aren't going to be able to blacklist only one port of these devices.
> The two drivers would be fighting over registers and IRQ management.
> Plus kernel bind/unbind is by PCI id.

I understand that.  Blacklisting an interface on a multiple port device
would be essentially the same as blacklist by the PCI device id.  You
can't split the PCI device.  I just need to find the list of ports
associated with a single PCI device.




[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-02 Thread Charles (Chas) Williams
On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote:
> On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote:
> > If a system is using deterministic interface names, it may be easier in
> > some cases to use the interface name to blacklist an interface.
> >
> 
> Is it possible to do this using the existing arguments, i.e. have the -b flag
> detect if it's a pci address or name automatically, rather than having to use
> a separate command-line arg for it?

You might be able to distinguish names by context.  I doubt interface
names ever look like PCI addresses.  But that's going to be a bigger
change since -b will need to be updated to 'blacklist' intead of
'pci-blacklist' to prevent confusion.  Or do you just want to overload
'-b' and keep both long options?




[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-02 Thread Charles (Chas) Williams
On Fri, 2015-10-02 at 16:44 +, Richardson, Bruce wrote:
> > -Original Message-
> > From: Charles (Chas) Williams [mailto:3chas3 at gmail.com]
> > 
> > On Fri, 2015-10-02 at 16:18 +0100, Bruce Richardson wrote:
> > > On Fri, Oct 02, 2015 at 11:00:07AM -0400, Chas Williams wrote:
> > > > If a system is using deterministic interface names, it may be easier
> > > > in some cases to use the interface name to blacklist an interface.
> > > >
> > >
> > > Is it possible to do this using the existing arguments, i.e. have the
> > > -b flag detect if it's a pci address or name automatically, rather
> > > than having to use a separate command-line arg for it?
> > 
> > You might be able to distinguish names by context.  I doubt interface
> > names ever look like PCI addresses.  But that's going to be a bigger
> > change since -b will need to be updated to 'blacklist' intead of 'pci-
> > blacklist' to prevent confusion.  Or do you just want to overload '-b' and
> > keep both long options?
> >
> I'm not sure about that, to be honest. However, I'd rather not have
> too many cmd line options to be maintained in the code. 
> 
> Does you proposed blacklisting patch work with non-pci devices as well
> as with PCI ones as now?

Unfortunately, the devargs API is rather PCI specific -- it takes a PCI
device.  Nothing prevents you from writing a device specific version of
the devargs API though for your device class since the devargs list isn't
static but checking for certain devargs wouldn't make sense in some cases.
Checking to see if a USB device matched a blacklisted PCI device would
be pointless.

Other devices (like Xen or hyperv) have a net/ directory/link in their /sys
entry that lets you determine an interface name.  I think it's the same
for USB ethernet devices -- I don't happen to have one to check.




[dpdk-dev] [PATCH] devargs: add blacklisting by linux interface name

2015-10-05 Thread Charles (Chas) Williams
On Fri, 2015-10-02 at 16:44 +, Richardson, Bruce wrote:
> I'm not sure about that, to be honest. However, I'd rather not have
> too many cmd line options to be maintained in the code. 
> 
> Does you proposed blacklisting patch work with non-pci devices as well
> as with PCI ones as now?

I refactored this a bit to fit it into the existing pci blacklisting.
Better?




[dpdk-dev] [PATCH v2] devargs: add blacklisting by linux interface name

2015-10-06 Thread Charles (Chas) Williams
On Tue, 2015-10-06 at 08:35 +0100, Stephen Hemminger wrote:
> On Mon,  5 Oct 2015 11:26:08 -0400
> Chas Williams <3chas3 at gmail.com> wrote:
> 
> > diff --git a/lib/librte_eal/common/include/rte_pci.h 
> > b/lib/librte_eal/common/include/rte_pci.h
> > index 83e3c28..852c149 100644
> > --- a/lib/librte_eal/common/include/rte_pci.h
> > +++ b/lib/librte_eal/common/include/rte_pci.h
> > @@ -161,6 +161,7 @@ struct rte_pci_device {
> > struct rte_pci_resource mem_resource[PCI_MAX_RESOURCE];   /**< PCI 
> > Memory Resource */
> > struct rte_intr_handle intr_handle; /**< Interrupt handle */
> > struct rte_pci_driver *driver;  /**< Associated driver */
> > +   char name[32];
> 
> Why not use IFNAMSIZ rather than magic constant here?

No particular reason.  It just matches the virtual device name size.
I will change it.



[dpdk-dev] [PATCH v2] devargs: add blacklisting by linux interface name

2015-10-14 Thread Charles (Chas) Williams
On Tue, 2015-10-13 at 14:49 +0200, Olivier MATZ wrote:
> Hi Chas,
> 
> > @@ -352,6 +354,19 @@ pci_scan_one(const char *dirname, uint16_t domain, 
> > uint8_t bus,
> > return -1;
> > }
> >  
> > +   /* get network interface name */
> > +   snprintf(filename, sizeof(filename), "%s/net", dirname);
> > +   dir = opendir(filename);
> > +   if (dir) {
> > +   while ((e = readdir(dir)) != NULL) {
> > +   if (e->d_name[0] == '.')
> > +   continue;
> > +
> > +   strncpy(dev->name, e->d_name, sizeof(dev->name));
> > +   }
> > +   closedir(dir);
> > +   }
> > +
> > if (!ret) {
> > if (!strcmp(driver, "vfio-pci"))
> > dev->kdrv = RTE_KDRV_VFIO;
> > 
> 
> For PCI devices that have several interfaces (I think it's the case for
> some Mellanox boards), maybe we should not store the interface name?

I am not sure what you mean here.  If a device has multiple ethernet
interfaces, then it should a have seperate PCI device address space for
each interface (I dont know of any DPDK drivers that don't make this
assumption as well).  If the device is multiprotocol, say Infiniband,
the device might have a net/ subdirectory, but it will be called something
like ib0 which you might want to blacklist for some reason.

> Another small comment about the strncpy(): it's maybe safer to ensure
> that dev->name is properly nul-terminated.

A good idea but it shouldn't happen in practice since dev.name will
be IFNAMSIZ.  I will fix it in the next version.




[dpdk-dev] [PATCH v3] mbuf: use refcnt = 0 when debugging

2017-09-07 Thread Charles (Chas) Williams
After commit 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool") is it
much harder to detect a "double free".  If the developer makes a copy
of an mbuf pointer and frees it twice, this condition is never detected
and the mbuf gets returned to the pool twice.

Since this requires extra work to track, make this behavior conditional
on CONFIG_RTE_LIBRTE_MBUF_DEBUG.

Signed-off-by: Chas Williams 
---
 lib/librte_mbuf/rte_mbuf.c |  2 +-
 lib/librte_mbuf/rte_mbuf.h | 40 ++--
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 26a62b8..b0d222c 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -145,7 +145,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->pool = mp;
m->nb_segs = 1;
m->port = 0xff;
-   rte_mbuf_refcnt_set(m, 1);
+   rte_mbuf_refcnt_set(m, RTE_MBUF_UNUSED_CNT);
m->next = NULL;
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index eaed7ee..1400b35 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -671,11 +671,15 @@ struct rte_pktmbuf_pool_private {
 
 #ifdef RTE_LIBRTE_MBUF_DEBUG
 
+#define RTE_MBUF_UNUSED_CNT 0
+
 /**  check mbuf type in debug mode */
 #define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
 
 #else /*  RTE_LIBRTE_MBUF_DEBUG */
 
+#define RTE_MBUF_UNUSED_CNT 1
+
 /**  check mbuf type in debug mode */
 #define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
 
@@ -721,6 +725,9 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) != 0);
+#endif
/*
 * The atomic_add is an expensive operation, so we don't want to
 * call it in the case where we know we are the uniq holder of
@@ -791,10 +798,9 @@ void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
 
 #define MBUF_RAW_ALLOC_CHECK(m) do {   \
-   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);   \
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) == RTE_MBUF_UNUSED_CNT); \
RTE_ASSERT((m)->next == NULL);  \
RTE_ASSERT((m)->nb_segs == 1);  \
-   __rte_mbuf_sanity_check(m, 0);  \
 } while (0)
 
 /**
@@ -825,6 +831,10 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct 
rte_mempool *mp)
return NULL;
m = (struct rte_mbuf *)mb;
MBUF_RAW_ALLOC_CHECK(m);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(m, 1);
+#endif
+   __rte_mbuf_sanity_check(m, 0);
return m;
 }
 
@@ -846,10 +856,9 @@ static __rte_always_inline void
 rte_mbuf_raw_free(struct rte_mbuf *m)
 {
RTE_ASSERT(RTE_MBUF_DIRECT(m));
-   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) == RTE_MBUF_UNUSED_CNT);
RTE_ASSERT(m->next == NULL);
RTE_ASSERT(m->nb_segs == 1);
-   __rte_mbuf_sanity_check(m, 0);
rte_mempool_put(m->pool, m);
 }
 
@@ -1159,21 +1168,37 @@ static inline int rte_pktmbuf_alloc_bulk(struct 
rte_mempool *pool,
case 0:
while (idx != count) {
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 3:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 2:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 1:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
@@ -1271,7 +1296,7 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
if (rte_mbuf_refcnt_update(md, -1) == 0) {
md->next = NULL;
md->nb_segs = 1;
-   

[dpdk-dev] [PATCH 1/2] vhost: reference count fix for nb_started_ports

2016-12-28 Thread Charles (Chas) Williams
From: Wen Chiu 

Only increment and decrement nb_started_ports on the first and last
device start and stop.  Otherwise, nb_started_ports can become negative
if a device is stopped multiple times.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Signed-off-by: Wen Chiu 
Reviewed-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 60b0f51..ba559ff 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -782,11 +782,11 @@ eth_dev_start(struct rte_eth_dev *dev)
internal->flags);
if (ret)
return ret;
-   }
 
-   /* We need only one message handling thread */
-   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
-   ret = vhost_driver_session_start();
+   /* We need only one message handling thread */
+   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
+   ret = vhost_driver_session_start();
+   }
 
return ret;
 }
@@ -796,11 +796,12 @@ eth_dev_stop(struct rte_eth_dev *dev)
 {
struct pmd_internal *internal = dev->data->dev_private;
 
-   if (rte_atomic16_cmpset(&internal->once, 1, 0))
+   if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
rte_vhost_driver_unregister(internal->iface_name);
 
-   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
-   vhost_driver_session_stop();
+   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+   vhost_driver_session_stop();
+   }
 }
 
 static int
-- 
2.1.4



[dpdk-dev] [PATCH 2/2] vhost: start vhost servers once

2016-12-28 Thread Charles (Chas) Williams
Start a vhost server once during devinit instead of during device start
and stop.  Some vhost clients, QEMU, don't re-attaching to sockets when
the vhost server is stopped and later started.  Preserve existing behavior
for vhost clients.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index ba559ff..914d83f 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -772,9 +772,8 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev)
+vhost_dev_start(struct pmd_internal *internal)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
int ret = 0;
 
if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
@@ -792,10 +791,8 @@ eth_dev_start(struct rte_eth_dev *dev)
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev)
+vhost_dev_stop(struct pmd_internal *internal)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-
if (rte_atomic16_cmpset(&internal->once, 1, 0)) {
rte_vhost_driver_unregister(internal->iface_name);
 
@@ -805,6 +802,27 @@ eth_dev_stop(struct rte_eth_dev *dev)
 }
 
 static int
+eth_dev_start(struct rte_eth_dev *dev)
+{
+   struct pmd_internal *internal = dev->data->dev_private;
+   int ret = 0;
+
+   if (internal->flags & RTE_VHOST_USER_CLIENT)
+   ret = vhost_dev_start(internal);
+
+   return ret;
+}
+
+static void
+eth_dev_stop(struct rte_eth_dev *dev)
+{
+   struct pmd_internal *internal = dev->data->dev_private;
+
+   if (internal->flags & RTE_VHOST_USER_CLIENT)
+   vhost_dev_stop(internal);
+}
+
+static int
 eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
   uint16_t nb_rx_desc __rte_unused,
   unsigned int socket_id,
@@ -1079,6 +1097,11 @@ eth_dev_vhost_create(const char *name, char *iface_name, 
int16_t queues,
eth_dev->rx_pkt_burst = eth_vhost_rx;
eth_dev->tx_pkt_burst = eth_vhost_tx;
 
+   if ((flags & RTE_VHOST_USER_CLIENT) == 0) {
+   if (vhost_dev_start(internal))
+   goto error;
+   }
+
return data->port_id;
 
 error:
@@ -1216,6 +1239,9 @@ rte_pmd_vhost_remove(const char *name)
 
eth_dev_stop(eth_dev);
 
+   if ((internal->flags & RTE_VHOST_USER_CLIENT) == 0)
+   vhost_dev_stop(internal);
+
rte_free(vring_states[eth_dev->data->port_id]);
vring_states[eth_dev->data->port_id] = NULL;
 
-- 
2.1.4



Re: [dpdk-dev] [PATCH 1/2] vhost: reference count fix for nb_started_ports

2016-12-29 Thread Charles (Chas) Williams


On 12/29/2016 03:51 AM, Yuanhan Liu wrote:

On Wed, Dec 28, 2016 at 04:10:51PM -0500, Charles (Chas) Williams wrote:

From: Wen Chiu 

Only increment and decrement nb_started_ports on the first and last
device start and stop.  Otherwise, nb_started_ports can become negative
if a device is stopped multiple times.


How could you be able to stop dev (precisely, invoke eth_dev_stop)
multiple times, judging that eth_dev_stop() will be invoked once
only?

void
rte_eth_dev_stop(uint8_t port_id)
{
struct rte_eth_dev *dev;

RTE_ETH_VALID_PORTID_OR_RET(port_id);
dev = &rte_eth_devices[port_id];

RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_stop);

==> if (dev->data->dev_started == 0) {
RTE_PMD_DEBUG_TRACE("Device with port_id=%" PRIu8
" already stopped\n",
port_id);
return;
}

==> dev->data->dev_started = 0;
(*dev->dev_ops->dev_stop)(dev);
}

Multiple threads?


No, we aren't using multiple threads for control.  But eth_dev_stop()
is called in rte_pmd_vhost_remove():

static int
rte_pmd_vhost_remove(const char *name)
{
...
pthread_mutex_lock(&internal_list_lock);
TAILQ_REMOVE(&internal_list, list, next);
pthread_mutex_unlock(&internal_list_lock);
rte_free(list);

eth_dev_stop(eth_dev);
...

So, if we .dev_stop() and deatch the virtual device, eth_dev_stop()
gets called twice.  Calling .dev_stop() when you are about to detach
the device seems completely reasonable.  It also seems reasonable to
call eth_dev_stop() inside rte_pmd_vhost_remove() in case the end
user didn't do a .dev_stop().


Re: [dpdk-dev] [PATCH 2/2] vhost: start vhost servers once

2016-12-29 Thread Charles (Chas) Williams

On 12/29/2016 03:52 AM, Yuanhan Liu wrote:

On Wed, Dec 28, 2016 at 04:10:52PM -0500, Charles (Chas) Williams wrote:

Start a vhost server once during devinit instead of during device start
and stop.  Some vhost clients, QEMU, don't re-attaching to sockets when
the vhost server is stopped and later started.  Preserve existing behavior
for vhost clients.


I didn't quite get the idea what you are going to fix.


The issue I am trying to fix is QEMU interaction when DPDK's vhost is
acting as a server to QEMU vhost clients.  If you create a vhost server
device, it doesn't create the actual datagram socket until you call
.dev_start().  If you call .dev_stop() is also deletes those sockets.
For QEMU, this is a problem since QEMU doesn't know how to re-attach to
datagram sockets that have gone away.

.dev_start()/.dev_stop() seems to roughly means link up and link down
so I understand why you might want to add/remove the datagram sockets.
However, in practice, this doesn't seem to make much sense for a DPDK
vhost server.  This doesn't seem like the right way to indicate link
status to vhost clients.

It seems like it would just be easier to do this for both clients and
servers, but I don't know why it was done this way originally so I
choose to keep the client behavior.


Re: [dpdk-dev] [PATCH 2/2] vhost: start vhost servers once

2016-12-30 Thread Charles (Chas) Williams



On 12/29/2016 10:15 PM, Yuanhan Liu wrote:

On Thu, Dec 29, 2016 at 10:58:11AM -0500, Charles (Chas) Williams wrote:

On 12/29/2016 03:52 AM, Yuanhan Liu wrote:

On Wed, Dec 28, 2016 at 04:10:52PM -0500, Charles (Chas) Williams wrote:

Start a vhost server once during devinit instead of during device start
and stop.  Some vhost clients, QEMU, don't re-attaching to sockets when
the vhost server is stopped and later started.  Preserve existing behavior
for vhost clients.


I didn't quite get the idea what you are going to fix.


The issue I am trying to fix is QEMU interaction when DPDK's vhost is
acting as a server to QEMU vhost clients.  If you create a vhost server
device, it doesn't create the actual datagram socket until you call
.dev_start().  If you call .dev_stop() is also deletes those sockets.
For QEMU, this is a problem since QEMU doesn't know how to re-attach to
datagram sockets that have gone away.


Thanks! And I'd appreciate it if you could have written the commit log
this way firstly.


.dev_start()/.dev_stop() seems to roughly means link up and link down
so I understand why you might want to add/remove the datagram sockets.
However, in practice, this doesn't seem to make much sense for a DPDK
vhost server.


Agree.


This doesn't seem like the right way to indicate link
status to vhost clients.

It seems like it would just be easier to do this for both clients and
servers, but I don't know why it was done this way originally so I
choose to keep the client behavior.


I don't think there are any differences between DPDK acting as client or
server. To me, the right logic seems to be (for both DPDK as server and
client).

For register,
- register the vhost socket at probe stage (either at rte_pmd_vhost_probe
  or at eth_dev_vhost_create).
- start the vhost session right after the register when we haven't started
  it before.

For unregister,
- invoke rte_vhost_driver_unregister() at rte_pmd_vhost_remove().


OK. This will be much easier than what I submitted.


For dev_start/stop,
- set allow_queuing to 1/0 for start/stop, respectively.


Unfortunately, I don't think this will work.  new_device() doesn't happen
until a client connects.  allow_queueing seems to be following the status
of the "wire" as it where.  .dev_start()/.dev_stop() is the link of local
port connected to the wire (administratively up or down as it where).

.dev_start() can happen before new_device() and attempting to RX for a
client that doesn't exist doesn't seem like a good idea.  Perhaps another
flag that follows dev_started, but for the queues?


[dpdk-dev] [PATCH v2 1/2] net/vhost: create datagram sockets immediately

2017-01-01 Thread Charles (Chas) Williams
If you create a vhost server device, it doesn't create the actual datagram
socket until you call .dev_start().  If you call .dev_stop() is also
deletes those sockets.  For QEMU clients, this is a problem since QEMU
doesn't know how to re-attach to datagram sockets that have gone away.

To work around this, register and unregister the datagram sockets during
device creation and removal.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 43 ---
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 60b0f51..6b11e40 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -114,8 +114,6 @@ struct pmd_internal {
char *iface_name;
uint16_t max_queues;
uint64_t flags;
-
-   volatile uint16_t once;
 };
 
 struct internal_list {
@@ -772,35 +770,14 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev)
+eth_dev_start(struct rte_eth_dev *dev __rte_unused)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-   int ret = 0;
-
-   if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
-   ret = rte_vhost_driver_register(internal->iface_name,
-   internal->flags);
-   if (ret)
-   return ret;
-   }
-
-   /* We need only one message handling thread */
-   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
-   ret = vhost_driver_session_start();
-
-   return ret;
+   return 0;
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev)
+eth_dev_stop(struct rte_eth_dev *dev __rte_unused)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-
-   if (rte_atomic16_cmpset(&internal->once, 1, 0))
-   rte_vhost_driver_unregister(internal->iface_name);
-
-   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
-   vhost_driver_session_stop();
 }
 
 static int
@@ -1078,6 +1055,15 @@ eth_dev_vhost_create(const char *name, char *iface_name, 
int16_t queues,
eth_dev->rx_pkt_burst = eth_vhost_rx;
eth_dev->tx_pkt_burst = eth_vhost_tx;
 
+   if (rte_vhost_driver_register(iface_name, flags))
+   goto error;
+
+   /* We need only one message handling thread */
+   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1) {
+   if (vhost_driver_session_start())
+   goto error;
+   }
+
return data->port_id;
 
 error:
@@ -1215,6 +1201,11 @@ rte_pmd_vhost_remove(const char *name)
 
eth_dev_stop(eth_dev);
 
+   rte_vhost_driver_unregister(internal->iface_name);
+
+   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+   vhost_driver_session_stop();
+
rte_free(vring_states[eth_dev->data->port_id]);
vring_states[eth_dev->data->port_id] = NULL;
 
-- 
2.1.4



[dpdk-dev] [PATCH v2 2/2] net/vhost: emulate device start/stop behavior

2017-01-01 Thread Charles (Chas) Williams
.dev_start()/.dev_stop() roughly corresponds to the local device's
port being up or down.  This is different from the remote client being
connected which is roughtly link up or down.  Emulate the behavior by
separately tracking the local start/stop state to determine if we should
allow packets to be queued to the remote client.

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 65 ---
 1 file changed, 54 insertions(+), 11 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 6b11e40..c9e45eb 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -100,7 +100,8 @@ struct vhost_stats {
 
 struct vhost_queue {
int vid;
-   rte_atomic32_t allow_queuing;
+   rte_atomic32_t connected;
+   rte_atomic32_t ready;
rte_atomic32_t while_queuing;
struct pmd_internal *internal;
struct rte_mempool *mb_pool;
@@ -383,18 +384,25 @@ vhost_update_packet_xstats(struct vhost_queue *vq,
}
 }
 
+static inline bool
+queuing_stopped(struct vhost_queue *r)
+{
+   return unlikely(rte_atomic32_read(&r->connected) == 0 ||
+   rte_atomic32_read(&r->ready) == 0);
+}
+
 static uint16_t
 eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
 {
struct vhost_queue *r = q;
uint16_t i, nb_rx = 0;
 
-   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+   if (queuing_stopped(r))
return 0;
 
rte_atomic32_set(&r->while_queuing, 1);
 
-   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+   if (queuing_stopped(r))
goto out;
 
/* Dequeue packets from guest TX queue */
@@ -422,12 +430,12 @@ eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t 
nb_bufs)
struct vhost_queue *r = q;
uint16_t i, nb_tx = 0;
 
-   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+   if (queuing_stopped(r))
return 0;
 
rte_atomic32_set(&r->while_queuing, 1);
 
-   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+   if (queuing_stopped(r))
goto out;
 
/* Enqueue packets to guest RX queue */
@@ -546,13 +554,13 @@ new_device(int vid)
vq = eth_dev->data->rx_queues[i];
if (vq == NULL)
continue;
-   rte_atomic32_set(&vq->allow_queuing, 1);
+   rte_atomic32_set(&vq->connected, 1);
}
for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
vq = eth_dev->data->tx_queues[i];
if (vq == NULL)
continue;
-   rte_atomic32_set(&vq->allow_queuing, 1);
+   rte_atomic32_set(&vq->connected, 1);
}
 
RTE_LOG(INFO, PMD, "New connection established\n");
@@ -585,7 +593,7 @@ destroy_device(int vid)
vq = eth_dev->data->rx_queues[i];
if (vq == NULL)
continue;
-   rte_atomic32_set(&vq->allow_queuing, 0);
+   rte_atomic32_set(&vq->connected, 0);
while (rte_atomic32_read(&vq->while_queuing))
rte_pause();
}
@@ -593,7 +601,7 @@ destroy_device(int vid)
vq = eth_dev->data->tx_queues[i];
if (vq == NULL)
continue;
-   rte_atomic32_set(&vq->allow_queuing, 0);
+   rte_atomic32_set(&vq->connected, 0);
while (rte_atomic32_read(&vq->while_queuing))
rte_pause();
}
@@ -770,14 +778,49 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev __rte_unused)
+eth_dev_start(struct rte_eth_dev *dev)
 {
+   struct vhost_queue *vq;
+   unsigned i;
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   vq = dev->data->rx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->ready, 1);
+   }
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   vq = dev->data->tx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->ready, 1);
+   }
+
return 0;
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev __rte_unused)
+eth_dev_stop(struct rte_eth_dev *dev)
 {
+   struct vhost_queue *vq;
+   unsigned i;
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   vq = dev->data->rx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->ready, 0);
+   while (rte_atomic32_read(&vq->while_queuing))
+   rte_pause();
+   }
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   vq = dev->data->tx_queues[i];
+   if (vq == NULL)
+   

[dpdk-dev] [PATCH v3 1/2] net/vhost: create datagram sockets immediately

2017-01-01 Thread Charles (Chas) Williams
If you create a vhost server device, it doesn't create the actual datagram
socket until you call .dev_start().  If you call .dev_stop() is also
deletes those sockets.  For QEMU clients, this is a problem since QEMU
doesn't know how to re-attach to datagram sockets that have gone away.

To work around this, register and unregister the datagram sockets during
device creation and removal.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 43 ---
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 60b0f51..6b11e40 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -114,8 +114,6 @@ struct pmd_internal {
char *iface_name;
uint16_t max_queues;
uint64_t flags;
-
-   volatile uint16_t once;
 };
 
 struct internal_list {
@@ -772,35 +770,14 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev)
+eth_dev_start(struct rte_eth_dev *dev __rte_unused)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-   int ret = 0;
-
-   if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
-   ret = rte_vhost_driver_register(internal->iface_name,
-   internal->flags);
-   if (ret)
-   return ret;
-   }
-
-   /* We need only one message handling thread */
-   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
-   ret = vhost_driver_session_start();
-
-   return ret;
+   return 0;
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev)
+eth_dev_stop(struct rte_eth_dev *dev __rte_unused)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-
-   if (rte_atomic16_cmpset(&internal->once, 1, 0))
-   rte_vhost_driver_unregister(internal->iface_name);
-
-   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
-   vhost_driver_session_stop();
 }
 
 static int
@@ -1078,6 +1055,15 @@ eth_dev_vhost_create(const char *name, char *iface_name, 
int16_t queues,
eth_dev->rx_pkt_burst = eth_vhost_rx;
eth_dev->tx_pkt_burst = eth_vhost_tx;
 
+   if (rte_vhost_driver_register(iface_name, flags))
+   goto error;
+
+   /* We need only one message handling thread */
+   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1) {
+   if (vhost_driver_session_start())
+   goto error;
+   }
+
return data->port_id;
 
 error:
@@ -1215,6 +1201,11 @@ rte_pmd_vhost_remove(const char *name)
 
eth_dev_stop(eth_dev);
 
+   rte_vhost_driver_unregister(internal->iface_name);
+
+   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+   vhost_driver_session_stop();
+
rte_free(vring_states[eth_dev->data->port_id]);
vring_states[eth_dev->data->port_id] = NULL;
 
-- 
2.1.4



[dpdk-dev] [PATCH v3 2/2] net/vhost: emulate device start/stop behavior

2017-01-01 Thread Charles (Chas) Williams
.dev_start()/.dev_stop() roughly corresponds to the local device's
port being up or down.  This is different from the remote client being
connected which is roughtly link up or down.  Emulate the behavior by
separately tracking the local start/stop state to determine if we should
allow packets to be queued to the remote client.

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 65 ---
 1 file changed, 54 insertions(+), 11 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 6b11e40..d5a4540 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -100,7 +100,8 @@ struct vhost_stats {
 
 struct vhost_queue {
int vid;
-   rte_atomic32_t allow_queuing;
+   rte_atomic32_t connected;
+   rte_atomic32_t ready;
rte_atomic32_t while_queuing;
struct pmd_internal *internal;
struct rte_mempool *mb_pool;
@@ -383,18 +384,25 @@ vhost_update_packet_xstats(struct vhost_queue *vq,
}
 }
 
+static inline bool
+queuing_stopped(struct vhost_queue *r)
+{
+   return unlikely(rte_atomic32_read(&r->connected) == 0 ||
+   rte_atomic32_read(&r->ready) == 0);
+}
+
 static uint16_t
 eth_vhost_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
 {
struct vhost_queue *r = q;
uint16_t i, nb_rx = 0;
 
-   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+   if (queuing_stopped(r))
return 0;
 
rte_atomic32_set(&r->while_queuing, 1);
 
-   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+   if (queuing_stopped(r))
goto out;
 
/* Dequeue packets from guest TX queue */
@@ -422,12 +430,12 @@ eth_vhost_tx(void *q, struct rte_mbuf **bufs, uint16_t 
nb_bufs)
struct vhost_queue *r = q;
uint16_t i, nb_tx = 0;
 
-   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+   if (queuing_stopped(r))
return 0;
 
rte_atomic32_set(&r->while_queuing, 1);
 
-   if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0))
+   if (queuing_stopped(r))
goto out;
 
/* Enqueue packets to guest RX queue */
@@ -546,13 +554,13 @@ new_device(int vid)
vq = eth_dev->data->rx_queues[i];
if (vq == NULL)
continue;
-   rte_atomic32_set(&vq->allow_queuing, 1);
+   rte_atomic32_set(&vq->connected, 1);
}
for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
vq = eth_dev->data->tx_queues[i];
if (vq == NULL)
continue;
-   rte_atomic32_set(&vq->allow_queuing, 1);
+   rte_atomic32_set(&vq->connected, 1);
}
 
RTE_LOG(INFO, PMD, "New connection established\n");
@@ -585,7 +593,7 @@ destroy_device(int vid)
vq = eth_dev->data->rx_queues[i];
if (vq == NULL)
continue;
-   rte_atomic32_set(&vq->allow_queuing, 0);
+   rte_atomic32_set(&vq->connected, 0);
while (rte_atomic32_read(&vq->while_queuing))
rte_pause();
}
@@ -593,7 +601,7 @@ destroy_device(int vid)
vq = eth_dev->data->tx_queues[i];
if (vq == NULL)
continue;
-   rte_atomic32_set(&vq->allow_queuing, 0);
+   rte_atomic32_set(&vq->connected, 0);
while (rte_atomic32_read(&vq->while_queuing))
rte_pause();
}
@@ -770,14 +778,49 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev __rte_unused)
+eth_dev_start(struct rte_eth_dev *dev)
 {
+   struct vhost_queue *vq;
+   unsigned int i;
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   vq = dev->data->rx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->ready, 1);
+   }
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   vq = dev->data->tx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->ready, 1);
+   }
+
return 0;
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev __rte_unused)
+eth_dev_stop(struct rte_eth_dev *dev)
 {
+   struct vhost_queue *vq;
+   unsigned int i;
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   vq = dev->data->rx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->ready, 0);
+   while (rte_atomic32_read(&vq->while_queuing))
+   rte_pause();
+   }
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   vq = dev->data->tx_queues[i];
+   if (vq == NULL)

Re: [dpdk-dev] [PATCH v3 1/2] net/vhost: create datagram sockets immediately

2017-01-03 Thread Charles (Chas) Williams



On 01/03/2017 03:22 AM, Yuanhan Liu wrote:

On Sun, Jan 01, 2017 at 02:01:56PM -0500, Charles (Chas) Williams wrote:

If you create a vhost server device, it doesn't create the actual datagram
socket until you call .dev_start().  If you call .dev_stop() is also
deletes those sockets.  For QEMU clients, this is a problem since QEMU
doesn't know how to re-attach to datagram sockets that have gone away.

To work around this, register and unregister the datagram sockets during


I will not call it's a "workaround", instead, it's a "fix" to me.


OK.


device creation and removal.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 43 ---
 1 file changed, 17 insertions(+), 26 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 60b0f51..6b11e40 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -114,8 +114,6 @@ struct pmd_internal {
char *iface_name;
uint16_t max_queues;
uint64_t flags;


I think the "flags" could also be dropped in this patch: no user any
more.


Sorry, I hadn't noticed that -- Yes, it can go away.


[dpdk-dev] [PATCH v4 2/2] net/vhost: emulate device start/stop behavior

2017-01-03 Thread Charles (Chas) Williams
.dev_start()/.dev_stop() roughly corresponds to the local device's port
being ready.  This is different from the remote client being connected
which is roughly link up or down.  Emulate the device start/stop behavior
by separately tracking the start/stop state to determine if we should
allow packets to be queued to/from the remote client.

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 82 ---
 1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index c669e79..0b5b80a 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -110,9 +110,11 @@ struct vhost_queue {
 };
 
 struct pmd_internal {
+   rte_atomic32_t dev_attached;
char *dev_name;
char *iface_name;
uint16_t max_queues;
+   rte_atomic32_t started;
 };
 
 struct internal_list {
@@ -490,6 +492,38 @@ find_internal_resource(char *ifname)
return list;
 }
 
+static void
+update_queuing_status(struct rte_eth_dev *dev)
+{
+   struct pmd_internal *internal = dev->data->dev_private;
+   struct vhost_queue *vq;
+   unsigned int i;
+   int allow_queuing = 1;
+
+   if (rte_atomic32_read(&internal->started) == 0 ||
+   rte_atomic32_read(&internal->dev_attached) == 0)
+   allow_queuing = 0;
+
+   /* Wait until rx/tx_pkt_burst stops accessing vhost device */
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   vq = dev->data->rx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->allow_queuing, allow_queuing);
+   while (rte_atomic32_read(&vq->while_queuing))
+   rte_pause();
+   }
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   vq = dev->data->tx_queues[i];
+   if (vq == NULL)
+   continue;
+   rte_atomic32_set(&vq->allow_queuing, allow_queuing);
+   while (rte_atomic32_read(&vq->while_queuing))
+   rte_pause();
+   }
+}
+
 static int
 new_device(int vid)
 {
@@ -541,18 +575,8 @@ new_device(int vid)
 
eth_dev->data->dev_link.link_status = ETH_LINK_UP;
 
-   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-   vq = eth_dev->data->rx_queues[i];
-   if (vq == NULL)
-   continue;
-   rte_atomic32_set(&vq->allow_queuing, 1);
-   }
-   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-   vq = eth_dev->data->tx_queues[i];
-   if (vq == NULL)
-   continue;
-   rte_atomic32_set(&vq->allow_queuing, 1);
-   }
+   rte_atomic32_set(&internal->dev_attached, 1);
+   update_queuing_status(eth_dev);
 
RTE_LOG(INFO, PMD, "New connection established\n");
 
@@ -565,6 +589,7 @@ static void
 destroy_device(int vid)
 {
struct rte_eth_dev *eth_dev;
+   struct pmd_internal *internal;
struct vhost_queue *vq;
struct internal_list *list;
char ifname[PATH_MAX];
@@ -578,24 +603,10 @@ destroy_device(int vid)
return;
}
eth_dev = list->eth_dev;
+   internal = eth_dev->data->dev_private;
 
-   /* Wait until rx/tx_pkt_burst stops accessing vhost device */
-   for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-   vq = eth_dev->data->rx_queues[i];
-   if (vq == NULL)
-   continue;
-   rte_atomic32_set(&vq->allow_queuing, 0);
-   while (rte_atomic32_read(&vq->while_queuing))
-   rte_pause();
-   }
-   for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-   vq = eth_dev->data->tx_queues[i];
-   if (vq == NULL)
-   continue;
-   rte_atomic32_set(&vq->allow_queuing, 0);
-   while (rte_atomic32_read(&vq->while_queuing))
-   rte_pause();
-   }
+   rte_atomic32_set(&internal->dev_attached, 0);
+   update_queuing_status(eth_dev);
 
eth_dev->data->dev_link.link_status = ETH_LINK_DOWN;
 
@@ -769,14 +780,23 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev __rte_unused)
+eth_dev_start(struct rte_eth_dev *dev)
 {
+   struct pmd_internal *internal = dev->data->dev_private;
+
+   rte_atomic32_set(&internal->started, 1);
+   update_queuing_status(dev);
+
return 0;
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev __rte_unused)
+eth_dev_stop(struct rte_eth_dev *dev)
 {
+   struct pmd_internal *internal = dev->data->dev_private;
+
+   rte_atomic32_set(&internal->started, 0);
+   update_queuing_status(dev);
 }
 
 static int
-- 
2.1.4



[dpdk-dev] [PATCH v4 1/2] net/vhost: create datagram sockets immediately

2017-01-03 Thread Charles (Chas) Williams
If you create a vhost server device, it doesn't create the actual datagram
socket until you call .dev_start().  If you call .dev_stop() is also
deletes those sockets.  For QEMU clients, this is a problem since QEMU
doesn't know how to re-attach to datagram sockets that have gone away.

To fix this, register and unregister the datagram sockets during device
creation and removal.

Fixes: ee584e9710b9 ("vhost: add driver on top of the library")

Signed-off-by: Chas Williams 
---
 drivers/net/vhost/rte_eth_vhost.c | 45 +++
 1 file changed, 17 insertions(+), 28 deletions(-)

diff --git a/drivers/net/vhost/rte_eth_vhost.c 
b/drivers/net/vhost/rte_eth_vhost.c
index 60b0f51..c669e79 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -113,9 +113,6 @@ struct pmd_internal {
char *dev_name;
char *iface_name;
uint16_t max_queues;
-   uint64_t flags;
-
-   volatile uint16_t once;
 };
 
 struct internal_list {
@@ -772,35 +769,14 @@ vhost_driver_session_stop(void)
 }
 
 static int
-eth_dev_start(struct rte_eth_dev *dev)
+eth_dev_start(struct rte_eth_dev *dev __rte_unused)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-   int ret = 0;
-
-   if (rte_atomic16_cmpset(&internal->once, 0, 1)) {
-   ret = rte_vhost_driver_register(internal->iface_name,
-   internal->flags);
-   if (ret)
-   return ret;
-   }
-
-   /* We need only one message handling thread */
-   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1)
-   ret = vhost_driver_session_start();
-
-   return ret;
+   return 0;
 }
 
 static void
-eth_dev_stop(struct rte_eth_dev *dev)
+eth_dev_stop(struct rte_eth_dev *dev __rte_unused)
 {
-   struct pmd_internal *internal = dev->data->dev_private;
-
-   if (rte_atomic16_cmpset(&internal->once, 1, 0))
-   rte_vhost_driver_unregister(internal->iface_name);
-
-   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
-   vhost_driver_session_stop();
 }
 
 static int
@@ -1043,7 +1019,6 @@ eth_dev_vhost_create(const char *name, char *iface_name, 
int16_t queues,
internal->iface_name = strdup(iface_name);
if (internal->iface_name == NULL)
goto error;
-   internal->flags = flags;
 
list->eth_dev = eth_dev;
pthread_mutex_lock(&internal_list_lock);
@@ -1078,6 +1053,15 @@ eth_dev_vhost_create(const char *name, char *iface_name, 
int16_t queues,
eth_dev->rx_pkt_burst = eth_vhost_rx;
eth_dev->tx_pkt_burst = eth_vhost_tx;
 
+   if (rte_vhost_driver_register(iface_name, flags))
+   goto error;
+
+   /* We need only one message handling thread */
+   if (rte_atomic16_add_return(&nb_started_ports, 1) == 1) {
+   if (vhost_driver_session_start())
+   goto error;
+   }
+
return data->port_id;
 
 error:
@@ -1215,6 +1199,11 @@ rte_pmd_vhost_remove(const char *name)
 
eth_dev_stop(eth_dev);
 
+   rte_vhost_driver_unregister(internal->iface_name);
+
+   if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
+   vhost_driver_session_stop();
+
rte_free(vring_states[eth_dev->data->port_id]);
vring_states[eth_dev->data->port_id] = NULL;
 
-- 
2.1.4



[dpdk-dev] [PATCH 1/4] net/af_packet: add iface name to internals

2017-01-03 Thread Charles (Chas) Williams
This will be used by later changes to determine the underlying linux
interface.

Signed-off-by: Chas Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index a1e13ff..5541fd7 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -99,6 +99,7 @@ struct pmd_internals {
unsigned nb_queues;
 
int if_index;
+   char *if_name;
struct ether_addr eth_addr;
 
struct tpacket_req req;
@@ -533,6 +534,7 @@ rte_pmd_init_internals(const char *name,
name);
goto error_early;
}
+   (*internals)->if_name = strdup(pair->value);
(*internals)->if_index = ifr.ifr_ifindex;
 
if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) {
@@ -724,6 +726,7 @@ rte_pmd_init_internals(const char *name,
((*internals)->rx_queue[q].sockfd != qsockfd))
close((*internals)->rx_queue[q].sockfd);
}
+   rte_free((*internals)->if_name);
rte_free(*internals);
 error_early:
rte_free(data);
@@ -892,6 +895,7 @@ rte_pmd_af_packet_remove(const char *name)
rte_free(internals->rx_queue[q].rd);
rte_free(internals->tx_queue[q].rd);
}
+   rte_free(internals->if_name);
 
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data);
-- 
2.1.4



[dpdk-dev] [PATCH 3/4] net/af_packet: promisicuous support

2017-01-03 Thread Charles (Chas) Williams
Add promiscuous support to the AF_PACKET PMD.  The underlying linux
device's IF_PROMISC flag is toggled to enable or disable.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 40 +++
 1 file changed, 40 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index d8ac5c6..b01a8d1 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -441,6 +441,44 @@ eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
return 0;
 }
 
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+   struct ifreq ifr;
+   int s;
+
+   s = socket(PF_INET, SOCK_DGRAM, 0);
+   if (s < 0)
+   return;
+
+   strncpy(ifr.ifr_name, if_name, IFNAMSIZ);
+   if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+   goto out;
+   ifr.ifr_flags &= mask;
+   ifr.ifr_flags |= flags;
+   if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+   goto out;
+out:
+   close(s);
+   return;
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+
+   eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+
+   eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
@@ -448,6 +486,8 @@ static const struct eth_dev_ops ops = {
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
.mtu_set = eth_dev_mtu_set,
+   .promiscuous_disable = eth_dev_promiscuous_disable,
+   .promiscuous_enable = eth_dev_promiscuous_enable,
.rx_queue_setup = eth_rx_queue_setup,
.tx_queue_setup = eth_tx_queue_setup,
.rx_queue_release = eth_queue_release,
-- 
2.1.4



[dpdk-dev] [PATCH 4/4] net/af_packet: add 802.1Q (VLAN) support

2017-01-03 Thread Charles (Chas) Williams
AF_PACKET has some flags to check on the receive side for 802.1Q
information.  If present, we copy into the mbuf.  For transmit, we
insert any 802.1Q information into the packet before copying to the ring.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index b01a8d1..7f1df92 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -161,6 +161,12 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
pbuf = (uint8_t *) ppd + ppd->tp_mac;
memcpy(rte_pktmbuf_mtod(mbuf, void *), pbuf, 
rte_pktmbuf_data_len(mbuf));
 
+   /* check for vlan info */
+   if (ppd->tp_status & TP_STATUS_VLAN_VALID) {
+   mbuf->vlan_tci = ppd->tp_vlan_tci;
+   mbuf->ol_flags |= (PKT_RX_VLAN_PKT | 
PKT_RX_VLAN_STRIPPED);
+   }
+
/* release incoming frame and advance ring buffer */
ppd->tp_status = TP_STATUS_KERNEL;
if (++framenum >= framecount)
@@ -214,6 +220,14 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
continue;
}
 
+   /* insert vlan info if necessary */
+   if (mbuf->ol_flags & PKT_TX_VLAN_PKT) {
+   if (rte_vlan_insert(&mbuf)) {
+   rte_pktmbuf_free(mbuf);
+   continue;
+   }
+   }
+
/* point at the next incoming frame */
if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
(poll(&pfd, 1, -1) < 0))
-- 
2.1.4



[dpdk-dev] [PATCH 2/4] net/af_packet: add support to change mtu

2017-01-03 Thread Charles (Chas) Williams
The underlying linux device's MTU is changed subject to the frame size
limitations during device creation.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index 5541fd7..d8ac5c6 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -413,12 +413,41 @@ eth_tx_queue_setup(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_mtu = mtu };
+   int ret;
+   int s;
+   unsigned int data_size = internals->req.tp_frame_size -
+TPACKET2_HDRLEN -
+sizeof(struct sockaddr_ll);
+
+   if (mtu > data_size)
+   return -EINVAL;
+
+   s = socket(PF_INET, SOCK_DGRAM, 0);
+   if (s < 0)
+   return -EINVAL;
+
+   strncpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+   ret = ioctl(s, SIOCSIFMTU, &ifr);
+   close(s);
+
+   if (ret < 0)
+   return -EINVAL;
+
+   return 0;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
.dev_close = eth_dev_close,
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
+   .mtu_set = eth_dev_mtu_set,
.rx_queue_setup = eth_rx_queue_setup,
.tx_queue_setup = eth_tx_queue_setup,
.rx_queue_release = eth_queue_release,
-- 
2.1.4



[dpdk-dev] [PATCH v2 1/4] net/af_packet: add iface name to internals

2017-01-05 Thread Charles (Chas) Williams
This will be used by later changes to determine the underlying linux
interface.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index 17b4ffe..4ef61a2 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -99,6 +99,7 @@ struct pmd_internals {
unsigned nb_queues;
 
int if_index;
+   char *if_name;
struct ether_addr eth_addr;
 
struct tpacket_req req;
@@ -532,6 +533,7 @@ rte_pmd_init_internals(const char *name,
name);
goto error_early;
}
+   (*internals)->if_name = strdup(pair->value);
(*internals)->if_index = ifr.ifr_ifindex;
 
if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) {
@@ -723,6 +725,7 @@ rte_pmd_init_internals(const char *name,
((*internals)->rx_queue[q].sockfd != qsockfd))
close((*internals)->rx_queue[q].sockfd);
}
+   free((*internals)->if_name);
rte_free(*internals);
 error_early:
rte_free(data);
@@ -891,6 +894,7 @@ rte_pmd_af_packet_remove(const char *name)
rte_free(internals->rx_queue[q].rd);
rte_free(internals->tx_queue[q].rd);
}
+   free(internals->if_name);
 
rte_free(eth_dev->data->dev_private);
rte_free(eth_dev->data);
-- 
2.1.4



[dpdk-dev] [PATCH v2 2/4] net/af_packet: add support to change mtu

2017-01-05 Thread Charles (Chas) Williams
The underlying linux device's MTU is changed subject to the frame size
limitations during device creation.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index 4ef61a2..a700b96 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -412,12 +412,41 @@ eth_tx_queue_setup(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+   struct ifreq ifr = { .ifr_mtu = mtu };
+   int ret;
+   int s;
+   unsigned int data_size = internals->req.tp_frame_size -
+TPACKET2_HDRLEN -
+sizeof(struct sockaddr_ll);
+
+   if (mtu > data_size)
+   return -EINVAL;
+
+   s = socket(PF_INET, SOCK_DGRAM, 0);
+   if (s < 0)
+   return -EINVAL;
+
+   strncpy(ifr.ifr_name, internals->if_name, IFNAMSIZ);
+   ret = ioctl(s, SIOCSIFMTU, &ifr);
+   close(s);
+
+   if (ret < 0)
+   return -EINVAL;
+
+   return 0;
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
.dev_close = eth_dev_close,
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
+   .mtu_set = eth_dev_mtu_set,
.rx_queue_setup = eth_rx_queue_setup,
.tx_queue_setup = eth_tx_queue_setup,
.rx_queue_release = eth_queue_release,
-- 
2.1.4



[dpdk-dev] [PATCH v2 4/4] net/af_packet: add 802.1Q (VLAN) support

2017-01-05 Thread Charles (Chas) Williams
AF_PACKET has some flags to check on the receive side for 802.1Q
information.  If present, we copy into the mbuf.  For transmit, we
insert any 802.1Q information into the packet before copying to the ring.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index 6aa8132..ee51c17 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -161,6 +161,12 @@ eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
pbuf = (uint8_t *) ppd + ppd->tp_mac;
memcpy(rte_pktmbuf_mtod(mbuf, void *), pbuf, 
rte_pktmbuf_data_len(mbuf));
 
+   /* check for vlan info */
+   if (ppd->tp_status & TP_STATUS_VLAN_VALID) {
+   mbuf->vlan_tci = ppd->tp_vlan_tci;
+   mbuf->ol_flags |= (PKT_RX_VLAN_PKT | 
PKT_RX_VLAN_STRIPPED);
+   }
+
/* release incoming frame and advance ring buffer */
ppd->tp_status = TP_STATUS_KERNEL;
if (++framenum >= framecount)
@@ -214,6 +220,14 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
continue;
}
 
+   /* insert vlan info if necessary */
+   if (mbuf->ol_flags & PKT_TX_VLAN_PKT) {
+   if (rte_vlan_insert(&mbuf)) {
+   rte_pktmbuf_free(mbuf);
+   continue;
+   }
+   }
+
/* point at the next incoming frame */
if ((ppd->tp_status != TP_STATUS_AVAILABLE) &&
(poll(&pfd, 1, -1) < 0))
-- 
2.1.4



[dpdk-dev] [PATCH v2 3/4] net/af_packet: promisicuous support

2017-01-05 Thread Charles (Chas) Williams
Add promiscuous support to the AF_PACKET PMD.  The underlying linux
device's IF_PROMISC flag is toggled to enable or disable.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index a700b96..6aa8132 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -440,6 +440,43 @@ eth_dev_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
return 0;
 }
 
+static void
+eth_dev_change_flags(char *if_name, uint32_t flags, uint32_t mask)
+{
+   struct ifreq ifr;
+   int s;
+
+   s = socket(PF_INET, SOCK_DGRAM, 0);
+   if (s < 0)
+   return;
+
+   strncpy(ifr.ifr_name, if_name, IFNAMSIZ);
+   if (ioctl(s, SIOCGIFFLAGS, &ifr) < 0)
+   goto out;
+   ifr.ifr_flags &= mask;
+   ifr.ifr_flags |= flags;
+   if (ioctl(s, SIOCSIFFLAGS, &ifr) < 0)
+   goto out;
+out:
+   close(s);
+}
+
+static void
+eth_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+
+   eth_dev_change_flags(internals->if_name, IFF_PROMISC, ~0);
+}
+
+static void
+eth_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+   struct pmd_internals *internals = dev->data->dev_private;
+
+   eth_dev_change_flags(internals->if_name, 0, ~IFF_PROMISC);
+}
+
 static const struct eth_dev_ops ops = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
@@ -447,6 +484,8 @@ static const struct eth_dev_ops ops = {
.dev_configure = eth_dev_configure,
.dev_infos_get = eth_dev_info,
.mtu_set = eth_dev_mtu_set,
+   .promiscuous_enable = eth_dev_promiscuous_enable,
+   .promiscuous_disable = eth_dev_promiscuous_disable,
.rx_queue_setup = eth_rx_queue_setup,
.tx_queue_setup = eth_tx_queue_setup,
.rx_queue_release = eth_queue_release,
-- 
2.1.4



[dpdk-dev] [PATCH] net/bnx2x: Fix transmit queue free threshold

2017-02-10 Thread Charles (Chas) Williams
The default tx_free_thresh is potentially larger than the allocated queue
which will result in TX queue cleanup never happening.  To fix this,
lower the default free threshold and ensure that the free threshold is
never greater than the maximum outstanding transmit buffers.

Fixes: 827ed2a118cc ("net/bnx2x: restructure Tx routine")

Signed-off-by: Chas Williams 
---
 drivers/net/bnx2x/bnx2x_rxtx.c | 2 ++
 drivers/net/bnx2x/bnx2x_rxtx.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bnx2x/bnx2x_rxtx.c b/drivers/net/bnx2x/bnx2x_rxtx.c
index 170e48f..adf0309 100644
--- a/drivers/net/bnx2x/bnx2x_rxtx.c
+++ b/drivers/net/bnx2x/bnx2x_rxtx.c
@@ -273,6 +273,8 @@ bnx2x_dev_tx_queue_setup(struct rte_eth_dev *dev,
 
txq->tx_free_thresh = tx_conf->tx_free_thresh ?
tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH;
+   txq->tx_free_thresh = min(txq->tx_free_thresh,
+ txq->nb_tx_desc - BDS_PER_TX_PKT);
 
PMD_INIT_LOG(DEBUG, "fp[%02d] req_bd=%u, thresh=%u, usable_bd=%lu, "
 "total_bd=%lu, tx_pages=%u",
diff --git a/drivers/net/bnx2x/bnx2x_rxtx.h b/drivers/net/bnx2x/bnx2x_rxtx.h
index dd251aa..2e38ec2 100644
--- a/drivers/net/bnx2x/bnx2x_rxtx.h
+++ b/drivers/net/bnx2x/bnx2x_rxtx.h
@@ -11,7 +11,7 @@
 #ifndef _BNX2X_RXTX_H_
 #define _BNX2X_RXTX_H_
 
-#define DEFAULT_TX_FREE_THRESH   512
+#define DEFAULT_TX_FREE_THRESH   64
 #define RTE_PMD_BNX2X_TX_MAX_BURST 1
 
 /**
-- 
2.1.4



[dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes

2017-03-13 Thread Charles (Chas) Williams
If the user reconfigures the queues size, then the previosly allocated
memzone may potentially be too small.  Instead, always free the old
memzone and allocate a new one.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver 
implementation")

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 6649c3f..104e040 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
 
 /*
  * Create memzone for device rings. malloc can't be used as the physical 
address is
- * needed. If the memzone is already created, then this function returns a ptr
- * to the old one.
+ * needed. If the memzone already exists, we free it since it may have been 
created
+ * with a different size.
  */
 static const struct rte_memzone *
 ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name,
@@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev, const char 
*ring_name,
 
mz = rte_memzone_lookup(z_name);
if (mz)
-   return mz;
+   rte_memzone_free(mz);
 
return rte_memzone_reserve_aligned(z_name, ring_size,
   socket_id, 0, VMXNET3_RING_BA_ALIGN);
-- 
2.1.4



Re: [dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes

2017-03-14 Thread Charles (Chas) Williams



On 03/14/2017 12:11 PM, Jan Blunck wrote:

On Mon, Mar 13, 2017 at 11:41 PM, Charles (Chas) Williams
 wrote:

If the user reconfigures the queues size, then the previosly allocated
memzone may potentially be too small.  Instead, always free the old
memzone and allocate a new one.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver 
implementation")

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 6649c3f..104e040 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)

 /*
  * Create memzone for device rings. malloc can't be used as the physical 
address is
- * needed. If the memzone is already created, then this function returns a ptr
- * to the old one.
+ * needed. If the memzone already exists, we free it since it may have been 
created
+ * with a different size.
  */
 static const struct rte_memzone *
 ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name,
@@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev, const char 
*ring_name,

mz = rte_memzone_lookup(z_name);
if (mz)
-   return mz;
+   rte_memzone_free(mz);

return rte_memzone_reserve_aligned(z_name, ring_size,
   socket_id, 0, VMXNET3_RING_BA_ALIGN);


Chas,

Thanks for hunting this one down. Wouldn't the rte_memzone_free()
better fit into vmxnet3_cmd_ring_release() ?


I don't care which way it goes.  I just did what is basically done in
gpa_zone_reserve() to match the "style".  Tracking the current ring size
and avoiding reallocating a potentially large chunk of memory seems like
a better idea.


Also the ring_dma_zone_reserve() could get replaced by
rte_eth_dma_zone_reserve() (see also


Yes, it probably should get changed to that along with tracking the size.


Re: [dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes

2017-03-15 Thread Charles (Chas) Williams



On 03/15/2017 04:18 AM, Jan Blunck wrote:

On Tue, Mar 14, 2017 at 5:38 PM, Charles (Chas) Williams
 wrote:



On 03/14/2017 12:11 PM, Jan Blunck wrote:


On Mon, Mar 13, 2017 at 11:41 PM, Charles (Chas) Williams
 wrote:


If the user reconfigures the queues size, then the previosly allocated
memzone may potentially be too small.  Instead, always free the old
memzone and allocate a new one.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver
implementation")

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 6649c3f..104e040 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)

 /*
  * Create memzone for device rings. malloc can't be used as the physical
address is
- * needed. If the memzone is already created, then this function returns
a ptr
- * to the old one.
+ * needed. If the memzone already exists, we free it since it may have
been created
+ * with a different size.
  */
 static const struct rte_memzone *
 ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name,
@@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev, const
char *ring_name,

mz = rte_memzone_lookup(z_name);
if (mz)
-   return mz;
+   rte_memzone_free(mz);

return rte_memzone_reserve_aligned(z_name, ring_size,
   socket_id, 0,
VMXNET3_RING_BA_ALIGN);



Chas,

Thanks for hunting this one down. Wouldn't the rte_memzone_free()
better fit into vmxnet3_cmd_ring_release() ?



I don't care which way it goes.  I just did what is basically done in
gpa_zone_reserve() to match the "style".  Tracking the current ring size
and avoiding reallocating a potentially large chunk of memory seems like
a better idea.


Also the ring_dma_zone_reserve() could get replaced by
rte_eth_dma_zone_reserve() (see also



Yes, it probably should get changed to that along with tracking the size.


Why don't we always allocate VMXNET3_RX_RING_MAX_SIZE entries? That
way we don't need to reallocate on a later queue setup change?


That's using more memory than it needs to use.  It might break someone's 
application
that already runs in some tightly constrained machine.  Failing to shrink the 
memzone
isn't likely to break anything since they have (apparently) already dealt with 
having
less memory available before switching to a smaller queue size.

Still, it can be a matter for another day.




Re: [dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes

2017-03-15 Thread Charles (Chas) Williams



On 03/15/2017 06:05 AM, Jan Blunck wrote:

On Wed, Mar 15, 2017 at 10:45 AM, Charles (Chas) Williams
 wrote:



On 03/15/2017 04:18 AM, Jan Blunck wrote:


On Tue, Mar 14, 2017 at 5:38 PM, Charles (Chas) Williams
 wrote:




On 03/14/2017 12:11 PM, Jan Blunck wrote:



On Mon, Mar 13, 2017 at 11:41 PM, Charles (Chas) Williams
 wrote:



If the user reconfigures the queues size, then the previosly allocated
memzone may potentially be too small.  Instead, always free the old
memzone and allocate a new one.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver
implementation")

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 6649c3f..104e040 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)

 /*
  * Create memzone for device rings. malloc can't be used as the
physical
address is
- * needed. If the memzone is already created, then this function
returns
a ptr
- * to the old one.
+ * needed. If the memzone already exists, we free it since it may have
been created
+ * with a different size.
  */
 static const struct rte_memzone *
 ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name,
@@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev,
const
char *ring_name,

mz = rte_memzone_lookup(z_name);
if (mz)
-   return mz;
+   rte_memzone_free(mz);

return rte_memzone_reserve_aligned(z_name, ring_size,
   socket_id, 0,
VMXNET3_RING_BA_ALIGN);




Chas,

Thanks for hunting this one down. Wouldn't the rte_memzone_free()
better fit into vmxnet3_cmd_ring_release() ?




I don't care which way it goes.  I just did what is basically done in
gpa_zone_reserve() to match the "style".  Tracking the current ring size
and avoiding reallocating a potentially large chunk of memory seems like
a better idea.


Also the ring_dma_zone_reserve() could get replaced by
rte_eth_dma_zone_reserve() (see also




Yes, it probably should get changed to that along with tracking the size.



Why don't we always allocate VMXNET3_RX_RING_MAX_SIZE entries? That
way we don't need to reallocate on a later queue setup change?



That's using more memory than it needs to use.  It might break someone's
application
that already runs in some tightly constrained machine.  Failing to shrink
the memzone
isn't likely to break anything since they have (apparently) already dealt
with having
less memory available before switching to a smaller queue size.

Still, it can be a matter for another day.



Other drivers (ixgbe, e1000, ...) always allocate based on the max
ring size too. Since the VMXNET3_RX_RING_MAX_SIZE is 4096 I don't
think it is a huge waste of memory.


OK.  BTW, the RX queues have the same issue.


Re: [dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes

2017-03-15 Thread Charles (Chas) Williams



On 03/15/2017 06:05 AM, Jan Blunck wrote:

On Wed, Mar 15, 2017 at 10:45 AM, Charles (Chas) Williams
 wrote:



On 03/15/2017 04:18 AM, Jan Blunck wrote:


On Tue, Mar 14, 2017 at 5:38 PM, Charles (Chas) Williams
 wrote:




On 03/14/2017 12:11 PM, Jan Blunck wrote:



On Mon, Mar 13, 2017 at 11:41 PM, Charles (Chas) Williams
 wrote:



If the user reconfigures the queues size, then the previosly allocated
memzone may potentially be too small.  Instead, always free the old
memzone and allocate a new one.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver
implementation")

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 6649c3f..104e040 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -893,8 +893,8 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)

 /*
  * Create memzone for device rings. malloc can't be used as the
physical
address is
- * needed. If the memzone is already created, then this function
returns
a ptr
- * to the old one.
+ * needed. If the memzone already exists, we free it since it may have
been created
+ * with a different size.
  */
 static const struct rte_memzone *
 ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name,
@@ -909,7 +909,7 @@ ring_dma_zone_reserve(struct rte_eth_dev *dev,
const
char *ring_name,

mz = rte_memzone_lookup(z_name);
if (mz)
-   return mz;
+   rte_memzone_free(mz);

return rte_memzone_reserve_aligned(z_name, ring_size,
   socket_id, 0,
VMXNET3_RING_BA_ALIGN);




Chas,

Thanks for hunting this one down. Wouldn't the rte_memzone_free()
better fit into vmxnet3_cmd_ring_release() ?




I don't care which way it goes.  I just did what is basically done in
gpa_zone_reserve() to match the "style".  Tracking the current ring size
and avoiding reallocating a potentially large chunk of memory seems like
a better idea.


Also the ring_dma_zone_reserve() could get replaced by
rte_eth_dma_zone_reserve() (see also




Yes, it probably should get changed to that along with tracking the size.



Why don't we always allocate VMXNET3_RX_RING_MAX_SIZE entries? That
way we don't need to reallocate on a later queue setup change?



That's using more memory than it needs to use.  It might break someone's
application
that already runs in some tightly constrained machine.  Failing to shrink
the memzone
isn't likely to break anything since they have (apparently) already dealt
with having
less memory available before switching to a smaller queue size.

Still, it can be a matter for another day.



Other drivers (ixgbe, e1000, ...) always allocate based on the max
ring size too. Since the VMXNET3_RX_RING_MAX_SIZE is 4096 I don't
think it is a huge waste of memory.


I changed my mind.  Our typical application only uses a queue length of
512 for the TX side which would mean we don't use more than half of the
allocated elements.  It's easy enough to just allocate what we need and
release in the queue releases as you suggested earlier.


[dpdk-dev] [PATCH] net/vmxnet3: fix queue size changes

2017-03-15 Thread Charles (Chas) Williams
If the user reconfigures the queue size, then the previously allocated
memzone may potentially be too small.  Release the memzone when a queue
is released and allocate a new one each time a queue is setup.

While here convert to rte_eth_dma_zone_reserve() which does basically
the same things as the private function.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver 
implementation")

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_ring.h |  2 ++
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 37 +++--
 2 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ring.h 
b/drivers/net/vmxnet3/vmxnet3_ring.h
index 0ce46c5..d2e8323 100644
--- a/drivers/net/vmxnet3/vmxnet3_ring.h
+++ b/drivers/net/vmxnet3/vmxnet3_ring.h
@@ -141,6 +141,7 @@ typedef struct vmxnet3_tx_queue {
uint32_t qid;
struct Vmxnet3_TxQueueDesc   *shared;
struct vmxnet3_txq_stats stats;
+   const struct rte_memzone *mz;
bool stopped;
uint16_t queue_id;  /**< Device TX queue index. 
*/
uint8_t  port_id;   /**< Device port 
identifier. */
@@ -175,6 +176,7 @@ typedef struct vmxnet3_rx_queue {
struct rte_mbuf *start_seg;
struct rte_mbuf *last_seg;
struct vmxnet3_rxq_statsstats;
+   const struct rte_memzone*mz;
boolstopped;
uint16_tqueue_id;  /**< Device RX queue index. 
*/
uint8_t port_id;   /**< Device port identifier. 
*/
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 6649c3f..e865c67 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -201,6 +201,8 @@ vmxnet3_dev_tx_queue_release(void *txq)
vmxnet3_tx_cmd_ring_release_mbufs(&tq->cmd_ring);
/* Release the cmd_ring */
vmxnet3_cmd_ring_release(&tq->cmd_ring);
+   /* Release the memzone */
+   rte_memzone_free(tq->mz);
}
 }
 
@@ -218,6 +220,9 @@ vmxnet3_dev_rx_queue_release(void *rxq)
/* Release both the cmd_rings */
for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++)
vmxnet3_cmd_ring_release(&rq->cmd_ring[i]);
+
+   /* Release the memzone */
+   rte_memzone_free(rq->mz);
}
 }
 
@@ -891,30 +896,6 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
return nb_rx;
 }
 
-/*
- * Create memzone for device rings. malloc can't be used as the physical 
address is
- * needed. If the memzone is already created, then this function returns a ptr
- * to the old one.
- */
-static const struct rte_memzone *
-ring_dma_zone_reserve(struct rte_eth_dev *dev, const char *ring_name,
- uint16_t queue_id, uint32_t ring_size, int socket_id)
-{
-   char z_name[RTE_MEMZONE_NAMESIZE];
-   const struct rte_memzone *mz;
-
-   snprintf(z_name, sizeof(z_name), "%s_%s_%d_%d",
-dev->driver->pci_drv.driver.name, ring_name,
-dev->data->port_id, queue_id);
-
-   mz = rte_memzone_lookup(z_name);
-   if (mz)
-   return mz;
-
-   return rte_memzone_reserve_aligned(z_name, ring_size,
-  socket_id, 0, VMXNET3_RING_BA_ALIGN);
-}
-
 int
 vmxnet3_dev_tx_queue_setup(struct rte_eth_dev *dev,
   uint16_t queue_idx,
@@ -983,11 +964,13 @@ vmxnet3_dev_tx_queue_setup(struct rte_eth_dev *dev,
size += sizeof(struct Vmxnet3_TxCompDesc) * comp_ring->size;
size += txq->txdata_desc_size * data_ring->size;
 
-   mz = ring_dma_zone_reserve(dev, "txdesc", queue_idx, size, socket_id);
+   mz = rte_eth_dma_zone_reserve(dev, "txdesc", queue_idx, size,
+ VMXNET3_RING_BA_ALIGN, socket_id);
if (mz == NULL) {
PMD_INIT_LOG(ERR, "ERROR: Creating queue descriptors zone");
return -ENOMEM;
}
+   txq->mz = mz;
memset(mz->addr, 0, mz->len);
 
/* cmd_ring initialization */
@@ -1092,11 +1075,13 @@ vmxnet3_dev_rx_queue_setup(struct rte_eth_dev *dev,
if (VMXNET3_VERSION_GE_3(hw) && rxq->data_desc_size)
size += rxq->data_desc_size * data_ring->size;
 
-   mz = ring_dma_zone_reserve(dev, "rxdesc", queue_idx, size, socket_id);
+   mz = rte_eth_dma_zone_reserve(dev, "rxdesc", queue_idx, size,
+ VMXNET3_RING_BA_ALIGN, socket_id);
if (mz == NULL) {
PMD_INIT_LOG(ERR, "ERROR: Creating queue descriptors zone");
return -ENOMEM;
}
+   rxq->mz = mz;
memset(mz->addr, 0, mz->len);
 
/* cmd_ring0 initialization *

[dpdk-dev] [PATCH 1/3] net/af_packet: handle strdup() failures

2017-05-17 Thread Charles (Chas) Williams
Fixes: 1b93c2aa81b4 ("net/af_packet: add interface name to internals")

Signed-off-by: Chas Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index a03966a..ce4dc07 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -630,6 +630,8 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
goto error_early;
}
(*internals)->if_name = strdup(pair->value);
+   if ((*internals)->if_name == NULL)
+   goto error_early;
(*internals)->if_index = ifr.ifr_ifindex;
 
if (ioctl(sockfd, SIOCGIFHWADDR, &ifr) == -1) {
-- 
2.1.4



[dpdk-dev] [PATCH 2/3] net/af_packet: add accessor for iface

2017-05-17 Thread Charles (Chas) Williams
Provide an accessor for the name of the underlying linux interface
used by the AF_PACKET-based interface.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/Makefile |  2 +
 drivers/net/af_packet/rte_eth_af_packet.c  | 17 +++
 drivers/net/af_packet/rte_eth_af_packet.h  | 55 ++
 .../net/af_packet/rte_pmd_af_packet_version.map|  6 +++
 4 files changed, 80 insertions(+)
 create mode 100644 drivers/net/af_packet/rte_eth_af_packet.h

diff --git a/drivers/net/af_packet/Makefile b/drivers/net/af_packet/Makefile
index 70d517c..5ea058c 100644
--- a/drivers/net/af_packet/Makefile
+++ b/drivers/net/af_packet/Makefile
@@ -50,4 +50,6 @@ CFLAGS += $(WERROR_FLAGS)
 #
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += rte_eth_af_packet.c
 
+SYMLINK-y-include += rte_eth_af_packet.h
+
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index ce4dc07..6927f70 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -43,6 +43,8 @@
 #include 
 #include 
 
+#include "rte_eth_af_packet.h"
+
 #include 
 #include 
 #include 
@@ -125,6 +127,21 @@ static struct rte_eth_link pmd_link = {
.link_autoneg = ETH_LINK_SPEED_AUTONEG
 };
 
+int
+rte_af_packet_get_ifname(uint8_t port, char *buf, size_t len)
+{
+   struct rte_eth_dev *dev;
+   struct pmd_internals *internals;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+   dev = &rte_eth_devices[port];
+   internals = dev->data->dev_private;
+   snprintf(buf, len, "%s", internals->if_name);
+
+   return 0;
+}
+
 static uint16_t
 eth_af_packet_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
diff --git a/drivers/net/af_packet/rte_eth_af_packet.h 
b/drivers/net/af_packet/rte_eth_af_packet.h
new file mode 100644
index 000..c5276f5
--- /dev/null
+++ b/drivers/net/af_packet/rte_eth_af_packet.h
@@ -0,0 +1,55 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2017 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *   * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_AF_PACKET_H_
+#define _RTE_ETH_AF_PACKET_H_
+
+#include 
+
+/**
+ * Get the name of the underlying kernel interface assocated
+ * with this AF_PACKET device.
+ *
+ * @param port
+ *  The port identifier of the AF_PACKET device.
+ * @param buf
+ *  The buffer to stored the queried ifname
+ * @param len
+ *  The length of buf
+ *
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_af_packet_get_ifname(uint8_t port, char *buf, size_t len);
+
+#endif /* _RTE_ETH_AF_PACKET_H_ */
diff --git a/drivers/net/af_packet/rte_pmd_af_packet_version.map 
b/drivers/net/af_packet/rte_pmd_af_packet_version.map
index ef35398..2231699 100644
--- a/drivers/net/af_packet/rte_pmd_af_packet_version.map
+++ b/drivers/net/af_packet/rte_pmd_af_packet_version.map
@@ -2,3 +2,9 @@ DPDK_2.0 {
 
local: *;
 };
+
+DPDK_17.08 {
+   global:
+
+   rte_af_packet_get_ifname;
+};
-- 
2.1.4



[dpdk-dev] [PATCH 3/3] net/af_packet: fix packet bytes counting

2017-05-17 Thread Charles (Chas) Williams
On error, we also need to zero the bytes transmitted.

Signed-off-by: Charles (Chas) Williams 
---
 drivers/net/af_packet/rte_eth_af_packet.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/af_packet/rte_eth_af_packet.c 
b/drivers/net/af_packet/rte_eth_af_packet.c
index 6927f70..e456836 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -269,8 +269,11 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, 
uint16_t nb_pkts)
}
 
/* kick-off transmits */
-   if (sendto(pkt_q->sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0) == -1)
-   num_tx = 0; /* error sending -- no packets transmitted */
+   if (sendto(pkt_q->sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0) == -1) {
+   /* error sending -- no packets transmitted */
+   num_tx = 0;
+   num_tx_bytes = 0;
+   }
 
pkt_q->framenum = framenum;
pkt_q->tx_pkts += num_tx;
-- 
2.1.4



[dpdk-dev] [PATCH 1/6] net/vmxnet3: retain counters on restart

2017-05-19 Thread Charles (Chas) Williams
From: Nachiketa Prachanda 

Most nics like virtio, igb/ixgbe etc. don't reset counters on
dev_start and arguably this helps in monitoring the counters
across a longer time span with multiple device start/stops.
vmxnet3 behavior is opposite to that and counters are reset by
the host side implementation each time the device is restarted.

Change the driver to save the counters in its private context
before it is reset by writing CMD_ACTIVATE to REG_CMD.

Signed-off-by: Nachiketa Prachanda 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 102 ---
 drivers/net/vmxnet3/vmxnet3_ethdev.h |   2 +
 2 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 17b471f..4d2c024 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -85,6 +85,7 @@ static void vmxnet3_dev_allmulticast_enable(struct 
rte_eth_dev *dev);
 static void vmxnet3_dev_allmulticast_disable(struct rte_eth_dev *dev);
 static int vmxnet3_dev_link_update(struct rte_eth_dev *dev,
   int wait_to_complete);
+static void vmxnet3_hw_stats_save(struct vmxnet3_hw *hw);
 static void vmxnet3_dev_stats_get(struct rte_eth_dev *dev,
  struct rte_eth_stats *stats);
 static void vmxnet3_dev_info_get(struct rte_eth_dev *dev,
@@ -351,6 +352,10 @@ eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev)
RTE_ASSERT((hw->rxdata_desc_size & ~VMXNET3_RXDATA_DESC_SIZE_MASK) ==
   hw->rxdata_desc_size);
 
+   /* clear shadow stats */
+   memset(hw->saved_tx_stats, 0, sizeof(hw->saved_tx_stats));
+   memset(hw->saved_rx_stats, 0, sizeof(hw->saved_rx_stats));
+
return 0;
 }
 
@@ -707,6 +712,9 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
 
PMD_INIT_FUNC_TRACE();
 
+   /* Save stats before it is reset by CMD_ACTIVATE */
+   vmxnet3_hw_stats_save(hw);
+
ret = vmxnet3_setup_driver_shared(dev);
if (ret != VMXNET3_SUCCESS)
return ret;
@@ -820,47 +828,105 @@ vmxnet3_dev_close(struct rte_eth_dev *dev)
 }
 
 static void
+vmxnet3_hw_tx_stats_get(struct vmxnet3_hw *hw, unsigned int q,
+   struct UPT1_TxStats *res)
+{
+#define VMXNET3_UPDATE_TX_STAT(h, i, f, r) \
+   ((r)->f = (h)->tqd_start[(i)].stats.f + \
+   (h)->saved_tx_stats[(i)].f)
+
+   VMXNET3_UPDATE_TX_STAT(hw, q, ucastPktsTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, mcastPktsTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, bcastPktsTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, ucastBytesTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, mcastBytesTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, bcastBytesTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, pktsTxError, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, pktsTxDiscard, res);
+
+#undef VMXNET3_UPDATE_TX_STAT
+}
+
+static void
+vmxnet3_hw_rx_stats_get(struct vmxnet3_hw *hw, unsigned int q,
+   struct UPT1_RxStats *res)
+{
+#define VMXNET3_UPDATE_RX_STAT(h, i, f, r) \
+   ((r)->f = (h)->rqd_start[(i)].stats.f + \
+   (h)->saved_rx_stats[(i)].f)
+
+   VMXNET3_UPDATE_RX_STAT(hw, q, ucastPktsRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, mcastPktsRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, bcastPktsRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, ucastBytesRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, mcastBytesRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, bcastBytesRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, pktsRxError, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, pktsRxOutOfBuf, res);
+
+#undef VMXNET3_UPDATE_RX_STATS
+}
+
+static void
+vmxnet3_hw_stats_save(struct vmxnet3_hw *hw)
+{
+   unsigned int i;
+
+   VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS);
+
+   RTE_BUILD_BUG_ON(RTE_ETHDEV_QUEUE_STAT_CNTRS < VMXNET3_MAX_TX_QUEUES);
+
+   for (i = 0; i < hw->num_tx_queues; i++)
+   vmxnet3_hw_tx_stats_get(hw, i, &hw->saved_tx_stats[i]);
+   for (i = 0; i < hw->num_rx_queues; i++)
+   vmxnet3_hw_rx_stats_get(hw, i, &hw->saved_rx_stats[i]);
+}
+
+static void
 vmxnet3_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 {
unsigned int i;
struct vmxnet3_hw *hw = dev->data->dev_private;
+   struct UPT1_TxStats txStats;
+   struct UPT1_RxStats rxStats;
 
VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS);
 
RTE_BUILD_BUG_ON(RTE_ETHDEV_QUEUE_STAT_CNTRS < VMXNET3_MAX_TX_QUEUES);
for (i = 0; i < hw->num_tx_queues; i++) {
-   struct UPT1_TxStats *txStats = &hw->tqd_start[i].stats;
+   vmxnet3_hw_tx_stats_get(hw, i, &txStats);
+
+   stats->q_opackets[i] = txStats.ucastPktsTxOK +
+   txStats.mcastPktsTxOK +
+

[dpdk-dev] [PATCH 2/6] net/vmxnet3: Implement retrieval of extended stats

2017-05-19 Thread Charles (Chas) Williams
From: Robert Shearman 

Implement xstats_get() to allow a number of driver-specific tx and rx
stats to be retrieved.

Signed-off-by: Robert Shearman 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 111 +++
 1 file changed, 111 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 4d2c024..a3378ad 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -88,6 +88,11 @@ static int vmxnet3_dev_link_update(struct rte_eth_dev *dev,
 static void vmxnet3_hw_stats_save(struct vmxnet3_hw *hw);
 static void vmxnet3_dev_stats_get(struct rte_eth_dev *dev,
  struct rte_eth_stats *stats);
+static int vmxnet3_dev_xstats_get_names(struct rte_eth_dev *dev,
+   struct rte_eth_xstat_name *xstats,
+   unsigned int n);
+static int vmxnet3_dev_xstats_get(struct rte_eth_dev *dev,
+ struct rte_eth_xstat *xstats, unsigned int n);
 static void vmxnet3_dev_info_get(struct rte_eth_dev *dev,
 struct rte_eth_dev_info *dev_info);
 static const uint32_t *
@@ -122,6 +127,8 @@ static const struct eth_dev_ops vmxnet3_eth_dev_ops = {
.allmulticast_disable = vmxnet3_dev_allmulticast_disable,
.link_update  = vmxnet3_dev_link_update,
.stats_get= vmxnet3_dev_stats_get,
+   .xstats_get_names = vmxnet3_dev_xstats_get_names,
+   .xstats_get   = vmxnet3_dev_xstats_get,
.mac_addr_set = vmxnet3_mac_addr_set,
.dev_infos_get= vmxnet3_dev_info_get,
.dev_supported_ptypes_get = vmxnet3_dev_supported_ptypes_get,
@@ -133,6 +140,27 @@ static const struct eth_dev_ops vmxnet3_eth_dev_ops = {
.tx_queue_release = vmxnet3_dev_tx_queue_release,
 };
 
+struct vmxnet3_xstats_name_off {
+   char name[RTE_ETH_XSTATS_NAME_SIZE];
+   unsigned int offset;
+};
+
+/* tx_qX_ is prepended to the name string here */
+static const struct vmxnet3_xstats_name_off vmxnet3_txq_stat_strings[] = {
+   {"drop_total", offsetof(struct vmxnet3_txq_stats, drop_total)},
+   {"drop_too_many_segs", offsetof(struct vmxnet3_txq_stats, 
drop_too_many_segs)},
+   {"drop_tso",   offsetof(struct vmxnet3_txq_stats, drop_tso)},
+   {"tx_ring_full",   offsetof(struct vmxnet3_txq_stats, 
tx_ring_full)},
+};
+
+/* rx_qX_ is prepended to the name string here */
+static const struct vmxnet3_xstats_name_off vmxnet3_rxq_stat_strings[] = {
+   {"drop_total",   offsetof(struct vmxnet3_rxq_stats, 
drop_total)},
+   {"drop_err", offsetof(struct vmxnet3_rxq_stats, drop_err)},
+   {"drop_fcs", offsetof(struct vmxnet3_rxq_stats, drop_fcs)},
+   {"rx_buf_alloc_failure", offsetof(struct vmxnet3_rxq_stats, 
rx_buf_alloc_failure)},
+};
+
 static const struct rte_memzone *
 gpa_zone_reserve(struct rte_eth_dev *dev, uint32_t size,
 const char *post_string, int socket_id,
@@ -882,6 +910,89 @@ vmxnet3_hw_stats_save(struct vmxnet3_hw *hw)
vmxnet3_hw_rx_stats_get(hw, i, &hw->saved_rx_stats[i]);
 }
 
+static int
+vmxnet3_dev_xstats_get_names(struct rte_eth_dev *dev,
+struct rte_eth_xstat_name *xstats_names,
+unsigned int n)
+{
+   unsigned int i, t, count = 0;
+   unsigned int nstats =
+   dev->data->nb_tx_queues * RTE_DIM(vmxnet3_txq_stat_strings) +
+   dev->data->nb_rx_queues * RTE_DIM(vmxnet3_rxq_stat_strings);
+
+   if (!xstats_names || n < nstats)
+   return nstats;
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   if (!dev->data->rx_queues[i])
+   continue;
+
+   for (t = 0; t < RTE_DIM(vmxnet3_rxq_stat_strings); t++) {
+   snprintf(xstats_names[count].name,
+sizeof(xstats_names[count].name),
+"rx_q%u_%s", i,
+vmxnet3_rxq_stat_strings[t].name);
+   count++;
+   }
+   }
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   if (!dev->data->tx_queues[i])
+   continue;
+
+   for (t = 0; t < RTE_DIM(vmxnet3_txq_stat_strings); t++) {
+   snprintf(xstats_names[count].name,
+sizeof(xstats_names[count].name),
+"tx_q%u_%s", i,
+vmxnet3_txq_stat_strings[t].name);
+   count++;
+   }
+   }
+
+   return count;
+}
+
+static int
+vmxnet3_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat *xstats,
+  unsigned int n)
+{
+   unsigned int i, t, count = 0;
+   unsigned 

[dpdk-dev] [PATCH 5/6] net/vmxnet3: receive queue lockup and memleak

2017-05-19 Thread Charles (Chas) Williams
From: Mandeep Rohilla 

The receive queue can lockup if all the rx descriptors have lost
their mbufs and temporarily there are no mbufs available. This
can happen if there is an mbuf leak or if the application holds
on to the mbuf for a while.

This also addresses an mbuf leak in an error condition during
packet receive.

Signed-off-by: Mandeep Rohilla 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index d8713a1..d21679d 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -731,6 +731,7 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
uint16_t nb_rx;
uint32_t nb_rxd, idx;
uint8_t ring_idx;
+   uint8_t i;
vmxnet3_rx_queue_t *rxq;
Vmxnet3_RxCompDesc *rcd;
vmxnet3_buf_info_t *rbi;
@@ -800,6 +801,12 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
   (int)(rcd - (struct Vmxnet3_RxCompDesc *)
 rxq->comp_ring.base), rcd->rxdIdx);
rte_pktmbuf_free_seg(rxm);
+   if (rxq->start_seg) {
+   struct rte_mbuf *start = rxq->start_seg;
+
+   rxq->start_seg = NULL;
+   rte_pktmbuf_free(start);
+   }
goto rcd_done;
}
 
@@ -893,6 +900,18 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
}
}
 
+   /*
+* Try to replenish the rx descriptors with the new mbufs
+*/
+   for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++) {
+   vmxnet3_post_rx_bufs(rxq, i);
+   if (unlikely(rxq->shared->ctrl.updateRxProd)) {
+   VMXNET3_WRITE_BAR0_REG(hw,
+   rxprod_reg[i] +
+   (rxq->queue_id * VMXNET3_REG_ALIGN),
+   rxq->cmd_ring[i].next2fill);
+   }
+   }
return nb_rx;
 }
 
-- 
2.1.4



[dpdk-dev] [PATCH 3/6] net/vmxnet3: Generate link-state change notifications

2017-05-19 Thread Charles (Chas) Williams
From: Robert Shearman 

Generate link-state change notifications by listening to interrupts
generated by the device. Make use of the existing
vmxnet3_process_events function that was compiled out, but change it
to call vmxnet3_dev_link_update on a VMXNET3_ECR_LINK event and to not
be so noisy in its log messages.

Enable interrupts on starting the device, using a new helper function,
vmxnet3_enable_intr, based on vmxnet3_disable_intr and validated
against the FreeBSD driver.

Keep track of the number of interrupts registered for to avoid
hardcoding these in vmxnet3_enable/disable_intr and to provision for
any future rxq intr support.

Factor out the guts of vmxnet3_dev_link_update minus the started check
to allow the new function to be called from vmxnet3_dev_start in the
lsc-enabled case to ensure that the link state is correctly set from
the actual state at that point.

Signed-off-by: Robert Shearman 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 116 +++
 drivers/net/vmxnet3/vmxnet3_ethdev.h |   2 +
 2 files changed, 91 insertions(+), 27 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index a3378ad..8f6e0fc 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -83,6 +83,8 @@ static void vmxnet3_dev_promiscuous_enable(struct rte_eth_dev 
*dev);
 static void vmxnet3_dev_promiscuous_disable(struct rte_eth_dev *dev);
 static void vmxnet3_dev_allmulticast_enable(struct rte_eth_dev *dev);
 static void vmxnet3_dev_allmulticast_disable(struct rte_eth_dev *dev);
+static int __vmxnet3_dev_link_update(struct rte_eth_dev *dev,
+int wait_to_complete);
 static int vmxnet3_dev_link_update(struct rte_eth_dev *dev,
   int wait_to_complete);
 static void vmxnet3_hw_stats_save(struct vmxnet3_hw *hw);
@@ -102,10 +104,8 @@ static int vmxnet3_dev_vlan_filter_set(struct rte_eth_dev 
*dev,
 static void vmxnet3_dev_vlan_offload_set(struct rte_eth_dev *dev, int mask);
 static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 struct ether_addr *mac_addr);
+static void vmxnet3_interrupt_handler(void *param);
 
-#if PROCESS_SYS_EVENTS == 1
-static void vmxnet3_process_events(struct vmxnet3_hw *);
-#endif
 /*
  * The set of PCI devices this driver supports
  */
@@ -250,10 +250,22 @@ vmxnet3_disable_intr(struct vmxnet3_hw *hw)
PMD_INIT_FUNC_TRACE();
 
hw->shared->devRead.intrConf.intrCtrl |= VMXNET3_IC_DISABLE_ALL;
-   for (i = 0; i < VMXNET3_MAX_INTRS; i++)
+   for (i = 0; i < hw->num_intrs; i++)
VMXNET3_WRITE_BAR0_REG(hw, VMXNET3_REG_IMR + i * 8, 1);
 }
 
+static void
+vmxnet3_enable_intr(struct vmxnet3_hw *hw)
+{
+   int i;
+
+   PMD_INIT_FUNC_TRACE();
+
+   hw->shared->devRead.intrConf.intrCtrl &= ~VMXNET3_IC_DISABLE_ALL;
+   for (i = 0; i < hw->num_intrs; i++)
+   VMXNET3_WRITE_BAR0_REG(hw, VMXNET3_REG_IMR + i * 8, 0);
+}
+
 /*
  * Gets tx data ring descriptor size.
  */
@@ -425,7 +437,7 @@ static int eth_vmxnet3_pci_remove(struct rte_pci_device 
*pci_dev)
 
 static struct rte_pci_driver rte_vmxnet3_pmd = {
.id_table = pci_id_vmxnet3_map,
-   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_vmxnet3_pci_probe,
.remove = eth_vmxnet3_pci_remove,
 };
@@ -648,11 +660,11 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
 
/*
 * Set number of interrupts to 1
-* PMD disables all the interrupts but this is MUST to activate device
-* It needs at least one interrupt for link events to handle
-* So we'll disable it later after device activation if needed
+* PMD by default disables all the interrupts but this is MUST
+* to activate device. It needs at least one interrupt for
+* link events to handle
 */
-   devRead->intrConf.numIntrs = 1;
+   hw->num_intrs = devRead->intrConf.numIntrs = 1;
devRead->intrConf.intrCtrl |= VMXNET3_IC_DISABLE_ALL;
 
for (i = 0; i < hw->num_tx_queues; i++) {
@@ -747,6 +759,20 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
if (ret != VMXNET3_SUCCESS)
return ret;
 
+   /* check if lsc interrupt feature is enabled */
+   if (dev->data->dev_conf.intr_conf.lsc) {
+   struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev->device);
+
+   /* Setup interrupt callback  */
+   rte_intr_callback_register(&pci_dev->intr_handle,
+  vmxnet3_interrupt_handler, dev);
+
+   if (rte_intr_enable(&pci_dev->intr_handle) < 0) {
+   PMD_INIT_LOG(ERR, "interrupt enable failed");
+   return -EIO;
+   }
+   }
+
/* Exchange shared data with device */
VMXNET3_WRITE_B

[dpdk-dev] [PATCH 4/6] net/vmxnet3: Make vmxnet3_process_events less noisy

2017-05-19 Thread Charles (Chas) Williams
From: Robert Shearman 

Make vmxnet3_process_events less noisy by removing logging when there
are no events to process and by making link, device-change and debug
events DEBUG level rather than ERR.

Change these to use PMD_DRV_LOG instead of PMD_INIT_LOG since they
don't happen at device init.

Signed-off-by: Robert Shearman 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 8f6e0fc..d241499 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -1305,10 +1305,8 @@ vmxnet3_process_events(struct rte_eth_dev *dev)
struct vmxnet3_hw *hw = dev->data->dev_private;
uint32_t events = hw->shared->ecr;
 
-   if (!events) {
-   PMD_INIT_LOG(ERR, "No events to process");
+   if (!events)
return;
-   }
 
/*
 * ECR bits when written with 1b are cleared. Hence write
@@ -1318,9 +1316,7 @@ vmxnet3_process_events(struct rte_eth_dev *dev)
 
/* Check if link state has changed */
if (events & VMXNET3_ECR_LINK) {
-   PMD_INIT_LOG(ERR,
-"Process events in %s(): VMXNET3_ECR_LINK event",
-__func__);
+   PMD_DRV_LOG(DEBUG, "Process events: VMXNET3_ECR_LINK event");
if (vmxnet3_dev_link_update(dev, 0) == 0)
_rte_eth_dev_callback_process(dev,
  RTE_ETH_EVENT_INTR_LSC,
@@ -1333,11 +1329,11 @@ vmxnet3_process_events(struct rte_eth_dev *dev)
   VMXNET3_CMD_GET_QUEUE_STATUS);
 
if (hw->tqd_start->status.stopped)
-   PMD_INIT_LOG(ERR, "tq error 0x%x",
-hw->tqd_start->status.error);
+   PMD_DRV_LOG(ERR, "tq error 0x%x",
+   hw->tqd_start->status.error);
 
if (hw->rqd_start->status.stopped)
-   PMD_INIT_LOG(ERR, "rq error 0x%x",
+   PMD_DRV_LOG(ERR, "rq error 0x%x",
 hw->rqd_start->status.error);
 
/* Reset the device */
@@ -1345,10 +1341,10 @@ vmxnet3_process_events(struct rte_eth_dev *dev)
}
 
if (events & VMXNET3_ECR_DIC)
-   PMD_INIT_LOG(ERR, "Device implementation change event.");
+   PMD_DRV_LOG(DEBUG, "Device implementation change event.");
 
if (events & VMXNET3_ECR_DEBUG)
-   PMD_INIT_LOG(ERR, "Debug event generated by device.");
+   PMD_DRV_LOG(DEBUG, "Debug event generated by device.");
 }
 
 static void
-- 
2.1.4



[dpdk-dev] [PATCH 6/6] net/vmxnet3: preserve configured MAC address

2017-05-19 Thread Charles (Chas) Williams
From: George Wilkie 

When starting a vmxnet3 device, it is always writing the permanent MAC
address, even if a different MAC address was configured.  Write from
the device data instead which holds the current one.

Signed-off-by: George Wilkie 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index d241499..2e80cb1 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -734,7 +734,7 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
vmxnet3_dev_vlan_offload_set(dev,
 ETH_VLAN_STRIP_MASK | 
ETH_VLAN_FILTER_MASK);
 
-   vmxnet3_write_mac(hw, hw->perm_addr);
+   vmxnet3_write_mac(hw, dev->data->mac_addrs->addr_bytes);
 
return VMXNET3_SUCCESS;
 }
-- 
2.1.4



Re: [dpdk-dev] [PATCH 2/3] net/af_packet: add accessor for iface

2017-05-22 Thread Charles (Chas) Williams



On 05/22/2017 05:39 AM, Ferruh Yigit wrote:

Hi Chas,

On 5/17/2017 7:03 PM, Charles (Chas) Williams wrote:

Provide an accessor for the name of the underlying linux interface
used by the AF_PACKET-based interface.


This patch provides a PMD specific API to get PMD internal data
(internals->if_name).

I think we should avoid PMD specific APIs if possible.

internals->if_name is provided by application, as devargs to af_packet
(iface=..), so app already knows port_id -> if_name mapping.

OR

With rte_eth_dev_info_get() af_packet returns the if_index for
underlying interface, it is possible to use if_indextoname() to get ifname.

Because above methods exists to get if_name, I think this API is not
necessary.


I will go this route then.



Signed-off-by: Charles (Chas) Williams 


<...>



[dpdk-dev] [PATCH] net/ring: fix adding MAC addresses

2017-05-26 Thread Charles (Chas) Williams
When .mac_addr_add() was changed to allow a return code, ring was changed
to return -ENOTSUP.  This changes the behavior of the rte_ring driver in
a way that does not maintain backward compatibility.  Additionally, if
the intent is to return -ENOTSUP, you could simply not define the stubs.

Fixes: 6d01e580ac5d ("ethdev: fix adding invalid MAC address")

Signed-off-by: Chas Williams 
---
 drivers/net/ring/rte_eth_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index 87d2258..d4dce95 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -230,7 +230,7 @@ eth_mac_addr_add(struct rte_eth_dev *dev __rte_unused,
uint32_t index __rte_unused,
uint32_t vmdq __rte_unused)
 {
-   return -ENOTSUP;
+   return 0;
 }
 
 static void
-- 
2.1.4



Re: [dpdk-dev] [PATCH 5/6] net/vmxnet3: receive queue lockup and memleak

2017-06-01 Thread Charles (Chas) Williams

While looking at another issue, I think one of the issues fixed in this
commit has already been fixed in the last DPDK release by:

commit 8fce14b789aecdb4345a62f6980e7b6e7f4ba245
Author: Stefan Puiu 
Date:   Mon Dec 19 11:40:53 2016 +0200

net/vmxnet3: fix Rx deadlock

Our use case is that we have an app that needs to keep mbufs around
for a while. We've seen cases when calling vmxnet3_post_rx_bufs() 
from
vmxet3_recv_pkts(), it might not succeed to add any mbufs to any RX
descriptors (where it returns -err). Since there are no mbufs that 
the
virtual hardware can use, no packets will be received after this; 
the
driver won't refill the mbuf after this so it gets stuck in this
state. I call this a deadlock for lack of a better term - the 
virtual
HW waits for free mbufs, while the app waits for the hardware to
notify it for data (by flipping the generation bit on the used Rx
descriptors). Note that after this, the app can't recover.
...

The mbuf leak due to an error during receive still exists.  That can be
refactored into a new commit.

On 05/19/2017 01:55 PM, Charles (Chas) Williams wrote:

From: Mandeep Rohilla 

The receive queue can lockup if all the rx descriptors have lost
their mbufs and temporarily there are no mbufs available. This
can happen if there is an mbuf leak or if the application holds
on to the mbuf for a while.

This also addresses an mbuf leak in an error condition during
packet receive.

Signed-off-by: Mandeep Rohilla 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index d8713a1..d21679d 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -731,6 +731,7 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
uint16_t nb_rx;
uint32_t nb_rxd, idx;
uint8_t ring_idx;
+   uint8_t i;
vmxnet3_rx_queue_t *rxq;
Vmxnet3_RxCompDesc *rcd;
vmxnet3_buf_info_t *rbi;
@@ -800,6 +801,12 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
   (int)(rcd - (struct Vmxnet3_RxCompDesc *)
 rxq->comp_ring.base), rcd->rxdIdx);
rte_pktmbuf_free_seg(rxm);
+   if (rxq->start_seg) {
+   struct rte_mbuf *start = rxq->start_seg;
+
+   rxq->start_seg = NULL;
+   rte_pktmbuf_free(start);
+   }
goto rcd_done;
}

@@ -893,6 +900,18 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
}
}

+   /*
+* Try to replenish the rx descriptors with the new mbufs
+*/
+   for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++) {
+   vmxnet3_post_rx_bufs(rxq, i);
+   if (unlikely(rxq->shared->ctrl.updateRxProd)) {
+   VMXNET3_WRITE_BAR0_REG(hw,
+   rxprod_reg[i] +
+   (rxq->queue_id * VMXNET3_REG_ALIGN),
+   rxq->cmd_ring[i].next2fill);
+   }
+   }
return nb_rx;
 }




[dpdk-dev] [PATCH] net/e1000: always enable receive and transmit

2017-07-17 Thread Charles (Chas) Williams
The transmit and receive controller state machines are only enabled after
receiving an interrupt and the link status is now valid.  If an adapter
is being used in conjuction with NC-SI, network controller sideband
interface, the adapter may never get a link state change interrupt since
the adapter's PHY is always link up and never changes state.

To fix this, always enable and disable the transmit and receive with
.dev_start and .dev_stop.  This is a better match for what is typically
done with the other PMD's.  Since we may never get an interrupt to check
the link state, we also poll once at the end of .dev_start to get the
current link status.

Signed-off-by: Chas Williams 
---
 drivers/net/e1000/em_ethdev.c  | 44 ++---
 drivers/net/e1000/igb_ethdev.c | 45 --
 2 files changed, 58 insertions(+), 31 deletions(-)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 3d4ab93..11e0bb8 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -586,6 +586,30 @@ em_set_pba(struct e1000_hw *hw)
E1000_WRITE_REG(hw, E1000_PBA, pba);
 }
 
+static void
+eth_em_rxtx_control(struct rte_eth_dev *dev,
+   bool enable)
+{
+   struct e1000_hw *hw =
+   E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t tctl, rctl;
+
+   tctl = E1000_READ_REG(hw, E1000_TCTL);
+   rctl = E1000_READ_REG(hw, E1000_RCTL);
+   if (enable) {
+   /* enable Tx/Rx */
+   tctl |= E1000_TCTL_EN;
+   rctl |= E1000_RCTL_EN;
+   } else {
+   /* disable Tx/Rx */
+   tctl &= ~E1000_TCTL_EN;
+   rctl &= ~E1000_RCTL_EN;
+   }
+   E1000_WRITE_REG(hw, E1000_TCTL, tctl);
+   E1000_WRITE_REG(hw, E1000_RCTL, rctl);
+   E1000_WRITE_FLUSH(hw);
+}
+
 static int
 eth_em_start(struct rte_eth_dev *dev)
 {
@@ -754,6 +778,9 @@ eth_em_start(struct rte_eth_dev *dev)
 
adapter->stopped = 0;
 
+   eth_em_rxtx_control(dev, true);
+   eth_em_link_update(dev, 0);
+
PMD_INIT_LOG(DEBUG, "<<");
 
return 0;
@@ -779,6 +806,7 @@ eth_em_stop(struct rte_eth_dev *dev)
struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(dev);
struct rte_intr_handle *intr_handle = &pci_dev->intr_handle;
 
+   eth_em_rxtx_control(dev, false);
em_rxq_intr_disable(hw);
em_lsc_intr_disable(hw);
 
@@ -1602,7 +1630,6 @@ eth_em_interrupt_action(struct rte_eth_dev *dev,
E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct e1000_interrupt *intr =
E1000_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
-   uint32_t tctl, rctl;
struct rte_eth_link link;
int ret;
 
@@ -1634,21 +1661,6 @@ eth_em_interrupt_action(struct rte_eth_dev *dev,
 pci_dev->addr.domain, pci_dev->addr.bus,
 pci_dev->addr.devid, pci_dev->addr.function);
 
-   tctl = E1000_READ_REG(hw, E1000_TCTL);
-   rctl = E1000_READ_REG(hw, E1000_RCTL);
-   if (link.link_status) {
-   /* enable Tx/Rx */
-   tctl |= E1000_TCTL_EN;
-   rctl |= E1000_RCTL_EN;
-   } else {
-   /* disable Tx/Rx */
-   tctl &= ~E1000_TCTL_EN;
-   rctl &= ~E1000_RCTL_EN;
-   }
-   E1000_WRITE_REG(hw, E1000_TCTL, tctl);
-   E1000_WRITE_REG(hw, E1000_RCTL, rctl);
-   E1000_WRITE_FLUSH(hw);
-
return 0;
 }
 
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index da03d9b..7e2d31a 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -1300,6 +1300,31 @@ eth_igb_configure(struct rte_eth_dev *dev)
return 0;
 }
 
+static void
+eth_igb_rxtx_control(struct rte_eth_dev *dev,
+bool enable)
+{
+   struct e1000_hw *hw =
+   E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t tctl, rctl;
+
+   tctl = E1000_READ_REG(hw, E1000_TCTL);
+   rctl = E1000_READ_REG(hw, E1000_RCTL);
+
+   if (enable) {
+   /* enable Tx/Rx */
+   tctl |= E1000_TCTL_EN;
+   rctl |= E1000_RCTL_EN;
+   } else {
+   /* disable Tx/Rx */
+   tctl &= ~E1000_TCTL_EN;
+   rctl &= ~E1000_RCTL_EN;
+   }
+   E1000_WRITE_REG(hw, E1000_TCTL, tctl);
+   E1000_WRITE_REG(hw, E1000_RCTL, rctl);
+   E1000_WRITE_FLUSH(hw);
+}
+
 static int
 eth_igb_start(struct rte_eth_dev *dev)
 {
@@ -1496,6 +1521,9 @@ eth_igb_start(struct rte_eth_dev *dev)
/* restore all types filter */
igb_filter_restore(dev);
 
+   eth_igb_rxtx_control(dev, true);
+   eth_igb_link_update(dev, 0);
+
PMD_INIT_LOG(DEBUG, "<<");
 
return 0;
@@ -1521,6 +1549,8 @@ eth_igb_stop(struct rte_eth_dev *dev)
struct rte_eth_link link;
struct rte_intr_handle *intr_h

[dpdk-dev] [PATCH 0/2] Some small changes to net/virtio

2017-07-17 Thread Charles (Chas) Williams
Just a couple small changes to net/virtio that make it a little more
well behaved.

Brian Russell (1):
  net/virtio: register/unregister intr handler on start/stop

Charles (Chas) Williams (1):
  net/virtio: do not re-enter clean up routines

 drivers/net/virtio/virtio_ethdev.c | 41 +-
 drivers/net/virtio/virtio_pci.h|  4 +++-
 2 files changed, 30 insertions(+), 15 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH 1/2] net/virtio: do not re-enter clean up routines

2017-07-17 Thread Charles (Chas) Williams
.dev_uninit calls .dev_stop and .dev_close.  The work that is done in
those routines doesn't need repeated.  Use started and opened to track
the adapter's status.

Signed-off-by: Chas Williams 
---
 drivers/net/virtio/virtio_ethdev.c | 15 ---
 drivers/net/virtio/virtio_pci.h|  4 +++-
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 00a3122..eff0545 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -609,6 +609,10 @@ virtio_dev_close(struct rte_eth_dev *dev)
 
PMD_INIT_LOG(DEBUG, "virtio_dev_close");
 
+   if (!hw->opened)
+   return;
+   hw->opened = false;
+
/* reset the NIC */
if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
VTPCI_OPS(hw)->set_config_irq(hw, VIRTIO_MSI_NO_VECTOR);
@@ -1696,6 +1700,8 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return -EBUSY;
}
 
+   hw->opened = true;
+
return 0;
 }
 
@@ -1759,7 +1765,7 @@ virtio_dev_start(struct rte_eth_dev *dev)
VIRTQUEUE_DUMP(txvq->vq);
}
 
-   hw->started = 1;
+   hw->started = true;
 
/* Initialize Link state */
virtio_dev_link_update(dev, 0);
@@ -1824,10 +1830,13 @@ virtio_dev_stop(struct rte_eth_dev *dev)
 
PMD_INIT_LOG(DEBUG, "stop");
 
+   if (!hw->started)
+   return;
+   hw->started = false;
+
if (intr_conf->lsc || intr_conf->rxq)
rte_intr_disable(dev->intr_handle);
 
-   hw->started = 0;
memset(&link, 0, sizeof(link));
virtio_dev_atomic_write_link_status(dev, &link);
 }
@@ -1844,7 +1853,7 @@ virtio_dev_link_update(struct rte_eth_dev *dev, 
__rte_unused int wait_to_complet
link.link_duplex = ETH_LINK_FULL_DUPLEX;
link.link_speed  = SPEED_10G;
 
-   if (hw->started == 0) {
+   if (!hw->started) {
link.link_status = ETH_LINK_DOWN;
} else if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
PMD_INIT_LOG(DEBUG, "Get link status from hw");
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 18caebd..65bad2d 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -35,6 +35,7 @@
 #define _VIRTIO_PCI_H_
 
 #include 
+#include 
 
 #include 
 #include 
@@ -253,7 +254,7 @@ struct virtio_hw {
uint64_treq_guest_features;
uint64_tguest_features;
uint32_tmax_queue_pairs;
-   uint16_tstarted;
+   boolstarted;
uint16_tmax_mtu;
uint16_tvtnet_hdr_size;
uint8_t vlan_strip;
@@ -268,6 +269,7 @@ struct virtio_hw {
struct virtio_pci_common_cfg *common_cfg;
struct virtio_net_config *dev_cfg;
void*virtio_user_dev;
+   boolopened;
 
struct virtqueue **vqs;
 };
-- 
2.1.4



[dpdk-dev] [PATCH 2/2] net/virtio: register/unregister intr handler on start/stop

2017-07-17 Thread Charles (Chas) Williams
From: Brian Russell 

Register and unregister the virtio interrupt handler when the device is
started and stopped.

Signed-off-by: Brian Russell 
---
 drivers/net/virtio/virtio_ethdev.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index eff0545..103e778 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1581,11 +1581,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
if (ret < 0)
return ret;
 
-   /* Setup interrupt callback  */
-   if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
-   rte_intr_callback_register(eth_dev->intr_handle,
-   virtio_interrupt_handler, eth_dev);
-
return 0;
 }
 
@@ -1607,11 +1602,6 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
rte_free(eth_dev->data->mac_addrs);
eth_dev->data->mac_addrs = NULL;
 
-   /* reset interrupt callback  */
-   if (eth_dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
-   rte_intr_callback_unregister(eth_dev->intr_handle,
-   virtio_interrupt_handler,
-   eth_dev);
if (eth_dev->device)
rte_pci_unmap_device(RTE_ETH_DEV_TO_PCI(eth_dev));
 
@@ -1730,6 +1720,12 @@ virtio_dev_start(struct rte_eth_dev *dev)
dev->data->dev_conf.intr_conf.rxq) {
rte_intr_disable(dev->intr_handle);
 
+   /* Setup interrupt callback  */
+   if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)
+   rte_intr_callback_register(dev->intr_handle,
+  virtio_interrupt_handler,
+  dev);
+
if (rte_intr_enable(dev->intr_handle) < 0) {
PMD_DRV_LOG(ERR, "interrupt enable failed");
return -EIO;
@@ -1834,9 +1830,17 @@ virtio_dev_stop(struct rte_eth_dev *dev)
return;
hw->started = false;
 
-   if (intr_conf->lsc || intr_conf->rxq)
+   if (intr_conf->lsc || intr_conf->rxq) {
rte_intr_disable(dev->intr_handle);
 
+   /* Reset interrupt callback  */
+   if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) {
+   rte_intr_callback_unregister(dev->intr_handle,
+virtio_interrupt_handler,
+dev);
+   }
+   }
+
memset(&link, 0, sizeof(link));
virtio_dev_atomic_write_link_status(dev, &link);
 }
-- 
2.1.4



Re: [dpdk-dev] [PATCH 0/2] Some small changes to net/virtio

2017-07-18 Thread Charles (Chas) Williams



On 07/18/2017 07:50 AM, Ferruh Yigit wrote:

On 7/18/2017 12:05 AM, Charles (Chas) Williams wrote:

Just a couple small changes to net/virtio that make it a little more
well behaved.


Same question here, is this patchset targets the 17.08-rc2? Can this be
postponed?


Yes, these can all be postponed to 17.11.  They are too late for a -rc2.





Brian Russell (1):
  net/virtio: register/unregister intr handler on start/stop

Charles (Chas) Williams (1):
  net/virtio: do not re-enter clean up routines

 drivers/net/virtio/virtio_ethdev.c | 41 +-
 drivers/net/virtio/virtio_pci.h|  4 +++-
 2 files changed, 30 insertions(+), 15 deletions(-)





[dpdk-dev] [PATCH] eal: implement rte_log_get_level

2017-07-22 Thread Charles (Chas) Williams
This function wasn't provided in the original commit.

Fixes: 4f0981e6ec6c ("eal: deprecate log functions")
Cc: sta...@dpdk.org

Signed-off-by: Chas Williams 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  1 +
 lib/librte_eal/common/eal_common_log.c  |  9 +
 lib/librte_eal/common/include/rte_log.h | 10 ++
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  1 +
 4 files changed, 21 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 480ad23..4687143 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -200,6 +200,7 @@ DPDK_17.08 {
rte_bus_find;
rte_bus_find_by_device;
rte_bus_find_by_name;
+   rte_log_get_level;
 
 } DPDK_17.05;
 
diff --git a/lib/librte_eal/common/eal_common_log.c 
b/lib/librte_eal/common/eal_common_log.c
index 41ea924..0e3b932 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -112,6 +112,15 @@ rte_get_log_level(void)
return rte_log_get_global_level();
 }
 
+int
+rte_log_get_level(uint32_t type)
+{
+   if (type >= rte_logs.dynamic_types_len)
+   return -1;
+
+   return rte_logs.dynamic_types[type].loglevel;
+}
+
 /* Set global log type */
 __rte_deprecated void
 rte_set_log_type(uint32_t type, int enable)
diff --git a/lib/librte_eal/common/include/rte_log.h 
b/lib/librte_eal/common/include/rte_log.h
index 3419138..ec8dba7 100644
--- a/lib/librte_eal/common/include/rte_log.h
+++ b/lib/librte_eal/common/include/rte_log.h
@@ -175,6 +175,16 @@ __rte_deprecated
 uint32_t rte_get_log_type(void);
 
 /**
+ * Get the log level for a given type.
+ *
+ * @param logtype
+ *   The log type identifier.
+ * @return
+ *   0 on success, a negative value if logtype is invalid.
+ */
+int rte_log_get_level(uint32_t logtype);
+
+/**
  * Set the log level for a given type.
  *
  * @param pattern
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map 
b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index fbaec39..0ecadd7 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -205,6 +205,7 @@ DPDK_17.08 {
rte_bus_find;
rte_bus_find_by_device;
rte_bus_find_by_name;
+   rte_log_get_level;
 
 } DPDK_17.05;
 
-- 
2.1.4



[dpdk-dev] [PATCH] net/vmxnet3: restore correct filtering

2017-07-24 Thread Charles (Chas) Williams
We should only restore shadow_vfta when hw_vlan_filter is active.
Otherwise, we should restore the previous filtering behavior.

Fixes: f003fc383487("vmxnet3: enable vlan filtering")
Cc: sta...@dpdk.org

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index ab5a824..3910991 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -1219,7 +1219,10 @@ vmxnet3_dev_promiscuous_disable(struct rte_eth_dev *dev)
struct vmxnet3_hw *hw = dev->data->dev_private;
uint32_t *vf_table = hw->shared->devRead.rxFilterConf.vfTable;
 
-   memcpy(vf_table, hw->shadow_vfta, VMXNET3_VFT_TABLE_SIZE);
+   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   memcpy(vf_table, hw->shadow_vfta, VMXNET3_VFT_TABLE_SIZE);
+   else
+   memset(vf_table, 0xff, VMXNET3_VFT_TABLE_SIZE);
vmxnet3_dev_set_rxmode(hw, VMXNET3_RXM_PROMISC, 0);
VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD,
   VMXNET3_CMD_UPDATE_VLAN_FILTERS);
-- 
2.1.4



Re: [dpdk-dev] [PATCH] net/vmxnet3: restore correct filtering

2017-08-01 Thread Charles (Chas) Williams



On 08/01/2017 07:41 PM, Shrikrishna Khare wrote:



On Mon, 24 Jul 2017, Charles (Chas) Williams wrote:


We should only restore shadow_vfta when hw_vlan_filter is active.
Otherwise, we should restore the previous filtering behavior.

Fixes: f003fc383487("vmxnet3: enable vlan filtering")
Cc: sta...@dpdk.org

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index ab5a824..3910991 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -1219,7 +1219,10 @@ vmxnet3_dev_promiscuous_disable(struct rte_eth_dev *dev)
struct vmxnet3_hw *hw = dev->data->dev_private;
uint32_t *vf_table = hw->shared->devRead.rxFilterConf.vfTable;

-   memcpy(vf_table, hw->shadow_vfta, VMXNET3_VFT_TABLE_SIZE);
+   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   memcpy(vf_table, hw->shadow_vfta, VMXNET3_VFT_TABLE_SIZE);
+   else
+   memset(vf_table, 0xff, VMXNET3_VFT_TABLE_SIZE);


From the device emulation standpoint, vf_table array set to all 1s means
that rx filtering is enabled but all vlan ID are allowed, while vf_table
array set to all 0s means that no vlan ID filtering is done.

So, it is better to go with all 0s rather than all 1s, it is also more
efficient for the emulation.

This is an existing problem in vmxnet3_dev_vlan_offload_set as well.


That may be the case, but this patch isn't the place to address that.
If this needs changed, there should be a follow on patch that makes
this change.




vmxnet3_dev_set_rxmode(hw, VMXNET3_RXM_PROMISC, 0);
VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD,
   VMXNET3_CMD_UPDATE_VLAN_FILTERS);
--
2.1.4




[dpdk-dev] [PATCH] mbuf: use refcnt = 0 when debugging

2017-08-07 Thread Charles (Chas) Williams
After commit 8f094a9ac5d7 is it much harder to detect a "double free".
If the developer makes a copy of an mbuf pointer and frees it twice, this
condition is never detected and the mbuf gets returned to the pool twice.

Since this requires extra work to track, make this behavior conditional
on CONFIG_RTE_LIBRTE_MBUF_DEBUG.

Signed-off-by: Chas Williams 
---
 lib/librte_mbuf/rte_mbuf.c |  2 +-
 lib/librte_mbuf/rte_mbuf.h | 42 +++---
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 26a62b8..b0d222c 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -145,7 +145,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->pool = mp;
m->nb_segs = 1;
m->port = 0xff;
-   rte_mbuf_refcnt_set(m, 1);
+   rte_mbuf_refcnt_set(m, RTE_MBUF_UNUSED_CNT);
m->next = NULL;
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index eaed7ee..9d87081 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -671,11 +671,15 @@ struct rte_pktmbuf_pool_private {
 
 #ifdef RTE_LIBRTE_MBUF_DEBUG
 
+#define RTE_MBUF_UNUSED_CNT 0
+
 /**  check mbuf type in debug mode */
 #define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
 
 #else /*  RTE_LIBRTE_MBUF_DEBUG */
 
+#define RTE_MBUF_UNUSED_CNT 1
+
 /**  check mbuf type in debug mode */
 #define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
 
@@ -721,6 +725,9 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) != 0);
+#endif
/*
 * The atomic_add is an expensive operation, so we don't want to
 * call it in the case where we know we are the uniq holder of
@@ -791,10 +798,9 @@ void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
 
 #define MBUF_RAW_ALLOC_CHECK(m) do {   \
-   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);   \
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) == RTE_MBUF_UNUSED_CNT); \
RTE_ASSERT((m)->next == NULL);  \
RTE_ASSERT((m)->nb_segs == 1);  \
-   __rte_mbuf_sanity_check(m, 0);  \
 } while (0)
 
 /**
@@ -825,6 +831,10 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct 
rte_mempool *mp)
return NULL;
m = (struct rte_mbuf *)mb;
MBUF_RAW_ALLOC_CHECK(m);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(m, 1);
+#endif
+   __rte_mbuf_sanity_check(m, 0);
return m;
 }
 
@@ -846,10 +856,9 @@ static __rte_always_inline void
 rte_mbuf_raw_free(struct rte_mbuf *m)
 {
RTE_ASSERT(RTE_MBUF_DIRECT(m));
-   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) == RTE_MBUF_UNUSED_CNT);
RTE_ASSERT(m->next == NULL);
RTE_ASSERT(m->nb_segs == 1);
-   __rte_mbuf_sanity_check(m, 0);
rte_mempool_put(m->pool, m);
 }
 
@@ -1159,21 +1168,37 @@ static inline int rte_pktmbuf_alloc_bulk(struct 
rte_mempool *pool,
case 0:
while (idx != count) {
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 3:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 2:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 1:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
@@ -1271,7 +1296,7 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
if (rte_mbuf_refcnt_update(md, -1) == 0) {
md->next = NULL;
md->nb_segs = 1;
-   rte_mbuf_refcnt_set(md, 1);

[dpdk-dev] [PATCH v2] mbuf: use refcnt = 0 when debugging

2017-08-07 Thread Charles (Chas) Williams
After commit 8f094a9ac5d7 ("mbuf: set mbuf fields while in pool") is it
much harder to detect a "double free".  If the developer makes a copy
of an mbuf pointer and frees it twice, this condition is never detected
and the mbuf gets returned to the pool twice.

Since this requires extra work to track, make this behavior conditional
on CONFIG_RTE_LIBRTE_MBUF_DEBUG.

Signed-off-by: Chas Williams 
---
 lib/librte_mbuf/rte_mbuf.c |  2 +-
 lib/librte_mbuf/rte_mbuf.h | 42 +++---
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 26a62b8..b0d222c 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -145,7 +145,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->pool = mp;
m->nb_segs = 1;
m->port = 0xff;
-   rte_mbuf_refcnt_set(m, 1);
+   rte_mbuf_refcnt_set(m, RTE_MBUF_UNUSED_CNT);
m->next = NULL;
 }
 
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index eaed7ee..3e00373 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -671,11 +671,15 @@ struct rte_pktmbuf_pool_private {
 
 #ifdef RTE_LIBRTE_MBUF_DEBUG
 
+#define RTE_MBUF_UNUSED_CNT 0
+
 /**  check mbuf type in debug mode */
 #define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
 
 #else /*  RTE_LIBRTE_MBUF_DEBUG */
 
+#define RTE_MBUF_UNUSED_CNT 1
+
 /**  check mbuf type in debug mode */
 #define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
 
@@ -721,6 +725,9 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) != 0);
+#endif
/*
 * The atomic_add is an expensive operation, so we don't want to
 * call it in the case where we know we are the uniq holder of
@@ -791,10 +798,9 @@ void
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
 
 #define MBUF_RAW_ALLOC_CHECK(m) do {   \
-   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);   \
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) == RTE_MBUF_UNUSED_CNT); \
RTE_ASSERT((m)->next == NULL);  \
RTE_ASSERT((m)->nb_segs == 1);  \
-   __rte_mbuf_sanity_check(m, 0);  \
 } while (0)
 
 /**
@@ -825,6 +831,10 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct 
rte_mempool *mp)
return NULL;
m = (struct rte_mbuf *)mb;
MBUF_RAW_ALLOC_CHECK(m);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(m, 1);
+#endif
+   __rte_mbuf_sanity_check(m, 0);
return m;
 }
 
@@ -846,10 +856,9 @@ static __rte_always_inline void
 rte_mbuf_raw_free(struct rte_mbuf *m)
 {
RTE_ASSERT(RTE_MBUF_DIRECT(m));
-   RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
+   RTE_ASSERT(rte_mbuf_refcnt_read(m) == RTE_MBUF_UNUSED_CNT);
RTE_ASSERT(m->next == NULL);
RTE_ASSERT(m->nb_segs == 1);
-   __rte_mbuf_sanity_check(m, 0);
rte_mempool_put(m->pool, m);
 }
 
@@ -1159,21 +1168,37 @@ static inline int rte_pktmbuf_alloc_bulk(struct 
rte_mempool *pool,
case 0:
while (idx != count) {
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 3:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 2:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 1:
MBUF_RAW_ALLOC_CHECK(mbufs[idx]);
+#ifdef RTE_LIBRTE_MBUF_DEBUG
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+#endif
+   __rte_mbuf_sanity_check(mbufs[idx], 0);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
@@ -1271,7 +1296,7 @@ static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
if (rte_mbuf_refcnt_update(md, -1) == 0) {
md->next = NULL;
md->nb_segs = 1;
- 

[dpdk-dev] [PATCH 1/2] bnx2x: fix error handling in bnx2x_loop_obtain_resources()

2016-02-08 Thread Charles (Chas) Williams
I am afraid I don't understand what you are asking. This was broken in 
the commit that added bnx2x, 540a211084a7695a1c7bc43068934c140d6989be

On 02/08/2016 05:51 AM, Bruce Richardson wrote:
> On Wed, Dec 30, 2015 at 07:37:50PM -0500, Chas Williams wrote:
>> From: "Charles (Chas) Williams" 
>>
>> bnx2x_loop_obtain_resources() returns a struct containing the status and
>> the error message.  If bnx2x_do_req4pf() fails, it shouldn't return both
>> of these fields set to 0 indicating failure and no error.
>>
>> Further, bnx2x_do_req4pf() needs to be able fail and return NO_RESOURCES
>> so that bnx2x_loop_obtain_resources() can negotiate reduced resource
>> requirments.  This requires additional checking around bnx2x_do_req4pf().
>>
>> Signed-off-by: Chas Williams <3chas3 at gmail.com>
>> ---
>
> Hi Chas,
>
> can you please provide a commit reference for a fixes line for this patch. [No
> need for a V2 just for that alone, just the reference is enough to save me
> diving into the git history. I'll manually add the fixes line when committing
> to the next-net tree]
>
> Maintainers, can you please review this patchset and provide feedback or an 
> ACK.
>
> Regards,
> /Bruce
>


Re: [dpdk-dev] [PATCH] maintainers: add Xavier for bonding

2020-06-07 Thread Charles (Chas) Williams



On 5/18/2020 9:00 AM, Ferruh Yigit wrote:

On 5/18/2020 12:20 PM, Wei Hu (Xavier) wrote:

Adding Xavier as additional maintainer to bonding.

Signed-off-by: Wei Hu (Xavier) 
---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6a14622..46b6535 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -529,6 +529,7 @@ F: doc/guides/nics/features/default.ini
  
  Link bonding

  M: Chas Williams 
+M: Wei Hu (Xavier) 
  F: drivers/net/bonding/
  F: doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst
  F: app/test/test_link_bonding*


Acked-by: Ferruh Yigit 

Thanks for volunteering.


Acked-by: Chas Williams 




[dpdk-dev] [PATCH V2 0/6] some local vmxnet3 patches

2017-06-15 Thread Charles (Chas) Williams
This series addresses some local issues with the vmxnet3 driver that
others might find of interest.

Changes in v2:

- net/vmxnet3: Implement retrieval of extended stats

.id field wasn't being filled in xstats.

- net/vmxnet3: receive queue memory leak

The leak and buffer reallcation were split.  The buffer reallocation
patch may be superceded by another commit so isn't included in this
series while under discussion.


[dpdk-dev] [PATCH v2 1/6] net/vmxnet3: retain counters on restart

2017-06-15 Thread Charles (Chas) Williams
From: Nachiketa Prachanda 

Most nics like virtio, igb/ixgbe etc. don't reset counters on
dev_start and arguably this helps in monitoring the counters
across a longer time span with multiple device start/stops.
vmxnet3 behavior is opposite to that and counters are reset by
the host side implementation each time the device is restarted.

Change the driver to save the counters in its private context
before it is reset by writing CMD_ACTIVATE to REG_CMD.

Signed-off-by: Nachiketa Prachanda 
Acked-by: Shrikrishna Khare 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 102 ---
 drivers/net/vmxnet3/vmxnet3_ethdev.h |   2 +
 2 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 292a671..1301f50 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -85,6 +85,7 @@ static void vmxnet3_dev_allmulticast_enable(struct 
rte_eth_dev *dev);
 static void vmxnet3_dev_allmulticast_disable(struct rte_eth_dev *dev);
 static int vmxnet3_dev_link_update(struct rte_eth_dev *dev,
   int wait_to_complete);
+static void vmxnet3_hw_stats_save(struct vmxnet3_hw *hw);
 static void vmxnet3_dev_stats_get(struct rte_eth_dev *dev,
  struct rte_eth_stats *stats);
 static void vmxnet3_dev_info_get(struct rte_eth_dev *dev,
@@ -351,6 +352,10 @@ eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev)
RTE_ASSERT((hw->rxdata_desc_size & ~VMXNET3_RXDATA_DESC_SIZE_MASK) ==
   hw->rxdata_desc_size);
 
+   /* clear shadow stats */
+   memset(hw->saved_tx_stats, 0, sizeof(hw->saved_tx_stats));
+   memset(hw->saved_rx_stats, 0, sizeof(hw->saved_rx_stats));
+
return 0;
 }
 
@@ -707,6 +712,9 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
 
PMD_INIT_FUNC_TRACE();
 
+   /* Save stats before it is reset by CMD_ACTIVATE */
+   vmxnet3_hw_stats_save(hw);
+
ret = vmxnet3_setup_driver_shared(dev);
if (ret != VMXNET3_SUCCESS)
return ret;
@@ -820,47 +828,105 @@ vmxnet3_dev_close(struct rte_eth_dev *dev)
 }
 
 static void
+vmxnet3_hw_tx_stats_get(struct vmxnet3_hw *hw, unsigned int q,
+   struct UPT1_TxStats *res)
+{
+#define VMXNET3_UPDATE_TX_STAT(h, i, f, r) \
+   ((r)->f = (h)->tqd_start[(i)].stats.f + \
+   (h)->saved_tx_stats[(i)].f)
+
+   VMXNET3_UPDATE_TX_STAT(hw, q, ucastPktsTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, mcastPktsTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, bcastPktsTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, ucastBytesTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, mcastBytesTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, bcastBytesTxOK, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, pktsTxError, res);
+   VMXNET3_UPDATE_TX_STAT(hw, q, pktsTxDiscard, res);
+
+#undef VMXNET3_UPDATE_TX_STAT
+}
+
+static void
+vmxnet3_hw_rx_stats_get(struct vmxnet3_hw *hw, unsigned int q,
+   struct UPT1_RxStats *res)
+{
+#define VMXNET3_UPDATE_RX_STAT(h, i, f, r) \
+   ((r)->f = (h)->rqd_start[(i)].stats.f + \
+   (h)->saved_rx_stats[(i)].f)
+
+   VMXNET3_UPDATE_RX_STAT(hw, q, ucastPktsRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, mcastPktsRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, bcastPktsRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, ucastBytesRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, mcastBytesRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, bcastBytesRxOK, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, pktsRxError, res);
+   VMXNET3_UPDATE_RX_STAT(hw, q, pktsRxOutOfBuf, res);
+
+#undef VMXNET3_UPDATE_RX_STATS
+}
+
+static void
+vmxnet3_hw_stats_save(struct vmxnet3_hw *hw)
+{
+   unsigned int i;
+
+   VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS);
+
+   RTE_BUILD_BUG_ON(RTE_ETHDEV_QUEUE_STAT_CNTRS < VMXNET3_MAX_TX_QUEUES);
+
+   for (i = 0; i < hw->num_tx_queues; i++)
+   vmxnet3_hw_tx_stats_get(hw, i, &hw->saved_tx_stats[i]);
+   for (i = 0; i < hw->num_rx_queues; i++)
+   vmxnet3_hw_rx_stats_get(hw, i, &hw->saved_rx_stats[i]);
+}
+
+static void
 vmxnet3_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 {
unsigned int i;
struct vmxnet3_hw *hw = dev->data->dev_private;
+   struct UPT1_TxStats txStats;
+   struct UPT1_RxStats rxStats;
 
VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS);
 
RTE_BUILD_BUG_ON(RTE_ETHDEV_QUEUE_STAT_CNTRS < VMXNET3_MAX_TX_QUEUES);
for (i = 0; i < hw->num_tx_queues; i++) {
-   struct UPT1_TxStats *txStats = &hw->tqd_start[i].stats;
+   vmxnet3_hw_tx_stats_get(hw, i, &txStats);
+
+   stats->q_opackets[i] = txStats.ucastPktsTxOK +
+   txStats.mcas

[dpdk-dev] [PATCH v2 4/6] net/vmxnet3: Make vmxnet3_process_events less noisy

2017-06-15 Thread Charles (Chas) Williams
From: Robert Shearman 

Make vmxnet3_process_events less noisy by removing logging when there
are no events to process and by making link, device-change and debug
events DEBUG level rather than ERR.

Change these to use PMD_DRV_LOG instead of PMD_INIT_LOG since they
don't happen at device init.

Signed-off-by: Robert Shearman 
Acked-by: Shrikrishna Khare 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 73277fb..e8cbc85 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -1307,10 +1307,8 @@ vmxnet3_process_events(struct rte_eth_dev *dev)
struct vmxnet3_hw *hw = dev->data->dev_private;
uint32_t events = hw->shared->ecr;
 
-   if (!events) {
-   PMD_INIT_LOG(ERR, "No events to process");
+   if (!events)
return;
-   }
 
/*
 * ECR bits when written with 1b are cleared. Hence write
@@ -1320,9 +1318,7 @@ vmxnet3_process_events(struct rte_eth_dev *dev)
 
/* Check if link state has changed */
if (events & VMXNET3_ECR_LINK) {
-   PMD_INIT_LOG(ERR,
-"Process events in %s(): VMXNET3_ECR_LINK event",
-__func__);
+   PMD_DRV_LOG(DEBUG, "Process events: VMXNET3_ECR_LINK event");
if (vmxnet3_dev_link_update(dev, 0) == 0)
_rte_eth_dev_callback_process(dev,
  RTE_ETH_EVENT_INTR_LSC,
@@ -1335,11 +1331,11 @@ vmxnet3_process_events(struct rte_eth_dev *dev)
   VMXNET3_CMD_GET_QUEUE_STATUS);
 
if (hw->tqd_start->status.stopped)
-   PMD_INIT_LOG(ERR, "tq error 0x%x",
-hw->tqd_start->status.error);
+   PMD_DRV_LOG(ERR, "tq error 0x%x",
+   hw->tqd_start->status.error);
 
if (hw->rqd_start->status.stopped)
-   PMD_INIT_LOG(ERR, "rq error 0x%x",
+   PMD_DRV_LOG(ERR, "rq error 0x%x",
 hw->rqd_start->status.error);
 
/* Reset the device */
@@ -1347,10 +1343,10 @@ vmxnet3_process_events(struct rte_eth_dev *dev)
}
 
if (events & VMXNET3_ECR_DIC)
-   PMD_INIT_LOG(ERR, "Device implementation change event.");
+   PMD_DRV_LOG(DEBUG, "Device implementation change event.");
 
if (events & VMXNET3_ECR_DEBUG)
-   PMD_INIT_LOG(ERR, "Debug event generated by device.");
+   PMD_DRV_LOG(DEBUG, "Debug event generated by device.");
 }
 
 static void
-- 
2.1.4



[dpdk-dev] [PATCH v2 3/6] net/vmxnet3: Generate link-state change notifications

2017-06-15 Thread Charles (Chas) Williams
From: Robert Shearman 

Generate link-state change notifications by listening to interrupts
generated by the device. Make use of the existing
vmxnet3_process_events function that was compiled out, but change it
to call vmxnet3_dev_link_update on a VMXNET3_ECR_LINK event and to not
be so noisy in its log messages.

Enable interrupts on starting the device, using a new helper function,
vmxnet3_enable_intr, based on vmxnet3_disable_intr and validated
against the FreeBSD driver.

Keep track of the number of interrupts registered for to avoid
hardcoding these in vmxnet3_enable/disable_intr and to provision for
any future rxq intr support.

Factor out the guts of vmxnet3_dev_link_update minus the started check
to allow the new function to be called from vmxnet3_dev_start in the
lsc-enabled case to ensure that the link state is correctly set from
the actual state at that point.

Signed-off-by: Robert Shearman 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 116 +++
 drivers/net/vmxnet3/vmxnet3_ethdev.h |   2 +
 2 files changed, 91 insertions(+), 27 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index f5a9832..73277fb 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -83,6 +83,8 @@ static void vmxnet3_dev_promiscuous_enable(struct rte_eth_dev 
*dev);
 static void vmxnet3_dev_promiscuous_disable(struct rte_eth_dev *dev);
 static void vmxnet3_dev_allmulticast_enable(struct rte_eth_dev *dev);
 static void vmxnet3_dev_allmulticast_disable(struct rte_eth_dev *dev);
+static int __vmxnet3_dev_link_update(struct rte_eth_dev *dev,
+int wait_to_complete);
 static int vmxnet3_dev_link_update(struct rte_eth_dev *dev,
   int wait_to_complete);
 static void vmxnet3_hw_stats_save(struct vmxnet3_hw *hw);
@@ -102,10 +104,8 @@ static int vmxnet3_dev_vlan_filter_set(struct rte_eth_dev 
*dev,
 static void vmxnet3_dev_vlan_offload_set(struct rte_eth_dev *dev, int mask);
 static void vmxnet3_mac_addr_set(struct rte_eth_dev *dev,
 struct ether_addr *mac_addr);
+static void vmxnet3_interrupt_handler(void *param);
 
-#if PROCESS_SYS_EVENTS == 1
-static void vmxnet3_process_events(struct vmxnet3_hw *);
-#endif
 /*
  * The set of PCI devices this driver supports
  */
@@ -250,10 +250,22 @@ vmxnet3_disable_intr(struct vmxnet3_hw *hw)
PMD_INIT_FUNC_TRACE();
 
hw->shared->devRead.intrConf.intrCtrl |= VMXNET3_IC_DISABLE_ALL;
-   for (i = 0; i < VMXNET3_MAX_INTRS; i++)
+   for (i = 0; i < hw->num_intrs; i++)
VMXNET3_WRITE_BAR0_REG(hw, VMXNET3_REG_IMR + i * 8, 1);
 }
 
+static void
+vmxnet3_enable_intr(struct vmxnet3_hw *hw)
+{
+   int i;
+
+   PMD_INIT_FUNC_TRACE();
+
+   hw->shared->devRead.intrConf.intrCtrl &= ~VMXNET3_IC_DISABLE_ALL;
+   for (i = 0; i < hw->num_intrs; i++)
+   VMXNET3_WRITE_BAR0_REG(hw, VMXNET3_REG_IMR + i * 8, 0);
+}
+
 /*
  * Gets tx data ring descriptor size.
  */
@@ -425,7 +437,7 @@ static int eth_vmxnet3_pci_remove(struct rte_pci_device 
*pci_dev)
 
 static struct rte_pci_driver rte_vmxnet3_pmd = {
.id_table = pci_id_vmxnet3_map,
-   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
.probe = eth_vmxnet3_pci_probe,
.remove = eth_vmxnet3_pci_remove,
 };
@@ -648,11 +660,11 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
 
/*
 * Set number of interrupts to 1
-* PMD disables all the interrupts but this is MUST to activate device
-* It needs at least one interrupt for link events to handle
-* So we'll disable it later after device activation if needed
+* PMD by default disables all the interrupts but this is MUST
+* to activate device. It needs at least one interrupt for
+* link events to handle
 */
-   devRead->intrConf.numIntrs = 1;
+   hw->num_intrs = devRead->intrConf.numIntrs = 1;
devRead->intrConf.intrCtrl |= VMXNET3_IC_DISABLE_ALL;
 
for (i = 0; i < hw->num_tx_queues; i++) {
@@ -747,6 +759,20 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
if (ret != VMXNET3_SUCCESS)
return ret;
 
+   /* check if lsc interrupt feature is enabled */
+   if (dev->data->dev_conf.intr_conf.lsc) {
+   struct rte_pci_device *pci_dev = RTE_DEV_TO_PCI(dev->device);
+
+   /* Setup interrupt callback  */
+   rte_intr_callback_register(&pci_dev->intr_handle,
+  vmxnet3_interrupt_handler, dev);
+
+   if (rte_intr_enable(&pci_dev->intr_handle) < 0) {
+   PMD_INIT_LOG(ERR, "interrupt enable failed");
+   return -EIO;
+   }
+   }
+
/* Exchange shared data with device */
VMXNET3_WRITE_B

[dpdk-dev] [PATCH v2 2/6] net/vmxnet3: Implement retrieval of extended stats

2017-06-15 Thread Charles (Chas) Williams
From: Robert Shearman 

Implement xstats_get() to allow a number of driver-specific tx and rx
stats to be retrieved.

Signed-off-by: Robert Shearman 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 113 +++
 1 file changed, 113 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 1301f50..f5a9832 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -88,6 +88,11 @@ static int vmxnet3_dev_link_update(struct rte_eth_dev *dev,
 static void vmxnet3_hw_stats_save(struct vmxnet3_hw *hw);
 static void vmxnet3_dev_stats_get(struct rte_eth_dev *dev,
  struct rte_eth_stats *stats);
+static int vmxnet3_dev_xstats_get_names(struct rte_eth_dev *dev,
+   struct rte_eth_xstat_name *xstats,
+   unsigned int n);
+static int vmxnet3_dev_xstats_get(struct rte_eth_dev *dev,
+ struct rte_eth_xstat *xstats, unsigned int n);
 static void vmxnet3_dev_info_get(struct rte_eth_dev *dev,
 struct rte_eth_dev_info *dev_info);
 static const uint32_t *
@@ -122,6 +127,8 @@ static const struct eth_dev_ops vmxnet3_eth_dev_ops = {
.allmulticast_disable = vmxnet3_dev_allmulticast_disable,
.link_update  = vmxnet3_dev_link_update,
.stats_get= vmxnet3_dev_stats_get,
+   .xstats_get_names = vmxnet3_dev_xstats_get_names,
+   .xstats_get   = vmxnet3_dev_xstats_get,
.mac_addr_set = vmxnet3_mac_addr_set,
.dev_infos_get= vmxnet3_dev_info_get,
.dev_supported_ptypes_get = vmxnet3_dev_supported_ptypes_get,
@@ -133,6 +140,27 @@ static const struct eth_dev_ops vmxnet3_eth_dev_ops = {
.tx_queue_release = vmxnet3_dev_tx_queue_release,
 };
 
+struct vmxnet3_xstats_name_off {
+   char name[RTE_ETH_XSTATS_NAME_SIZE];
+   unsigned int offset;
+};
+
+/* tx_qX_ is prepended to the name string here */
+static const struct vmxnet3_xstats_name_off vmxnet3_txq_stat_strings[] = {
+   {"drop_total", offsetof(struct vmxnet3_txq_stats, drop_total)},
+   {"drop_too_many_segs", offsetof(struct vmxnet3_txq_stats, 
drop_too_many_segs)},
+   {"drop_tso",   offsetof(struct vmxnet3_txq_stats, drop_tso)},
+   {"tx_ring_full",   offsetof(struct vmxnet3_txq_stats, 
tx_ring_full)},
+};
+
+/* rx_qX_ is prepended to the name string here */
+static const struct vmxnet3_xstats_name_off vmxnet3_rxq_stat_strings[] = {
+   {"drop_total",   offsetof(struct vmxnet3_rxq_stats, 
drop_total)},
+   {"drop_err", offsetof(struct vmxnet3_rxq_stats, drop_err)},
+   {"drop_fcs", offsetof(struct vmxnet3_rxq_stats, drop_fcs)},
+   {"rx_buf_alloc_failure", offsetof(struct vmxnet3_rxq_stats, 
rx_buf_alloc_failure)},
+};
+
 static const struct rte_memzone *
 gpa_zone_reserve(struct rte_eth_dev *dev, uint32_t size,
 const char *post_string, int socket_id,
@@ -882,6 +910,91 @@ vmxnet3_hw_stats_save(struct vmxnet3_hw *hw)
vmxnet3_hw_rx_stats_get(hw, i, &hw->saved_rx_stats[i]);
 }
 
+static int
+vmxnet3_dev_xstats_get_names(struct rte_eth_dev *dev,
+struct rte_eth_xstat_name *xstats_names,
+unsigned int n)
+{
+   unsigned int i, t, count = 0;
+   unsigned int nstats =
+   dev->data->nb_tx_queues * RTE_DIM(vmxnet3_txq_stat_strings) +
+   dev->data->nb_rx_queues * RTE_DIM(vmxnet3_rxq_stat_strings);
+
+   if (!xstats_names || n < nstats)
+   return nstats;
+
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   if (!dev->data->rx_queues[i])
+   continue;
+
+   for (t = 0; t < RTE_DIM(vmxnet3_rxq_stat_strings); t++) {
+   snprintf(xstats_names[count].name,
+sizeof(xstats_names[count].name),
+"rx_q%u_%s", i,
+vmxnet3_rxq_stat_strings[t].name);
+   count++;
+   }
+   }
+
+   for (i = 0; i < dev->data->nb_tx_queues; i++) {
+   if (!dev->data->tx_queues[i])
+   continue;
+
+   for (t = 0; t < RTE_DIM(vmxnet3_txq_stat_strings); t++) {
+   snprintf(xstats_names[count].name,
+sizeof(xstats_names[count].name),
+"tx_q%u_%s", i,
+vmxnet3_txq_stat_strings[t].name);
+   count++;
+   }
+   }
+
+   return count;
+}
+
+static int
+vmxnet3_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat *xstats,
+  unsigned int n)
+{
+   unsigned int i, t, count = 0;
+   unsigned 

[dpdk-dev] [PATCH v2 5/6] net/vmxnet3: receive queue memory leak

2017-06-15 Thread Charles (Chas) Williams
From: Mandeep Rohilla 

This addresses an mbuf leak in an error condition during packet
receive.

Fixes: dfaff37fc46d ("vmxnet3: import new vmxnet3 poll mode driver 
implementation")

Signed-off-by: Mandeep Rohilla 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index d8713a1..13c73f6 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -800,6 +800,12 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
   (int)(rcd - (struct Vmxnet3_RxCompDesc *)
 rxq->comp_ring.base), rcd->rxdIdx);
rte_pktmbuf_free_seg(rxm);
+   if (rxq->start_seg) {
+   struct rte_mbuf *start = rxq->start_seg;
+
+   rxq->start_seg = NULL;
+   rte_pktmbuf_free(start);
+   }
goto rcd_done;
}
 
-- 
2.1.4



[dpdk-dev] [PATCH v2 6/6] net/vmxnet3: preserve configured MAC address

2017-06-15 Thread Charles (Chas) Williams
From: George Wilkie 

When starting a vmxnet3 device, it is always writing the permanent MAC
address, even if a different MAC address was configured.  Write from
the device data instead which holds the current one.

Signed-off-by: George Wilkie 
Acked-by: Shrikrishna Khare 
---
 drivers/net/vmxnet3/vmxnet3_ethdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c 
b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index e8cbc85..265de35 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -734,7 +734,7 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev)
vmxnet3_dev_vlan_offload_set(dev,
 ETH_VLAN_STRIP_MASK | 
ETH_VLAN_FILTER_MASK);
 
-   vmxnet3_write_mac(hw, hw->perm_addr);
+   vmxnet3_write_mac(hw, dev->data->mac_addrs->addr_bytes);
 
return VMXNET3_SUCCESS;
 }
-- 
2.1.4



Re: [dpdk-dev] [PATCH V2 0/6] some local vmxnet3 patches

2017-06-28 Thread Charles (Chas) Williams



On 06/28/2017 08:52 AM, Ferruh Yigit wrote:

On 6/28/2017 12:30 PM, Ferruh Yigit wrote:

On 6/15/2017 1:16 PM, Charles (Chas) Williams wrote:

This series addresses some local issues with the vmxnet3 driver that
others might find of interest.

Changes in v2:

- net/vmxnet3: Implement retrieval of extended stats

.id field wasn't being filled in xstats.

- net/vmxnet3: receive queue memory leak

The leak and buffer reallcation were split.  The buffer reallocation
patch may be superceded by another commit so isn't included in this
series while under discussion.



Series applied to dpdk-next-net/master, thanks.


Hi Charles,

Patches require vmxnet3.ini update, "Extended stats" and "Link status
event" ones I guess. I missed them to before getting patchset.

Would you mind sending another patch to update features document?

Thanks,
ferruh



I will get to it today.


[dpdk-dev] [PATCH] doc: add extended stats to features file

2017-06-28 Thread Charles (Chas) Williams
Update vmxnet3.ini to indicate extended stats support

Signed-off-by: Chas Williams 
---
 doc/guides/nics/features/vmxnet3.ini | 1 +
 1 file changed, 1 insertion(+)

diff --git a/doc/guides/nics/features/vmxnet3.ini 
b/doc/guides/nics/features/vmxnet3.ini
index 8f7f8e0..4e60183 100644
--- a/doc/guides/nics/features/vmxnet3.ini
+++ b/doc/guides/nics/features/vmxnet3.ini
@@ -27,3 +27,4 @@ Linux VFIO   = Y
 x86-32   = Y
 x86-64   = Y
 Usage doc= Y
+Extended stats   = Y
-- 
2.1.4



Re: [dpdk-dev] [PATCH V2 0/6] some local vmxnet3 patches

2017-06-28 Thread Charles (Chas) Williams



On 06/28/2017 08:52 AM, Ferruh Yigit wrote:

On 6/28/2017 12:30 PM, Ferruh Yigit wrote:

On 6/15/2017 1:16 PM, Charles (Chas) Williams wrote:

This series addresses some local issues with the vmxnet3 driver that
others might find of interest.

Changes in v2:

- net/vmxnet3: Implement retrieval of extended stats

.id field wasn't being filled in xstats.

- net/vmxnet3: receive queue memory leak

The leak and buffer reallcation were split.  The buffer reallocation
patch may be superceded by another commit so isn't included in this
series while under discussion.



Series applied to dpdk-next-net/master, thanks.


Hi Charles,

Patches require vmxnet3.ini update, "Extended stats" and "Link status
event" ones I guess. I missed them to before getting patchset.

Would you mind sending another patch to update features document?

Thanks,
ferruh



Curiously, the vmxnet3 already indicates that it is handling link
status event interrupts.


; Supported features of the 'vmxnet3' network poll mode driver.
;
; Refer to default.ini for the full list of available PMD features.
;
[Features]
Speed capabilities   = P
Link status  = Y
Link status event= Y


[dpdk-dev] [PATCH] net/vmxnet3: avoid code duplication

2017-06-29 Thread Charles (Chas) Williams
Refactor vmxnet3_post_rx_bufs() to call vmxnet3_renew_desc()
to update the newly allocated mbufs.  While here, relocate the
relevant comments to vmxnet3_renew_desc().

Signed-off-by: Chas Williams 
---
 drivers/net/vmxnet3/vmxnet3_rxtx.c | 61 +-
 1 file changed, 21 insertions(+), 40 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c 
b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index 13c73f6..d9cf437 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -593,24 +593,40 @@ static inline void
 vmxnet3_renew_desc(vmxnet3_rx_queue_t *rxq, uint8_t ring_id,
   struct rte_mbuf *mbuf)
 {
-   uint32_t val = 0;
+   uint32_t val;
struct vmxnet3_cmd_ring *ring = &rxq->cmd_ring[ring_id];
struct Vmxnet3_RxDesc *rxd =
(struct Vmxnet3_RxDesc *)(ring->base + ring->next2fill);
vmxnet3_buf_info_t *buf_info = &ring->buf_info[ring->next2fill];
 
-   if (ring_id == 0)
+   if (ring_id == 0) {
+   /* Usually: One HEAD type buf per packet
+* val = (ring->next2fill % rxq->hw->bufs_per_pkt) ?
+* VMXNET3_RXD_BTYPE_BODY : VMXNET3_RXD_BTYPE_HEAD;
+*/
+
+   /* We use single packet buffer so all heads here */
val = VMXNET3_RXD_BTYPE_HEAD;
-   else
+   } else {
+   /* All BODY type buffers for 2nd ring */
val = VMXNET3_RXD_BTYPE_BODY;
+   }
 
+   /*
+* Load mbuf pointer into buf_info[ring_size]
+* buf_info structure is equivalent to cookie for virtio-virtqueue
+*/
buf_info->m = mbuf;
buf_info->len = (uint16_t)(mbuf->buf_len - RTE_PKTMBUF_HEADROOM);
buf_info->bufPA = rte_mbuf_data_dma_addr_default(mbuf);
 
+   /* Load Rx Descriptor with the buffer's GPA */
rxd->addr = buf_info->bufPA;
+
+   /* After this point rxd->addr MUST not be NULL */
rxd->btype = val;
rxd->len = buf_info->len;
+   /* Flip gen bit at the end to change ownership */
rxd->gen = ring->gen;
 
vmxnet3_cmd_ring_adv_next2fill(ring);
@@ -629,28 +645,11 @@ static int
 vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t ring_id)
 {
int err = 0;
-   uint32_t i = 0, val = 0;
+   uint32_t i = 0;
struct vmxnet3_cmd_ring *ring = &rxq->cmd_ring[ring_id];
 
-   if (ring_id == 0) {
-   /* Usually: One HEAD type buf per packet
-* val = (ring->next2fill % rxq->hw->bufs_per_pkt) ?
-* VMXNET3_RXD_BTYPE_BODY : VMXNET3_RXD_BTYPE_HEAD;
-*/
-
-   /* We use single packet buffer so all heads here */
-   val = VMXNET3_RXD_BTYPE_HEAD;
-   } else {
-   /* All BODY type buffers for 2nd ring */
-   val = VMXNET3_RXD_BTYPE_BODY;
-   }
-
while (vmxnet3_cmd_ring_desc_avail(ring) > 0) {
-   struct Vmxnet3_RxDesc *rxd;
struct rte_mbuf *mbuf;
-   vmxnet3_buf_info_t *buf_info = &ring->buf_info[ring->next2fill];
-
-   rxd = (struct Vmxnet3_RxDesc *)(ring->base + ring->next2fill);
 
/* Allocate blank mbuf for the current Rx Descriptor */
mbuf = rte_mbuf_raw_alloc(rxq->mp);
@@ -661,25 +660,7 @@ vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t 
ring_id)
break;
}
 
-   /*
-* Load mbuf pointer into buf_info[ring_size]
-* buf_info structure is equivalent to cookie for 
virtio-virtqueue
-*/
-   buf_info->m = mbuf;
-   buf_info->len = (uint16_t)(mbuf->buf_len -
-  RTE_PKTMBUF_HEADROOM);
-   buf_info->bufPA = rte_mbuf_data_dma_addr_default(mbuf);
-
-   /* Load Rx Descriptor with the buffer's GPA */
-   rxd->addr = buf_info->bufPA;
-
-   /* After this point rxd->addr MUST not be NULL */
-   rxd->btype = val;
-   rxd->len = buf_info->len;
-   /* Flip gen bit at the end to change ownership */
-   rxd->gen = ring->gen;
-
-   vmxnet3_cmd_ring_adv_next2fill(ring);
+   vmxnet3_renew_desc(rxq, ring_id, mbuf);
i++;
}
 
-- 
2.1.4