Re: [dpdk-dev] [RFC 2/2] net/tap: add eBPF instructions

2017-12-05 Thread Pascal Mazon
Indeed, the spirit of the initial work was to publish source code,
compiled BPF bytecode, and Makefile for re-generating the latter.
I have no clue regarding licensing, I'll trust you guys on that.

On 30/11/2017 18:39, Thomas Monjalon wrote:
> 30/11/2017 18:20, Stephen Hemminger:
>> On Thu, 30 Nov 2017 18:05:22 +0100
>> Thomas Monjalon  wrote:
>>
>>> 30/11/2017 17:54, Stephen Hemminger:
 Loading BPF is a could solution to doing dynamic flow matching.
 It needs to be done differently to be accepted.

 Putting raw machine instructions in source code is as bad as binary
 blobs. You need to provide original source of program and then have
 build instructions to create any needed code.  
>>> The source program is provided in this patch: tap_bpf_program.c
>>> It is pre-compiled to avoid requiring too many dependencies when building 
>>> DPDK.
>> But the "freedom to modify" comes into play here. If a *evil* vendor builds
>> an application based on DPDK and does not provide source. Then user still 
>> deserves
>> the right to modify the eBPF program that it loads as GPL.  The best solution
>> is to make the TAP PMD loader routine load the program from a file.
>> Although I am certainly not a FSF legal scholar, putting GPL'd object code in
>> TAP PMD risks accusations of being a derived or combined work.
> Good point.
> The compiled BPF may be provided as a plugin file.
> So we would be free to not package this GPL file.



Re: [dpdk-dev] [PATCH] bus/vdev: add custom scan hook

2017-12-05 Thread Tan, Jianfeng



On 12/4/2017 5:31 PM, Thomas Monjalon wrote:


The hook is in bus->scan().
I think we should launch a bus scan when there is a new device event.


That's what I'm trying to say. We finally need to execute a handler as 
of a device event to finish the job.


Then why not just keep the vdev plug in that handler, by calling 
rte_dev_hotplug_add()?


Thanks,
Jianfeng



[dpdk-dev] [PATCH v2 0/4] Vhost: fix mq=on but VIRTIO_NET_F_MQ not negotiated

2017-12-05 Thread Maxime Coquelin
Hi,

This second revision only fixes the bit shifting issue reported by
Ladi (patch 1/4).

Having QEMU started with mq=on but guest driver not negotiating
VIRTIO_NET_F_MQ feature ends up in the vhost device to never
start. Indeed, more queues are created in the vhost backend than
configured.

Guest drivers known to not advertise the VIRTIO_NET_F_MQ feature are
iPXE and OVMF Virtio-net drivers.

Queues are created because before starting the guest, QEMU sends
VHOST_USER_SET_VRING_CALL requests for all queues declared in QEMU
command line. Also, once Virtio features negotiated, QEMU sends
VHOST_USER_SET_VRING_ENABLE requests to disable all but the first
queue pair.

This series fixes this by destroying all but first queue pair in
the backend if VIRTIO_NET_F_MQ isn't negotiated. First patches
makes sure that VHOST_USER_SET_FEATURES request doesn't change
Virtio features while the device is running, which should never
happen as per the Virtio spec. This helps to make sure vitqueues
aren't destroyed while being processed, but also protect from
other illegal features changes (e.g. VIRTIO_NET_F_MRG_RXBUF).


Changes since v1:
=
- Patch 1: shift bits in the right direction (Ladi)

Maxime Coquelin (4):
  vhost: prevent features to be changed while device is running
  vhost: propagate VHOST_USER_SET_FEATURES handling error
  vhost: extract virtqueue cleaning and freeing functions
  vhost: destroy unused virtqueues when multiqueue not negotiated

 lib/librte_vhost/vhost.c  | 22 --
 lib/librte_vhost/vhost.h  |  3 +++
 lib/librte_vhost/vhost_user.c | 40 ++--
 3 files changed, 53 insertions(+), 12 deletions(-)

-- 
2.14.3



[dpdk-dev] [PATCH v2 2/4] vhost: propagate VHOST_USER_SET_FEATURES handling error

2017-12-05 Thread Maxime Coquelin
Not propagating VHOST_USER_SET_FEATURES request handling
error may result in unpredictable behavior, as host and
guests features may no more be synchronized.

This patch fixes this by reporting the error to the upper
layer, which would result in the device being destroyed
and the connection with the master to be closed.

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 2d86c0ca8..a5e1f2482 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1263,7 +1263,9 @@ vhost_user_msg_handler(int vid, int fd)
send_vhost_reply(fd, &msg);
break;
case VHOST_USER_SET_FEATURES:
-   vhost_user_set_features(dev, msg.payload.u64);
+   ret = vhost_user_set_features(dev, msg.payload.u64);
+   if (ret)
+   return -1;
break;
 
case VHOST_USER_GET_PROTOCOL_FEATURES:
-- 
2.14.3



[dpdk-dev] [PATCH v2 1/4] vhost: prevent features to be changed while device is running

2017-12-05 Thread Maxime Coquelin
As section 2.2 of the Virtio spec states about features
negotiation:
"During device initialization, the driver reads this and tells
the device the subset that it accepts. The only way to
renegotiate is to reset the device."

This patch implements a check to prevent illegal features change
while the device is running.

One exception is the VHOST_F_LOG_ALL feature bit, which is enabled
when live-migration is initiated. But this feature is not negotiated
with the Virtio driver, but directly with the Vhost master.

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index f4c7ce462..2d86c0ca8 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -183,7 +183,22 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t 
features)
return -1;
}
 
-   if ((dev->flags & VIRTIO_DEV_RUNNING) && dev->features != features) {
+   if (dev->features == features)
+   return 0;
+
+   if (dev->flags & VIRTIO_DEV_RUNNING) {
+   /*
+* Error out if master tries to change features while device is
+* in running state. The exception being VHOST_F_LOG_ALL, which
+* is enabled when the live-migration starts.
+*/
+   if ((dev->features ^ features) & ~(1ULL << VHOST_F_LOG_ALL)) {
+   RTE_LOG(ERR, VHOST_CONFIG,
+   "(%d) features changed while device is 
running.\n",
+   dev->vid);
+   return -1;
+   }
+
if (dev->notify_ops->features_changed)
dev->notify_ops->features_changed(dev->vid, features);
}
-- 
2.14.3



[dpdk-dev] [PATCH v2 3/4] vhost: extract virtqueue cleaning and freeing functions

2017-12-05 Thread Maxime Coquelin
This patch extracts needed code for vhost_user.c to be able
to clean and free virtqueues unitary.

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost.c | 22 --
 lib/librte_vhost/vhost.h |  3 +++
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 4f8b73a09..df528a4ea 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -103,7 +103,7 @@ get_device(int vid)
return dev;
 }
 
-static void
+void
 cleanup_vq(struct vhost_virtqueue *vq, int destroy)
 {
if ((vq->callfd >= 0) && (destroy != 0))
@@ -127,6 +127,15 @@ cleanup_device(struct virtio_net *dev, int destroy)
cleanup_vq(dev->virtqueue[i], destroy);
 }
 
+void
+free_vq(struct vhost_virtqueue *vq)
+{
+   rte_free(vq->shadow_used_ring);
+   rte_free(vq->batch_copy_elems);
+   rte_mempool_free(vq->iotlb_pool);
+   rte_free(vq);
+}
+
 /*
  * Release virtqueues and device memory.
  */
@@ -134,16 +143,9 @@ static void
 free_device(struct virtio_net *dev)
 {
uint32_t i;
-   struct vhost_virtqueue *vq;
-
-   for (i = 0; i < dev->nr_vring; i++) {
-   vq = dev->virtqueue[i];
 
-   rte_free(vq->shadow_used_ring);
-   rte_free(vq->batch_copy_elems);
-   rte_mempool_free(vq->iotlb_pool);
-   rte_free(vq);
-   }
+   for (i = 0; i < dev->nr_vring; i++)
+   free_vq(dev->virtqueue[i]);
 
rte_free(dev);
 }
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 1cc81c17c..9cad1bb3c 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -364,6 +364,9 @@ void cleanup_device(struct virtio_net *dev, int destroy);
 void reset_device(struct virtio_net *dev);
 void vhost_destroy_device(int);
 
+void cleanup_vq(struct vhost_virtqueue *vq, int destroy);
+void free_vq(struct vhost_virtqueue *vq);
+
 int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx);
 
 void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
-- 
2.14.3



[dpdk-dev] [PATCH v2 4/4] vhost: destroy unused virtqueues when multiqueue not negotiated

2017-12-05 Thread Maxime Coquelin
QEMU sends VHOST_USER_SET_VRING_CALL requests for all queues
declared in QEMU command line before the guest is started.
It has the effect in DPDK vhost-user backend to allocate vrings
for all queues declared by QEMU.

If the first driver being used does not support multiqueue,
the device never changes to VIRTIO_DEV_RUNNING state as only
the first queue pair is initialized. One driver impacted by
this bug is virtio-net's iPXE driver which does not support
VIRTIO_NET_F_MQ feature.

It is safe to destroy unused virtqueues in SET_FEATURES request
handler, as it is ensured the device is not in running state
at this stage, so virtqueues aren't being processed.

Signed-off-by: Maxime Coquelin 
---
 lib/librte_vhost/vhost_user.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index a5e1f2482..b17080215 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -173,6 +173,7 @@ vhost_user_get_features(struct virtio_net *dev)
 static int
 vhost_user_set_features(struct virtio_net *dev, uint64_t features)
 {
+   int i;
uint64_t vhost_features = 0;
 
rte_vhost_driver_get_features(dev->ifname, &vhost_features);
@@ -216,6 +217,24 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t 
features)
(dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ? "on" : "off",
(dev->features & (1ULL << VIRTIO_F_VERSION_1)) ? "on" : "off");
 
+   if (!(dev->features & (1ULL << VIRTIO_NET_F_MQ))) {
+   /*
+* Remove all but first queue pair if MQ hasn't been
+* negotiated. This is safe because the device is not
+* running at this stage.
+*/
+   for (i = dev->nr_vring; i > 1; i--) {
+   struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+   if (!vq)
+   continue;
+
+   cleanup_vq(vq, 1);
+   free_vq(vq);
+   dev->nr_vring--;
+   }
+   }
+
return 0;
 }
 
-- 
2.14.3



[dpdk-dev] DPDK Amazon AWS ENA driver problem with ping

2017-12-05 Thread Jordan Rhody
Hi,

I have been testing the DPDK ENA driver and am hoping that someone can help
in assisting an observation I have.

Using the DPDK ENA driver, I can send a ping request/reply from client A to
client B through an Edge device running DPDK.

However, the ping request that I observe always contains a '0x' for the
sequence id of the icmp request:

Internet Control Message Protocol
Type: 8 (Echo (ping) request)
Code: 0
Checksum: 0xf9a1 incorrect, should be 0xfa8c
[Checksum Status: Bad] <--- checksum is
original and unmodified
Identifier (BE): 2385 (0x0951)
Identifier (LE): 20745 (0x5109)
Sequence number (BE): 65535 (0x)<--- here
Sequence number (LE): 65535 (0x)<--- here
[No response seen]
Timestamp from icmp data: Nov 21, 2017 16:41:20.0 PST
[Timestamp from icmp data (relative): 1.042436000 seconds]
Data (48 bytes)
Data: 612d0f00101112131415161718191a1b1c1d1e1f...
[Length: 48]

I have traced the packet to the entry point of eth_ena_xmit_pkts( ) and it
does *not* have sequence id 0x.  The sequence id seems sane and valid
at that point.

Turning off DPDK and using the Linux Amazon AWS ENA driver yields no
problems at all.  ping request/reply works without a problem.  As an aside,
this same code works for the DPDK igb, ixgbe and virtio driver.

I have patched the DPDK ENA driver with all patches through DPDK 17.05 to
no avail and with the same results.

Question -- the basic: Could there be something 'wrong' with my AWS network
interface with regard to setup, perhaps something that may be obvious to
other folks more familiar with AWS?

If not, has anyone else experienced problems with DPDK AWS ENA driver?

Are there any possible patches out there that have not been commit to the
git repo that may fix this particular issue?

Thanks in advance,

Jordan Rhode


[dpdk-dev] The limitation of VHOST_MEMORY_MAX_NREGIONS in virtio-user with vhost-user backend

2017-12-05 Thread Wang, Wei 5. (NSB - CN/Hangzhou)


Hi, all

In DPDK document, it it described that Virtio in containers Cannot work when 
there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. In another word, do 
not use 2MB hugepage so far. Do you know the reason of this limitation?

In my envirionment, the pdpe1gb is not set in cpu flag, so hugepage can't be 
set to 1GB size. The hugepage number shall be more than 8


So is there any solution or workaround to fix this limitation? Or change dpdk 
code to fix this limitation?



Vivian
seat:  19009
phone: 13738006921





Re: [dpdk-dev] [PATCH 6/7] devtools/test-build: add MRVL NET PMD to test-build

2017-12-05 Thread Tomasz Duszynski
On Mon, Dec 04, 2017 at 04:48:51PM -0800, Ferruh Yigit wrote:
> On 12/1/2017 7:19 AM, Tomasz Duszynski wrote:
> > Add MRVL NET PMD to test build tool.
> >
> > Signed-off-by: Tomasz Duszynski 
> > ---
> >  devtools/test-build.sh | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/devtools/test-build.sh b/devtools/test-build.sh
> > index 092d3a7..78609c9 100755
> > --- a/devtools/test-build.sh
> > +++ b/devtools/test-build.sh
> > @@ -197,6 +197,7 @@ config () #   
> > sed -ri   's,(SCHED_.*=)n,\1y,' $1/.config
> > test -z "$LIBMUSDK_PATH" || \
> > sed -ri's,(PMD_MRVL_CRYPTO=)n,\1y,' $1/.config
> > +   sed -ri   's,(MRVL_PMD=)n,\1y,' $1/.config
>
> Won't MRVL_PMD also require LIBMUSDK_PATH?
> For this case check doesn't cover it and will be enabled always.
>

Good catch. Will be fixed in v3.

> > build_config_hook $1 $2 $3
> >
> > # Explicit enabler/disabler (uppercase)
> >
>

--
- Tomasz Duszyński


[dpdk-dev] [PATCH] net/i40e: exclude LLDP packet count

2017-12-05 Thread Qi Zhang
When use port stats register to calcuate the packet count,  LLDP packets
are counted in statistics which is not expected, the patch exclude this
number from total number.

Fixes: 763de290cbd1 ("net/i40e: fix packet count for PF")
Cc sta...@dpdk.org

Signed-off-by: Qi Zhang 
---
 drivers/net/i40e/i40e_ethdev.c | 58 +++---
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 811cc9f..4cbb259 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -2531,6 +2531,22 @@ i40e_read_stats_registers(struct i40e_pf *pf, struct 
i40e_hw *hw)
pf->offset_loaded,
&pf->internal_stats_offset.rx_broadcast,
&pf->internal_stats.rx_broadcast);
+   /* Get total internal tx packet count */
+   i40e_stat_update_48(hw, I40E_GLV_UPTCH(hw->port),
+   I40E_GLV_UPTCL(hw->port),
+   pf->offset_loaded,
+   &pf->internal_stats_offset.tx_unicast,
+   &pf->internal_stats.tx_unicast);
+   i40e_stat_update_48(hw, I40E_GLV_MPTCH(hw->port),
+   I40E_GLV_MPTCL(hw->port),
+   pf->offset_loaded,
+   &pf->internal_stats_offset.tx_multicast,
+   &pf->internal_stats.tx_multicast);
+   i40e_stat_update_48(hw, I40E_GLV_BPTCH(hw->port),
+   I40E_GLV_BPTCL(hw->port),
+   pf->offset_loaded,
+   &pf->internal_stats_offset.tx_broadcast,
+   &pf->internal_stats.tx_broadcast);
 
/* exclude CRC size */
pf->internal_stats.rx_bytes -= (pf->internal_stats.rx_unicast +
@@ -2560,16 +2576,32 @@ i40e_read_stats_registers(struct i40e_pf *pf, struct 
i40e_hw *hw)
ns->eth.rx_bytes -= (ns->eth.rx_unicast + ns->eth.rx_multicast +
ns->eth.rx_broadcast) * ETHER_CRC_LEN;
 
-   /* Workaround: it is possible I40E_GLV_GORCH[H/L] is updated before
+   /* exlude internal rx bytes
+* Workaround: it is possible I40E_GLV_GORCH[H/L] is updated before
 * I40E_GLPRT_GORCH[H/L], so there is a small window that cause negtive
 * value.
+* same to I40E_GLV_UPRC[H/L], I40E_GLV_MPRC[H/L], I40E_GLV_BPRC[H/L].
 */
if (ns->eth.rx_bytes < pf->internal_stats.rx_bytes)
ns->eth.rx_bytes = 0;
-   /* exlude internal rx bytes */
else
ns->eth.rx_bytes -= pf->internal_stats.rx_bytes;
 
+   if (ns->eth.rx_unicast < pf->internal_stats.rx_unicast)
+   ns->eth.rx_unicast = 0;
+   else
+   ns->eth.rx_unicast -= pf->internal_stats.rx_unicast;
+
+   if (ns->eth.rx_multicast < pf->internal_stats.rx_multicast)
+   ns->eth.rx_multicast = 0;
+   else
+   ns->eth.rx_multicast -= pf->internal_stats.rx_multicast;
+
+   if (ns->eth.rx_broadcast < pf->internal_stats.rx_broadcast)
+   ns->eth.rx_broadcast = 0;
+   else
+   ns->eth.rx_broadcast -= pf->internal_stats.rx_broadcast;
+
i40e_stat_update_32(hw, I40E_GLPRT_RDPC(hw->port),
pf->offset_loaded, &os->eth.rx_discards,
&ns->eth.rx_discards);
@@ -2598,12 +2630,32 @@ i40e_read_stats_registers(struct i40e_pf *pf, struct 
i40e_hw *hw)
ns->eth.tx_bytes -= (ns->eth.tx_unicast + ns->eth.tx_multicast +
ns->eth.tx_broadcast) * ETHER_CRC_LEN;
 
-   /* exclude internal tx bytes */
+   /* exlude internal tx bytes
+* Workaround: it is possible I40E_GLV_GOTCH[H/L] is updated before
+* I40E_GLPRT_GOTCH[H/L], so there is a small window that cause negtive
+* value.
+* same to I40E_GLV_UPTC[H/L], I40E_GLV_MPTC[H/L], I40E_GLV_BPTC[H/L].
+*/
if (ns->eth.tx_bytes < pf->internal_stats.tx_bytes)
ns->eth.tx_bytes = 0;
else
ns->eth.tx_bytes -= pf->internal_stats.tx_bytes;
 
+   if (ns->eth.tx_unicast < pf->internal_stats.tx_unicast)
+   ns->eth.tx_unicast = 0;
+   else
+   ns->eth.tx_unicast -= pf->internal_stats.tx_unicast;
+
+   if (ns->eth.tx_multicast < pf->internal_stats.tx_multicast)
+   ns->eth.tx_multicast = 0;
+   else
+   ns->eth.tx_multicast -= pf->internal_stats.tx_multicast;
+
+   if (ns->eth.tx_broadcast < pf->internal_stats.tx_broadcast)
+   ns->eth.tx_broadcast = 0;
+   else
+   ns->eth.tx_broadcast -= pf->internal_stats.tx_broadcast;
+
/* GLPRT_TEPC not supported */
 
/* additional port specific stats */
-- 
2.7.4



Re: [dpdk-dev] [PATCH] bus/vdev: add custom scan hook

2017-12-05 Thread Thomas Monjalon
05/12/2017 09:27, Tan, Jianfeng:
> 
> On 12/4/2017 5:31 PM, Thomas Monjalon wrote:
> >
> > The hook is in bus->scan().
> > I think we should launch a bus scan when there is a new device event.
> 
> That's what I'm trying to say. We finally need to execute a handler as 
> of a device event to finish the job.

Please be more specific, I am not sure to understand.

> Then why not just keep the vdev plug in that handler, by calling 
> rte_dev_hotplug_add()?

Which handler?


Re: [dpdk-dev] [PATCH v2] net/virtio: fix an incorrect behavior of device stop/start

2017-12-05 Thread Fischetti, Antonio


> -Original Message-
> From: Bie, Tiwei
> Sent: Tuesday, December 5, 2017 3:12 AM
> To: Fischetti, Antonio 
> Cc: dev@dpdk.org; y...@fridaylinux.org; maxime.coque...@redhat.com;
> jfreim...@redhat.com; sta...@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] net/virtio: fix an incorrect behavior
> of device stop/start
> 
> On Mon, Dec 04, 2017 at 07:46:07PM +0800, Fischetti, Antonio wrote:
> > > -Original Message-
> > > From: Bie, Tiwei
> > > Sent: Monday, December 4, 2017 7:20 AM
> > > To: Fischetti, Antonio 
> > > Cc: dev@dpdk.org; y...@fridaylinux.org; maxime.coque...@redhat.com;
> > > jfreim...@redhat.com; sta...@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v2] net/virtio: fix an incorrect
> behavior
> > > of device stop/start
> > >
> > > On Sat, Dec 02, 2017 at 12:30:33PM +0800, Tiwei Bie wrote:
> > > > Hi Antonio,
> > > >
> > > > On Sat, Dec 02, 2017 at 01:17:58AM +0800, Fischetti, Antonio
> wrote:
> > > > > Hi All,
> > > > > I've got an update on this.
> > > > > I could replicate the same issue by using testpmd + a VM (=
> Virtual
> > > Machine).
> > > > >
> > > > > The test topology I'm using is:
> > > > >
> > > > >
> > > > > [Traffic gen][NIC port #0][testpmd][vhost port #2]--
> --+
> > > > >
> |
> > > > >
> |
> > > > >
> > > [testpmd in
> > > > >
> the
> > > VM]
> > > > >
> |
> > > > >
> |
> > > > > [Traffic gen][NIC port #1][testpmd][vhost port #3]--
> --+
> > > > >
> > > > >
> > > > > So there's no OvS now in the picture: one testpmd running on the
> > > host
> > > > > and one testpmd running on the VM.
> > > > >
> > > > > The issue is that no packet goes through testpmd in the VM.
> > > > > It seems this is happening after this patch was upstreamed.
> > > > >
> > > > > Please note
> > > > > ---
> > > > > To replicate this issue both the next 2 conditions must be met:
> > > > >  - the traffic is already being sent before launching testpmd in
> the
> > > VM
> > > > >  - there are at least 2 forwarding lcores.
> > > > >
> > > >
> > > [...]
> > > >
> > > > Do you see anything I missed? Or can you reproduce the issue with
> the
> > > > setup I'm using?
> > > >
> > >
> > > Hi Antonio,
> > >
> > > Are you using vector Rx in your test? After some further
> >
> > [Antonio] Hi Tiwei, yes I suppose so.
> >
> > Below some more details on my testbench to explain why I'm
> > using this configuration.
> >
> > With this topology I can replicate as much as possible
> > my initial OvS-DPDK testbench. In the ovs case I
> > had 1 single process on the host (OvS) which spawns 2
> > dequeuing threads running on 2 different lcores (please
> > note: if they run on the same lcore there's no issue).
> >
> > So in place of Ovs I'm using 1 testpmd only on the host like:
> >
> > sudo ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 \
> > --socket-mem=1024,1024 \
> > --vdev 'eth_vhost0,iface=/tmp/sock0,queues=1' \
> > --vdev 'eth_vhost1,iface=/tmp/sock1,queues=1' -- -i
> >
> > so the testpmd on the host is seeing 4 phy ports, 2 are real
> > phy ports and 2 are vdev.
> >
> > Then I set the forwarding cores
> >
> >  testpmd> set corelist 2,3
> >
> > and the port order
> >  testpmd> set portlist 0,2,1,3
> >  testpmd> set fwd mac retry
> >  testpmd> start
> >
> > I then launch the VM and inside of it I launch testpmd:
> >
> > ./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i \
> > --txqflags=0xf00 --disable-hw-vlan
> >
> > on testpmd console:
> >  testpmd> set fwd mac retry
> >  testpmd> start
> >
> > As the traffic is already running at the line-rate for
> > some seconds, for sure all the descs have been consumed.
> >
> > I'm currently using 1 GB hugepage sizes, but I was
> > seeing the same issue with OvS when I was using 2 MB
> > hugepage sizes.
> >
> 
> Hi Antonio,
> 
> Got it! Thank you for the detailed info!
> I will send out a patch to fix this issue ASAP.

[Antonio] That's great Tiwei, thanks! Once it is ready I 
can give it a try as with testpmd as with OvS on the host.


> 
> Best regards,
> Tiwei Bie


[dpdk-dev] [PATCH] virtio_net: kick guest even when virtio queue is full

2017-12-05 Thread Agota Benyhe
From: Patrik Andersson R 

Changing the vring call file descriptor during virtio device enqueue
operation may lead to "kick" on a file descriptor that is closed. As
a consequence the guest might not be notified of new packets in the
enqueue.

The suggested change will add some extra load on DPDK and on the
guest if the queue is frequently full. Any application using DPDK
should avoid attempting retransmissions when the zero packets are
enqueued.

To overcome this problem, the kick operation in virtio_dev_merge_rx()
was excluded from the pkt_idx > 0 condition. A similar change was
done in virtio_dev_rx().

Signed-off-by: Patrik Andersson R 
Signed-off-by: Agota Benyhe 
---
 lib/librte_vhost/virtio_net.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 6fee16e..f5fc8c0 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -59,6 +59,13 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t 
nr_vring)
 }
 
 static __rte_always_inline void
+kick_guest_if_necessary(const struct vhost_virtqueue *vq) {
+   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
+   && (vq->callfd >= 0))
+   eventfd_write(vq->callfd, (eventfd_t)1);
+}
+
+static __rte_always_inline void
 do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
  uint16_t to, uint16_t from, uint16_t size)
 {
@@ -344,8 +351,11 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
free_entries = avail_idx - start_idx;
count = RTE_MIN(count, free_entries);
count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST);
-   if (count == 0)
+
+   if (count == 0) {
+kick_guest_if_necessary(vq);
goto out;
+}
 
LOG_DEBUG(VHOST_DATA, "(%d) start_idx %d | end_idx %d\n",
dev->vid, start_idx, start_idx + count);
@@ -411,10 +421,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
/* flush used->idx update before we read avail->flags. */
rte_mb();
 
-   /* Kick the guest if necessary. */
-   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
-   && (vq->callfd >= 0))
-   eventfd_write(vq->callfd, (eventfd_t)1);
+kick_guest_if_necessary(vq);
 out:
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
@@ -704,13 +711,9 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
 
/* flush used->idx update before we read avail->flags. */
rte_mb();
-
-   /* Kick the guest if necessary. */
-   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
-   && (vq->callfd >= 0))
-   eventfd_write(vq->callfd, (eventfd_t)1);
}
 
+kick_guest_if_necessary(vq);
 out:
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
-- 
2.7.4



[dpdk-dev] [PATCH v1] net/mlx5: fix flow type for allmulti rules

2017-12-05 Thread Raslan Darawsheh
Chnaged ibv_flow_attr type for allmulti rule to IBV_FLOW_ATTR_MC_DEFAULT
instead of IBV_FLOW_ATTR_NORMAL, in case allmulti was enabled.

Fixes: 272733b5 ("net/mlx5: use flow to enable unicast traffic")

Signed-off-by: Raslan Darawsheh 
---
 drivers/net/mlx5/mlx5_flow.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 1eda836..eaa53e2 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -429,6 +429,7 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
 /** Structure to pass to the conversion function. */
 struct mlx5_flow_parse {
uint32_t inner; /**< Set once VXLAN is encountered. */
+   uint8_t allmulti:1; /** Set once allmulti dst MAC is encountered. */
uint32_t create:1;
/**< Whether resources should remain after a validate. */
uint32_t drop:1; /**< Target is a drop queue. */
@@ -1165,6 +1166,17 @@ priv_flow_convert(struct priv *priv,
}
}
}
+   if (parser->allmulti &&
+   parser->layer == HASH_RXQ_ETH) {
+   for (i = 0; i != hash_rxq_init_n; ++i) {
+   if (!parser->queue[i].ibv_attr)
+   continue;
+   if (parser->queue[i].ibv_attr->num_of_specs != 1)
+   break;
+   parser->queue[i].ibv_attr->type =
+   IBV_FLOW_ATTR_MC_DEFAULT;
+   }
+   }
return ret;
 exit_enomem:
for (i = 0; i != hash_rxq_init_n; ++i) {
@@ -1262,6 +1274,7 @@ mlx5_flow_create_eth(const struct rte_flow_item *item,
eth.val.ether_type &= eth.mask.ether_type;
}
mlx5_flow_create_copy(parser, ð, eth_size);
+   parser->allmulti = eth.val.dst_mac[0] & 1;
return 0;
 }
 
-- 
2.7.4



[dpdk-dev] [PATCH v3 1/7] net/mrvl: sync compilation with musdk-17.10

2017-12-05 Thread Tomasz Duszynski
Followig changes are needed to switch to musdk-17.10:

- With a new version of the musdk library it's no longer necessary to
  explicitly define MVCONF_ARCH_DMA_ADDR_T_64BIT and
  CONF_PP2_BPOOL_COOKIE_SIZE.

  Proper defines are autogenerated by ./configure script based on
  passed options and available after mv_autogen_comp_flags.h inclusion.

- API used to set promiscuous mode was renamed. Thus in order to
  compile against the latest library new API must be used.

Signed-off-by: Tomasz Duszynski 
Acked-by: Jianbo Liu 
---
 drivers/net/mrvl/Makefile  | 4 ++--
 drivers/net/mrvl/mrvl_ethdev.c | 8 ++--
 drivers/net/mrvl/mrvl_ethdev.h | 5 +
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mrvl/Makefile b/drivers/net/mrvl/Makefile
index 815c3ba..f75e53c 100644
--- a/drivers/net/mrvl/Makefile
+++ b/drivers/net/mrvl/Makefile
@@ -51,8 +51,8 @@ EXPORT_MAP := rte_pmd_mrvl_version.map

 # external library dependencies
 CFLAGS += -I$(LIBMUSDK_PATH)/include
-CFLAGS += -DMVCONF_ARCH_DMA_ADDR_T_64BIT
-CFLAGS += -DCONF_PP2_BPOOL_COOKIE_SIZE=32
+CFLAGS += -DMVCONF_TYPES_PUBLIC
+CFLAGS += -DMVCONF_DMA_PHYS_ADDR_T_PUBLIC
 CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -O3
 LDLIBS += -L$(LIBMUSDK_PATH)/lib
diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
index 2936165..a1ae2c1 100644
--- a/drivers/net/mrvl/mrvl_ethdev.c
+++ b/drivers/net/mrvl/mrvl_ethdev.c
@@ -47,10 +47,6 @@
 #undef container_of
 #endif

-#include 
-#include 
-#include 
-
 #include 
 #include 
 #include 
@@ -690,7 +686,7 @@ mrvl_promiscuous_enable(struct rte_eth_dev *dev)
struct mrvl_priv *priv = dev->data->dev_private;
int ret;

-   ret = pp2_ppio_set_uc_promisc(priv->ppio, 1);
+   ret = pp2_ppio_set_promisc(priv->ppio, 1);
if (ret)
RTE_LOG(ERR, PMD, "Failed to enable promiscuous mode\n");
 }
@@ -724,7 +720,7 @@ mrvl_promiscuous_disable(struct rte_eth_dev *dev)
struct mrvl_priv *priv = dev->data->dev_private;
int ret;

-   ret = pp2_ppio_set_uc_promisc(priv->ppio, 0);
+   ret = pp2_ppio_set_promisc(priv->ppio, 0);
if (ret)
RTE_LOG(ERR, PMD, "Failed to disable promiscuous mode\n");
 }
diff --git a/drivers/net/mrvl/mrvl_ethdev.h b/drivers/net/mrvl/mrvl_ethdev.h
index 2a4ab5a..8a647a5 100644
--- a/drivers/net/mrvl/mrvl_ethdev.h
+++ b/drivers/net/mrvl/mrvl_ethdev.h
@@ -36,7 +36,12 @@
 #define _MRVL_ETHDEV_H_

 #include 
+
+#include 
+#include 
+#include 
 #include 
+#include 
 #include 

 /** Maximum number of rx queues per port */
--
2.7.4



[dpdk-dev] [PATCH v3 0/7] Sync with MUSDK-17.10

2017-12-05 Thread Tomasz Duszynski
This patchset brings following changes:

o Sync with MUSDK-17.10. Latest version of the library comes with many
  improvements and fixes thus switching to it is beneficial.

o A few code and documentation updates.

Changes since v2:
o Fix NET PMD conditional compilation in test-build.sh

Changes since v1:
o Add extra error log in case setting link up fails.
o Cram all MUSDK related headers into the same the driver header.

Tomasz Duszynski (7):
  net/mrvl: sync compilation with musdk-17.10
  net/mrvl: query link status using library API
  net/mrvl: do not enable port after setting MAC address
  net/mrvl: check if ppio is initialized
  net/mrvl: add extra error logs
  devtools/test-build: add MRVL NET PMD to test-build
  net/mrvl: update MRVL NET PMD documentation

 devtools/test-build.sh |  2 +
 doc/guides/nics/mrvl.rst   | 69 ---
 drivers/net/mrvl/Makefile  |  4 +-
 drivers/net/mrvl/mrvl_ethdev.c | 92 +-
 drivers/net/mrvl/mrvl_ethdev.h |  5 +++
 5 files changed, 118 insertions(+), 54 deletions(-)

--
2.7.4



[dpdk-dev] [PATCH v3 2/7] net/mrvl: query link status using library API

2017-12-05 Thread Tomasz Duszynski
Up to now link status was updated unconditionally during
link_up()/link_down() calls thus one was never sure about
it's true status.

Using dedicated library api makes sure the true link status is set.

Signed-off-by: Tomasz Duszynski 
Acked-by: Jianbo Liu 
---
 drivers/net/mrvl/mrvl_ethdev.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
index a1ae2c1..47f12b8 100644
--- a/drivers/net/mrvl/mrvl_ethdev.c
+++ b/drivers/net/mrvl/mrvl_ethdev.c
@@ -361,8 +361,6 @@ mrvl_dev_set_link_up(struct rte_eth_dev *dev)
if (ret)
pp2_ppio_disable(priv->ppio);

-   dev->data->dev_link.link_status = ETH_LINK_UP;
-
return ret;
 }

@@ -379,15 +377,8 @@ static int
 mrvl_dev_set_link_down(struct rte_eth_dev *dev)
 {
struct mrvl_priv *priv = dev->data->dev_private;
-   int ret;
-
-   ret = pp2_ppio_disable(priv->ppio);
-   if (ret)
-   return ret;
-
-   dev->data->dev_link.link_status = ETH_LINK_DOWN;

-   return ret;
+   return pp2_ppio_disable(priv->ppio);
 }

 /**
@@ -628,9 +619,10 @@ mrvl_link_update(struct rte_eth_dev *dev, int 
wait_to_complete __rte_unused)
 * TODO
 * once MUSDK provides necessary API use it here
 */
+   struct mrvl_priv *priv = dev->data->dev_private;
struct ethtool_cmd edata;
struct ifreq req;
-   int ret, fd;
+   int ret, fd, link_up;

edata.cmd = ETHTOOL_GSET;

@@ -670,6 +662,8 @@ mrvl_link_update(struct rte_eth_dev *dev, int 
wait_to_complete __rte_unused)
 ETH_LINK_HALF_DUPLEX;
dev->data->dev_link.link_autoneg = edata.autoneg ? ETH_LINK_AUTONEG :
   ETH_LINK_FIXED;
+   pp2_ppio_get_link_state(priv->ppio, &link_up);
+   dev->data->dev_link.link_status = link_up ? ETH_LINK_UP : ETH_LINK_DOWN;

return 0;
 }
--
2.7.4



[dpdk-dev] [PATCH v3 6/7] devtools/test-build: add MRVL NET PMD to test-build

2017-12-05 Thread Tomasz Duszynski
Add MRVL NET PMD to test build tool.

Signed-off-by: Tomasz Duszynski 
Acked-by: Jianbo Liu 
---
 devtools/test-build.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 092d3a7..b551764 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -197,6 +197,8 @@ config () #   
sed -ri   's,(SCHED_.*=)n,\1y,' $1/.config
test -z "$LIBMUSDK_PATH" || \
sed -ri's,(PMD_MRVL_CRYPTO=)n,\1y,' $1/.config
+   test -z "$LIBMUSDK_PATH" || \
+   sed -ri   's,(MRVL_PMD=)n,\1y,' $1/.config
build_config_hook $1 $2 $3

# Explicit enabler/disabler (uppercase)
--
2.7.4



[dpdk-dev] [PATCH v3 4/7] net/mrvl: check if ppio is initialized

2017-12-05 Thread Tomasz Duszynski
Uninitialized ppio cannot be passed to MUSDK library routines as
application will crash.

Fix this by first checking whether ppio has been initialized.

Signed-off-by: Tomasz Duszynski 
Acked-by: Jianbo Liu 
---
 drivers/net/mrvl/mrvl_ethdev.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
index c44a2bc..40f2ead 100644
--- a/drivers/net/mrvl/mrvl_ethdev.c
+++ b/drivers/net/mrvl/mrvl_ethdev.c
@@ -324,6 +324,9 @@ mrvl_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
if (mtu < ETHER_MIN_MTU || mru > MRVL_PKT_SIZE_MAX)
return -EINVAL;

+   if (!priv->ppio)
+   return -EPERM;
+
ret = pp2_ppio_set_mru(priv->ppio, mru);
if (ret)
return ret;
@@ -346,6 +349,9 @@ mrvl_dev_set_link_up(struct rte_eth_dev *dev)
struct mrvl_priv *priv = dev->data->dev_private;
int ret;

+   if (!priv->ppio)
+   return -EPERM;
+
ret = pp2_ppio_enable(priv->ppio);
if (ret)
return ret;
@@ -378,6 +384,9 @@ mrvl_dev_set_link_down(struct rte_eth_dev *dev)
 {
struct mrvl_priv *priv = dev->data->dev_private;

+   if (!priv->ppio)
+   return -EPERM;
+
return pp2_ppio_disable(priv->ppio);
 }

@@ -624,6 +633,9 @@ mrvl_link_update(struct rte_eth_dev *dev, int 
wait_to_complete __rte_unused)
struct ifreq req;
int ret, fd, link_up;

+   if (!priv->ppio)
+   return -EPERM;
+
edata.cmd = ETHTOOL_GSET;

strcpy(req.ifr_name, dev->data->name);
@@ -680,6 +692,9 @@ mrvl_promiscuous_enable(struct rte_eth_dev *dev)
struct mrvl_priv *priv = dev->data->dev_private;
int ret;

+   if (!priv->ppio)
+   return;
+
ret = pp2_ppio_set_promisc(priv->ppio, 1);
if (ret)
RTE_LOG(ERR, PMD, "Failed to enable promiscuous mode\n");
@@ -697,6 +712,9 @@ mrvl_allmulticast_enable(struct rte_eth_dev *dev)
struct mrvl_priv *priv = dev->data->dev_private;
int ret;

+   if (!priv->ppio)
+   return;
+
ret = pp2_ppio_set_mc_promisc(priv->ppio, 1);
if (ret)
RTE_LOG(ERR, PMD, "Failed enable all-multicast mode\n");
@@ -714,6 +732,9 @@ mrvl_promiscuous_disable(struct rte_eth_dev *dev)
struct mrvl_priv *priv = dev->data->dev_private;
int ret;

+   if (!priv->ppio)
+   return;
+
ret = pp2_ppio_set_promisc(priv->ppio, 0);
if (ret)
RTE_LOG(ERR, PMD, "Failed to disable promiscuous mode\n");
@@ -731,6 +752,9 @@ mrvl_allmulticast_disable(struct rte_eth_dev *dev)
struct mrvl_priv *priv = dev->data->dev_private;
int ret;

+   if (!priv->ppio)
+   return;
+
ret = pp2_ppio_set_mc_promisc(priv->ppio, 0);
if (ret)
RTE_LOG(ERR, PMD, "Failed to disable all-multicast mode\n");
@@ -751,6 +775,9 @@ mrvl_mac_addr_remove(struct rte_eth_dev *dev, uint32_t 
index)
char buf[ETHER_ADDR_FMT_SIZE];
int ret;

+   if (!priv->ppio)
+   return;
+
ret = pp2_ppio_remove_mac_addr(priv->ppio,
   dev->data->mac_addrs[index].addr_bytes);
if (ret) {
@@ -787,6 +814,9 @@ mrvl_mac_addr_add(struct rte_eth_dev *dev, struct 
ether_addr *mac_addr,
/* For setting index 0, mrvl_mac_addr_set() should be used.*/
return -1;

+   if (!priv->ppio)
+   return -EPERM;
+
/*
 * Maximum number of uc addresses can be tuned via kernel module mvpp2x
 * parameter uc_filter_max. Maximum number of mc addresses is then
@@ -824,6 +854,9 @@ mrvl_mac_addr_set(struct rte_eth_dev *dev, struct 
ether_addr *mac_addr)
struct mrvl_priv *priv = dev->data->dev_private;
int ret;

+   if (!priv->ppio)
+   return;
+
ret = pp2_ppio_set_mac_addr(priv->ppio, mac_addr->addr_bytes);
if (ret) {
char buf[ETHER_ADDR_FMT_SIZE];
@@ -851,6 +884,9 @@ mrvl_stats_get(struct rte_eth_dev *dev, struct 
rte_eth_stats *stats)
uint64_t drop_mac = 0;
unsigned int i, idx, ret;

+   if (!priv->ppio)
+   return -EPERM;
+
for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct mrvl_rxq *rxq = dev->data->rx_queues[i];
struct pp2_ppio_inq_statistics rx_stats;
@@ -943,6 +979,9 @@ mrvl_stats_reset(struct rte_eth_dev *dev)
struct mrvl_priv *priv = dev->data->dev_private;
int i;

+   if (!priv->ppio)
+   return;
+
for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct mrvl_rxq *rxq = dev->data->rx_queues[i];

@@ -1099,6 +1138,9 @@ mrvl_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
 {
struct mrvl_priv *priv = dev->data->dev_private;

+   if (!priv->pp

[dpdk-dev] [PATCH v3 5/7] net/mrvl: add extra error logs

2017-12-05 Thread Tomasz Duszynski
Add extra error logs in a few places.

Signed-off-by: Tomasz Duszynski 
Acked-by: Jianbo Liu 
---
 drivers/net/mrvl/mrvl_ethdev.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
index 40f2ead..127ce44 100644
--- a/drivers/net/mrvl/mrvl_ethdev.c
+++ b/drivers/net/mrvl/mrvl_ethdev.c
@@ -426,8 +426,10 @@ mrvl_dev_start(struct rte_eth_dev *dev)
priv->bpool_min_size = priv->nb_rx_queues * MRVL_BURST_SIZE * 2;

ret = pp2_ppio_init(&priv->ppio_params, &priv->ppio);
-   if (ret)
+   if (ret) {
+   RTE_LOG(ERR, PMD, "Failed to init ppio\n");
return ret;
+   }

/*
 * In case there are some some stale uc/mc mac addresses flush them
@@ -462,17 +464,20 @@ mrvl_dev_start(struct rte_eth_dev *dev)
if (mrvl_qos_cfg) {
ret = mrvl_start_qos_mapping(priv);
if (ret) {
-   pp2_ppio_deinit(priv->ppio);
-   return ret;
+   RTE_LOG(ERR, PMD, "Failed to setup QoS mapping\n");
+   goto out;
}
}

ret = mrvl_dev_set_link_up(dev);
-   if (ret)
+   if (ret) {
+   RTE_LOG(ERR, PMD, "Failed to set link up\n");
goto out;
+   }

return 0;
 out:
+   RTE_LOG(ERR, PMD, "Failed to start device\n");
pp2_ppio_deinit(priv->ppio);
return ret;
 }
--
2.7.4



[dpdk-dev] [PATCH v3 3/7] net/mrvl: do not enable port after setting MAC address

2017-12-05 Thread Tomasz Duszynski
Setting enabled port's mac address caused it to stop receiving
packets. Now as that issue is fixed in library renabling port
is no longer necessary.

Signed-off-by: Tomasz Duszynski 
Acked-by: Jianbo Liu 
---
 drivers/net/mrvl/mrvl_ethdev.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
index 47f12b8..c44a2bc 100644
--- a/drivers/net/mrvl/mrvl_ethdev.c
+++ b/drivers/net/mrvl/mrvl_ethdev.c
@@ -822,15 +822,14 @@ static void
 mrvl_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
 {
struct mrvl_priv *priv = dev->data->dev_private;
+   int ret;

-   pp2_ppio_set_mac_addr(priv->ppio, mac_addr->addr_bytes);
-   /*
-* TODO
-* Port stops sending packets if pp2_ppio_set_mac_addr()
-* was called after pp2_ppio_enable(). As a quick fix issue
-* enable port once again.
-*/
-   pp2_ppio_enable(priv->ppio);
+   ret = pp2_ppio_set_mac_addr(priv->ppio, mac_addr->addr_bytes);
+   if (ret) {
+   char buf[ETHER_ADDR_FMT_SIZE];
+   ether_format_addr(buf, sizeof(buf), mac_addr);
+   RTE_LOG(ERR, PMD, "Failed to set mac to %s\n", buf);
+   }
 }

 /**
--
2.7.4



[dpdk-dev] [PATCH v3 7/7] net/mrvl: update MRVL NET PMD documentation

2017-12-05 Thread Tomasz Duszynski
Update MRVL NET PMD documentation.

Signed-off-by: Tomasz Duszynski 
Acked-by: Jianbo Liu 
---
 doc/guides/nics/mrvl.rst | 69 +++-
 1 file changed, 45 insertions(+), 24 deletions(-)

diff --git a/doc/guides/nics/mrvl.rst b/doc/guides/nics/mrvl.rst
index fbfdf47..67b254c 100644
--- a/doc/guides/nics/mrvl.rst
+++ b/doc/guides/nics/mrvl.rst
@@ -87,31 +87,40 @@ Limitations
 Prerequisites
 -

-- Custom Linux Kernel sources available
-  `here 
`__.
+- Custom Linux Kernel sources

-- Out of tree `mvpp2x_sysfs` kernel module sources available
-  `here 
`__.
+  .. code-block:: console

-- MUSDK (Marvell User-Space SDK) sources available
-  `here 
`__.
+ git clone https://github.com/MarvellEmbeddedProcessors/linux-marvell.git 
-b linux-4.4.52-armada-17.10

-MUSDK is a light-weight library that provides direct access to Marvell's
-PPv2 (Packet Processor v2). Alternatively prebuilt MUSDK library can be
-requested from `Marvell Extranet `_. Once
-approval has been granted, library can be found by typing ``musdk`` in
-the search box.
+- Out of tree `mvpp2x_sysfs` kernel module sources

-MUSDK must be configured with the following features:
+  .. code-block:: console

-.. code-block:: console
+ git clone https://github.com/MarvellEmbeddedProcessors/mvpp2x-marvell.git 
-b mvpp2x-armada-17.10

-   --enable-bpool-dma=64
+- MUSDK (Marvell User-Space SDK) sources
+
+  .. code-block:: console
+
+ git clone https://github.com/MarvellEmbeddedProcessors/musdk-marvell.git 
-b musdk-armada-17.10
+
+  MUSDK is a light-weight library that provides direct access to Marvell's
+  PPv2 (Packet Processor v2). Alternatively prebuilt MUSDK library can be
+  requested from `Marvell Extranet `_. Once
+  approval has been granted, library can be found by typing ``musdk`` in
+  the search box.
+
+  MUSDK must be configured with the following features:
+
+  .. code-block:: console
+
+ --enable-bpool-dma=64

 - DPDK environment

-Follow the DPDK :ref:`Getting Started Guide for Linux ` to setup
-DPDK environment.
+  Follow the DPDK :ref:`Getting Started Guide for Linux ` to setup
+  DPDK environment.


 Config File Options
@@ -123,11 +132,6 @@ The following options can be modified in the ``config`` 
file.

 Toggle compilation of the librte_pmd_mrvl driver.

-- ``CONFIG_RTE_MRVL_MUSDK_DMA_MEMSIZE`` (default ``41943040``)
-
-Size in bytes of the contiguous memory region that MUSDK will allocate
-for run-time DMA-able data buffers.
-

 QoS Configuration
 -
@@ -142,7 +146,7 @@ Configuration syntax

[port  default]
default_tc = 
-   qos_mode = 
+   mapping_priority = 

[port  tc ]
rxq = 
@@ -160,7 +164,7 @@ Where:

 - : Default traffic class (e.g. 0)

-- : QoS priority for mapping (`ip`, `vlan`, `ip/vlan` or 
`vlan/ip`).
+- : QoS priority for mapping (`ip`, `vlan`, `ip/vlan` or 
`vlan/ip`).

 - : Traffic Class to be configured.

@@ -218,9 +222,26 @@ Building DPDK
 Driver needs precompiled MUSDK library during compilation. Please consult
 ``doc/musdk_get_started.txt`` for the detailed build instructions.

+.. code-block:: console
+
+   export CROSS_COMPILE=/bin/aarch64-linux-gnu-
+   ./bootstrap
+   ./configure --enable-bpool-dma=64
+   make install
+
+MUSDK will be installed to `usr/local` under current directory.
+For the detailed build instructions please consult 
``doc/musdk_get_started.txt``.
+
 Before the DPDK build process the environmental variable ``LIBMUSDK_PATH`` with
 the path to the MUSDK installation directory needs to be exported.

+.. code-block:: console
+
+   export LIBMUSDK_PATH=/usr/local
+   export CROSS=aarch64-linux-gnu-
+   make config T=arm64-armv8a-linuxapp-gcc
+   sed -ri 's,(MRVL_PMD=)n,\1y,' build/.config
+   make

 Usage Example
 -
@@ -242,7 +263,7 @@ Additionally interfaces used by DPDK application need to be 
put up:
 .. code-block:: console

ip link set eth0 up
-   ip link set eth1 up
+   ip link set eth2 up

 In order to run testpmd example application following command can be used:

--
2.7.4



Re: [dpdk-dev] [PATCH v1] net/mlx5: fix flow type for allmulti rules

2017-12-05 Thread Nelio Laranjeiro
On Tue, Dec 05, 2017 at 11:37:50AM +0200, Raslan Darawsheh wrote:
> Chnaged ibv_flow_attr type for allmulti rule to IBV_FLOW_ATTR_MC_DEFAULT
> instead of IBV_FLOW_ATTR_NORMAL, in case allmulti was enabled.
> 
> Fixes: 272733b5 ("net/mlx5: use flow to enable unicast traffic")
> 
> Signed-off-by: Raslan Darawsheh 
> ---
>  drivers/net/mlx5/mlx5_flow.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
> index 1eda836..eaa53e2 100644
> --- a/drivers/net/mlx5/mlx5_flow.c
> +++ b/drivers/net/mlx5/mlx5_flow.c
> @@ -429,6 +429,7 @@ static const struct mlx5_flow_items mlx5_flow_items[] = {
>  /** Structure to pass to the conversion function. */
>  struct mlx5_flow_parse {
>   uint32_t inner; /**< Set once VXLAN is encountered. */
> + uint8_t allmulti:1; /** Set once allmulti dst MAC is encountered. */
>   uint32_t create:1;
>   /**< Whether resources should remain after a validate. */
>   uint32_t drop:1; /**< Target is a drop queue. */
> @@ -1165,6 +1166,17 @@ priv_flow_convert(struct priv *priv,
>   }
>   }
>   }
> + if (parser->allmulti &&
> + parser->layer == HASH_RXQ_ETH) {
> + for (i = 0; i != hash_rxq_init_n; ++i) {
> + if (!parser->queue[i].ibv_attr)
> + continue;
> + if (parser->queue[i].ibv_attr->num_of_specs != 1)
> + break;
> + parser->queue[i].ibv_attr->type =
> + IBV_FLOW_ATTR_MC_DEFAULT;
> + }
> + }
>   return ret;
>  exit_enomem:
>   for (i = 0; i != hash_rxq_init_n; ++i) {
> @@ -1262,6 +1274,7 @@ mlx5_flow_create_eth(const struct rte_flow_item *item,
>   eth.val.ether_type &= eth.mask.ether_type;
>   }
>   mlx5_flow_create_copy(parser, ð, eth_size);
> + parser->allmulti = eth.val.dst_mac[0] & 1;
>   return 0;
>  }
>  
> -- 
> 2.7.4

It also concerns DPDK 17.11, adding sta...@dpdk.org

Acked-by: Nelio Laranjeiro 

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Bruce Richardson
On Tue, Dec 05, 2017 at 06:08:35AM +, Matan Azrad wrote:
> Hi Neil
> 
> > -Original Message- From: Neil Horman
> > [mailto:nhor...@tuxdriver.com] Sent: Tuesday, December 5, 2017 12:31
> > AM To: Matan Azrad  Cc: Ananyev, Konstantin
> > ; Gaëtan Rivet
> > ; Thomas Monjalon ; Wu,
> > Jingjing ; dev@dpdk.org Subject: Re:
> > [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > 
> > On Mon, Dec 04, 2017 at 06:10:56PM +, Matan Azrad wrote:
> > > Hi Neil
> > >
> > > > -Original Message- From: Neil Horman
> > > > [mailto:nhor...@tuxdriver.com] Sent: Monday, December 4, 2017
> > > > 6:01 PM To: Matan Azrad  Cc: Ananyev,
> > > > Konstantin ; Gaëtan Rivet
> > > > ; Thomas Monjalon
> > ; Wu,
> > > > Jingjing ; dev@dpdk.org Subject: Re:
> > > > [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > > >
> > > > On Sun, Dec 03, 2017 at 01:46:49PM +, Matan Azrad wrote:
> > > > > Hi Konstantine
> > > > >
> > > > > > -Original Message- From: Ananyev, Konstantin
> > > > > > [mailto:konstantin.anan...@intel.com] Sent: Sunday, December
> > > > > > 3, 2017 1:10 PM To: Matan Azrad ; Neil
> > > > > > Horman ; Gaëtan Rivet
> > 
> > > > > > Cc: Thomas Monjalon ; Wu, Jingjing
> > > > > > ; dev@dpdk.org Subject: RE:
> > > > > > [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Matan,
> > > > > >
> > > > > > > -Original Message- From: dev
> > > > > > > [mailto:dev-boun...@dpdk.org] On Behalf Of Matan Azrad
> > > > > > > Sent: Sunday, December 3, 2017 8:05 AM To: Neil Horman
> > > > > > > ; Gaëtan Rivet
> > > > > > >  Cc: Thomas Monjalon
> > > > > > > ; Wu, Jingjing
> > > > > > > ; dev@dpdk.org Subject: Re:
> > > > > > > [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > > -Original Message- From: Neil Horman
> > > > > > > > [mailto:nhor...@tuxdriver.com] Sent: Friday, December 1,
> > > > > > > > 2017 2:10 PM To: Gaëtan Rivet 
> > > > > > > > Cc: Matan Azrad ; Thomas Monjalon
> > > > > > > > ; Jingjing Wu
> > > > > > > > ; dev@dpdk.org Subject: Re:
> > > > > > > > [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > > > > > > >
> > > > > > > > On Thu, Nov 30, 2017 at 02:24:43PM +0100, Gaëtan Rivet
> > > > > > > > wrote:
> > > > > > > > > Hello Matan, Neil,
> > > > > > > > >
> > > > > > > > > I like the port ownership concept. I think it is
> > > > > > > > > needed to clarify some operations and should be useful
> > > > > > > > > to several
> > subsystems.
> > > > > > > > >
> > > > > > > > > This patch could certainly be sub-divided however, and
> > > > > > > > > your current 1/5 should probably come after this one.
> > > > > > > > >
> > > > > > > > > Some comments inline.
> > > > > > > > >
> > > > > > > > > On Thu, Nov 30, 2017 at 07:36:11AM -0500, Neil Horman
> > > > > > > > > wrote:
> > > > > > > > > > On Tue, Nov 28, 2017 at 11:57:58AM +, Matan
> > > > > > > > > > Azrad
> > wrote:
> > > > > > > > > > > The ownership of a port is implicit in DPDK.
> > > > > > > > > > > Making it explicit is better from the next
> > > > > > > > > > > reasons: 1. It may be convenient for multi-process
> > > > > > > > > > > applications to know
> > > > > > which
> > > > > > > > > > >process is in charge of a port.  2. A library
> > > > > > > > > > >could work on top of a port.  3. A port can
> > > > > > > > > > >work on top of another port.
> > > > > > > > > > >
> > > > > > > > > > > Also in the fail-safe case, an issue has been met
> > > > > > > > > > > in testpmd.  We need to check that the user is not
> > > > > > > > > > > trying to use a port which is already managed by
> > > > > > > > > > > fail-safe.
> > > > > > > > > > >
> > > > > > > > > > > Add ownership mechanism to DPDK Ethernet devices
> > > > > > > > > > > to avoid multiple management of a device by
> > > > > > > > > > > different DPDK
> > entities.
> > > > > > > > > > >
> > > > > > > > > > > A port owner is built from owner id(number) and
> > > > > > > > > > > owner name(string) while the owner id must be
> > > > > > > > > > > unique to distinguish between two identical entity
> > > > > > > > > > > instances and the owner name can be
> > > > > > any name.
> > > > > > > > > > > The name helps to logically recognize the owner by
> > > > > > > > > > > different DPDK entities and allows easy debug.
> > > > > > > > > > > Each DPDK entity can allocate an owner unique
> > > > > > > > > > > identifier and can use it and its preferred name
> > > > > > > > > > > to owns valid ethdev
> > > > ports.
> > > > > > > > > > > Each DPDK entity can get any port owner status to
> > > > > > > > > > > decide if it can manage the port or not.
> > > > > > > > > > >
> > > > > > > > > > > The current ethdev internal port management is not
> > > > > > > > > > > affected by this feature.
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > The internal port management is not affected, but the
> > > > > > > > > external interface is, how

Re: [dpdk-dev] [PATCH] relicense various bits of the dpdk

2017-12-05 Thread Hemant Agrawal

On 12/4/2017 9:25 PM, Neil Horman wrote:

Received a note the other day from the Linux Foundation governance board
for DPDK indicating that several files I have copyright on need to be
relicensed to be compliant with the DPDK licensing guidelines.  I have
some concerns with some parts of the request, but am not opposed to
other parts.  So, for those pieces that we are in consensus on, I'm
proposing that we change their license from BSD 2 clause to 3 clause.
I'm also updating the files to use the SPDX licensing scheme

Signed-off-by: Neil Horman 
CC: Hemant Agrawal 
CC: Thomas Monjalon 
---
 devtools/validate-abi.sh   | 32 
 lib/librte_compat/rte_compat.h | 24 +---
 2 files changed, 5 insertions(+), 51 deletions(-)


Acked-by: Hemant Agrawal 


Re: [dpdk-dev] [PATCH 1/3] lib/librte_flow_classify: remove table id parameter from apis

2017-12-05 Thread Iremonger, Bernard
Hi Jasvinder,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Iremonger, Bernard
> Sent: Monday, December 4, 2017 4:45 PM
> To: Singh, Jasvinder ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] lib/librte_flow_classify: remove table id
> parameter from apis
> 
> Hi Jasvinder,
> 
> > -Original Message-
> > From: Singh, Jasvinder
> > Sent: Thursday, November 23, 2017 11:32 AM
> > To: dev@dpdk.org
> > Cc: Iremonger, Bernard 
> > Subject: [PATCH 1/3] lib/librte_flow_classify: remove table id
> > parameter from apis
> >
> > This patch removes table id parameter from all the flow classify apis
> > to reduce the complexity and and does some cleanup of the code.
> >
> > The validate api has been exposed as public api to allows user to
> > validate the flow before adding it to the classifier.
> >
> > The sample app and unit tests have been updated to accomodate the apis
> > changes.
> >
> > Signed-off-by: Jasvinder Singh 
> > ---
> >  examples/flow_classify/flow_classify.c |  27 +-
> >  lib/librte_flow_classify/rte_flow_classify.c   | 320 
> > +++--
> >  lib/librte_flow_classify/rte_flow_classify.h   |  74 +++--
> >  lib/librte_flow_classify/rte_flow_classify_parse.c | 118 
> > lib/librte_flow_classify/rte_flow_classify_parse.h |  16 +-
> >  .../rte_flow_classify_version.map  |   1 +
> >  test/test/test_flow_classify.c |  86 +++---
> >  test/test/test_flow_classify.h |  10 +-
> >  8 files changed, 365 insertions(+), 287 deletions(-)
> >
> > diff --git a/examples/flow_classify/flow_classify.c
> > b/examples/flow_classify/flow_classify.c
> > index 766f1dd..37e6904 100644
> > --- a/examples/flow_classify/flow_classify.c
> > +++ b/examples/flow_classify/flow_classify.c
> > @@ -94,7 +94,6 @@ static const struct rte_eth_conf port_conf_default =
> > {
> >
> >  struct flow_classifier {
> > struct rte_flow_classifier *cls;
> > -   uint32_t table_id[RTE_FLOW_CLASSIFY_TABLE_MAX];
> >  };
> >
> >  struct flow_classifier_acl {
> > @@ -195,7 +194,15 @@ static struct rte_flow_item  end_item = {
> > RTE_FLOW_ITEM_TYPE_END,
> >  /* sample actions:
> >   * "actions count / end"
> >   */
> > -static struct rte_flow_action count_action = {
> > RTE_FLOW_ACTION_TYPE_COUNT, 0};
> > +struct rte_flow_query_count count = {
> > +   .reset = 1,
> > +   .hits_set = 1,
> > +   .bytes_set = 1,
> > +   .hits = 0,
> > +   .bytes = 0,
> > +};
> > +static struct rte_flow_action count_action = {
> > RTE_FLOW_ACTION_TYPE_COUNT,
> > +   &count};
> >  static struct rte_flow_action end_action = {
> > RTE_FLOW_ACTION_TYPE_END, 0};  static struct rte_flow_action
> > actions[2];
> >
> > @@ -274,7 +281,7 @@ lcore_main(struct flow_classifier *cls_app)
> > int i = 0;
> >
> > ret = rte_flow_classify_table_entry_delete(cls_app->cls,
> > -   cls_app->table_id[0], rules[7]);
> > +   rules[7]);
> > if (ret)
> > printf("table_entry_delete failed [7] %d\n\n", ret);
> > else
> > @@ -317,7 +324,6 @@ lcore_main(struct flow_classifier *cls_app)
> > if (rules[i]) {
> > ret = rte_flow_classifier_query(
> > cls_app->cls,
> > -   cls_app->table_id[0],
> > bufs, nb_rx, rules[i],
> > &classify_stats);
> > if (ret)
> > @@ -635,8 +641,8 @@ add_classify_rule(struct rte_eth_ntuple_filter
> > *ntuple_filter,
> > actions[1] = end_action;
> >
> > rule = rte_flow_classify_table_entry_add(
> > -   cls_app->cls, cls_app->table_id[0], &key_found,
> > -   &attr, pattern_ipv4_5tuple, actions, &error);
> > +   cls_app->cls, &attr, pattern_ipv4_5tuple,
> > +   actions, &key_found, &error);
> > if (rule == NULL) {
> > printf("table entry add failed ipv4_proto = %u\n",
> > ipv4_proto);
> > @@ -809,7 +815,6 @@ main(int argc, char *argv[])
> >
> > cls_params.name = "flow_classifier";
> > cls_params.socket_id = socket_id;
> > -   cls_params.type = RTE_FLOW_CLASSIFY_TABLE_TYPE_ACL;
> >
> > cls_app->cls = rte_flow_classifier_create(&cls_params);
> > if (cls_app->cls == NULL) {
> > @@ -824,11 +829,11 @@ main(int argc, char *argv[])
> > memcpy(table_acl_params.field_format, ipv4_defs, sizeof(ipv4_defs));
> >
> > /* initialise table create params */
> > -   cls_table_params.ops = &rte_table_acl_ops,
> > -   cls_table_params.arg_create = &table_acl_params,
> > +   cls_table_params.ops = &rte_table_acl_ops;
> > +   cls_table_params.arg_create = &table_acl_params;
> > +   cls_table_params.type =
> > RTE_FLOW_CLASSIFY_TABLE_ACL_IP4_5TUPLE;
> >
> > -   ret = rte_flow_classify_table_create(cls_app->cls

Re: [dpdk-dev] [PATCH 2/3] lib/librte_flow_classy: add run api for flow classification

2017-12-05 Thread Iremonger, Bernard
Hi Jasvinder,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Iremonger, Bernard
> Sent: Monday, December 4, 2017 4:46 PM
> To: Singh, Jasvinder ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/3] lib/librte_flow_classy: add run api for 
> flow
> classification
> 
> Hi Jasvinder,
> 
> > -Original Message-
> > From: Singh, Jasvinder
> > Sent: Thursday, November 23, 2017 11:32 AM
> > To: dev@dpdk.org
> > Cc: Iremonger, Bernard 
> > Subject: [PATCH 2/3] lib/librte_flow_classy: add run api for flow
> > classification
> >
> > This patch extends the flow classification library by adding run api.
> > This function classifies the packets based on the flow rules stored in
> > the classifier table.
> > During lookup operation, the table entry is identified on lookup hit
> > and based on meta-data stored at table entry, actions are performed on
> > the current packet.
> > The meta-information about the actions stored in the table entry is
> > determined from the actions fields specified in flow rules.
> >
> > Signed-off-by: Jasvinder Singh 
> > ---
> >  lib/librte_flow_classify/rte_flow_classify.c   | 56
> > ++
> >  lib/librte_flow_classify/rte_flow_classify.h   | 24 ++
> >  .../rte_flow_classify_version.map  |  1 +
> >  3 files changed, 81 insertions(+)
> >
> > diff --git a/lib/librte_flow_classify/rte_flow_classify.c
> > b/lib/librte_flow_classify/rte_flow_classify.c
> > index ff1bc86..5433bfe 100644
> > --- a/lib/librte_flow_classify/rte_flow_classify.c
> > +++ b/lib/librte_flow_classify/rte_flow_classify.c
> > @@ -37,6 +37,9 @@
> >  #include 
> >  #include 
> >
> > +#define RTE_PKT_METADATA_PTR(pkt, offset) \
> > +   (&((uint32_t *)(pkt))[offset])
> > +
> >  int librte_flow_classify_logtype;
> >
> >  static uint32_t unique_id = 1;
> > @@ -674,6 +677,59 @@ action_apply(struct rte_flow_classifier *cls,  }
> >
> >  int
> > +rte_flow_classifier_run(struct rte_flow_classifier *cls,
> > +   struct rte_mbuf **pkts,
> > +   const uint16_t nb_pkts,
> > +   uint32_t pkt_offset)
> > +{
> > +   struct rte_flow_action_mark *mark;
> > +   struct classify_action *action;
> > +   uint64_t pkts_mask = RTE_LEN2MASK(nb_pkts, uint64_t);
> > +   uint64_t action_mask;
> > +   uint32_t *ptr, i, j;
> > +   int ret = -EINVAL;
> > +
> > +   if (!cls || !pkts  || nb_pkts == 0)
> > +   return ret;
> > +
> > +   for (i = 0; i < cls->num_tables; i++) {
> > +   if (cls->table_mask & (1LU << i)) {
> > +   struct rte_cls_table *table = &cls->tables[i];
> > +   uint64_t lookup_hit_mask;
> > +
> > +   ret = table->ops.f_lookup(table->h_table,
> > +   pkts, pkts_mask, &lookup_hit_mask,
> > +   (void **)cls->entries);
> > +   if (ret)
> > +   return ret;
> > +
> > +   if (lookup_hit_mask) {
> > +   for (j = 0; j < nb_pkts; j++) {
> > +   uint64_t pkt_mask = 1LLU << j;
> > +
> > +   if ((lookup_hit_mask & pkt_mask) ==
> > 0)
> > +   continue;
> > +   /* Meta-data */
> > +   enum rte_flow_action_type act_type
> > =
> > +
> > RTE_FLOW_ACTION_TYPE_MARK;
> > +   action = &cls->entries[j]->action;
> > +   action_mask = action->action_mask;
> > +
> > +   if (action_mask & (1LLU << act_type))
> > {
> > +   mark = &action->act.mark;
> > +   ptr =
> > RTE_PKT_METADATA_PTR(
> > +   pkts[j], pkt_offset);
> > +   *ptr = mark->id;
> > +   }
> > +   }
> > +   }
> > +   }
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > +int
> >  rte_flow_classifier_query(struct rte_flow_classifier *cls,
> > struct rte_mbuf **pkts,
> > const uint16_t nb_pkts,
> > diff --git a/lib/librte_flow_classify/rte_flow_classify.h
> > b/lib/librte_flow_classify/rte_flow_classify.h
> > index b9b669f..b74bd11 100644
> > --- a/lib/librte_flow_classify/rte_flow_classify.h
> > +++ b/lib/librte_flow_classify/rte_flow_classify.h
> > @@ -273,6 +273,30 @@ rte_flow_classify_table_entry_delete(struct
> > rte_flow_classifier *cls,
> > struct rte_flow_classify_rule *rule);
> >
> >  /**
> > + * Flow classifier run.
> > + *
> > + * As a result of lookup operation, flow classifer idenfies the
> > + * table entries that are hit and executes the actions on the packets.
> > + *
> > + * @param[in] cls
> > + *   Flow classifier handle
> > + * @param[in] pkts
> > + *   P

Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name

2017-12-05 Thread Adrien Mazarguil
Hi Yuanhan,

On Mon, Dec 04, 2017 at 09:55:31PM +0800, Yuanhan Liu wrote:
> On Fri, Dec 01, 2017 at 10:47:50AM +0100, Gaëtan Rivet wrote:
> > On Thu, Nov 30, 2017 at 10:44:58PM +0100, Thomas Monjalon wrote:
> > > 30/11/2017 22:21, Stephen Hemminger:
> > > > On Thu, 30 Nov 2017 18:35:11 +0100
> > > > Thomas Monjalon  wrote:
> > > > 
> > > > > 30/11/2017 18:15, Stephen Hemminger:
> > > > > > Some thoughts.
> > > > > > 1) Not all devices are PCI; look at recent VMBUS  
> > > > > 
> > > > > Yes, we need a syntax which works for every devices.
> > > > > I suggest to use the prefix "pci:" before the PCI id.
> > > > > We need also a prefix and ids for NXP buses.
> > > > > We could use "vmbus:" before VMBUS ids.
> > > > > How VMBUS ids look like?
> > > > > 
> > 
> > rte_devargs are easily accessible, user-readable. Only thing missing
> > would be requiring a 1-1 mapping between an rte_devargs and a port, thus
> > requiring PMDs to have at least one version of a device string that
> > would probe a single port (as is done with port= in mlx4).
> > 
> > Implementing an rte_devargs to rte_device in rte_bus is simple enough,
> > and this would allow implementing an rte_devargs to port_id in rte_eth.
> > 
> > What am I missing?
> 
> rte_devargs is identified by the name (pci id for pci device). It also
> includes other driver specific key-value options. It's not clear for the
> user to know which one (or few) of them should be used together with the
> PCI id to identify a specific port. For example, as you mentioned, in
> mlx4, it's "pci_id,port=x". It could be something else in other drivers.

Just for information, this "port=x" argument in mlx4 is consistent with the
value found in /sys/class/net/ethX/dev_port under Linux. If we choose to use
a port index (instead of a MAC or something else), it would make sense to
standardize it on the same order as given by the host OS for consistency
across all PMDs.

Devices with a single port per PCI address would simply use/allow "0".

> Actually, this patch also proposes a devarg like naming style: "name[,mac]".
> What it does try to do is to define a standard syntax, so that the user
> doesn't have to know any driver specific options.
> 
> However, the mac address is changeable, leaving this naming inconsistent.
> Well, in practice, PCI id is also changeable.
> 
> OTOH, having a consistent naming seems a bit far away from this patch's
> goal: define a standard ethdev naming and leave less harassment to the users.

I'm not a fan of the MAC naming scheme either, a kind of per-device physical
port index seems more robust and doesn't require much initialization to
determine how many ports are supported by the device and whether the index is
known/valid (particularly given the vast majority exposes only one per bus
address).

Would it make sense to have this name usable unmodified as a valid device
(-w) argument, including parameters?

If so, PMDs could append parameters while probing the underlying device, by
appending ",port=x", ",mac=x" followed by other unspecified parameters with
default values. This could uniquely identify the port _and_ its
configuration in a reusable fashion.

Otherwise if all we need is unique names, we can use the opposite and much
simpler approach. Let librte_ether assign them sequentially
(e.g. "rte_eth%d", no need for consistency with OS netdevices), applications
can figure the rest based on data structures if needed.

Thoughts?

> > > > > > 2) The name may have to be set before MAC address is determined on 
> > > > > > boot.  
> > > > > 
> > > > > I don't understand this comment.
> > > > > Do you mean MAC may be unknown when starting DPDK?
> > > > 
> > > > The MAC be known by the hardware, but the device would have to be
> > > > created before using  hardware to read it.
> > > 
> > > Indeed, it is a problem if we want to use this syntax for blacklist.
> > > 
> > > 
> > > > > > 3) The names themselves are not persistent or human friendly. This 
> > > > > > is hard
> > > > > >see the effort udev goes to.  
> > > > > 
> > > > > Yes udev has a syntax to identify devices. It can be inspiring.
> > > > > Qemu may also be inspiring:
> > > > >   
> > > > > https://github.com/qemu/qemu/blob/master/docs/qdev-device-use.txt

-- 
Adrien Mazarguil
6WIND


Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Ananyev, Konstantin
Hi Matan,

> -Original Message-
> From: Matan Azrad [mailto:ma...@mellanox.com]
> Sent: Sunday, December 3, 2017 1:47 PM
> To: Ananyev, Konstantin ; Neil Horman 
> ; Gaëtan Rivet
> 
> Cc: Thomas Monjalon ; Wu, Jingjing 
> ; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> 
> Hi Konstantine
> 
> > -Original Message-
> > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
> > Sent: Sunday, December 3, 2017 1:10 PM
> > To: Matan Azrad ; Neil Horman
> > ; Gaëtan Rivet 
> > Cc: Thomas Monjalon ; Wu, Jingjing
> > ; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> >
> >
> >
> > Hi Matan,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Matan Azrad
> > > Sent: Sunday, December 3, 2017 8:05 AM
> > > To: Neil Horman ; Gaëtan Rivet
> > > 
> > > Cc: Thomas Monjalon ; Wu, Jingjing
> > > ; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > >
> > > Hi
> > >
> > > > -Original Message-
> > > > From: Neil Horman [mailto:nhor...@tuxdriver.com]
> > > > Sent: Friday, December 1, 2017 2:10 PM
> > > > To: Gaëtan Rivet 
> > > > Cc: Matan Azrad ; Thomas Monjalon
> > > > ; Jingjing Wu ;
> > > > dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > > >
> > > > On Thu, Nov 30, 2017 at 02:24:43PM +0100, Gaëtan Rivet wrote:
> > > > > Hello Matan, Neil,
> > > > >
> > > > > I like the port ownership concept. I think it is needed to clarify
> > > > > some operations and should be useful to several subsystems.
> > > > >
> > > > > This patch could certainly be sub-divided however, and your
> > > > > current
> > > > > 1/5 should probably come after this one.
> > > > >
> > > > > Some comments inline.
> > > > >
> > > > > On Thu, Nov 30, 2017 at 07:36:11AM -0500, Neil Horman wrote:
> > > > > > On Tue, Nov 28, 2017 at 11:57:58AM +, Matan Azrad wrote:
> > > > > > > The ownership of a port is implicit in DPDK.
> > > > > > > Making it explicit is better from the next reasons:
> > > > > > > 1. It may be convenient for multi-process applications to know
> > which
> > > > > > >process is in charge of a port.
> > > > > > > 2. A library could work on top of a port.
> > > > > > > 3. A port can work on top of another port.
> > > > > > >
> > > > > > > Also in the fail-safe case, an issue has been met in testpmd.
> > > > > > > We need to check that the user is not trying to use a port
> > > > > > > which is already managed by fail-safe.
> > > > > > >
> > > > > > > Add ownership mechanism to DPDK Ethernet devices to avoid
> > > > > > > multiple management of a device by different DPDK entities.
> > > > > > >
> > > > > > > A port owner is built from owner id(number) and owner
> > > > > > > name(string) while the owner id must be unique to distinguish
> > > > > > > between two identical entity instances and the owner name can be
> > any name.
> > > > > > > The name helps to logically recognize the owner by different
> > > > > > > DPDK entities and allows easy debug.
> > > > > > > Each DPDK entity can allocate an owner unique identifier and
> > > > > > > can use it and its preferred name to owns valid ethdev ports.
> > > > > > > Each DPDK entity can get any port owner status to decide if it
> > > > > > > can manage the port or not.
> > > > > > >
> > > > > > > The current ethdev internal port management is not affected by
> > > > > > > this feature.
> > > > > > >
> > > > >
> > > > > The internal port management is not affected, but the external
> > > > > interface is, however. In order to respect port ownership,
> > > > > applications are forced to modify their port iterator, as shown by
> > > > > your
> > > > testpmd patch.
> > > > >
> > > > > I think it would be better to modify the current
> > > > > RTE_ETH_FOREACH_DEV to call RTE_FOREACH_DEV_OWNED_BY, and
> > > > > introduce a default owner that would represent the application
> > > > > itself (probably with the ID 0 and an owner string ""). Only with
> > > > > specific additional configuration should this default subset of 
> > > > > ethdev be
> > divided.
> > > > >
> > > > > This would make this evolution seamless for applications, at no
> > > > > cost to the complexity of the design.
> > > > >
> > > > > > > Signed-off-by: Matan Azrad 
> > > > > >
> > > > > >
> > > > > > This seems fairly racy.  What if one thread attempts to set
> > > > > > ownership on a port, while another is checking it on another cpu
> > > > > > in parallel.  It doesn't seem like it will protect against that at 
> > > > > > all.
> > > > > > It also doesn't protect against the possibility of multiple
> > > > > > threads attempting to poll for rx in parallel, which I think was
> > > > > > part of Thomas's origional statement regarding port ownership
> > > > > > (he noted that the lockless design implied only a single thread
> > > > > > should be allowed to poll
> > > > for receive or make configuration changes at a t

[dpdk-dev] [PATCH v2] virtio_net: kick guest even when virtio queue is full

2017-12-05 Thread Agota Benyhe
From: Patrik Andersson R 

Changing the vring call file descriptor during virtio device enqueue
operation may lead to "kick" on a file descriptor that is closed. As
a consequence the guest might not be notified of new packets in the
enqueue.

The suggested change will add some extra load on DPDK and on the
guest if the queue is frequently full. Any application using DPDK
should avoid attempting retransmissions when the zero packets are
enqueued.

To overcome this problem, the kick operation in virtio_dev_merge_rx()
was excluded from the pkt_idx > 0 condition. A similar change was
done in virtio_dev_rx().

Signed-off-by: Patrik Andersson R 
Signed-off-by: Agota Benyhe 
---
 lib/librte_vhost/virtio_net.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 6fee16e..dcc1879 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -59,6 +59,13 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t 
nr_vring)
 }
 
 static __rte_always_inline void
+kick_guest_if_necessary(const struct vhost_virtqueue *vq) {
+   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
+   && (vq->callfd >= 0))
+   eventfd_write(vq->callfd, (eventfd_t)1);
+}
+
+static __rte_always_inline void
 do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq,
  uint16_t to, uint16_t from, uint16_t size)
 {
@@ -344,8 +351,11 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
free_entries = avail_idx - start_idx;
count = RTE_MIN(count, free_entries);
count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST);
-   if (count == 0)
+
+   if (count == 0) {
+   kick_guest_if_necessary(vq);
goto out;
+   }
 
LOG_DEBUG(VHOST_DATA, "(%d) start_idx %d | end_idx %d\n",
dev->vid, start_idx, start_idx + count);
@@ -411,10 +421,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
/* flush used->idx update before we read avail->flags. */
rte_mb();
 
-   /* Kick the guest if necessary. */
-   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
-   && (vq->callfd >= 0))
-   eventfd_write(vq->callfd, (eventfd_t)1);
+   kick_guest_if_necessary(vq);
 out:
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
@@ -704,13 +711,9 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t 
queue_id,
 
/* flush used->idx update before we read avail->flags. */
rte_mb();
-
-   /* Kick the guest if necessary. */
-   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
-   && (vq->callfd >= 0))
-   eventfd_write(vq->callfd, (eventfd_t)1);
}
 
+   kick_guest_if_necessary(vq);
 out:
if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))
vhost_user_iotlb_rd_unlock(vq);
-- 
2.7.4



Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Ananyev, Konstantin


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Tuesday, December 5, 2017 11:12 AM
> To: Matan Azrad ; Neil Horman ; 
> Gaëtan Rivet 
> Cc: Thomas Monjalon ; Wu, Jingjing 
> ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> 
> Hi Matan,
> 
> > -Original Message-
> > From: Matan Azrad [mailto:ma...@mellanox.com]
> > Sent: Sunday, December 3, 2017 1:47 PM
> > To: Ananyev, Konstantin ; Neil Horman 
> > ; Gaëtan Rivet
> > 
> > Cc: Thomas Monjalon ; Wu, Jingjing 
> > ; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> >
> > Hi Konstantine
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
> > > Sent: Sunday, December 3, 2017 1:10 PM
> > > To: Matan Azrad ; Neil Horman
> > > ; Gaëtan Rivet 
> > > Cc: Thomas Monjalon ; Wu, Jingjing
> > > ; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > >
> > >
> > >
> > > Hi Matan,
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Matan Azrad
> > > > Sent: Sunday, December 3, 2017 8:05 AM
> > > > To: Neil Horman ; Gaëtan Rivet
> > > > 
> > > > Cc: Thomas Monjalon ; Wu, Jingjing
> > > > ; dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > > >
> > > > Hi
> > > >
> > > > > -Original Message-
> > > > > From: Neil Horman [mailto:nhor...@tuxdriver.com]
> > > > > Sent: Friday, December 1, 2017 2:10 PM
> > > > > To: Gaëtan Rivet 
> > > > > Cc: Matan Azrad ; Thomas Monjalon
> > > > > ; Jingjing Wu ;
> > > > > dev@dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> > > > >
> > > > > On Thu, Nov 30, 2017 at 02:24:43PM +0100, Gaëtan Rivet wrote:
> > > > > > Hello Matan, Neil,
> > > > > >
> > > > > > I like the port ownership concept. I think it is needed to clarify
> > > > > > some operations and should be useful to several subsystems.
> > > > > >
> > > > > > This patch could certainly be sub-divided however, and your
> > > > > > current
> > > > > > 1/5 should probably come after this one.
> > > > > >
> > > > > > Some comments inline.
> > > > > >
> > > > > > On Thu, Nov 30, 2017 at 07:36:11AM -0500, Neil Horman wrote:
> > > > > > > On Tue, Nov 28, 2017 at 11:57:58AM +, Matan Azrad wrote:
> > > > > > > > The ownership of a port is implicit in DPDK.
> > > > > > > > Making it explicit is better from the next reasons:
> > > > > > > > 1. It may be convenient for multi-process applications to know
> > > which
> > > > > > > >process is in charge of a port.
> > > > > > > > 2. A library could work on top of a port.
> > > > > > > > 3. A port can work on top of another port.
> > > > > > > >
> > > > > > > > Also in the fail-safe case, an issue has been met in testpmd.
> > > > > > > > We need to check that the user is not trying to use a port
> > > > > > > > which is already managed by fail-safe.
> > > > > > > >
> > > > > > > > Add ownership mechanism to DPDK Ethernet devices to avoid
> > > > > > > > multiple management of a device by different DPDK entities.
> > > > > > > >
> > > > > > > > A port owner is built from owner id(number) and owner
> > > > > > > > name(string) while the owner id must be unique to distinguish
> > > > > > > > between two identical entity instances and the owner name can be
> > > any name.
> > > > > > > > The name helps to logically recognize the owner by different
> > > > > > > > DPDK entities and allows easy debug.
> > > > > > > > Each DPDK entity can allocate an owner unique identifier and
> > > > > > > > can use it and its preferred name to owns valid ethdev ports.
> > > > > > > > Each DPDK entity can get any port owner status to decide if it
> > > > > > > > can manage the port or not.
> > > > > > > >
> > > > > > > > The current ethdev internal port management is not affected by
> > > > > > > > this feature.
> > > > > > > >
> > > > > >
> > > > > > The internal port management is not affected, but the external
> > > > > > interface is, however. In order to respect port ownership,
> > > > > > applications are forced to modify their port iterator, as shown by
> > > > > > your
> > > > > testpmd patch.
> > > > > >
> > > > > > I think it would be better to modify the current
> > > > > > RTE_ETH_FOREACH_DEV to call RTE_FOREACH_DEV_OWNED_BY, and
> > > > > > introduce a default owner that would represent the application
> > > > > > itself (probably with the ID 0 and an owner string ""). Only with
> > > > > > specific additional configuration should this default subset of 
> > > > > > ethdev be
> > > divided.
> > > > > >
> > > > > > This would make this evolution seamless for applications, at no
> > > > > > cost to the complexity of the design.
> > > > > >
> > > > > > > > Signed-off-by: Matan Azrad 
> > > > > > >
> > > > > > >
> > > > > > > This seems fairly racy.  What if one thread attempts to set
> > > > > > > ownershi

Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Thomas Monjalon
Hi,

I will give my view on locking and synchronization in a different email.
Let's discuss about the API here.

05/12/2017 12:12, Ananyev, Konstantin:
> From: Matan Azrad [mailto:ma...@mellanox.com]
> > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]

> > > If the goal is just to have an ability to recognize is that device is 
> > > managed by
> > > another device (failsafe, bonding, etc.),  then I think all we need is a 
> > > pointer
> > > to rte_eth_dev_data of the owner (NULL would mean no owner).
> > 
> > I think string is better than a pointer from the next reasons:
> > 1. It is more human friendly than pointers for debug and printing.
> 
> We can have a function that would take an owner pointer and produce nice
> pretty formatted text explanation: "owned by fail-safe device at port X" or 
> so.

I don't think it is possible or convenient to have such function.
Keep in mind that the owner can be an application thread.
If you prefer using a single function pointer (may help for
atomic implementation), we can allocate an owner structure containing
a name as a string to identify the owner in human readable format.
Then we just have to set the pointer of this struct to rte_eth_dev_data.


> What I meant - this api to set/get ownership should be sort of internal to 
> ethdev layer.
> Let say it would be used for failsafe/bonding (any other compound) device 
> that needs
> to own/manage several low-level devices.
> So in normal situation user wouldn't need to use that API directly at all.

Again, the application may use this API to declare its ownership.
And anwyway, it may be interesting from an application point of view
to be able to list every devices and their internal owners.


Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Thomas Monjalon
05/12/2017 12:44, Ananyev, Konstantin:
> Just forgot to mention - I don' think it is good idea to disallow secondary 
> process to set  theowner.

I think we all agree on that.
My initial suggestion was to use the ownership in secondary processes.
I think Matan forbid it as a first step because there is no
multi-process synchronization currently.

> Let say  in secondary process I have few tap/ring/pcap devices.
> Why it shouldn't be allowed to unite them under bonding device and make that 
> device to own them?
> That's why I think get/set owner better to be atomic.
> If the owner is just a pointer - in that case get operation will be atomic by 
> nature,
> set could be implemented just by CAS.

It would be perfect.
Can we be sure that the atomic will work perfectly on shared memory?
On every architectures?


Re: [dpdk-dev] [PATCH] maintainers: claim responsibility for virtio PMD

2017-12-05 Thread Yuanhan Liu
On Fri, Dec 01, 2017 at 08:06:04PM +0800, Yuanhan Liu wrote:
> On Fri, Dec 01, 2017 at 07:04:35PM +0800, Tiwei Bie wrote:
> > Add myself as co-maintainer for virtio driver.
> > 
> > Signed-off-by: Tiwei Bie 
> > ---
> > Hi Yuanhan and Maxime,
> > 
> > As there will be more and more virtio related stuffs (e.g. vhost-pci,
> > virtio crypto, vhost crypto) in DPDK, more maintaining efforts will be
> > needed for virtio/vhost. So I volunteer to co-maintain virtio driver.
> > Hope this could help!
> 
> I'm glad that you are offering help! And
> 
> Acked-by: Yuanhan Liu 

It was a mistake; I switched to an older machine, with older configs.
Sorry for that. And here is the right one:

Acked-by: Yuanhan Liu 

--yliu


Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name

2017-12-05 Thread Thomas Monjalon
05/12/2017 12:04, Adrien Mazarguil:
> Hi Yuanhan,
> 
> On Mon, Dec 04, 2017 at 09:55:31PM +0800, Yuanhan Liu wrote:
> > rte_devargs is identified by the name (pci id for pci device). It also
> > includes other driver specific key-value options. It's not clear for the
> > user to know which one (or few) of them should be used together with the
> > PCI id to identify a specific port. For example, as you mentioned, in
> > mlx4, it's "pci_id,port=x". It could be something else in other drivers.
> 
> Just for information, this "port=x" argument in mlx4 is consistent with the
> value found in /sys/class/net/ethX/dev_port under Linux. If we choose to use
> a port index (instead of a MAC or something else), it would make sense to
> standardize it on the same order as given by the host OS for consistency
> across all PMDs.

Good idea, thanks.

I think we will MAC in some cases and port number in others.
It is important to have identifiers available even without initializing
the device.

> Devices with a single port per PCI address would simply use/allow "0".
> 
> > Actually, this patch also proposes a devarg like naming style: "name[,mac]".
> > What it does try to do is to define a standard syntax, so that the user
> > doesn't have to know any driver specific options.
> > 
> > However, the mac address is changeable, leaving this naming inconsistent.
> > Well, in practice, PCI id is also changeable.
> > 
> > OTOH, having a consistent naming seems a bit far away from this patch's
> > goal: define a standard ethdev naming and leave less harassment to the 
> > users.
> 
> I'm not a fan of the MAC naming scheme either, a kind of per-device physical
> port index seems more robust and doesn't require much initialization to
> determine how many ports are supported by the device and whether the index is
> known/valid (particularly given the vast majority exposes only one per bus
> address).
> 
> Would it make sense to have this name usable unmodified as a valid device
> (-w) argument, including parameters?

Yes we must provide some new key/value arguments for devargs.
So it can be used anywhere, including -w/-b options in DPDK
or port configuration in OVS.

> If so, PMDs could append parameters while probing the underlying device, by
> appending ",port=x", ",mac=x" followed by other unspecified parameters with
> default values. This could uniquely identify the port _and_ its
> configuration in a reusable fashion.

Question: should we separate device identification and configuration
in the syntax?

> Otherwise if all we need is unique names, we can use the opposite and much
> simpler approach. Let librte_ether assign them sequentially
> (e.g. "rte_eth%d", no need for consistency with OS netdevices), applications
> can figure the rest based on data structures if needed.

No, unique names are not useful / not usable by users.



Re: [dpdk-dev] [PATCH] bus/vdev: add custom scan hook

2017-12-05 Thread Tan, Jianfeng


On 12/5/2017 4:41 PM, Thomas Monjalon wrote:


05/12/2017 09:27, Tan, Jianfeng:

>

> On 12/4/2017 5:31 PM, Thomas Monjalon wrote:

> >

> > The hook is in bus->scan().

> > I think we should launch a bus scan when there is a new device event.

>

> That's what I'm trying to say. We finally need to execute a handler as

> of a device event to finish the job.

Please be more specific, I am not sure to understand.



By the handler, I mean when we monitor the udev by select/poll/epoll and 
device uevents come, the application will execute a handler (or just a 
function) for each of such uevent. Then why not adding the vdev there?


Thanks,
Jianfeng


Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name

2017-12-05 Thread Yuanhan Liu
On Tue, Dec 05, 2017 at 02:20:05PM +0100, Thomas Monjalon wrote:
> 05/12/2017 12:04, Adrien Mazarguil:
> 
> > Hi Yuanhan,
> 
> >
> 
> > On Mon, Dec 04, 2017 at 09:55:31PM +0800, Yuanhan Liu wrote:
> > > rte_devargs is identified by the name (pci id for pci device). It also
> > > includes other driver specific key-value options. It's not clear for the
> > > user to know which one (or few) of them should be used together with the
> > > PCI id to identify a specific port. For example, as you mentioned, in
> > > mlx4, it's "pci_id,port=x". It could be something else in other drivers.
> 
> > Just for information, this "port=x" argument in mlx4 is consistent with the
> > value found in /sys/class/net/ethX/dev_port under Linux. If we choose to use
> > a port index (instead of a MAC or something else), it would make sense to
> > standardize it on the same order as given by the host OS for consistency
> > across all PMDs.

Thanks for the info.

But FYI, for most of other PMDs, such sys file won't exist, as the host
driver should have been unbind and bind with something like uio. So I
don't think it applies to all other nics.

> 
> Good idea, thanks.
> 
>  
> 
> I think we will MAC in some cases and port number in others.

Could you list some examples?

> It is important to have identifiers available even without initializing
> the device.

I don't object that. I think that's good for whitelisting/blacklisting
before the device initialization. However, once the initialization is
done, everything could be used to identify a port (say, the mac this patch
mentioned).

> > Devices with a single port per PCI address would simply use/allow "0".

Yes.

> >
> 
> > > Actually, this patch also proposes a devarg like naming style: "name[,mac]
> ".
> > > What it does try to do is to define a standard syntax, so that the user
> > > doesn't have to know any driver specific options.
> > >
> > > However, the mac address is changeable, leaving this naming inconsistent.
> > > Well, in practice, PCI id is also changeable.
> 
> > >
> 
> > > OTOH, having a consistent naming seems a bit far away from this patch's
> > > goal: define a standard ethdev naming and leave less harassment to the
> users.
> 
> >
> 
> > I'm not a fan of the MAC naming scheme either, a kind of per-device physical
> > port index seems more robust and doesn't require much initialization to
> > determine how many ports are supported by the device and whether the index 
> > is
> > known/valid (particularly given the vast majority exposes only one per bus
> > address).

I'm not a fan of the MAC naming, neither. The reason this patch proposes mac
naming is that it's more clear for the user to specify a port. I also agree
that the port index could be another good option.

One thing I really haven't considered yet is how it becomes when the VF
reprenstor comes to play? (more guys are cc'ed).

> > Would it make sense to have this name usable unmodified as a valid device
> > (-w) argument, including parameters?
> 
> Yes we must provide some new key/value arguments for devargs.
> So it can be used anywhere, including -w/-b options in DPDK
> or port configuration in OVS.
>  
> 
> > If so, PMDs could append parameters while probing the underlying device, by
> > appending ",port=x", ",mac=x" followed by other unspecified parameters with
> > default values. This could uniquely identify the port _and_ its
> > configuration in a reusable fashion.
> 
>  
> 
> Question: should we separate device identification and configuration
> in the syntax?

Yes, we should. I think we should parse the identification generically,
and leave the left to the driver.

> > Otherwise if all we need is unique names, we can use the opposite and much
> > simpler approach. Let librte_ether assign them sequentially
> > (e.g. "rte_eth%d", no need for consistency with OS netdevices), applications
> > can figure the rest based on data structures if needed.
> 
> No, unique names are not useful / not usable by users.

Agree. We don't really need another unique name, since the port id is
already unique.

However, I think a consistent naming might be useful: user doesn't have
to worry it even though MAC is changed or the PCI slot is changed.

--yliu


Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario

2017-12-05 Thread Yuanhan Liu
On Tue, Dec 05, 2017 at 06:59:26AM +, Yang, Zhiyong wrote:
> Hi all,
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhiyong Yang
> > Sent: Thursday, November 30, 2017 5:47 PM
> > To: dev@dpdk.org; y...@fridaylinux.org; maxime.coque...@redhat.com
> > Cc: Wang, Wei W ; Tan, Jianfeng
> > 
> > Subject: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
> > supporting VM2VM scenario
> > 
> > Vhostpci PMD is a new type driver working in guest OS which has ability to 
> > drive
> > the vhostpci modern pci device, which is a new virtio device.
> > 
> > The following linking is about vhostpci design:
> > 
> > An initial device design is presented at KVM Forum'16:
> > http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-
> > pci.pdf
> > The latest device design and implementation will be posted to the QEMU
> > community soon.
> > 
> 
> The latest version vhostpci device code has been posted to QEMU community.
> [PATCH v3 0/7] Vhost-pci for inter-VM communication
> http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00494.html

I didn't follow on the qemu ML. Do you know when it's likely it will
get accepted?

--yliu
> 
> [2016] Design of Vhost-pci by Wei Wang - YouTube linking as reference.
> https://www.youtube.com/watch?v=xITj0qsaSJQ
> 
> thanks
> Zhiyong


Re: [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message

2017-12-05 Thread Yuanhan Liu
On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/15/2017 12:41 PM, Jianfeng Tan wrote:
> >In a running VM, operations (like device attach/detach) will
> >trigger the QEMU to resend set_mem_table to vhost-user backend.
> >
> >DPDK vhost-user handles this message rudely by unmap all existing
> >regions and map new ones. This might lead to segfault if there
> >is pmd thread just trying to touch those unmapped memory regions.
> >
> >But for most cases, except VM memory hotplug, QEMU still sends the
> >set_mem_table message even the memory regions are not changed as
> >QEMU vhost-user filters out those not backed by file (fd > 0).
> >
> >To fix this case, we add a check in the handler to see if the
> >memory regions are really changed; if not, we just keep old memory
> >regions.
> >
> >Fixes: 8f972312b8f4 ("vhost: support vhost-user")
> >
> >CC: sta...@dpdk.org
> >
> >CC: Yuanhan Liu 
> >CC: Maxime Coquelin 
> >
> >Reported-by: Yang Zhang 
> >Reported-by: Xin Long 
> >Signed-off-by: Yi Yang 
> >Signed-off-by: Jianfeng Tan 
> >---
> >  lib/librte_vhost/vhost_user.c | 33 +
> >  1 file changed, 33 insertions(+)
> 
> Reviewed-by: Maxime Coquelin 

Applied to dpdk-next-virtio.

Thanks.

--yliu


Re: [dpdk-dev] [PATCH 2/2] net/virtio: support GUEST ANNOUNCE

2017-12-05 Thread Yuanhan Liu
On Thu, Nov 30, 2017 at 02:41:12AM +, Wang, Xiao W wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> > Sent: Monday, November 27, 2017 8:49 PM
> > To: Wang, Xiao W 
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH 2/2] net/virtio: support GUEST ANNOUNCE
> > 
> > On Fri, Nov 24, 2017 at 03:04:00AM -0800, Xiao Wang wrote:
> > > When live migration is done, for the backup VM, either the virtio
> > > frontend or the vhost backend needs to send out gratuitous RARP packet
> > > to announce its new network location.
> > >
> > > This patch enables VIRTIO_NET_F_GUEST_ANNOUNCE feature to support
> > live
> > > migration scenario where the vhost backend doesn't have the ability to
> > > generate RARP packet.
> > 
> > Yes, it's a feature good to have.
> > 
> > > +static int
> > > +virtio_dev_pause(struct rte_eth_dev *dev)
> > > +{
> > > + struct virtio_hw *hw = dev->data->dev_private;
> > > +
> > > + if (hw->started == 0)
> > > + return -1;
> > > + hw->started = 0;
> > > + /*
> > > +  * Prevent the worker thread from touching queues to avoid condition,
> > > +  * 1 ms should be enough for the ongoing Tx function to finish.
> > > +  */
> > > + rte_delay_ms(1);
> > > + return 0;
> > > +}
> > > +
> > > +static void
> > > +virtio_dev_resume(struct rte_eth_dev *dev)
> > > +{
> > > + struct virtio_hw *hw = dev->data->dev_private;
> > > +
> > > + hw->started = 1;
> > > +}
> > 
> > However, the implementation (stop first, pause for 1ms, duplicate another
> > Tx function, resume) doesn't seem elegant.
> > 
> > You probably could try something like DPDK vhost does:
> > 
> > - set a flag when S_ANNOUCE is received
> > - inject a pkt when such flag is set in the xmit function
> > 
> > You then should be able to get rid of all of above stuffs.
> > 
> > --yliu
> 
> The difference is that the virtio port may just receive packet, without xmit.

Thanks, I missed that.

However, you really should not add a duplicate function. It adds more
maintain effort. I think you probably could just invoke the tx_pkt_burst
callback directly. You have stopped the device after all. What's the
necessary to duplicate it?

--yliu


Re: [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message

2017-12-05 Thread Maxime Coquelin



On 12/05/2017 03:19 PM, Yuanhan Liu wrote:

On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote:



On 11/15/2017 12:41 PM, Jianfeng Tan wrote:

In a running VM, operations (like device attach/detach) will
trigger the QEMU to resend set_mem_table to vhost-user backend.

DPDK vhost-user handles this message rudely by unmap all existing
regions and map new ones. This might lead to segfault if there
is pmd thread just trying to touch those unmapped memory regions.

But for most cases, except VM memory hotplug, 


FYI, Victor is working on implementing a lock-less protection mechanism
to prevent crashes in such cases. It is intended first to protect
log_base in case of multiqueue + live-migration, but would solve thi
issue too.


QEMU still sends the

set_mem_table message even the memory regions are not changed as
QEMU vhost-user filters out those not backed by file (fd > 0).

To fix this case, we add a check in the handler to see if the
memory regions are really changed; if not, we just keep old memory
regions.

Fixes: 8f972312b8f4 ("vhost: support vhost-user")

CC: sta...@dpdk.org

CC: Yuanhan Liu 
CC: Maxime Coquelin 

Reported-by: Yang Zhang 
Reported-by: Xin Long 
Signed-off-by: Yi Yang 
Signed-off-by: Jianfeng Tan 
---
  lib/librte_vhost/vhost_user.c | 33 +
  1 file changed, 33 insertions(+)


Reviewed-by: Maxime Coquelin 


Applied to dpdk-next-virtio.

Thanks.

--yliu



Maxime


Re: [dpdk-dev] [PATCH v2 4/4] vhost: destroy unused virtqueues when multiqueue not negotiated

2017-12-05 Thread Laszlo Ersek
Hi Maxime,

On 12/05/17 09:34, Maxime Coquelin wrote:
> QEMU sends VHOST_USER_SET_VRING_CALL requests for all queues
> declared in QEMU command line before the guest is started.
> It has the effect in DPDK vhost-user backend to allocate vrings
> for all queues declared by QEMU.
> 
> If the first driver being used does not support multiqueue,
> the device never changes to VIRTIO_DEV_RUNNING state as only
> the first queue pair is initialized. One driver impacted by
> this bug is virtio-net's iPXE driver which does not support
> VIRTIO_NET_F_MQ feature.
> 
> It is safe to destroy unused virtqueues in SET_FEATURES request
> handler, as it is ensured the device is not in running state
> at this stage, so virtqueues aren't being processed.
> 
> Signed-off-by: Maxime Coquelin 
> ---
>  lib/librte_vhost/vhost_user.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index a5e1f2482..b17080215 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -173,6 +173,7 @@ vhost_user_get_features(struct virtio_net *dev)
>  static int
>  vhost_user_set_features(struct virtio_net *dev, uint64_t features)
>  {
> + int i;
>   uint64_t vhost_features = 0;
>  
>   rte_vhost_driver_get_features(dev->ifname, &vhost_features);
> @@ -216,6 +217,24 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t 
> features)
>   (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ? "on" : "off",
>   (dev->features & (1ULL << VIRTIO_F_VERSION_1)) ? "on" : "off");
>  
> + if (!(dev->features & (1ULL << VIRTIO_NET_F_MQ))) {
> + /*
> +  * Remove all but first queue pair if MQ hasn't been
> +  * negotiated. This is safe because the device is not
> +  * running at this stage.
> +  */
> + for (i = dev->nr_vring; i > 1; i--) {
> + struct vhost_virtqueue *vq = dev->virtqueue[i];

Sorry, I don't have any experience with dpdk.

But, if "dev->virtqueue" has "dev->nr_vring" elements, then this loop is
off-by one; dev->virtqueue[dev->nr_vring] should never be accessed. For
example, if you have three queues, numbered 0, 1 and 2, this loop will
access/release virtqueue[3] (bad) and virtqueue[2] (good).

Instead, I suggest:

  i = dev->nr_vring;
  while (i > 2) {
struct vhost_virtqueue *vq;

vq = dev->virtqueue[--i];
/* the rest here */
  }

The loop body is entered with "i" standing for "how many queues are left
that should be freed".

Thanks
Laszlo

> +
> + if (!vq)
> + continue;
> +
> + cleanup_vq(vq, 1);
> + free_vq(vq);
> + dev->nr_vring--;
> + }
> + }
> +
>   return 0;
>  }
>  
> 



Re: [dpdk-dev] The limitation of VHOST_MEMORY_MAX_NREGIONS in virtio-user with vhost-user backend

2017-12-05 Thread Yuanhan Liu
On Tue, Dec 05, 2017 at 06:58:53AM +, Wang, Wei 5. (NSB - CN/Hangzhou) 
wrote:
> 
> 
> Hi, all
> 
> In DPDK document, it it described that Virtio in containers Cannot work when 
> there are more than VHOST_MEMORY_MAX_NREGIONS(8) hugepages. In another word, 
> do not use 2MB hugepage so far. Do you know the reason of this limitation?

It comes from the vhost-user spec.

> In my envirionment, the pdpe1gb is not set in cpu flag, so hugepage can't be 
> set to 1GB size. The hugepage number shall be more than 8
> 
> 
> So is there any solution or workaround to fix this limitation? Or change dpdk 
> code to fix this limitation?
> 

AFAIK, there are no workarounds. And I think the DPDK EAL memory code need be
refactored a bit to fix this limitation. Unforunately, seems there are no work
on that recently.

--yliu


Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Bruce Richardson
On Tue, Dec 05, 2017 at 12:53:36PM +0100, Thomas Monjalon wrote:
> 05/12/2017 12:44, Ananyev, Konstantin:
> > Just forgot to mention - I don' think it is good idea to disallow secondary 
> > process to set  theowner.
> 
> I think we all agree on that.
> My initial suggestion was to use the ownership in secondary processes.
> I think Matan forbid it as a first step because there is no
> multi-process synchronization currently.
> 
> > Let say  in secondary process I have few tap/ring/pcap devices.
> > Why it shouldn't be allowed to unite them under bonding device and make 
> > that device to own them?
> > That's why I think get/set owner better to be atomic.
> > If the owner is just a pointer - in that case get operation will be atomic 
> > by nature,
> > set could be implemented just by CAS.
> 
> It would be perfect.
> Can we be sure that the atomic will work perfectly on shared memory?

The sharing of memory is an OS-level construct in managing page tables,
more than anything else. For atomic operations, a memory address is a
memory address, whether it is shared or private to a process.

> On every architectures?

All architectures should have an atomic compare-and-set equivalent
operation for it's native pointer size. In the unlikely case we have to
support one that doesn't, we can special-case that in some other way.


Re: [dpdk-dev] [PATCH v2 4/4] vhost: destroy unused virtqueues when multiqueue not negotiated

2017-12-05 Thread Maxime Coquelin

Hi Laszlo,

On 12/05/2017 03:40 PM, Laszlo Ersek wrote:

Hi Maxime,

On 12/05/17 09:34, Maxime Coquelin wrote:

QEMU sends VHOST_USER_SET_VRING_CALL requests for all queues
declared in QEMU command line before the guest is started.
It has the effect in DPDK vhost-user backend to allocate vrings
for all queues declared by QEMU.

If the first driver being used does not support multiqueue,
the device never changes to VIRTIO_DEV_RUNNING state as only
the first queue pair is initialized. One driver impacted by
this bug is virtio-net's iPXE driver which does not support
VIRTIO_NET_F_MQ feature.

It is safe to destroy unused virtqueues in SET_FEATURES request
handler, as it is ensured the device is not in running state
at this stage, so virtqueues aren't being processed.

Signed-off-by: Maxime Coquelin 
---
  lib/librte_vhost/vhost_user.c | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index a5e1f2482..b17080215 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -173,6 +173,7 @@ vhost_user_get_features(struct virtio_net *dev)
  static int
  vhost_user_set_features(struct virtio_net *dev, uint64_t features)
  {
+   int i;
uint64_t vhost_features = 0;
  
  	rte_vhost_driver_get_features(dev->ifname, &vhost_features);

@@ -216,6 +217,24 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t 
features)
(dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) ? "on" : "off",
(dev->features & (1ULL << VIRTIO_F_VERSION_1)) ? "on" : "off");
  
+	if (!(dev->features & (1ULL << VIRTIO_NET_F_MQ))) {

+   /*
+* Remove all but first queue pair if MQ hasn't been
+* negotiated. This is safe because the device is not
+* running at this stage.
+*/
+   for (i = dev->nr_vring; i > 1; i--) {
+   struct vhost_virtqueue *vq = dev->virtqueue[i];


Sorry, I don't have any experience with dpdk.

But, if "dev->virtqueue" has "dev->nr_vring" elements, then this loop is
off-by one; dev->virtqueue[dev->nr_vring] should never be accessed. For
example, if you have three queues, numbered 0, 1 and 2, this loop will
access/release virtqueue[3] (bad) and virtqueue[2] (good).


Right, thanks for spotting this.

I didn't noticed my mistake while testing it because of the NULL check
in the loop.


Instead, I suggest:

   i = dev->nr_vring;
   while (i > 2) {
 struct vhost_virtqueue *vq;

 vq = dev->virtqueue[--i];
 /* the rest here */
   }

The loop body is entered with "i" standing for "how many queues are left
that should be freed".


Yes, that sounds cleaner. I think dev->nr_vring can safely be
decremented directly, so "i" could be skipped.

Thanks!
Maxime


Thanks
Laszlo


+
+   if (!vq)
+   continue;
+
+   cleanup_vq(vq, 1);
+   free_vq(vq);
+   dev->nr_vring--;
+   }
+   }
+
return 0;
  }
  





Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Ananyev, Konstantin

>> Just forgot to mention - I don' think it is good idea to disallow secondary 
>> process to set theowner.
 
>I think we all agree on that.
>My initial suggestion was to use the ownership in secondary processes.
>I think Matan forbid it as a first step because there is no
>multi-process synchronization currently.
 
>> Let say in secondary process I have few tap/ring/pcap devices.
>> Why it shouldn't be allowed to unite them under bonding device and make that 
>> device to own them?
>> That's why I think get/set owner better to be atomic.
>> If the owner is just a pointer - in that case get operation will be atomic 
>> by nature,
>> set could be implemented just by CAS.
 
>It would be perfect.
>Can we be sure that the atomic will work perfectly on shared memory?
>On every architectures?

I believe - yes, how otherwise rte_ring and rte_mbuf would work for MP? :)
Konstantin


Re: [dpdk-dev] [PATCH] vhost: support UDP Fragmentation Offload

2017-12-05 Thread Maxime Coquelin



On 11/21/2017 07:56 AM, Jiayu Hu wrote:

In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO
and guest UFO. Guest UFO means the frontend can receive large UDP packets,
and host UFO means the backend can receive large UDP packets. This patch
supports host UFO and guest UFO for vhost-user.

Signed-off-by: Jiayu Hu
---
  lib/librte_mbuf/rte_mbuf.h|  7 +++
  lib/librte_vhost/vhost.h  |  2 ++
  lib/librte_vhost/virtio_net.c | 10 ++
  3 files changed, 19 insertions(+)


Reviewed-by: Maxime Coquelin 

Thanks,
Maxime


Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Ananyev, Konstantin

Hi Thomas,

> Hi,
 
> I will give my view on locking and synchronization in a different email.
> Let's discuss about the API here.
 
> 05/12/2017 12:12, Ananyev, Konstantin:
> >> From: Matan Azrad [mailto:ma...@mellanox.com]
>> > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
 
> > > > If the goal is just to have an ability to recognize is that device is 
> > > > managed by
> > > > another device (failsafe, bonding, etc.), then I think all we need is a 
> > > > pointer
> > > > to rte_eth_dev_data of the owner (NULL would mean no owner).
> > > 
> > > I think string is better than a pointer from the next reasons:
> > > 1. It is more human friendly than pointers for debug and printing.
> > 
> > We can have a function that would take an owner pointer and produce nice
> > pretty formatted text explanation: "owned by fail-safe device at port X" or 
> > so.
 
> I don't think it is possible or convenient to have such function.

Why do you think it is not possible?

> Keep in mind that the owner can be an application thread.
> If you prefer using a single function pointer (may help for
> atomic implementation), we can allocate an owner structure containing
> a name as a string to identify the owner in human readable format.
> Then we just have to set the pointer of this struct to rte_eth_dev_data.

Basically you'd like to have an ability to set something different then
pointer to rte_eth_dev_data as an owner, right?
I think this is possible too, just not sure it will useful.
 
> > What I meant - this api to set/get ownership should be sort of internal to 
> > ethdev layer.
> > Let say it would be used for failsafe/bonding (any other compound) device 
> > that needs
> > to own/manage several low-level devices.
> > So in normal situation user wouldn't need to use that API directly at all.
 
> Again, the application may use this API to declare its ownership.

Could you explain that a bit: what would mean 'application declares an 
ownership on device'?
Does it mean that no other application will be allowed to do any control op on 
that device
till application will clear its ownership?
I.E. make sure that at each moment only one particular thread can modify device 
configuration?
Or would it be totally informal and second application will be free to ignore 
it?

If it will be the second one - I personally don't see much point in it.
If it the first one - then simplest and most straightforward way would be -
introduce a mutex (either per device or just per whole rte_eth_dev[]) and force
each control op to grab it at entrance release at exit.

> And anwyway, it may be interesting from an application point of view
> to be able to list every devices and their internal owners.

Yes sure application is free to call 'get' to retrieve information etc.
What I am saying for normal operation - application don't have to call that API.
I.E. - we don't need to change testpmd, etc. apps because that API was 
introduced.

Konstantin




Re: [dpdk-dev] [PATCH] bus/vdev: add custom scan hook

2017-12-05 Thread Thomas Monjalon
05/12/2017 14:56, Tan, Jianfeng:
> 
> On 12/5/2017 4:41 PM, Thomas Monjalon wrote:
> >
> > 05/12/2017 09:27, Tan, Jianfeng:
> >
> > >
> >
> > > On 12/4/2017 5:31 PM, Thomas Monjalon wrote:
> >
> > > >
> >
> > > > The hook is in bus->scan().
> >
> > > > I think we should launch a bus scan when there is a new device event.
> >
> > >
> >
> > > That's what I'm trying to say. We finally need to execute a handler as
> >
> > > of a device event to finish the job.
> >
> > Please be more specific, I am not sure to understand.
> >
> 
> By the handler, I mean when we monitor the udev by select/poll/epoll and 
> device uevents come, the application will execute a handler (or just a 
> function) for each of such uevent. Then why not adding the vdev there?

Because it must work for hotplug, and initial scan too.
We can also think to application requiring a manual scan.
The bus scan is the right place to have every scans called. That's simple.



[dpdk-dev] [PATCH] maintainers: claim co maintainership of docs

2017-12-05 Thread Marko Kovacevic
Signed-off-by: Marko Kovacevic 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f0baeb4..3ed7eb8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -36,6 +36,7 @@ M: maintain...@dpdk.org
 
 Documentation (with overlaps)
 M: John McNamara 
+M: Marko Kovacevic 
 F: README
 F: doc/
 
-- 
2.9.5



Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name

2017-12-05 Thread Thomas Monjalon
05/12/2017 14:58, Yuanhan Liu:
> On Tue, Dec 05, 2017 at 02:20:05PM +0100, Thomas Monjalon wrote:
> > 05/12/2017 12:04, Adrien Mazarguil:
> > 
> > > Hi Yuanhan,
> > 
> > >
> > 
> > > On Mon, Dec 04, 2017 at 09:55:31PM +0800, Yuanhan Liu wrote:
> > > > rte_devargs is identified by the name (pci id for pci device). It also
> > > > includes other driver specific key-value options. It's not clear for the
> > > > user to know which one (or few) of them should be used together with the
> > > > PCI id to identify a specific port. For example, as you mentioned, in
> > > > mlx4, it's "pci_id,port=x". It could be something else in other drivers.
> > 
> > > Just for information, this "port=x" argument in mlx4 is consistent with 
> > > the
> > > value found in /sys/class/net/ethX/dev_port under Linux. If we choose to 
> > > use
> > > a port index (instead of a MAC or something else), it would make sense to
> > > standardize it on the same order as given by the host OS for consistency
> > > across all PMDs.
> 
> Thanks for the info.
> 
> But FYI, for most of other PMDs, such sys file won't exist, as the host
> driver should have been unbind and bind with something like uio. So I
> don't think it applies to all other nics.
> 
> > 
> > Good idea, thanks.
> > 
> >  
> > 
> > I think we will MAC in some cases and port number in others.
> 
> Could you list some examples?

Port number could be used for Mellanox devices.
MAC could be used when devices are already probed in DPDK,
but can it be used for blacklisting? Do we know the MAC address?


> > It is important to have identifiers available even without initializing
> > the device.
> 
> I don't object that. I think that's good for whitelisting/blacklisting
> before the device initialization. However, once the initialization is
> done, everything could be used to identify a port (say, the mac this patch
> mentioned).
> 
> > > Devices with a single port per PCI address would simply use/allow "0".
> 
> Yes.
> 
> > >
> > 
> > > > Actually, this patch also proposes a devarg like naming style: 
> > > > "name[,mac]
> > ".
> > > > What it does try to do is to define a standard syntax, so that the user
> > > > doesn't have to know any driver specific options.
> > > >
> > > > However, the mac address is changeable, leaving this naming 
> > > > inconsistent.
> > > > Well, in practice, PCI id is also changeable.
> > 
> > > >
> > 
> > > > OTOH, having a consistent naming seems a bit far away from this patch's
> > > > goal: define a standard ethdev naming and leave less harassment to the
> > users.
> > 
> > >
> > 
> > > I'm not a fan of the MAC naming scheme either, a kind of per-device 
> > > physical
> > > port index seems more robust and doesn't require much initialization to
> > > determine how many ports are supported by the device and whether the 
> > > index is
> > > known/valid (particularly given the vast majority exposes only one per bus
> > > address).
> 
> I'm not a fan of the MAC naming, neither. The reason this patch proposes mac
> naming is that it's more clear for the user to specify a port. I also agree
> that the port index could be another good option.
> 
> One thing I really haven't considered yet is how it becomes when the VF
> reprenstor comes to play? (more guys are cc'ed).
> 
> > > Would it make sense to have this name usable unmodified as a valid device
> > > (-w) argument, including parameters?
> > 
> > Yes we must provide some new key/value arguments for devargs.
> > So it can be used anywhere, including -w/-b options in DPDK
> > or port configuration in OVS.
> >  
> > 
> > > If so, PMDs could append parameters while probing the underlying device, 
> > > by
> > > appending ",port=x", ",mac=x" followed by other unspecified parameters 
> > > with
> > > default values. This could uniquely identify the port _and_ its
> > > configuration in a reusable fashion.
> > 
> >  
> > 
> > Question: should we separate device identification and configuration
> > in the syntax?
> 
> Yes, we should. I think we should parse the identification generically,
> and leave the left to the driver.

So we must take care of a syntax to separate them.
For instance, MAC can be used for identification or configuration (changing
the MAC by user input).


> > > Otherwise if all we need is unique names, we can use the opposite and much
> > > simpler approach. Let librte_ether assign them sequentially
> > > (e.g. "rte_eth%d", no need for consistency with OS netdevices), 
> > > applications
> > > can figure the rest based on data structures if needed.
> > 
> > No, unique names are not useful / not usable by users.
> 
> Agree. We don't really need another unique name, since the port id is
> already unique.
> 
> However, I think a consistent naming might be useful: user doesn't have
> to worry it even though MAC is changed or the PCI slot is changed.
> 
>   --yliu
> 





Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Thomas Monjalon
05/12/2017 16:13, Ananyev, Konstantin:
> 
> Hi Thomas,
> 
> > Hi,
>  
> > I will give my view on locking and synchronization in a different email.
> > Let's discuss about the API here.
>  
> > 05/12/2017 12:12, Ananyev, Konstantin:
> > >> From: Matan Azrad [mailto:ma...@mellanox.com]
> >> > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
>  
> > > > > If the goal is just to have an ability to recognize is that device is 
> > > > > managed by
> > > > > another device (failsafe, bonding, etc.), then I think all we need is 
> > > > > a pointer
> > > > > to rte_eth_dev_data of the owner (NULL would mean no owner).
> > > > 
> > > > I think string is better than a pointer from the next reasons:
> > > > 1. It is more human friendly than pointers for debug and printing.
> > > 
> > > We can have a function that would take an owner pointer and produce nice
> > > pretty formatted text explanation: "owned by fail-safe device at port X" 
> > > or so.
>  
> > I don't think it is possible or convenient to have such function.
> 
> Why do you think it is not possible?

Because of applications being the owner (discussion below).

> > Keep in mind that the owner can be an application thread.
> > If you prefer using a single function pointer (may help for
> > atomic implementation), we can allocate an owner structure containing
> > a name as a string to identify the owner in human readable format.
> > Then we just have to set the pointer of this struct to rte_eth_dev_data.
> 
> Basically you'd like to have an ability to set something different then
> pointer to rte_eth_dev_data as an owner, right?

No, it can be a pointer, or an id, I don't care.

> I think this is possible too, just not sure it will useful.
>  
> > > What I meant - this api to set/get ownership should be sort of internal 
> > > to ethdev layer.
> > > Let say it would be used for failsafe/bonding (any other compound) device 
> > > that needs
> > > to own/manage several low-level devices.
> > > So in normal situation user wouldn't need to use that API directly at all.
>  
> > Again, the application may use this API to declare its ownership.
> 
> Could you explain that a bit: what would mean 'application declares an 
> ownership on device'?
> Does it mean that no other application will be allowed to do any control op 
> on that device
> till application will clear its ownership?
> I.E. make sure that at each moment only one particular thread can modify 
> device configuration?
> Or would it be totally informal and second application will be free to ignore 
> it?

It is an information.
It will avoid a library taking ownership on a port controlled by the app.

> If it will be the second one - I personally don't see much point in it.
> If it the first one - then simplest and most straightforward way would be -
> introduce a mutex (either per device or just per whole rte_eth_dev[]) and 
> force
> each control op to grab it at entrance release at exit.

No, a mutex does not provide any information.

> > And anwyway, it may be interesting from an application point of view
> > to be able to list every devices and their internal owners.
> 
> Yes sure application is free to call 'get' to retrieve information etc.
> What I am saying for normal operation - application don't have to call that 
> API.
> I.E. - we don't need to change testpmd, etc. apps because that API was 
> introduced.

Yes the ports can stay without any owner if the application does not fill it.


Re: [dpdk-dev] [PATCH] [RFC] ether: standardize getting the port by name

2017-12-05 Thread Adrien Mazarguil
On Tue, Dec 05, 2017 at 04:28:00PM +0100, Thomas Monjalon wrote:
> 05/12/2017 14:58, Yuanhan Liu:
> > On Tue, Dec 05, 2017 at 02:20:05PM +0100, Thomas Monjalon wrote:
> > > 05/12/2017 12:04, Adrien Mazarguil:
> > > 
> > > > Hi Yuanhan,
> > > 
> > > >
> > > 
> > > > On Mon, Dec 04, 2017 at 09:55:31PM +0800, Yuanhan Liu wrote:
> > > > > rte_devargs is identified by the name (pci id for pci device). It also
> > > > > includes other driver specific key-value options. It's not clear for 
> > > > > the
> > > > > user to know which one (or few) of them should be used together with 
> > > > > the
> > > > > PCI id to identify a specific port. For example, as you mentioned, in
> > > > > mlx4, it's "pci_id,port=x". It could be something else in other 
> > > > > drivers.
> > > 
> > > > Just for information, this "port=x" argument in mlx4 is consistent with 
> > > > the
> > > > value found in /sys/class/net/ethX/dev_port under Linux. If we choose 
> > > > to use
> > > > a port index (instead of a MAC or something else), it would make sense 
> > > > to
> > > > standardize it on the same order as given by the host OS for consistency
> > > > across all PMDs.
> > 
> > Thanks for the info.
> > 
> > But FYI, for most of other PMDs, such sys file won't exist, as the host
> > driver should have been unbind and bind with something like uio. So I
> > don't think it applies to all other nics.

Sure, I only meant PMDs must implement the same numbering scheme the kernel
driver they replace would have used (as exposed through dev_port) for
consistency. Note dev_port is always present since Linux 3.15, even with
single port/bus address devices, so applications that want to construct
-w/-b arguments can rely on that before unbinding devices.

> > > Good idea, thanks.
> > > 
> > >  
> > > 
> > > I think we will MAC in some cases and port number in others.
> > 
> > Could you list some examples?
> 
> Port number could be used for Mellanox devices.
> MAC could be used when devices are already probed in DPDK,
> but can it be used for blacklisting? Do we know the MAC address?

Problem as described in a prior message (below) is that once a device is
initialized, its MAC can be modified while the device name cannot. This will
lead to confusion and probably bug reports.

Besides, it conflicts with the tap PMD that already takes a "mac" argument
to assign a default MAC address, not look for a device with the given MAC.

> > > It is important to have identifiers available even without initializing
> > > the device.
> > 
> > I don't object that. I think that's good for whitelisting/blacklisting
> > before the device initialization. However, once the initialization is
> > done, everything could be used to identify a port (say, the mac this patch
> > mentioned).
> > 
> > > > Devices with a single port per PCI address would simply use/allow "0".
> > 
> > Yes.
> > 
> > > >
> > > 
> > > > > Actually, this patch also proposes a devarg like naming style: 
> > > > > "name[,mac]
> > > ".
> > > > > What it does try to do is to define a standard syntax, so that the 
> > > > > user
> > > > > doesn't have to know any driver specific options.
> > > > >
> > > > > However, the mac address is changeable, leaving this naming 
> > > > > inconsistent.
> > > > > Well, in practice, PCI id is also changeable.
> > > 
> > > > >
> > > 
> > > > > OTOH, having a consistent naming seems a bit far away from this 
> > > > > patch's
> > > > > goal: define a standard ethdev naming and leave less harassment to the
> > > users.
> > > 
> > > >
> > > 
> > > > I'm not a fan of the MAC naming scheme either, a kind of per-device 
> > > > physical
> > > > port index seems more robust and doesn't require much initialization to
> > > > determine how many ports are supported by the device and whether the 
> > > > index is
> > > > known/valid (particularly given the vast majority exposes only one per 
> > > > bus
> > > > address).
> > 
> > I'm not a fan of the MAC naming, neither. The reason this patch proposes mac
> > naming is that it's more clear for the user to specify a port. I also agree
> > that the port index could be another good option.
> > 
> > One thing I really haven't considered yet is how it becomes when the VF
> > reprenstor comes to play? (more guys are cc'ed).

Won't VFs come through a separate PCI bus address anyway? Not sure extra
care is needed for those.

> > > > Would it make sense to have this name usable unmodified as a valid 
> > > > device
> > > > (-w) argument, including parameters?
> > > 
> > > Yes we must provide some new key/value arguments for devargs.
> > > So it can be used anywhere, including -w/-b options in DPDK
> > > or port configuration in OVS.
> > >  
> > > 
> > > > If so, PMDs could append parameters while probing the underlying 
> > > > device, by
> > > > appending ",port=x", ",mac=x" followed by other unspecified parameters 
> > > > with
> > > > default values. This could uniquely identify the port _and_ its
> > > > configuration in a reusable fas

Re: [dpdk-dev] [PATCH] maintainers: claim co maintainership of docs

2017-12-05 Thread Mcnamara, John


> -Original Message-
> From: Kovacevic, Marko
> Sent: Tuesday, December 5, 2017 3:22 PM
> To: dev@dpdk.org
> Cc: Mcnamara, John ; Kovacevic, Marko
> 
> Subject: [PATCH] maintainers: claim co maintainership of docs
> 
> Signed-off-by: Marko Kovacevic 

Welcome aboard. :-)

Acked-by: John McNamara 




Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Neil Horman
On Tue, Dec 05, 2017 at 06:08:35AM +, Matan Azrad wrote:
>
> Please look at the code again, secondary process cannot take ownership, I 
> don't allow it!
> Actually, I think it must not be like that as no limitation for that in any 
> other ethdev configurations.
> 
Sure you do.  Consider the following situation, two tasks, A and B running
independently on separate CPUS:

TASK A  TASK B

calls rte_eth_dev_owner_new (gets id 2)| calls rte_eth_dev_owner_new (gets id 1)
   |
calls rte_eth_dev_owner_set on port 1  | calls rte_eth_dev_owner_set (port 1)
   |
sets port_owner->id = 2| gets removed from cpu via scheduler
   |
   | returns to continue running on cpu
   |
Gets interrupted immediately before| completes rte_eth_dev_owner_set, 
 return 0 statement|  setting port_owner->id = 1
   |
   | returns 0 from rte_eth_dev_owner_set
is scheduled back on the cpu   |
   |
returns 0 from rte_eth_dev_owner_set   |

in the above scenario, you've allowed two tasks to race through the ownership
set routine, and while your intended semantics indicate that task A should have
taken ownership of the port, task B actually did, and whats worse, both tasks
think they completed successfully.

I get that much of dpdk relies on the fact that the application either handles
all the locking, or architects itself so that a single thread of execution (or
at least only one thread at a time), is responsible for packet processing and
port configuration. If you are assuming the former, you've done a disservice to
the downstream consumer, because the locking is the intricate part of this
operation (i.e. you are requiring that the developer figure out what granularity
of locking is required such that you don't serialize too many operations that
don't need it, while maintaining enough serialization that you don't corrupt the
data that they wanted the api to manage in the first place.  If you assert that
the application should only be using a single task to do these operations to
begin with, then this API isn't overly useful, because theres only one thread
pushing data into the library and, by definition it implicitly owns all the
ports, or at least knows which ports it shouldn't mess with (e.g subordunate
ports in a failsafe device).

> > > Any port configuration (for example port creation and release) is not
> > internally synchronized between the primary to secondary processes so I
> > don't see any reason to synchronize port ownership.
> > Yes, and I've never agreed with that design point either, because the fact 
> > of
> > the matter is that one of two things must be true in relation to port
> > configuration:
> > 
> > 1) Either multiple processes will attempt to read/change port
> > configuration/ownership
> > 
> > or
> > 
> > 2) port ownership is implicitly given to a single task (be it a primary or
> > secondary task), and said ownership is therefore implicitly known by the
> > application
> > 
> > Either situation may be true depending on the details of the application 
> > being
> > built, but regardless, if (2) is true, then this api isn't really needed at 
> > all,
> > because the application implicitly has been designed to have a port be
> > owned by a given task. 
> 
> Here I think you miss something, the port ownership is not mainly for process 
> DPDK entities,
> The entity can be also PMD, library, application in same process.
> For MP it is nice to have, the secondary just can see the primary owners and 
> take decision accordingly (please read my answer to Konstatin above). 
> 
But the former is just a case of the latter (in fact worse).  if you expect
another execution context out of the control of the application to query this
API, then you are in an MP situation, and definately need to provide mutually
exclusive access to your data.  If instead you expect all other execution
contexts to suspend (or otherwise refrain from accessing this API) while a
single task makes changes, then by definition you have already had to create
some syncrnoization between those contexts and are capable of informing them of
of the new ownership scheme

The bottom line is, either you expect multiple access, or you dont.  If you
expect multiple access, and you belive that said access are not completely under
the control of your application, you need to protect those accesses against one
another.  If you don't expect multiple access (or expect your application to
architect itself to enforce single access), then you've created a environment in
which the single accessor already has to know all the infor

Re: [dpdk-dev] [PATCH 2/5] examples/kni: add optional parameter to enable LRO

2017-12-05 Thread Patil, Harish


-Original Message-
From: Ferruh Yigit 
Date: Monday, December 4, 2017 at 3:25 PM
To: "Mody, Rasesh" , "dev@dpdk.org" 
Cc: Harish Patil , Dept-Eng DPDK Dev

Subject: Re: [dpdk-dev] [PATCH 2/5] examples/kni: add optional parameter
to enable LRO

>On 11/24/2017 12:35 PM, Rasesh Mody wrote:
>> From: Harish Patil 
>> 
>> Add an optional cmdline parameter to enable LRO. This is useful to test
>> LRO feature by being able to run linux utils like iperf over KNI
>>interface
>> which generates consistent packet aggregations.
>> 
>> Signed-off-by: Harish Patil 
>
>Acked-by: Ferruh Yigit 
>
>I think this patch has no dependency for rest of the patchset and can be
>separately applied.

That’s right, thanks.
>



Re: [dpdk-dev] [PATCH 2/5] examples/kni: add optional parameter to enable LRO

2017-12-05 Thread Patil, Harish


-Original Message-
From: Shahaf Shuler 
Date: Monday, December 4, 2017 at 10:05 PM
To: Ferruh Yigit , "Mody, Rasesh"
, "dev@dpdk.org" 
Cc: Harish Patil , Dept-Eng DPDK Dev

Subject: RE: [dpdk-dev] [PATCH 2/5] examples/kni: add optional parameter
to enable LRO

>Tuesday, December 5, 2017 1:25 AM, Ferruh Yigit:
>> On 11/24/2017 12:35 PM, Rasesh Mody wrote:
>> > From: Harish Patil 
>> >
>> > Add an optional cmdline parameter to enable LRO. This is useful to
>> > test LRO feature by being able to run linux utils like iperf over KNI
>> > interface which generates consistent packet aggregations.
>> >
>> > Signed-off-by: Harish Patil 
>> 
>> Acked-by: Ferruh Yigit 
>> 
>> I think this patch has no dependency for rest of the patchset and can be
>> separately applied.
>
>Snipped from the patch [1]
>
>Please use the new ethdev offloads API. there is already a series to
>convert the examples including kni[2], I think it is better to work on
>top of it. 
>
>[1]
>@@ -611,6 +618,10 @@ struct kni_interface_stats {
>   /* Initialise device and RX/TX queues */
>   RTE_LOG(INFO, APP, "Initialising port %u ...\n", (unsigned)port);
>   fflush(stdout);
>+
>+  if (enable_lro)
>+  port_conf.rxmode.enable_lro = 1;
>+
>
>[2] http://dpdk.org/dev/patchwork/patch/31558/
>
Okay, I will take a look at it.

Thanks,
Harish


>



[dpdk-dev] [PATCH 2/2] lib: optimize _xstats_by_ids APIs

2017-12-05 Thread Elza Mathew
Introduced a check to detect if the stats IDs being
requested are all basic stats IDs. In that case,
ensured that only the basic stats would be retrieved.
Previously, both basic stats and xstats were being
retrieved even if all the IDs were basic stats IDs.

Signed-off-by: Elza Mathew 
---
 lib/librte_ether/rte_ethdev.c | 38 +++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index dfe8e65..210600d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1688,7 +1688,9 @@ struct rte_eth_dev *
 {
struct rte_eth_xstat_name *xstats_names_copy;
unsigned int no_basic_stat_requested = 1;
+   unsigned int no_ext_stat_requested = 1;
unsigned int expected_entries;
+   unsigned int basic_count;
struct rte_eth_dev *dev;
unsigned int i;
int ret;
@@ -1696,6 +1698,7 @@ struct rte_eth_dev *
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
dev = &rte_eth_devices[port_id];
 
+   basic_count = get_xstats_basic_count(dev);
ret = get_xstats_count(port_id);
if (ret < 0)
return ret;
@@ -1713,7 +1716,6 @@ struct rte_eth_dev *
return -EINVAL;
 
if (ids && dev->dev_ops->xstats_get_names_by_id != NULL && size > 0) {
-   unsigned int basic_count = get_xstats_basic_count(dev);
uint64_t ids_copy[size];
 
for (i = 0; i < size; i++) {
@@ -1752,8 +1754,22 @@ struct rte_eth_dev *
return -ENOMEM;
}
 
+   if (ids) {
+   for (i = 0; i < size; i++) {
+   if (ids[i] > basic_count) {
+   no_ext_stat_requested = 0;
+   break;
+   }
+   }
+   }
+
/* Fill xstats_names_copy structure */
-   rte_eth_xstats_get_names(port_id, xstats_names_copy, expected_entries);
+   if (ids && no_ext_stat_requested) {
+   rte_eth_basic_stats_get_names(dev, xstats_names_copy);
+   } else {
+   rte_eth_xstats_get_names(port_id, xstats_names_copy,
+   expected_entries);
+   }
 
/* Filter stats */
for (i = 0; i < size; i++) {
@@ -1860,7 +1876,9 @@ struct rte_eth_dev *
 uint64_t *values, unsigned int size)
 {
unsigned int no_basic_stat_requested = 1;
+   unsigned int no_ext_stat_requested = 1;
unsigned int num_xstats_filled;
+   unsigned int basic_count;
uint16_t expected_entries;
struct rte_eth_dev *dev;
unsigned int i;
@@ -1870,6 +1888,7 @@ struct rte_eth_dev *
expected_entries = get_xstats_count(port_id);
struct rte_eth_xstat xstats[expected_entries];
dev = &rte_eth_devices[port_id];
+   basic_count = get_xstats_basic_count(dev);
 
/* Return max number of stats if no ids given */
if (!ids) {
@@ -1904,8 +1923,21 @@ struct rte_eth_dev *
values, size);
}
 
+   if (ids) {
+   for (i = 0; i < size; i++) {
+   if (ids[i] > basic_count) {
+   no_ext_stat_requested = 0;
+   break;
+   }
+   }
+   }
+
/* Fill the xstats structure */
-   ret = rte_eth_xstats_get(port_id, xstats, expected_entries);
+   if (ids && no_ext_stat_requested)
+   ret = rte_eth_basic_stats_get(port_id, xstats);
+   else
+   ret = rte_eth_xstats_get(port_id, xstats, expected_entries);
+
if (ret < 0)
return ret;
num_xstats_filled = (unsigned int)ret;
-- 
1.9.1



[dpdk-dev] [PATCH 1/2] lib: refactor basic stats code

2017-12-05 Thread Elza Mathew
Moved the code to get the basic stats names and values
into static functions.

Signed-off-by: Elza Mathew 
---
 lib/librte_ether/rte_ethdev.c | 156 --
 1 file changed, 91 insertions(+), 65 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 318af28..dfe8e65 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1641,6 +1641,45 @@ struct rte_eth_dev *
return -EINVAL;
 }
 
+/* retrieve basic stats names */
+static int
+rte_eth_basic_stats_get_names(struct rte_eth_dev *dev,
+   struct rte_eth_xstat_name *xstats_names)
+{
+   int cnt_used_entries = 0;
+   uint32_t idx, id_queue;
+   uint16_t num_q;
+
+   for (idx = 0; idx < RTE_NB_STATS; idx++) {
+   snprintf(xstats_names[cnt_used_entries].name,
+   sizeof(xstats_names[0].name),
+   "%s", rte_stats_strings[idx].name);
+   cnt_used_entries++;
+   }
+   num_q = RTE_MIN(dev->data->nb_rx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
+   for (id_queue = 0; id_queue < num_q; id_queue++) {
+   for (idx = 0; idx < RTE_NB_RXQ_STATS; idx++) {
+   snprintf(xstats_names[cnt_used_entries].name,
+   sizeof(xstats_names[0].name),
+   "rx_q%u%s",
+   id_queue, rte_rxq_stats_strings[idx].name);
+   cnt_used_entries++;
+   }
+
+   }
+   num_q = RTE_MIN(dev->data->nb_tx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
+   for (id_queue = 0; id_queue < num_q; id_queue++) {
+   for (idx = 0; idx < RTE_NB_TXQ_STATS; idx++) {
+   snprintf(xstats_names[cnt_used_entries].name,
+   sizeof(xstats_names[0].name),
+   "tx_q%u%s",
+   id_queue, rte_txq_stats_strings[idx].name);
+   cnt_used_entries++;
+   }
+   }
+   return cnt_used_entries;
+}
+
 /* retrieve ethdev extended statistics names */
 int
 rte_eth_xstats_get_names_by_id(uint16_t port_id,
@@ -1739,8 +1778,6 @@ struct rte_eth_dev *
int cnt_used_entries;
int cnt_expected_entries;
int cnt_driver_entries;
-   uint32_t idx, id_queue;
-   uint16_t num_q;
 
cnt_expected_entries = get_xstats_count(port_id);
if (xstats_names == NULL || cnt_expected_entries < 0 ||
@@ -1749,35 +1786,9 @@ struct rte_eth_dev *
 
/* port_id checked in get_xstats_count() */
dev = &rte_eth_devices[port_id];
-   cnt_used_entries = 0;
 
-   for (idx = 0; idx < RTE_NB_STATS; idx++) {
-   snprintf(xstats_names[cnt_used_entries].name,
-   sizeof(xstats_names[0].name),
-   "%s", rte_stats_strings[idx].name);
-   cnt_used_entries++;
-   }
-   num_q = RTE_MIN(dev->data->nb_rx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
-   for (id_queue = 0; id_queue < num_q; id_queue++) {
-   for (idx = 0; idx < RTE_NB_RXQ_STATS; idx++) {
-   snprintf(xstats_names[cnt_used_entries].name,
-   sizeof(xstats_names[0].name),
-   "rx_q%u%s",
-   id_queue, rte_rxq_stats_strings[idx].name);
-   cnt_used_entries++;
-   }
-
-   }
-   num_q = RTE_MIN(dev->data->nb_tx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
-   for (id_queue = 0; id_queue < num_q; id_queue++) {
-   for (idx = 0; idx < RTE_NB_TXQ_STATS; idx++) {
-   snprintf(xstats_names[cnt_used_entries].name,
-   sizeof(xstats_names[0].name),
-   "tx_q%u%s",
-   id_queue, rte_txq_stats_strings[idx].name);
-   cnt_used_entries++;
-   }
-   }
+   cnt_used_entries = rte_eth_basic_stats_get_names(
+   dev, xstats_names);
 
if (dev->dev_ops->xstats_get_names != NULL) {
/* If there are any driver-specific xstats, append them
@@ -1795,6 +1806,54 @@ struct rte_eth_dev *
return cnt_used_entries;
 }
 
+
+static int
+rte_eth_basic_stats_get(uint16_t port_id, struct rte_eth_xstat *xstats)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_stats eth_stats;
+   unsigned int count = 0, i, q;
+   uint64_t val, *stats_ptr;
+   uint16_t nb_rxqs, nb_txqs;
+
+   rte_eth_stats_get(port_id, ð_stats);
+   dev = &rte_eth_devices[port_id];
+
+   nb_rxqs = RTE_MIN(dev->data->nb_rx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
+   nb_txqs = RTE_MIN(dev->data->nb_tx_queues, RTE_ETHDEV_QUEUE_STAT_CNTRS);
+
+   /* global stats */
+   for (i = 0; i < RTE_NB_STATS; i++) {
+   stats_ptr = RTE_PTR_ADD(ð_stats,
+

Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership

2017-12-05 Thread Ananyev, Konstantin



> 
> 
> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Tuesday, December 5, 2017 3:50 PM
> To: Ananyev, Konstantin 
> Cc: Matan Azrad ; Neil Horman ; 
> Gaëtan Rivet ; Wu,
> Jingjing ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/5] ethdev: add port ownership
> 
> 05/12/2017 16:13, Ananyev, Konstantin:
> >
> > Hi Thomas,
> >
> > > Hi,
> >
> > > I will give my view on locking and synchronization in a different email.
> > > Let's discuss about the API here.
> >
> > > 05/12/2017 12:12, Ananyev, Konstantin:
> > > >> From: Matan Azrad [mailto:ma...@mellanox.com]
> > >> > From: Ananyev, Konstantin [mailto:konstantin.anan...@intel.com]
> >
> > > > > > If the goal is just to have an ability to recognize is that device 
> > > > > > is managed by
> > > > > > another device (failsafe, bonding, etc.), then I think all we need 
> > > > > > is a pointer
> > > > > > to rte_eth_dev_data of the owner (NULL would mean no owner).
> > > > >
> > > > > I think string is better than a pointer from the next reasons:
> > > > > 1. It is more human friendly than pointers for debug and printing.
> > > >
> > > > We can have a function that would take an owner pointer and produce nice
> > > > pretty formatted text explanation: "owned by fail-safe device at port 
> > > > X" or so.
> >
> > > I don't think it is possible or convenient to have such function.
> >
> > Why do you think it is not possible?
> 
> Because of applications being the owner (discussion below).
> 
> > > Keep in mind that the owner can be an application thread.
> > > If you prefer using a single function pointer (may help for
> > > atomic implementation), we can allocate an owner structure containing
> > > a name as a string to identify the owner in human readable format.
> > > Then we just have to set the pointer of this struct to rte_eth_dev_data.
> >
> > Basically you'd like to have an ability to set something different then
> > pointer to rte_eth_dev_data as an owner, right?
> 
> No, it can be a pointer, or an id, I don't care.

My question was about a pointer to a specific struct: rte_eth_dev_data.
As I understand you want it to be a pointer to something else?
Probably to some new struct rte_eth_dev_owner or so...

> 
> > I think this is possible too, just not sure it will useful.
> >
> > > > What I meant - this api to set/get ownership should be sort of internal 
> > > > to ethdev layer.
> > > > Let say it would be used for failsafe/bonding (any other compound) 
> > > > device that needs
> > > > to own/manage several low-level devices.
> > > > So in normal situation user wouldn't need to use that API directly at 
> > > > all.
> >
> > > Again, the application may use this API to declare its ownership.
> >
> > Could you explain that a bit: what would mean 'application declares an 
> > ownership on device'?
> > Does it mean that no other application will be allowed to do any control op 
> > on that device
> > till application will clear its ownership?
> > I.E. make sure that at each moment only one particular thread can modify 
> > device configuration?
> > Or would it be totally informal and second application will be free to 
> > ignore it?
> 
> It is an information.
> It will avoid a library taking ownership on a port controlled by the app.

Ok, let's step back a bit.
As I can see there are 2 separate issues/design choices inside rte_ethdev :)  :
1) control API is not MT safe - at any given moment 2 (or more) threads 
(/processes)
can try to change config of a given device.
   Right now we leave that sync effort to the upper layer.
2) Even with the same thread 2 (or more) DPDK entities (high level PMD/third 
party lib, etc.)
can try to manage the same device.
   I.E. the device can be managed by bonding device, but application mistakenly
can try to manage it on its own instead of relying on the bonding device to 
do so.
   Again right now we leave it up to the upper layer to keep track which device 
is managed
   by what DPDK entity.

So what problem are you guys trying to solve with your patch?
Is it 1) or 2) below or might be something else?

Konstantin

> 
> > If it will be the second one - I personally don't see much point in it.
> > If it the first one - then simplest and most straightforward way would be -
> > introduce a mutex (either per device or just per whole rte_eth_dev[]) and 
> > force
> > each control op to grab it at entrance release at exit.
> 
> No, a mutex does not provide any information.

> 
> > > And anwyway, it may be interesting from an application point of view
> > > to be able to list every devices and their internal owners.
> >
> > Yes sure application is free to call 'get' to retrieve information etc.
> > What I am saying for normal operation - application don't have to call that 
> > API.
> > I.E. - we don't need to change testpmd, etc. apps because that API was 
> > introduced.
> 
> Yes the ports can stay without any owner if the application does not fill it.

Konstantin


Re: [dpdk-dev] [PATCH] bus/vdev: add custom scan hook

2017-12-05 Thread Tan, Jianfeng

> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Tuesday, December 5, 2017 11:21 PM
> To: Tan, Jianfeng
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] bus/vdev: add custom scan hook
>
> 05/12/2017 14:56, Tan, Jianfeng:
> >
> > On 12/5/2017 4:41 PM, Thomas Monjalon wrote:
> > >
> > > 05/12/2017 09:27, Tan, Jianfeng:
> > >
> > > >
> > >
> > > > On 12/4/2017 5:31 PM, Thomas Monjalon wrote:
> > >
> > > > >
> > >
> > > > > The hook is in bus->scan().
> > >
> > > > > I think we should launch a bus scan when there is a new 
device event.

> > >
> > > >
> > >
> > > > That's what I'm trying to say. We finally need to execute a 
handler as

> > >
> > > > of a device event to finish the job.
> > >
> > > Please be more specific, I am not sure to understand.
> > >
> >
> > By the handler, I mean when we monitor the udev by 
select/poll/epoll and

> > device uevents come, the application will execute a handler (or just a
> > function) for each of such uevent. Then why not adding the vdev there?
>
> Because it must work for hotplug, and initial scan too.
> We can also think to application requiring a manual scan.
> The bus scan is the right place to have every scans called. That's 
simple.


Yes, the logic is simpler; anything changes, appication just invokes a 
scan.Then we will have a somewhat complex hook that: (1) scan all 
devices and identify added/removed devices; (2) and add vdev for added 
device; (3) remove vdev for removed device.


Compared to that, we can separate the implementation for hotplug and 
initial scan:

  * For initial scan, iterate all devices to add vdev for each of them.
  * For hotplug, based on which device is added/removed, we just 
add/remove the corresponding vdev for that device.


Anyway, since this patch does provide a new way; and will not affect the 
other way. So I think it's OK to add this.


Thank you for your clarification.

Thanks,
Jianfeng


Re: [dpdk-dev] [PATCH] bus/vdev: add custom scan hook

2017-12-05 Thread Tan, Jianfeng



On 12/1/2017 8:36 AM, Thomas Monjalon wrote:

The scan callback allows to spawn a vdev automatically
given some custom scan rules.
It is especially useful to create a TAP device automatically
connected to a netdevice as remote.

Signed-off-by: Thomas Monjalon 
---
warning: to be tested
---
  drivers/bus/vdev/rte_bus_vdev.h | 29 
  drivers/bus/vdev/vdev.c | 75 +
  2 files changed, 104 insertions(+)

diff --git a/drivers/bus/vdev/rte_bus_vdev.h b/drivers/bus/vdev/rte_bus_vdev.h
index 41762b853..a90382f49 100644
--- a/drivers/bus/vdev/rte_bus_vdev.h
+++ b/drivers/bus/vdev/rte_bus_vdev.h
@@ -124,6 +124,35 @@ RTE_PMD_EXPORT_NAME(nm, __COUNTER__)
  #define RTE_PMD_REGISTER_ALIAS(nm, alias)\
  static const char *vdrvinit_ ## nm ## _alias = RTE_STR(alias)
  
+typedef void (*rte_vdev_scan_callback)(void *user_arg);

+
+/**
+ * Add a callback to be called on vdev scan
+ * before reading the devargs list.
+ *
+ * @param callback
+ *   The function to be called which can update the devargs list.
+ * @param user_arg
+ *   An opaque pointer passed to callback.
+ * @return
+ *   0 on success, -ENOMEM otherwise
+ */
+int
+rte_vdev_add_custom_scan(rte_vdev_scan_callback callback, void *user_arg);
+
+/**
+ * Remove a registered scan callback.
+ *
+ * @param callback
+ *   The registered function to be removed.
+ * @param user_arg
+ *   The associated opaque pointer or (void*)-1 for any.
+ * @return
+ *   0 on success
+ */
+int
+rte_vdev_remove_custom_scan(rte_vdev_scan_callback callback, void *user_arg);
+
  /**
   * Initialize a driver specified by name.
   *
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index fd7736d63..902f1e128 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -44,6 +44,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  
  #include "rte_bus_vdev.h"

@@ -62,6 +64,16 @@ static struct vdev_device_list vdev_device_list =
  struct vdev_driver_list vdev_driver_list =
TAILQ_HEAD_INITIALIZER(vdev_driver_list);
  
+struct vdev_custom_scan {

+   TAILQ_ENTRY(vdev_custom_scan) next;
+   rte_vdev_scan_callback callback;
+   void *user_arg;
+};
+TAILQ_HEAD(vdev_custom_scans, vdev_custom_scan);
+static struct vdev_custom_scans vdev_custom_scans =
+   TAILQ_HEAD_INITIALIZER(vdev_custom_scans);
+static rte_spinlock_t vdev_custom_scan_lock = RTE_SPINLOCK_INITIALIZER;
+
  /* register a driver */
  void
  rte_vdev_register(struct rte_vdev_driver *driver)
@@ -76,6 +88,53 @@ rte_vdev_unregister(struct rte_vdev_driver *driver)
TAILQ_REMOVE(&vdev_driver_list, driver, next);
  }
  
+int

+rte_vdev_add_custom_scan(rte_vdev_scan_callback callback, void *user_arg)
+{
+   struct vdev_custom_scan *custom_scan;
+
+   rte_spinlock_lock(&vdev_custom_scan_lock);
+
+   /* check if already registered */
+   TAILQ_FOREACH(custom_scan, &vdev_custom_scans, next) {
+   if (custom_scan->callback == callback &&
+   custom_scan->user_arg == user_arg)
+   break;
+   }
+
+   if (custom_scan == NULL) {
+   custom_scan = malloc(sizeof(struct vdev_custom_scan));
+   if (custom_scan != NULL) {
+   custom_scan->callback = callback;
+   custom_scan->user_arg = user_arg;
+   TAILQ_INSERT_TAIL(&vdev_custom_scans, custom_scan, 
next);
+   }
+   }
+
+   rte_spinlock_unlock(&vdev_custom_scan_lock);
+
+   return (custom_scan == NULL) ? -ENOMEM : 0;
+}
+
+int
+rte_vdev_remove_custom_scan(rte_vdev_scan_callback callback, void *user_arg)
+{
+   struct vdev_custom_scan *custom_scan, *tmp_scan;
+
+   rte_spinlock_lock(&vdev_custom_scan_lock);
+   TAILQ_FOREACH_SAFE(custom_scan, &vdev_custom_scans, next, tmp_scan) {
+   if (custom_scan->callback != callback ||
+   (custom_scan->user_arg != (void *)-1 &&
+   custom_scan->user_arg != user_arg))
+   continue;
+   TAILQ_REMOVE(&vdev_custom_scans, custom_scan, next);
+   free(custom_scan);
+   }
+   rte_spinlock_unlock(&vdev_custom_scan_lock);
+
+   return 0;
+}
+
  static int
  vdev_parse(const char *name, void *addr)
  {
@@ -260,6 +319,22 @@ vdev_scan(void)
  {
struct rte_vdev_device *dev;
struct rte_devargs *devargs;
+   struct vdev_custom_scan *custom_scan;
+
+   /* call custom scan callbacks if any */
+   rte_spinlock_lock(&vdev_custom_scan_lock);
+   TAILQ_FOREACH(custom_scan, &vdev_custom_scans, next) {
+   if (custom_scan->callback != NULL)
+   /*
+* the callback should update devargs list
+* by calling rte_eal_devargs_insert() with
+* devargs.bus = rte_bus_find_by_name("vdev");
+ 

Re: [dpdk-dev] The limitation of VHOST_MEMORY_MAX_NREGIONS in virtio-user with vhost-user backend

2017-12-05 Thread Tan, Jianfeng


> -Original Message-
> From: Yuanhan Liu [mailto:y...@fridaylinux.org]
> Sent: Tuesday, December 5, 2017 10:42 PM
> To: Wang, Wei 5. (NSB - CN/Hangzhou)
> Cc: dev@dpdk.org; Tan, Jianfeng
> Subject: Re: [dpdk-dev] The limitation of VHOST_MEMORY_MAX_NREGIONS
> in virtio-user with vhost-user backend
> 
> On Tue, Dec 05, 2017 at 06:58:53AM +, Wang, Wei 5. (NSB -
> CN/Hangzhou) wrote:
> >
> >
> > Hi, all
> >
> > In DPDK document, it it described that Virtio in containers Cannot work
> when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
> hugepages. In another word, do not use 2MB hugepage so far. Do you know
> the reason of this limitation?
> 
> It comes from the vhost-user spec.
> 
> > In my envirionment, the pdpe1gb is not set in cpu flag, so hugepage can't
> be set to 1GB size. The hugepage number shall be more than 8
> >
> >
> > So is there any solution or workaround to fix this limitation? Or change
> dpdk code to fix this limitation?
> >
> 
> AFAIK, there are no workarounds. And I think the DPDK EAL memory code
> need be
> refactored a bit to fix this limitation. Unforunately, seems there are no work
> on that recently.

For more info, you can refer to: http://dpdk.org/dev/patchwork/patch/10406/

Thanks,
Jianfeng

> 
>   --yliu


Re: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD supporting VM2VM scenario

2017-12-05 Thread Wei Wang

On 12/05/2017 10:08 PM, Yuanhan Liu wrote:

On Tue, Dec 05, 2017 at 06:59:26AM +, Yang, Zhiyong wrote:

Hi all,


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhiyong Yang
Sent: Thursday, November 30, 2017 5:47 PM
To: dev@dpdk.org; y...@fridaylinux.org; maxime.coque...@redhat.com
Cc: Wang, Wei W ; Tan, Jianfeng

Subject: [dpdk-dev] [PATCH 00/11] net/vhostpci: A new vhostpci PMD
supporting VM2VM scenario

Vhostpci PMD is a new type driver working in guest OS which has ability to drive
the vhostpci modern pci device, which is a new virtio device.

The following linking is about vhostpci design:

An initial device design is presented at KVM Forum'16:
http://www.linux-kvm.org/images/5/55/02x07A-Wei_Wang-Design_of-Vhost-
pci.pdf
The latest device design and implementation will be posted to the QEMU
community soon.


The latest version vhostpci device code has been posted to QEMU community.
[PATCH v3 0/7] Vhost-pci for inter-VM communication
http://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00494.html

I didn't follow on the qemu ML. Do you know when it's likely it will
get accepted?

--yliu


Hi Yuanhan,

There appear to be more people interested in this solution. My 
expectation is to get the fundamental part of device finalized in 3 months.


For the pmd patches, I think it would be great if people can start the 
review from the non-device related part (e.g. the data transmission logic).


Best,
Wei





Re: [dpdk-dev] [PATCH v2 06/10] net/virtio: fix queue setup consistency

2017-12-05 Thread Tiwei Bie
Hi Maxime and Olivier:

On Thu, Sep 07, 2017 at 02:13:43PM +0200, Olivier Matz wrote:
[...]
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index 8eee3ff80..c7888f103 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1737,6 +1737,19 @@ virtio_dev_start(struct rte_eth_dev *dev)
>   struct virtnet_rx *rxvq;
>   struct virtnet_tx *txvq __rte_unused;
>   struct virtio_hw *hw = dev->data->dev_private;
> + int ret;
> +
> + /* Finish the initialization of the queues */
> + for (i = 0; i < dev->data->nb_rx_queues; i++) {
> + ret = virtio_dev_rx_queue_setup_finish(dev, i);
> + if (ret < 0)
> + return ret;
> + }

I'm trying to fix an issue [1] reported by Antonio. And during
the debugging, I found that vector Rx of virtio PMD has been
broken (when doing port stop/start) since below two patches were
applied:

25bf7a0b0936 ("vhost: make error handling consistent in Rx path")
   -- needed on the Tx side (testpmd/vhost-pmd in below test)
efc83a1e7fc3 ("net/virtio: fix queue setup consistency")
   -- needed on the Rx side (testpmd/virtio-user in below test)

Below are the steps to reproduce the issue:

#0. Checkout the commit

# 25bf7a0b0936 was applied after efc83a1e7fc3
git checkout 25bf7a0b0936

(There is another vector Rx bug caused by rxq flushing on the
 HEAD. So it's better to checkout the old commit first.)

#1. Apply below patch to disable mergeable Rx, and build DPDK

diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 2039bc5..d45ffa5 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -65,7 +65,6 @@
 1u << VIRTIO_NET_F_CSUM  | \
 1u << VIRTIO_NET_F_HOST_TSO4 | \
 1u << VIRTIO_NET_F_HOST_TSO6 | \
-1u << VIRTIO_NET_F_MRG_RXBUF | \
 1u << VIRTIO_NET_F_MTU | \
 1u << VIRTIO_RING_F_INDIRECT_DESC |\
 1ULL << VIRTIO_F_VERSION_1   | \

#2. Launch testpmd/vhost-pmd:

./x86_64-native-linuxapp-gcc/app/testpmd -l 1,2 \
--socket-mem 1024,1024 \
--file-prefix=vhost \
--no-pci \
--vdev=net_vhost0,iface=/tmp/socket-0,queues=1 \
-- \
--port-topology=chained \
-i \
--nb-cores=1

#3. Launch testpmd/virtio-user:

./x86_64-native-linuxapp-gcc/app/testpmd -l 5,6 \
--socket-mem 1024,1024 \
--file-prefix=virtio-user \
--no-pci \
--vdev=net_virtio_user0,path=/tmp/socket-0 \
-- \
--port-topology=chained \
-i \
--nb-cores=1 \
--disable-hw-vlan \
--txqflags=0xf01

#4. In testpmd/virtio-user run below commands:

testpmd> set fwd rxonly
testpmd> start

#5. In testpmd/vhost-pmd run below commands:

testpmd> set burst 1
testpmd> set fwd rxonly
testpmd> start tx_first 1
testpmd> stop

#6. In testpmd/virtio-user run below commands:

testpmd> stop
testpmd> port stop all
testpmd> port start all
testpmd> start

#7. In testpmd/vhost-pmd run below commands:

testpmd> set fwd txonly
testpmd> start

#8. In testpmd/virtio-user run below commands:

testpmd> show port stats all

And you will see that there is no traffic any more after
receiving a few hundred packets.

[1] http://dpdk.org/ml/archives/dev/2017-December/082983.html

Best regards,
Tiwei Bie


Re: [dpdk-dev] [PATCH] net/mlx4: revert workaround for broken Verbs

2017-12-05 Thread Shahaf Shuler
Monday, December 4, 2017 6:53 PM, Adrien Mazarguil:
> On Wed, Nov 29, 2017 at 09:21:26AM +, Matan Azrad wrote:
> > This workaround was needed to properly handle device removal with old
> > Mellanox OFED releases that are not supported by this PMD anymore.
> >
> > Starting from rdma-core v16 this removal issue shouldn't happen when
> > setting MLX4_DEVICE_FATAL_CLEANUP environment variable to 1.
> >
> > Set the aforementioned variable to 1.
> >
> > Reverts: 5f4677c ("net/mlx4: workaround verbs error after plug-out")
> >
> > Signed-off-by: Matan Azrad 
> 
> You should expand the commit ID a bit:
> 
> Reverts: 5f4677c6ad5e ("net/mlx4: workaround verbs error after plug-out")
> 
> Other than that,
> 
> Acked-by: Adrien Mazarguil 

Applied the next-net-mlx with the above commit log fix, thanks.

> 
> Thanks.
> 
> --
> Adrien Mazarguil
> 6WIND


Re: [dpdk-dev] [PATCH v1] net/mlx5: fix flow type for allmulti rules

2017-12-05 Thread Shahaf Shuler
Tuesday, December 5, 2017 11:57 AM, Nelio Laranjeiro:
> On Tue, Dec 05, 2017 at 11:37:50AM +0200, Raslan Darawsheh wrote:
> > Chnaged ibv_flow_attr type for allmulti rule to
> > IBV_FLOW_ATTR_MC_DEFAULT instead of IBV_FLOW_ATTR_NORMAL, in
> case allmulti was enabled.
> >
> > Fixes: 272733b5 ("net/mlx5: use flow to enable unicast traffic")
> >
> > Signed-off-by: Raslan Darawsheh 
> > ---
> >  drivers/net/mlx5/mlx5_flow.c | 13 +
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/net/mlx5/mlx5_flow.c
> > b/drivers/net/mlx5/mlx5_flow.c index 1eda836..eaa53e2 100644
> > --- a/drivers/net/mlx5/mlx5_flow.c
> > +++ b/drivers/net/mlx5/mlx5_flow.c
> > @@ -429,6 +429,7 @@ static const struct mlx5_flow_items
> > mlx5_flow_items[] = {
> >  /** Structure to pass to the conversion function. */  struct
> > mlx5_flow_parse {
> > uint32_t inner; /**< Set once VXLAN is encountered. */
> > +   uint8_t allmulti:1; /** Set once allmulti dst MAC is encountered. */

Changed comment to /**< Set once allmulti dst MAC is encountered. */ to fit 
doxygen style. 

> > uint32_t create:1;
> > /**< Whether resources should remain after a validate. */
> > uint32_t drop:1; /**< Target is a drop queue. */ @@ -1165,6
> +1166,17
> > @@ priv_flow_convert(struct priv *priv,
> > }
> > }
> > }
> > +   if (parser->allmulti &&
> > +   parser->layer == HASH_RXQ_ETH) {
> > +   for (i = 0; i != hash_rxq_init_n; ++i) {
> > +   if (!parser->queue[i].ibv_attr)
> > +   continue;
> > +   if (parser->queue[i].ibv_attr->num_of_specs != 1)
> > +   break;
> > +   parser->queue[i].ibv_attr->type =
> > +
>   IBV_FLOW_ATTR_MC_DEFAULT;
> > +   }
> > +   }
> > return ret;
> >  exit_enomem:
> > for (i = 0; i != hash_rxq_init_n; ++i) { @@ -1262,6 +1274,7 @@
> > mlx5_flow_create_eth(const struct rte_flow_item *item,
> > eth.val.ether_type &= eth.mask.ether_type;
> > }
> > mlx5_flow_create_copy(parser, ð, eth_size);
> > +   parser->allmulti = eth.val.dst_mac[0] & 1;
> > return 0;
> >  }
> >
> > --
> > 2.7.4
> 
> It also concerns DPDK 17.11, adding sta...@dpdk.org
> 
> Acked-by: Nelio Laranjeiro 

Applied to next-net-mlx with the Cc:sta...@dpdk.org addition, thanks. 

> 
> --
> Nélio Laranjeiro
> 6WIND


Re: [dpdk-dev] [PATCH v3 1/2] lib/security: add support for get metadata

2017-12-05 Thread Anoob

Hi Akhil, Radu,

Please see inline.

Thanks,

Anoob


On 11/24/2017 05:33 PM, Akhil Goyal wrote:

On 11/24/2017 5:29 PM, Radu Nicolau wrote:



On 11/24/2017 11:34 AM, Akhil Goyal wrote:

Hi Radu,
On 11/24/2017 4:47 PM, Radu Nicolau wrote:



On 11/24/2017 10:55 AM, Akhil Goyal wrote:

On 11/24/2017 3:09 PM, Radu Nicolau wrote:

Hi,

Comment inline


On 11/24/2017 8:50 AM, Akhil Goyal wrote:

Hi Anoob, Radu,
On 11/23/2017 4:49 PM, Anoob Joseph wrote:
In case of inline protocol processed ingress traffic, the 
packet may not
have enough information to determine the security parameters 
with which
the packet was processed. In such cases, application could get 
metadata
from the packet which could be used to identify the security 
parameters

with which the packet was processed.

Signed-off-by: Anoob Joseph 
---
v3:
* Replaced 64 bit metadata in conf with (void *)userdata
* The API(rte_security_get_pkt_metadata) would return void * 
instead of

   uint64_t

v2:
* Replaced get_session and get_cookie APIs with 
get_pkt_metadata API


  lib/librte_security/rte_security.c    | 13 +
  lib/librte_security/rte_security.h    | 19 
+++

  lib/librte_security/rte_security_driver.h | 16 
  3 files changed, 48 insertions(+)

diff --git a/lib/librte_security/rte_security.c 
b/lib/librte_security/rte_security.c

index 1227fca..a1d78b6 100644
--- a/lib/librte_security/rte_security.c
+++ b/lib/librte_security/rte_security.c
@@ -108,6 +108,19 @@ rte_security_set_pkt_metadata(struct 
rte_security_ctx *instance,

 sess, m, params);
  }
  +void *
+rte_security_get_pkt_metadata(struct rte_security_ctx *instance,
+  struct rte_mbuf *pkt)
Can we rename pkt with m. Just to make it consistent with the 
set API.

+{
+    void *md = NULL;
+
+ RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->get_pkt_metadata, NULL);
+    if (instance->ops->get_pkt_metadata(instance->device, pkt, 
&md))

+    return NULL;
+
+    return md;
+}


Pkt metadata should be set by user i.e. the application, and the 
driver need not be aware of the format and the values of the 
metadata.
So setting the metadata in the driver and getting it back from 
the driver does not look a good idea.


Is it possible, that the application define the metadata on its 
own and set it in the library itself without the call to the 
driver ops.
I'm not sure I understand here; even in our case (ixgbe) the 
driver sets the metadata and it is aware of the format - that is 
the whole idea. This is why we added the set_metadata API, to 
allow the driver to inject extra information into the mbuf, 
information that is driver specific and derived from the security 
session, so it makes sense to also have a symmetric get_metadata.
Private data is the one that follows those rules, i.e. 
application specific and driver transparent.


As per my understanding of the user metadata, it should be in 
control of the application, and the application shall know the 
format of that. Setting in driver will disallow this.

Please let me know if my understanding is incorrect.

If at all, some information is needed to be set on the basis of 
driver, then application can get that information from the driver 
and then set it in the packet metadata in its own way/format.


The rte_security_set_pkt_metadata() doc defines the metadata as 
"device-specific defined metadata" and also takes a device specific 
params pointer, so the symmetric function is to be expected to work 
in the same way, i.e. return device specific metadata associated 
with the security session and instance and mbuf. How is this 
metadata stored is not specified in the security API, so the PMD 
implementation have the flexibility.
Is rte_security_get_pkt_metadata() expected to return a "device 
specific" pointer? If that's the case, we would need another call 
(something like, rte_security_get_userdata()) to get back the userdata, 
right? Or is it fine, if the application assumes it will get userdata 
(the one passed in conf while creating security session) with 
rte_security_get_pkt_metadata()?




Yes it was defined that way and I did not noticed this one at the 
time of it's implementation.
Here, my point is that the application may be using mbuf udata for 
it's own functionality, it should not be modified in the driver.


However, if we need to do this, then we may need to clarify in the 
documentation that for security, udata shall be set with the 
rte_security_set_pkt_metadata() and not otherwise.
Indeed, we should update the doc stating that the set_metadata may 
change the mbuf userdata field so the application should use only 
private data if needed.


Agreed, but it is dependent on which driver/mode(inline or lookaside), 
it will be used.
Lookaside may not need this API as of now. Other implementations may 
also don't require. So this shall be documented that way.


-Akhil





Re: [dpdk-dev] Your next patch "net/virtio: fix an incorrect behavior..."

2017-12-05 Thread Yuanhan Liu
Thanks for the remind! I have reverted it.

--yliu

On Wed, Dec 06, 2017 at 03:11:48PM +0800, Tiwei Bie wrote:
> Hi Yuanhan,
> 
> Thanks for Antonio's reminder!
> 
> Below patch may break vector Rx in some cases [1].
> 
> http://dpdk.org/browse/dpdk-stable/commit/drivers/net/virtio?h=17.08&id=f4e455b776f61ec7fbdfb5dce6ba34ad9016689b
> 
> And I'm working on a fix for it. But as the deadline of 17.08.1
> is approaching, we don't have enough time to get the fix reviewed
> and tested by the community. I'm not sure, maybe we should remove
> it from the stable branch for 17.08.1 for now?
> 
> [1] http://dpdk.org/ml/archives/dev/2017-December/082983.html
> 
> Best regards,
> Tiwei Bie
> 
> On Tue, Dec 05, 2017 at 09:14:39PM +0800, Fischetti, Antonio wrote:
> > Thanks Tiwei,
> > one more thing. I saw they are going to backport some patches into 17.08.1
> > 
> > http://dev.dpdk.narkive.com/pX0IIwPl/dpdk-dev-17-08-1-patches-review-and-test
> > 
> > is your next fix related to the one reported as "net/virtio: flush Rx 
> > queues on start"?
> > 
> > If that's the case, should we flag this to Yuanhan Liu, just to be aware 
> > there's some work in progress?
> > 
> > 
> > Antonio
> >