date:20161010

On Wed, Sep 28, 2016 at 10:28:48AM +0800, Yuanhan Liu wrote:
> On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > > > I assume that if using Version 1 that the bit will be ignored
> > > 
> > > Yes, but I will just quote what you just said: what if the guest
> > > virtio device is a legacy device? I also gave my reasons in another
> > > email why I consistently set this flag:
> > > 
> > >   - we have to return all features we support to the guest.
> > >   
> > > We don't know the guest is a modern or legacy device. That means
> > > we should claim we support both: VERSION_1 and ANY_LAYOUT.
> > >   
> > > Assume guest is a legacy device and we just set VERSION_1 (the current
> > > case), ANY_LAYOUT will never be negotiated.
> > >   
> > >   - I'm following the way Linux kernel takes: it also set both features.
> > >   
> > >   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> > > 
> > > The unset after negotiation I proposed turned out it won't work: the
> > > feature is already negotiated; unsetting it only in vhost side doesn't
> > > change anything. Besides, it may break the migration as Michael stated
> > > below.
> > 
> > I think the reverse. Teach vhost user that for future machine types
> > only VERSION_1 implies ANY_LAYOUT.

So I guess at this point, we can teach vhost-user in qemu
that version 1 implies any_layout but only for machine types
qemu 2.8 and up. It sets a bad precedent but oh well.

-- 
MST

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

On Sun, Oct 09, 2016 at 12:09:07PM +, Wang, Zhihong wrote:
> > > > Tested with testpmd, host: txonly, guest: rxonly
> > > > size (bytes) improvement (%)
> > > > 644.12
> > > > 128   6
> > > > 256   2.65
> > > > 512   -1.12
> > > > 1024 -7.02
> > >
> > > There is a difference between Zhihong's code and the old I spotted in
> > > the first time: Zhihong removed the avail_idx prefetch. I understand
> > > the prefetch becomes a bit tricky when mrg-rx code path is considered;
> > > thus, I didn't comment on that.
> > >
> > > That's one of the difference that, IMO, could drop a regression. I then
> > > finally got a chance to add it back.
> > >
> > > A rough test shows it improves the performance of 1400B packet size
> > greatly
> > > in the "txonly in host and rxonly in guest" case: +33% is the number I get
> > > with my test server (Ivybridge).
> > 
> > Thanks Yuanhan! I'll validate this on x86.
> 
> Hi Yuanhan,
> 
> Seems your code doesn't perform correctly. I write a new version
> of avail idx prefetch but didn't see any perf benefit.
> 
> To be honest I doubt the benefit of this idea. The previous mrg_off
> code has this method but doesn't give any benefits.

Good point. I thought of that before, too. But you know that I made it
in rush, that I didn't think further and test more.

I looked the code a bit closer this time, and spotted a bug: the prefetch
actually didn't happen, due to following code piece:

if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
prefetch_avail_idx(vq);
...
}

Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
prefetch_avail_idx() will be called. The fix is easy though: just put
prefetch_avail_idx before invoking enqueue_packet.

In summary, Zhihong is right, I see no more gains with that fix :(

However, as stated, that's kind of the only difference I found between
yours and the old code, that maybe it's still worthwhile to have a
test on ARM, Jianbo?

--yliu

> Even if this is useful, the benefits should be more significant for
> small packets, it's unlikely this simple idx prefetch could bring
> over 30% perf gain for large packets like 1400B ones.
> 
> But if you really do work it out like that I'll be very glad to see.
> 
> Thanks
> Zhihong
> 
> > 
> > >
> > > I guess this might/would help your case as well. Mind to have a test
> > > and tell me the results?
> > >
> > > BTW, I made it in rush; I haven't tested the mrg-rx code path yet.
> > >
> > > Thanks.
> > >
> > >   --yliu

[dpdk-dev] [PATCH] app/testpmd: fix pf/vf check of flow director

Parameters pf & vf are added into most of flow director
filter CLIs.
But mac-valn and tunnel filters don't have these parameters,
the parameters should not be checked for mac-vlan and tunnel
filters.

Fixes: e6a68c013353 ("app/testpmd: extend commands for flow director in VF")
Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c | 34 +++---
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index f90befc..2580f27 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8502,24 +8502,28 @@ cmd_flow_director_filter_parsed(void *parsed_result,
else
entry.action.behavior = RTE_ETH_FDIR_ACCEPT;

-   if (!strcmp(res->pf_vf, "pf"))
-   entry.input.flow_ext.is_vf = 0;
-   else if (!strncmp(res->pf_vf, "vf", 2)) {
-   struct rte_eth_dev_info dev_info;
-
-   memset(&dev_info, 0, sizeof(dev_info));
-   rte_eth_dev_info_get(res->port_id, &dev_info);
-   errno = 0;
-   vf_id = strtoul(res->pf_vf + 2, &end, 10);
-   if (errno != 0 || *end != '\0' || vf_id >= dev_info.max_vfs) {
+   if (fdir_conf.mode !=  RTE_FDIR_MODE_PERFECT_MAC_VLAN &&
+   fdir_conf.mode !=  RTE_FDIR_MODE_PERFECT_TUNNEL) {
+   if (!strcmp(res->pf_vf, "pf"))
+   entry.input.flow_ext.is_vf = 0;
+   else if (!strncmp(res->pf_vf, "vf", 2)) {
+   struct rte_eth_dev_info dev_info;
+
+   memset(&dev_info, 0, sizeof(dev_info));
+   rte_eth_dev_info_get(res->port_id, &dev_info);
+   errno = 0;
+   vf_id = strtoul(res->pf_vf + 2, &end, 10);
+   if (errno != 0 || *end != '\0' ||
+   vf_id >= dev_info.max_vfs) {
+   printf("invalid parameter %s.\n", res->pf_vf);
+   return;
+   }
+   entry.input.flow_ext.is_vf = 1;
+   entry.input.flow_ext.dst_id = (uint16_t)vf_id;
+   } else {
printf("invalid parameter %s.\n", res->pf_vf);
return;
}
-   entry.input.flow_ext.is_vf = 1;
-   entry.input.flow_ext.dst_id = (uint16_t)vf_id;
-   } else {
-   printf("invalid parameter %s.\n", res->pf_vf);
-   return;
}

/* set to report FD ID by default */
-- 
1.9.3

[dpdk-dev] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 02:20:22AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 28, 2016 at 10:28:48AM +0800, Yuanhan Liu wrote:
> > On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> > > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > > > > I assume that if using Version 1 that the bit will be ignored
> > > > 
> > > > Yes, but I will just quote what you just said: what if the guest
> > > > virtio device is a legacy device? I also gave my reasons in another
> > > > email why I consistently set this flag:
> > > > 
> > > >   - we have to return all features we support to the guest.
> > > >   
> > > > We don't know the guest is a modern or legacy device. That means
> > > > we should claim we support both: VERSION_1 and ANY_LAYOUT.
> > > >   
> > > > Assume guest is a legacy device and we just set VERSION_1 (the 
> > > > current
> > > > case), ANY_LAYOUT will never be negotiated.
> > > >   
> > > >   - I'm following the way Linux kernel takes: it also set both features.
> > > >   
> > > >   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> > > > 
> > > > The unset after negotiation I proposed turned out it won't work: the
> > > > feature is already negotiated; unsetting it only in vhost side doesn't
> > > > change anything. Besides, it may break the migration as Michael stated
> > > > below.
> > > 
> > > I think the reverse. Teach vhost user that for future machine types
> > > only VERSION_1 implies ANY_LAYOUT.
> 
> So I guess at this point, we can teach vhost-user in qemu
> that version 1 implies any_layout but only for machine types
> qemu 2.8 and up. It sets a bad precedent but oh well.

It should work.

--yliu

[dpdk-dev] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 11:03:33AM +0800, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 02:20:22AM +0300, Michael S. Tsirkin wrote:
> > On Wed, Sep 28, 2016 at 10:28:48AM +0800, Yuanhan Liu wrote:
> > > On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> > > > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > > > > > I assume that if using Version 1 that the bit will be ignored
> > > > > 
> > > > > Yes, but I will just quote what you just said: what if the guest
> > > > > virtio device is a legacy device? I also gave my reasons in another
> > > > > email why I consistently set this flag:
> > > > > 
> > > > >   - we have to return all features we support to the guest.
> > > > >   
> > > > > We don't know the guest is a modern or legacy device. That means
> > > > > we should claim we support both: VERSION_1 and ANY_LAYOUT.
> > > > >   
> > > > > Assume guest is a legacy device and we just set VERSION_1 (the 
> > > > > current
> > > > > case), ANY_LAYOUT will never be negotiated.
> > > > >   
> > > > >   - I'm following the way Linux kernel takes: it also set both 
> > > > > features.
> > > > >   
> > > > >   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> > > > > 
> > > > > The unset after negotiation I proposed turned out it won't work: the
> > > > > feature is already negotiated; unsetting it only in vhost side doesn't
> > > > > change anything. Besides, it may break the migration as Michael stated
> > > > > below.
> > > > 
> > > > I think the reverse. Teach vhost user that for future machine types
> > > > only VERSION_1 implies ANY_LAYOUT.
> > 
> > So I guess at this point, we can teach vhost-user in qemu
> > that version 1 implies any_layout but only for machine types
> > qemu 2.8 and up. It sets a bad precedent but oh well.
> 
> It should work.
> 
>   --yliu

Cool. Want to post a patch?

-- 
MST

[dpdk-dev] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 06:04:32AM +0300, Michael S. Tsirkin wrote:
> > > So I guess at this point, we can teach vhost-user in qemu
> > > that version 1 implies any_layout but only for machine types
> > > qemu 2.8 and up. It sets a bad precedent but oh well.
> > 
> > It should work.
> > 
> > --yliu
> 
> Cool. Want to post a patch?

I could have a try this week (Well, it's very unlikely though).
If not, it will be postponed for a while: I am traveling next week.

--yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > 
> > 
> > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> Yes but two points.
> 
> 1. why is this memset expensive?

I don't have the exact answer, but just some rough thoughts:

It's an external clib function: there is a call stack and the
IP register will bounch back and forth. BTW, It's kind of an
overkill to use that for resetting 14 bytes structure.

Some trick like
*(struct virtio_net_hdr *)hdr = {0, };

Or even 
hdr->xxx = 0;
hdr->yyy = 0;

should behaviour better.

There was an example: the vhost enqueue optmization patchset from
Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
on my Ivybridge server: it has no such issue on his server though.

[0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html

--yliu

> Is the test completely skipping looking
>at the packet otherwise?
> 
> 2. As long as we are doing this, see
>   Alignment vs. Networking
>   
> in Documentation/unaligned-memory-access.txt
> 
> 
> > From the micro-benchmarks results, we can expect +10% compared to
> > indirect descriptors, and + 5% compared to using 2 descs in the
> > virtqueue.
> > Also, it should have the same benefits as indirect descriptors for 0%
> > pkt loss (as we can fill 2x more packets in the virtqueue).
> > 
> > What do you think?
> > 
> > Thanks,
> > Maxime

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > Yes but two points.
> > 
> > 1. why is this memset expensive?
> 
> I don't have the exact answer, but just some rough thoughts:
> 
> It's an external clib function: there is a call stack and the
> IP register will bounch back and forth.

for memset 0?  gcc 5.3.1 on fedora happily inlines it.

> BTW, It's kind of an
> overkill to use that for resetting 14 bytes structure.
> 
> Some trick like
> *(struct virtio_net_hdr *)hdr = {0, };
> 
> Or even 
> hdr->xxx = 0;
> hdr->yyy = 0;
> 
> should behaviour better.
> 
> There was an example: the vhost enqueue optmization patchset from
> Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> on my Ivybridge server: it has no such issue on his server though.
> 
> [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> 
>   --yliu

I'd say that's weird. what's your config? any chance you
are using an old compiler?


> > Is the test completely skipping looking
> >at the packet otherwise?
> > 
> > 2. As long as we are doing this, see
> > Alignment vs. Networking
> > 
> > in Documentation/unaligned-memory-access.txt
> > 
> > 
> > > From the micro-benchmarks results, we can expect +10% compared to
> > > indirect descriptors, and + 5% compared to using 2 descs in the
> > > virtqueue.
> > > Also, it should have the same benefits as indirect descriptors for 0%
> > > pkt loss (as we can fill 2x more packets in the virtqueue).
> > > 
> > > What do you think?
> > > 
> > > Thanks,
> > > Maxime

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> >so doing this unconditionally would be a spec violation, but if you see
> >value in this, we can add a feature bit.
> Right it would be a spec violation, so it should be done conditionally.
> If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
> It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
> If negotiated, we wouldn't need to prepend a header.

If we could skip Tx header, I think we could also skip Rx header, in the
case when mrg-rx is aslo turned off?

> From the micro-benchmarks results, we can expect +10% compared to
> indirect descriptors, and + 5% compared to using 2 descs in the
> virtqueue.
> Also, it should have the same benefits as indirect descriptors for 0%
> pkt loss (as we can fill 2x more packets in the virtqueue).
> 
> What do you think?

I would vote for this. It should yield maximum performance for the case
that it's guaranteed that packet size will always fit in a typical MTU
(1500).

--yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > 
> > > > 
> > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > Yes but two points.
> > > 
> > > 1. why is this memset expensive?
> > 
> > I don't have the exact answer, but just some rough thoughts:
> > 
> > It's an external clib function: there is a call stack and the
> > IP register will bounch back and forth.
> 
> for memset 0?  gcc 5.3.1 on fedora happily inlines it.

Good to know!

> > overkill to use that for resetting 14 bytes structure.
> > 
> > Some trick like
> > *(struct virtio_net_hdr *)hdr = {0, };
> > 
> > Or even 
> > hdr->xxx = 0;
> > hdr->yyy = 0;
> > 
> > should behaviour better.
> > 
> > There was an example: the vhost enqueue optmization patchset from
> > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > on my Ivybridge server: it has no such issue on his server though.
> > 
> > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > 
> > --yliu
> 
> I'd say that's weird. what's your config? any chance you
> are using an old compiler?

Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
he said the memset is not well optimized for Ivybridge server.

--yliu

[dpdk-dev] [PATCH v2] i40: fix the VXLAN TSO issue

NACK.

This fix has been done by a new one which is already merged:  
http://dpdk.org/dev/patchwork/patch/15059/


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zhe Tao
> Sent: Thursday, July 7, 2016 12:27 PM
> To: dev at dpdk.org
> Cc: zhe.tao at intel.com; Wu, Jingjing 
> Subject: [dpdk-dev] [PATCH v2] i40: fix the VXLAN TSO issue
> 
> Problem:
> When using the TSO + VXLAN feature in i40e, the outer UDP length fields in
> the multiple UDP segments which are TSOed by the i40e will have the wrong
> value.
> 
> Fix this problem by adding the tunnel type field in the i40e descriptor which
> was missed before.
> 
> Fixes: 77b8301733c3 ("i40e: VXLAN Tx checksum offload")
> 
> Signed-off-by: Zhe Tao 
> ---
> V2: Edited some comments for mbuf structure and i40e driver.
> 
>  app/test-pmd/csumonly.c  | 26 +++---
>  drivers/net/i40e/i40e_rxtx.c | 12 +---
>  lib/librte_mbuf/rte_mbuf.h   | 16 +++-
>  3 files changed, 43 insertions(+), 11 deletions(-)
> 
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
> ac4bd8f..d423c20 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -204,7 +204,8 @@ parse_ethernet(struct ether_hdr *eth_hdr, struct
> testpmd_offload_info *info)  static void  parse_vxlan(struct udp_hdr
> *udp_hdr,
>   struct testpmd_offload_info *info,
> - uint32_t pkt_type)
> + uint32_t pkt_type,
> + uint64_t *ol_flags)
>  {
>   struct ether_hdr *eth_hdr;
> 
> @@ -215,6 +216,7 @@ parse_vxlan(struct udp_hdr *udp_hdr,
>   RTE_ETH_IS_TUNNEL_PKT(pkt_type) == 0)
>   return;
> 
> + *ol_flags |= PKT_TX_TUNNEL_VXLAN;
>   info->is_tunnel = 1;
>   info->outer_ethertype = info->ethertype;
>   info->outer_l2_len = info->l2_len;
> @@ -231,7 +233,9 @@ parse_vxlan(struct udp_hdr *udp_hdr,
> 
>  /* Parse a gre header */
>  static void
> -parse_gre(struct simple_gre_hdr *gre_hdr, struct testpmd_offload_info
> *info)
> +parse_gre(struct simple_gre_hdr *gre_hdr,
> +   struct testpmd_offload_info *info,
> +   uint64_t *ol_flags)
>  {
>   struct ether_hdr *eth_hdr;
>   struct ipv4_hdr *ipv4_hdr;
> @@ -242,6 +246,8 @@ parse_gre(struct simple_gre_hdr *gre_hdr, struct
> testpmd_offload_info *info)
>   if ((gre_hdr->flags & _htons(~GRE_SUPPORTED_FIELDS)) != 0)
>   return;
> 
> + *ol_flags |= PKT_TX_TUNNEL_GRE;
> +
>   gre_len += sizeof(struct simple_gre_hdr);
> 
>   if (gre_hdr->flags & _htons(GRE_KEY_PRESENT)) @@ -417,7 +423,7
> @@ process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info
> *info,
>   * packet */
>  static uint64_t
>  process_outer_cksums(void *outer_l3_hdr, struct testpmd_offload_info
> *info,
> - uint16_t testpmd_ol_flags)
> + uint16_t testpmd_ol_flags, uint64_t orig_ol_flags)
>  {
>   struct ipv4_hdr *ipv4_hdr = outer_l3_hdr;
>   struct ipv6_hdr *ipv6_hdr = outer_l3_hdr; @@ -442,6 +448,9 @@
> process_outer_cksums(void *outer_l3_hdr, struct testpmd_offload_info
> *info,
>* hardware supporting it today, and no API for it. */
> 
>   udp_hdr = (struct udp_hdr *)((char *)outer_l3_hdr + info-
> >outer_l3_len);
> + if ((orig_ol_flags & PKT_TX_TCP_SEG) &&
> + ((orig_ol_flags & PKT_TX_TUNNEL_MASK) ==
> PKT_TX_TUNNEL_VXLAN))
> + udp_hdr->dgram_cksum = 0;
>   /* do not recalculate udp cksum if it was 0 */
>   if (udp_hdr->dgram_cksum != 0) {
>   udp_hdr->dgram_cksum = 0;
> @@ -705,15 +714,18 @@ pkt_burst_checksum_forward(struct fwd_stream
> *fs)
>   if (info.l4_proto == IPPROTO_UDP) {
>   struct udp_hdr *udp_hdr;
>   udp_hdr = (struct udp_hdr *)((char *)l3_hdr
> +
> - info.l3_len);
> - parse_vxlan(udp_hdr, &info, m-
> >packet_type);
> +info.l3_len);
> + parse_vxlan(udp_hdr, &info, m-
> >packet_type,
> + &ol_flags);
>   } else if (info.l4_proto == IPPROTO_GRE) {
>   struct simple_gre_hdr *gre_hdr;
>   gre_hdr = (struct simple_gre_hdr *)
>   ((char *)l3_hdr + info.l3_len);
> - parse_gre(gre_hdr, &info);
> + parse_gre(gre_hdr, &info, &ol_flags);
>   } else if (info.l4_proto == IPPROTO_IPIP) {
>   void *encap_ip_hdr;
> +
> + ol_flags |= PKT_TX_TUNNEL_IPIP;
>   encap_ip_hdr = (char *)l3_hdr + info.l3_len;
>   parse_encap_ip(encap_ip_hdr, &info);
>   }
> @@ -745,7 +757,7 @@ pkt_burst_checksum_forward(struct fwd_stream *fs)
>* proces

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
> > > And the same is done is done in DPDK:
> > > 
> > > static inline int __attribute__((always_inline))
> > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> > >   uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> > >   struct rte_mempool *mbuf_pool)
> > > {
> > > ...
> > > /*
> > >  * A virtio driver normally uses at least 2 desc buffers
> > >  * for Tx: the first for storing the header, and others
> > >  * for storing the data.
> > >  */
> > > if (likely((desc->len == dev->vhost_hlen) &&
> > >(desc->flags & VRING_DESC_F_NEXT) != 0)) {
> > > desc = &descs[desc->next];
> > > if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> > > return -1;
> > > 
> > > desc_addr = gpa_to_vva(dev, desc->addr);
> > > if (unlikely(!desc_addr))
> > > return -1;
> > > 
> > > rte_prefetch0((void *)(uintptr_t)desc_addr);
> > > 
> > > desc_offset = 0;
> > > desc_avail  = desc->len;
> > > nr_desc+= 1;
> > > 
> > > PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> > > } else {
> > > desc_avail  = desc->len - dev->vhost_hlen;
> > > desc_offset = dev->vhost_hlen;
> > > }
> > 
> > Actually, the header is parsed in DPDK vhost implementation.
> > But as Virtio PMD provides a zero'ed header, we could just parse
> > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
> 
> host can always skip the header parse if it wants to.
> It didn't seem worth it to add branches there but
> if I'm wrong, by all means code it up.

It's added by following commit, which yields about 10% performance
boosts for PVP case (with 64B packet size).

At that time, a packet always use 2 descs. Since indirect desc is
enabled (by default) now, the assumption is not true then. What's
worse, it might even slow things a bit down. That should also be
part of the reason why performance is slightly worse than before.

--yliu

commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
Author: Yuanhan Liu 
Date:   Mon May 2 17:46:17 2016 -0700

vhost: optimize dequeue for small packets

A virtio driver normally uses at least 2 desc buffers for Tx: the
first for storing the header, and the others for storing the data.

Therefore, we could fetch the first data desc buf before the main
loop, and do the copy first before the check of "are we done yet?".
This could save one check for small packets that just have one data
desc buffer and need one mbuf to store it.

Signed-off-by: Yuanhan Liu 
Acked-by: Huawei Xie 
Tested-by: Rich Lane

[dpdk-dev] [PATCH v2] i40: fix the VXLAN TSO issue

On Mon, Oct 10, 2016 at 03:58:57AM +, Wu, Jingjing wrote:
> NACK.
> 
> This fix has been done by a new one which is already merged:  
> http://dpdk.org/dev/patchwork/patch/15059/

Just want to let you know, that you don't have to NACK an old patchset. In
the process of the patchwork, ideally, the patch author should mark the old
one as "Superseded" once he send out a new version.

Meanwhile, the committers might/could also do the mark while reviewing the
patchwork. I have done that a lot, since I didn't see any one submitted
virtio/vhost patches have done that before :/

--yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

2016-10-10 Thread Wang, Zhihong



> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, October 10, 2016 11:59 AM
> To: Michael S. Tsirkin 
> Cc: Maxime Coquelin ; Stephen Hemminger
> ; dev at dpdk.org; qemu-
> devel at nongnu.org; Wang, Zhihong 
> Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
> 
> On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > >
> > > > >
> > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > > Yes but two points.
> > > >
> > > > 1. why is this memset expensive?
> > >
> > > I don't have the exact answer, but just some rough thoughts:
> > >
> > > It's an external clib function: there is a call stack and the
> > > IP register will bounch back and forth.
> >
> > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> 
> Good to know!
> 
> > > overkill to use that for resetting 14 bytes structure.
> > >
> > > Some trick like
> > > *(struct virtio_net_hdr *)hdr = {0, };
> > >
> > > Or even
> > > hdr->xxx = 0;
> > > hdr->yyy = 0;
> > >
> > > should behaviour better.
> > >
> > > There was an example: the vhost enqueue optmization patchset from
> > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > on my Ivybridge server: it has no such issue on his server though.
> > >
> > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > >
> > >   --yliu
> >
> > I'd say that's weird. what's your config? any chance you
> > are using an old compiler?
> 
> Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> he said the memset is not well optimized for Ivybridge server.

The dst is remote in that case. It's fine on Haswell but has complication
in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.

I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.

> 
>   --yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote:
> On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
> > > > And the same is done is done in DPDK:
> > > > 
> > > > static inline int __attribute__((always_inline))
> > > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> > > >   uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> > > >   struct rte_mempool *mbuf_pool)
> > > > {
> > > > ...
> > > > /*
> > > >  * A virtio driver normally uses at least 2 desc buffers
> > > >  * for Tx: the first for storing the header, and others
> > > >  * for storing the data.
> > > >  */
> > > > if (likely((desc->len == dev->vhost_hlen) &&
> > > >(desc->flags & VRING_DESC_F_NEXT) != 0)) {
> > > > desc = &descs[desc->next];
> > > > if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> > > > return -1;
> > > > 
> > > > desc_addr = gpa_to_vva(dev, desc->addr);
> > > > if (unlikely(!desc_addr))
> > > > return -1;
> > > > 
> > > > rte_prefetch0((void *)(uintptr_t)desc_addr);
> > > > 
> > > > desc_offset = 0;
> > > > desc_avail  = desc->len;
> > > > nr_desc+= 1;
> > > > 
> > > > PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> > > > } else {
> > > > desc_avail  = desc->len - dev->vhost_hlen;
> > > > desc_offset = dev->vhost_hlen;
> > > > }
> > > 
> > > Actually, the header is parsed in DPDK vhost implementation.
> > > But as Virtio PMD provides a zero'ed header, we could just parse
> > > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
> > 
> > host can always skip the header parse if it wants to.
> > It didn't seem worth it to add branches there but
> > if I'm wrong, by all means code it up.
> 
> It's added by following commit, which yields about 10% performance
> boosts for PVP case (with 64B packet size).
> 
> At that time, a packet always use 2 descs. Since indirect desc is
> enabled (by default) now, the assumption is not true then. What's
> worse, it might even slow things a bit down. That should also be
> part of the reason why performance is slightly worse than before.
> 
>   --yliu

I'm not sure I get what you are saying

> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> Author: Yuanhan Liu 
> Date:   Mon May 2 17:46:17 2016 -0700
> 
> vhost: optimize dequeue for small packets
> 
> A virtio driver normally uses at least 2 desc buffers for Tx: the
> first for storing the header, and the others for storing the data.
> 
> Therefore, we could fetch the first data desc buf before the main
> loop, and do the copy first before the check of "are we done yet?".
> This could save one check for small packets that just have one data
> desc buffer and need one mbuf to store it.
> 
> Signed-off-by: Yuanhan Liu 
> Acked-by: Huawei Xie 
> Tested-by: Rich Lane 

This fast-paths the 2-descriptors format but it's not active
for indirect descriptors. Is this what you mean?
Should be a simple matter to apply this optimization for indirect.

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 07:17:06AM +0300, Michael S. Tsirkin wrote:
> On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote:
> > On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
> > > > > And the same is done is done in DPDK:
> > > > > 
> > > > > static inline int __attribute__((always_inline))
> > > > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> > > > >   uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> > > > >   struct rte_mempool *mbuf_pool)
> > > > > {
> > > > > ...
> > > > > /*
> > > > >  * A virtio driver normally uses at least 2 desc buffers
> > > > >  * for Tx: the first for storing the header, and others
> > > > >  * for storing the data.
> > > > >  */
> > > > > if (likely((desc->len == dev->vhost_hlen) &&
> > > > >(desc->flags & VRING_DESC_F_NEXT) != 0)) {
> > > > > desc = &descs[desc->next];
> > > > > if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> > > > > return -1;
> > > > > 
> > > > > desc_addr = gpa_to_vva(dev, desc->addr);
> > > > > if (unlikely(!desc_addr))
> > > > > return -1;
> > > > > 
> > > > > rte_prefetch0((void *)(uintptr_t)desc_addr);
> > > > > 
> > > > > desc_offset = 0;
> > > > > desc_avail  = desc->len;
> > > > > nr_desc+= 1;
> > > > > 
> > > > > PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> > > > > } else {
> > > > > desc_avail  = desc->len - dev->vhost_hlen;
> > > > > desc_offset = dev->vhost_hlen;
> > > > > }
> > > > 
> > > > Actually, the header is parsed in DPDK vhost implementation.
> > > > But as Virtio PMD provides a zero'ed header, we could just parse
> > > > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
> > > 
> > > host can always skip the header parse if it wants to.
> > > It didn't seem worth it to add branches there but
> > > if I'm wrong, by all means code it up.
> > 
> > It's added by following commit, which yields about 10% performance
> > boosts for PVP case (with 64B packet size).
> > 
> > At that time, a packet always use 2 descs. Since indirect desc is
> > enabled (by default) now, the assumption is not true then. What's
> > worse, it might even slow things a bit down. That should also be
> > part of the reason why performance is slightly worse than before.
> > 
> > --yliu
> 
> I'm not sure I get what you are saying
> 
> > commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> > Author: Yuanhan Liu 
> > Date:   Mon May 2 17:46:17 2016 -0700
> > 
> > vhost: optimize dequeue for small packets
> > 
> > A virtio driver normally uses at least 2 desc buffers for Tx: the
> > first for storing the header, and the others for storing the data.
> > 
> > Therefore, we could fetch the first data desc buf before the main
> > loop, and do the copy first before the check of "are we done yet?".
> > This could save one check for small packets that just have one data
> > desc buffer and need one mbuf to store it.
> > 
> > Signed-off-by: Yuanhan Liu 
> > Acked-by: Huawei Xie 
> > Tested-by: Rich Lane 
> 
> This fast-paths the 2-descriptors format but it's not active
> for indirect descriptors. Is this what you mean?

Yes. It's also not active when ANY_LAYOUT is actually turned on.

> Should be a simple matter to apply this optimization for indirect.

Might be.

--yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 04:16:19AM +, Wang, Zhihong wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > Sent: Monday, October 10, 2016 11:59 AM
> > To: Michael S. Tsirkin 
> > Cc: Maxime Coquelin ; Stephen Hemminger
> > ; dev at dpdk.org; qemu-
> > devel at nongnu.org; Wang, Zhihong 
> > Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
> > 
> > On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > > >
> > > > > >
> > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > > > Yes but two points.
> > > > >
> > > > > 1. why is this memset expensive?
> > > >
> > > > I don't have the exact answer, but just some rough thoughts:
> > > >
> > > > It's an external clib function: there is a call stack and the
> > > > IP register will bounch back and forth.
> > >
> > > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> > 
> > Good to know!
> > 
> > > > overkill to use that for resetting 14 bytes structure.
> > > >
> > > > Some trick like
> > > > *(struct virtio_net_hdr *)hdr = {0, };
> > > >
> > > > Or even
> > > > hdr->xxx = 0;
> > > > hdr->yyy = 0;
> > > >
> > > > should behaviour better.
> > > >
> > > > There was an example: the vhost enqueue optmization patchset from
> > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > > on my Ivybridge server: it has no such issue on his server though.
> > > >
> > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > >
> > > > --yliu
> > >
> > > I'd say that's weird. what's your config? any chance you
> > > are using an old compiler?
> > 
> > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > he said the memset is not well optimized for Ivybridge server.
> 
> The dst is remote in that case. It's fine on Haswell but has complication
> in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> 
> I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.

I just wrote some test code and compiled with gcc -O2,
it did get inlined.

It's probably this:
uint16_t head_size = vq->hw->vtnet_hdr_size;
...
memset(hdr, 0, head_size);
IOW head_size is not known to compiler.

Try sticking a bool there instead of vtnet_hdr_size, and
 memset(hdr, 0, bigheader ? 10 : 12);


> > 
> > --yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 12:22:09PM +0800, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 07:17:06AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote:
> > > On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
> > > > > > And the same is done is done in DPDK:
> > > > > > 
> > > > > > static inline int __attribute__((always_inline))
> > > > > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> > > > > >   uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> > > > > >   struct rte_mempool *mbuf_pool)
> > > > > > {
> > > > > > ...
> > > > > > /*
> > > > > >  * A virtio driver normally uses at least 2 desc buffers
> > > > > >  * for Tx: the first for storing the header, and others
> > > > > >  * for storing the data.
> > > > > >  */
> > > > > > if (likely((desc->len == dev->vhost_hlen) &&
> > > > > >(desc->flags & VRING_DESC_F_NEXT) != 0)) {
> > > > > > desc = &descs[desc->next];
> > > > > > if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> > > > > > return -1;
> > > > > > 
> > > > > > desc_addr = gpa_to_vva(dev, desc->addr);
> > > > > > if (unlikely(!desc_addr))
> > > > > > return -1;
> > > > > > 
> > > > > > rte_prefetch0((void *)(uintptr_t)desc_addr);
> > > > > > 
> > > > > > desc_offset = 0;
> > > > > > desc_avail  = desc->len;
> > > > > > nr_desc+= 1;
> > > > > > 
> > > > > > PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> > > > > > } else {
> > > > > > desc_avail  = desc->len - dev->vhost_hlen;
> > > > > > desc_offset = dev->vhost_hlen;
> > > > > > }
> > > > > 
> > > > > Actually, the header is parsed in DPDK vhost implementation.
> > > > > But as Virtio PMD provides a zero'ed header, we could just parse
> > > > > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
> > > > 
> > > > host can always skip the header parse if it wants to.
> > > > It didn't seem worth it to add branches there but
> > > > if I'm wrong, by all means code it up.
> > > 
> > > It's added by following commit, which yields about 10% performance
> > > boosts for PVP case (with 64B packet size).
> > > 
> > > At that time, a packet always use 2 descs. Since indirect desc is
> > > enabled (by default) now, the assumption is not true then. What's
> > > worse, it might even slow things a bit down. That should also be
> > > part of the reason why performance is slightly worse than before.
> > > 
> > >   --yliu
> > 
> > I'm not sure I get what you are saying
> > 
> > > commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> > > Author: Yuanhan Liu 
> > > Date:   Mon May 2 17:46:17 2016 -0700
> > > 
> > > vhost: optimize dequeue for small packets
> > > 
> > > A virtio driver normally uses at least 2 desc buffers for Tx: the
> > > first for storing the header, and the others for storing the data.
> > > 
> > > Therefore, we could fetch the first data desc buf before the main
> > > loop, and do the copy first before the check of "are we done yet?".
> > > This could save one check for small packets that just have one data
> > > desc buffer and need one mbuf to store it.
> > > 
> > > Signed-off-by: Yuanhan Liu 
> > > Acked-by: Huawei Xie 
> > > Tested-by: Rich Lane 
> > 
> > This fast-paths the 2-descriptors format but it's not active
> > for indirect descriptors. Is this what you mean?
> 
> Yes. It's also not active when ANY_LAYOUT is actually turned on.

It's not needed there though - you only use 1 desc, no point in
fetching the next one.


> > Should be a simple matter to apply this optimization for indirect.
> 
> Might be.
> 
>   --yliu

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 04:16:19AM +, Wang, Zhihong wrote:
> 
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > Sent: Monday, October 10, 2016 11:59 AM
> > To: Michael S. Tsirkin 
> > Cc: Maxime Coquelin ; Stephen Hemminger
> > ; dev at dpdk.org; qemu-
> > devel at nongnu.org; Wang, Zhihong 
> > Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
> > 
> > On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > > >
> > > > > >
> > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > > > Yes but two points.
> > > > >
> > > > > 1. why is this memset expensive?
> > > >
> > > > I don't have the exact answer, but just some rough thoughts:
> > > >
> > > > It's an external clib function: there is a call stack and the
> > > > IP register will bounch back and forth.
> > >
> > > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> > 
> > Good to know!
> > 
> > > > overkill to use that for resetting 14 bytes structure.
> > > >
> > > > Some trick like
> > > > *(struct virtio_net_hdr *)hdr = {0, };
> > > >
> > > > Or even
> > > > hdr->xxx = 0;
> > > > hdr->yyy = 0;
> > > >
> > > > should behaviour better.
> > > >
> > > > There was an example: the vhost enqueue optmization patchset from
> > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > > on my Ivybridge server: it has no such issue on his server though.
> > > >
> > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > >
> > > > --yliu
> > >
> > > I'd say that's weird. what's your config? any chance you
> > > are using an old compiler?
> > 
> > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > he said the memset is not well optimized for Ivybridge server.
> 
> The dst is remote in that case. It's fine on Haswell but has complication
> in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> 
> I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.


So try something like this then:

Signed-off-by: Michael S. Tsirkin 


diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index dd7693f..7a3f88e 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -292,6 +292,16 @@ vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
return (hw->guest_features & (1ULL << bit)) != 0;
 }

+static inline int
+vtnet_hdr_size(struct virtio_hw *hw)
+{
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
+   vtpci_with_feature(hw, VIRTIO_F_VERSION_1))
+   return sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   else
+   return sizeof(struct virtio_net_hdr);
+}
+
 /*
  * Function declaration from virtio_pci.c
  */
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index a27208e..21a45e1 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -216,7 +216,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct 
rte_mbuf *cookie,
struct vring_desc *start_dp;
uint16_t seg_num = cookie->nb_segs;
uint16_t head_idx, idx;
-   uint16_t head_size = vq->hw->vtnet_hdr_size;
+   uint16_t head_size = vtnet_hdr_size(vq->hw);
unsigned long offs;

head_idx = vq->vq_desc_head_idx;

Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good
for performance. Move fields used on data path into vq
and use from there to avoid indirections?

[dpdk-dev] [PATCH v2] drivers: prefix driver REGISTER macro with RTE PMD

2016-10-10 Thread Shreyansh Jain

On Monday 10 October 2016 01:20 AM, Thomas Monjalon wrote:
> 2016-10-09 15:12, Shreyansh Jain:
>> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>>> 2016-10-08 23:35, Shreyansh Jain:
 +PMDINFO_TO_O = if grep -E 'RTE_PMD_REGISTER_PCI\([0-9a-zA-Z,_\.
>>> ]+\)|RTE_PMD_REGISTER_VDEV\([0-9a-zA-Z,_\. ]+\)' $<;\
 +   then \
>>>
>>> I don't understand why you don't simply grep 'RTE_PMD_REGISTER_.*(' ?
>>
>> Because I want to make sure that the grep matches only the DRIVER 
>> registration functions.
>> In case a new macro (or driver type) is added in future, this macro can be 
>> updated. This way we can reduce the probability of a faulty match.
>>
>> Is there a problem with closest possible match?
>
> It is just long and useless. A macro starting with RTE_PMD_REGISTER_ must
> be called from a PMD. What else?

Long, yes. But I don't know why you state it as useless. Reducing 
probability of a false positive is what it does.

Anyways, I will send a v3 with "grep -Eq 'DRIVER_REGISTER_.*\(.*\)'".

-
Shreyansh

[dpdk-dev] [PATCH v2] drivers: prefix driver REGISTER macro with RTE PMD

2016-10-10 Thread Shreyansh Jain

On Monday 10 October 2016 10:41 AM, Shreyansh Jain wrote:
> On Monday 10 October 2016 01:20 AM, Thomas Monjalon wrote:
>> 2016-10-09 15:12, Shreyansh Jain:
>>> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
 2016-10-08 23:35, Shreyansh Jain:
> +PMDINFO_TO_O = if grep -E 'RTE_PMD_REGISTER_PCI\([0-9a-zA-Z,_\.
 ]+\)|RTE_PMD_REGISTER_VDEV\([0-9a-zA-Z,_\. ]+\)' $<;\
> +   then \

 I don't understand why you don't simply grep 'RTE_PMD_REGISTER_.*(' ?
>>>
>>> Because I want to make sure that the grep matches only the DRIVER
>>> registration functions.
>>> In case a new macro (or driver type) is added in future, this macro
>>> can be updated. This way we can reduce the probability of a faulty
>>> match.
>>>
>>> Is there a problem with closest possible match?
>>
>> It is just long and useless. A macro starting with RTE_PMD_REGISTER_ must
>> be called from a PMD. What else?
>
> Long, yes. But I don't know why you state it as useless. Reducing
> probability of a false positive is what it does.
>
> Anyways, I will send a v3 with "grep -Eq 'DRIVER_REGISTER_.*\(.*\)'".

I meant "grep -q 'RTE_PMD_REGISTER_.*(.*)'".
Minimal changes from existing.

-
Shreyansh

[dpdk-dev] [PATCH v2] test_cryptodev_perf: IV and digest should be stored at a DMAeble address

2016-10-10 Thread Akhil Goyal

On 10/8/2016 3:06 AM, De Lara Guarch, Pablo wrote:
> Hi Akhil,
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
>> akhil.goyal at nxp.com
>> Sent: Friday, October 07, 2016 10:06 AM
>> To: Kusztal, ArkadiuszX; Doherty, Declan
>> Cc: Griffin, John; Trahe, Fiona; Jain, Deepak K; dev at dpdk.org; Akhil Goyal
>> Subject: [dpdk-dev] [PATCH v2] test_cryptodev_perf: IV and digest should be
>> stored at a DMAeble address
>>
>> From: Akhil Goyal 
>>
>> For physical crypto devices, IV and digest are processed by the crypto
>> device which need the contents to be written on some DMA able address.
>>
>> So in order to do that, IV and digest are accomodated in the packet.
>>
>> Signed-off-by: Akhil Goyal 
>> v2: patch rebased
>
> You need to rebase against the HEAD of the dpdk-next-crypto subtree:
> (http://dpdk.org/browse/next/dpdk-next-crypto/).
>
> Thanks!
> Pablo
>
>
>
>
>
Hi Pablo,

I have already rebased this patch to dpdk-next-crypto subtree. Please 
let me know if there is any issue.
Here is my git log
779f1301a382b6b9e2877fd0357bba33d1242d65 test_cryptodev_perf: IV and 
digest should be stored at a DMAeble address
8c9fdf4568768b7765fc2176e400a860dc758020 app/test: remove hard-coding of 
crypto num qps
c1876c1cb90f0882ada0acd9e430be7cf63bc765 app/test: cleanup unnecessary 
ring size setup
136592c3a350ded56438b59cc4921a243f08e1d0 app/test: remove pointless for loop
fca4f966b42adc0c8f3e1d43a94d93ea4fcb crypto/aesni_mb: free ring 
memory on qp release in PMD

Regards

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-10 Thread Jianbo Liu

On 10 October 2016 at 10:44, Yuanhan Liu  wrote:
> On Sun, Oct 09, 2016 at 12:09:07PM +, Wang, Zhihong wrote:
>> > > > Tested with testpmd, host: txonly, guest: rxonly
>> > > > size (bytes) improvement (%)
>> > > > 644.12
>> > > > 128   6
>> > > > 256   2.65
>> > > > 512   -1.12
>> > > > 1024 -7.02
>> > >
>> > > There is a difference between Zhihong's code and the old I spotted in
>> > > the first time: Zhihong removed the avail_idx prefetch. I understand
>> > > the prefetch becomes a bit tricky when mrg-rx code path is considered;
>> > > thus, I didn't comment on that.
>> > >
>> > > That's one of the difference that, IMO, could drop a regression. I then
>> > > finally got a chance to add it back.
>> > >
>> > > A rough test shows it improves the performance of 1400B packet size
>> > greatly
>> > > in the "txonly in host and rxonly in guest" case: +33% is the number I 
>> > > get
>> > > with my test server (Ivybridge).
>> >
>> > Thanks Yuanhan! I'll validate this on x86.
>>
>> Hi Yuanhan,
>>
>> Seems your code doesn't perform correctly. I write a new version
>> of avail idx prefetch but didn't see any perf benefit.
>>
>> To be honest I doubt the benefit of this idea. The previous mrg_off
>> code has this method but doesn't give any benefits.
>
> Good point. I thought of that before, too. But you know that I made it
> in rush, that I didn't think further and test more.
>
> I looked the code a bit closer this time, and spotted a bug: the prefetch
> actually didn't happen, due to following code piece:
>
> if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
> prefetch_avail_idx(vq);
> ...
> }
>
> Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
> prefetch_avail_idx() will be called. The fix is easy though: just put
> prefetch_avail_idx before invoking enqueue_packet.
>
> In summary, Zhihong is right, I see no more gains with that fix :(
>
> However, as stated, that's kind of the only difference I found between
> yours and the old code, that maybe it's still worthwhile to have a
> test on ARM, Jianbo?
>
I haven't tested it, but I think it could be no improvement for ARM either.

A smalll suggestion for enqueue_packet:

.
+   /* start copy from mbuf to desc */
+   while (mbuf_avail || mbuf->next) {
.

Considering pkt_len is in the first cache line (same as data_len),
while next pointer is in the second cache line,
is it better to check the total packet len, instead of the last mbuf's
next pointer to jump out of while loop and avoid possible cache miss?

[dpdk-dev] [PATCH v3] drivers: prefix driver REGISTER macro with RTE PMD

2016-10-10 Thread Shreyansh Jain

All macros related to driver registeration renamed from DRIVER_*
to RTE_PMD_*

This includes:

 DRIVER_REGISTER_PCI -> RTE_PMD_REGISTER_PCI
 DRIVER_REGISTER_PCI_TABLE -> RTE_PMD_REGISTER_PCI_TABLE
 DRIVER_REGISTER_VDEV -> RTE_PMD_REGISTER_VDEV
 DRIVER_REGISTER_PARAM_STRING -> RTE_PMD_REGISTER_PARAM_STRING
 DRIVER_EXPORT_* -> RTE_PMD_EXPORT_*

Fix PMDINFOGEN tool to look for matches of RTE_PMD_REGISTER_*.

Signed-off-by: Shreyansh Jain 
--
Changes since v2:
 - Rebase over master
 - revert PMDINFO grep pattern

Changes since v1:
 - Fix patch headline

Changes since v0:
 - expand replacement to DRIVER_EXPORT_*
 - merge all changes into single commit
 - checkpatch fixes

---
 doc/guides/prog_guide/dev_kit_build_system.rst |  2 +-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c   |  4 ++--
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c |  4 ++--
 drivers/crypto/kasumi/rte_kasumi_pmd.c |  4 ++--
 drivers/crypto/libcrypto/rte_libcrypto_pmd.c   |  5 +++--
 drivers/crypto/null/null_crypto_pmd.c  |  4 ++--
 drivers/crypto/qat/rte_qat_cryptodev.c |  4 ++--
 drivers/crypto/snow3g/rte_snow3g_pmd.c |  4 ++--
 drivers/crypto/zuc/rte_zuc_pmd.c   |  4 ++--
 drivers/net/af_packet/rte_eth_af_packet.c  |  4 ++--
 drivers/net/bnx2x/bnx2x_ethdev.c   |  8 
 drivers/net/bnxt/bnxt_ethdev.c |  4 ++--
 drivers/net/bonding/rte_eth_bond_pmd.c |  4 ++--
 drivers/net/cxgbe/cxgbe_ethdev.c   |  4 ++--
 drivers/net/e1000/em_ethdev.c  |  4 ++--
 drivers/net/e1000/igb_ethdev.c |  8 
 drivers/net/ena/ena_ethdev.c   |  4 ++--
 drivers/net/enic/enic_ethdev.c |  4 ++--
 drivers/net/fm10k/fm10k_ethdev.c   |  4 ++--
 drivers/net/i40e/i40e_ethdev.c |  4 ++--
 drivers/net/i40e/i40e_ethdev_vf.c  |  4 ++--
 drivers/net/ixgbe/ixgbe_ethdev.c   |  8 
 drivers/net/mlx4/mlx4.c|  4 ++--
 drivers/net/mlx5/mlx5.c|  4 ++--
 drivers/net/mpipe/mpipe_tilegx.c   |  4 ++--
 drivers/net/nfp/nfp_net.c  |  4 ++--
 drivers/net/null/rte_eth_null.c|  4 ++--
 drivers/net/pcap/rte_eth_pcap.c|  4 ++--
 drivers/net/qede/qede_ethdev.c |  8 
 drivers/net/ring/rte_eth_ring.c|  4 ++--
 drivers/net/szedata2/rte_eth_szedata2.c|  4 ++--
 drivers/net/thunderx/nicvf_ethdev.c|  4 ++--
 drivers/net/vhost/rte_eth_vhost.c  |  4 ++--
 drivers/net/virtio/virtio_ethdev.c |  4 ++--
 drivers/net/virtio/virtio_user_ethdev.c|  4 ++--
 drivers/net/vmxnet3/vmxnet3_ethdev.c   |  4 ++--
 drivers/net/xenvirt/rte_eth_xenvirt.c  |  4 ++--
 lib/librte_eal/common/eal_common_dev.c |  2 +-
 lib/librte_eal/common/include/rte_dev.h| 10 +-
 lib/librte_eal/common/include/rte_pci.h|  4 ++--
 lib/librte_eal/common/include/rte_vdev.h   |  4 ++--
 mk/internal/rte.compile-pre.mk |  2 +-
 42 files changed, 93 insertions(+), 92 deletions(-)

diff --git a/doc/guides/prog_guide/dev_kit_build_system.rst 
b/doc/guides/prog_guide/dev_kit_build_system.rst
index 05358d0..19de156 100644
--- a/doc/guides/prog_guide/dev_kit_build_system.rst
+++ b/doc/guides/prog_guide/dev_kit_build_system.rst
@@ -264,7 +264,7 @@ instance the macro:

 .. code-block:: c

-   DRIVER_REGISTER_PCI(name, drv)
+   RTE_PMD_REGISTER_PCI(name, drv)

 Creates the following symbol:

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c 
b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index 76dce9c..0b3fd09 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
@@ -528,8 +528,8 @@ static struct rte_vdev_driver aesni_gcm_pmd_drv = {
.remove = aesni_gcm_remove
 };

-DRIVER_REGISTER_VDEV(CRYPTODEV_NAME_AESNI_GCM_PMD, aesni_gcm_pmd_drv);
-DRIVER_REGISTER_PARAM_STRING(CRYPTODEV_NAME_AESNI_GCM_PMD,
+RTE_PMD_REGISTER_VDEV(CRYPTODEV_NAME_AESNI_GCM_PMD, aesni_gcm_pmd_drv);
+RTE_PMD_REGISTER_PARAM_STRING(CRYPTODEV_NAME_AESNI_GCM_PMD,
"max_nb_queue_pairs= "
"max_nb_sessions= "
"socket_id=");
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c 
b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
index ff2fc25..b936735 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
@@ -719,8 +719,8 @@ static struct rte_vdev_driver cryptodev_aesni_mb_pmd_drv = {
.remove = cryptodev_aesni_mb_remove
 };

-DRIVER_REGISTER_VDEV(CRYPTODEV_NAME_AESNI_MB_PMD, cryptodev_aesni_mb_pmd_drv);
-DRIVER_REGISTER_PARAM_STRING(CRYPTODEV_NAME_AESNI_MB_PMD,
+RTE_PMD_REGISTER_VDEV(CRYPTODEV_NAME_AESNI_MB_PMD, cryptodev_aesni_mb_pmd_drv);
+RTE_PMD_REGISTER_PARAM_STRING(CRYPTODEV_NAME_AESNI_MB_PMD,
"max_nb_queue_pairs= "
"max_nb_sessions= "
"socket_id="

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-10 Thread Wang, Zhihong



> -Original Message-
> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> Sent: Monday, October 10, 2016 1:32 PM
> To: Yuanhan Liu 
> Cc: Wang, Zhihong ; Maxime Coquelin
> ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> 
> On 10 October 2016 at 10:44, Yuanhan Liu 
> wrote:
> > On Sun, Oct 09, 2016 at 12:09:07PM +, Wang, Zhihong wrote:
> >> > > > Tested with testpmd, host: txonly, guest: rxonly
> >> > > > size (bytes) improvement (%)
> >> > > > 644.12
> >> > > > 128   6
> >> > > > 256   2.65
> >> > > > 512   -1.12
> >> > > > 1024 -7.02
> >> > >
> >> > > There is a difference between Zhihong's code and the old I spotted in
> >> > > the first time: Zhihong removed the avail_idx prefetch. I understand
> >> > > the prefetch becomes a bit tricky when mrg-rx code path is
> considered;
> >> > > thus, I didn't comment on that.
> >> > >
> >> > > That's one of the difference that, IMO, could drop a regression. I then
> >> > > finally got a chance to add it back.
> >> > >
> >> > > A rough test shows it improves the performance of 1400B packet size
> >> > greatly
> >> > > in the "txonly in host and rxonly in guest" case: +33% is the number I
> get
> >> > > with my test server (Ivybridge).
> >> >
> >> > Thanks Yuanhan! I'll validate this on x86.
> >>
> >> Hi Yuanhan,
> >>
> >> Seems your code doesn't perform correctly. I write a new version
> >> of avail idx prefetch but didn't see any perf benefit.
> >>
> >> To be honest I doubt the benefit of this idea. The previous mrg_off
> >> code has this method but doesn't give any benefits.
> >
> > Good point. I thought of that before, too. But you know that I made it
> > in rush, that I didn't think further and test more.
> >
> > I looked the code a bit closer this time, and spotted a bug: the prefetch
> > actually didn't happen, due to following code piece:
> >
> > if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
> > prefetch_avail_idx(vq);
> > ...
> > }
> >
> > Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
> > prefetch_avail_idx() will be called. The fix is easy though: just put
> > prefetch_avail_idx before invoking enqueue_packet.
> >
> > In summary, Zhihong is right, I see no more gains with that fix :(
> >
> > However, as stated, that's kind of the only difference I found between
> > yours and the old code, that maybe it's still worthwhile to have a
> > test on ARM, Jianbo?
> >
> I haven't tested it, but I think it could be no improvement for ARM either.
> 
> A smalll suggestion for enqueue_packet:
> 
> .
> +   /* start copy from mbuf to desc */
> +   while (mbuf_avail || mbuf->next) {
> .
> 
> Considering pkt_len is in the first cache line (same as data_len),
> while next pointer is in the second cache line,
> is it better to check the total packet len, instead of the last mbuf's
> next pointer to jump out of while loop and avoid possible cache miss?

Jianbo,

Thanks for the reply!

This idea sounds good, but it won't help the general perf in my
opinion, since the 2nd cache line is accessed anyway prior in
virtio_enqueue_offload.

Also this would bring a NULL check when actually access mbuf->next.

BTW, could you please publish the number of:

 1. mrg_rxbuf=on, comparison between original and original + this patch

 2. mrg_rxbuf=off, comparison between original and original + this patch

So we can have a whole picture of how this patch impact on ARM platform.

Thanks
Zhihong

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-10 Thread Jianbo Liu

On 10 October 2016 at 14:22, Wang, Zhihong  wrote:
>
>
>> -Original Message-
>> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
>> Sent: Monday, October 10, 2016 1:32 PM
>> To: Yuanhan Liu 
>> Cc: Wang, Zhihong ; Maxime Coquelin
>> ; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
>>
>> On 10 October 2016 at 10:44, Yuanhan Liu 
>> wrote:
>> > On Sun, Oct 09, 2016 at 12:09:07PM +, Wang, Zhihong wrote:
>> >> > > > Tested with testpmd, host: txonly, guest: rxonly
>> >> > > > size (bytes) improvement (%)
>> >> > > > 644.12
>> >> > > > 128   6
>> >> > > > 256   2.65
>> >> > > > 512   -1.12
>> >> > > > 1024 -7.02
>> >> > >
>> >> > > There is a difference between Zhihong's code and the old I spotted in
>> >> > > the first time: Zhihong removed the avail_idx prefetch. I understand
>> >> > > the prefetch becomes a bit tricky when mrg-rx code path is
>> considered;
>> >> > > thus, I didn't comment on that.
>> >> > >
>> >> > > That's one of the difference that, IMO, could drop a regression. I 
>> >> > > then
>> >> > > finally got a chance to add it back.
>> >> > >
>> >> > > A rough test shows it improves the performance of 1400B packet size
>> >> > greatly
>> >> > > in the "txonly in host and rxonly in guest" case: +33% is the number I
>> get
>> >> > > with my test server (Ivybridge).
>> >> >
>> >> > Thanks Yuanhan! I'll validate this on x86.
>> >>
>> >> Hi Yuanhan,
>> >>
>> >> Seems your code doesn't perform correctly. I write a new version
>> >> of avail idx prefetch but didn't see any perf benefit.
>> >>
>> >> To be honest I doubt the benefit of this idea. The previous mrg_off
>> >> code has this method but doesn't give any benefits.
>> >
>> > Good point. I thought of that before, too. But you know that I made it
>> > in rush, that I didn't think further and test more.
>> >
>> > I looked the code a bit closer this time, and spotted a bug: the prefetch
>> > actually didn't happen, due to following code piece:
>> >
>> > if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
>> > prefetch_avail_idx(vq);
>> > ...
>> > }
>> >
>> > Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
>> > prefetch_avail_idx() will be called. The fix is easy though: just put
>> > prefetch_avail_idx before invoking enqueue_packet.
>> >
>> > In summary, Zhihong is right, I see no more gains with that fix :(
>> >
>> > However, as stated, that's kind of the only difference I found between
>> > yours and the old code, that maybe it's still worthwhile to have a
>> > test on ARM, Jianbo?
>> >
>> I haven't tested it, but I think it could be no improvement for ARM either.
>>
>> A smalll suggestion for enqueue_packet:
>>
>> .
>> +   /* start copy from mbuf to desc */
>> +   while (mbuf_avail || mbuf->next) {
>> .
>>
>> Considering pkt_len is in the first cache line (same as data_len),
>> while next pointer is in the second cache line,
>> is it better to check the total packet len, instead of the last mbuf's
>> next pointer to jump out of while loop and avoid possible cache miss?
>
> Jianbo,
>
> Thanks for the reply!
>
> This idea sounds good, but it won't help the general perf in my
> opinion, since the 2nd cache line is accessed anyway prior in
> virtio_enqueue_offload.
>
Yes, you are right. I'm thinking of prefetching beforehand.
And if it's a chained mbuf, virtio_enqueue_offload will not be called
in next loop.

> Also this would bring a NULL check when actually access mbuf->next.
>
> BTW, could you please publish the number of:
>
>  1. mrg_rxbuf=on, comparison between original and original + this patch
>
>  2. mrg_rxbuf=off, comparison between original and original + this patch
>
> So we can have a whole picture of how this patch impact on ARM platform.
>
I think you already have got many results in my previous emails.
Sorry I can't test right now and busy with other things.

[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

2016-10-10 Thread Wang, Zhihong



> -Original Message-
> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> Sent: Monday, October 10, 2016 2:58 PM
> To: Wang, Zhihong 
> Cc: Yuanhan Liu ; Maxime Coquelin
> ; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> 
> On 10 October 2016 at 14:22, Wang, Zhihong 
> wrote:
> >
> >
> >> -Original Message-
> >> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> >> Sent: Monday, October 10, 2016 1:32 PM
> >> To: Yuanhan Liu 
> >> Cc: Wang, Zhihong ; Maxime Coquelin
> >> ; dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> >>
> >> On 10 October 2016 at 10:44, Yuanhan Liu 
> >> wrote:
> >> > On Sun, Oct 09, 2016 at 12:09:07PM +, Wang, Zhihong wrote:
> >> >> > > > Tested with testpmd, host: txonly, guest: rxonly
> >> >> > > > size (bytes) improvement (%)
> >> >> > > > 644.12
> >> >> > > > 128   6
> >> >> > > > 256   2.65
> >> >> > > > 512   -1.12
> >> >> > > > 1024 -7.02
> >> >> > >
> >> >> > > There is a difference between Zhihong's code and the old I spotted
> in
> >> >> > > the first time: Zhihong removed the avail_idx prefetch. I
> understand
> >> >> > > the prefetch becomes a bit tricky when mrg-rx code path is
> >> considered;
> >> >> > > thus, I didn't comment on that.
> >> >> > >
> >> >> > > That's one of the difference that, IMO, could drop a regression. I
> then
> >> >> > > finally got a chance to add it back.
> >> >> > >
> >> >> > > A rough test shows it improves the performance of 1400B packet
> size
> >> >> > greatly
> >> >> > > in the "txonly in host and rxonly in guest" case: +33% is the number
> I
> >> get
> >> >> > > with my test server (Ivybridge).
> >> >> >
> >> >> > Thanks Yuanhan! I'll validate this on x86.
> >> >>
> >> >> Hi Yuanhan,
> >> >>
> >> >> Seems your code doesn't perform correctly. I write a new version
> >> >> of avail idx prefetch but didn't see any perf benefit.
> >> >>
> >> >> To be honest I doubt the benefit of this idea. The previous mrg_off
> >> >> code has this method but doesn't give any benefits.
> >> >
> >> > Good point. I thought of that before, too. But you know that I made it
> >> > in rush, that I didn't think further and test more.
> >> >
> >> > I looked the code a bit closer this time, and spotted a bug: the prefetch
> >> > actually didn't happen, due to following code piece:
> >> >
> >> > if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
> >> > prefetch_avail_idx(vq);
> >> > ...
> >> > }
> >> >
> >> > Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
> >> > prefetch_avail_idx() will be called. The fix is easy though: just put
> >> > prefetch_avail_idx before invoking enqueue_packet.
> >> >
> >> > In summary, Zhihong is right, I see no more gains with that fix :(
> >> >
> >> > However, as stated, that's kind of the only difference I found between
> >> > yours and the old code, that maybe it's still worthwhile to have a
> >> > test on ARM, Jianbo?
> >> >
> >> I haven't tested it, but I think it could be no improvement for ARM either.
> >>
> >> A smalll suggestion for enqueue_packet:
> >>
> >> .
> >> +   /* start copy from mbuf to desc */
> >> +   while (mbuf_avail || mbuf->next) {
> >> .
> >>
> >> Considering pkt_len is in the first cache line (same as data_len),
> >> while next pointer is in the second cache line,
> >> is it better to check the total packet len, instead of the last mbuf's
> >> next pointer to jump out of while loop and avoid possible cache miss?
> >
> > Jianbo,
> >
> > Thanks for the reply!
> >
> > This idea sounds good, but it won't help the general perf in my
> > opinion, since the 2nd cache line is accessed anyway prior in
> > virtio_enqueue_offload.
> >
> Yes, you are right. I'm thinking of prefetching beforehand.
> And if it's a chained mbuf, virtio_enqueue_offload will not be called
> in next loop.
> 
> > Also this would bring a NULL check when actually access mbuf->next.
> >
> > BTW, could you please publish the number of:
> >
> >  1. mrg_rxbuf=on, comparison between original and original + this patch
> >
> >  2. mrg_rxbuf=off, comparison between original and original + this patch
> >
> > So we can have a whole picture of how this patch impact on ARM platform.
> >
> I think you already have got many results in my previous emails.
> Sorry I can't test right now and busy with other things.

We're still missing mrg on data.

[dpdk-dev] [PATCH 0/3] fix flow director mask issues

There're 2 issues about flow director mask.
1, MAC address mask is not supported in mac-vlan mode.
2, All the parameter are defined as big endian, but
   they're not treated approriately.
This patch set is used to fix these 2 issues.

Wenzhuo Lu (3):
  ixgbe: fix wrong flow director mask
  app/testpmd: fix flow director endian issue
  app/testpmd: fix wrong flow director mask

 app/test-pmd/cmdline.c  | 11 +++-
 app/test-pmd/config.c   | 43 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  3 +-
 drivers/net/ixgbe/ixgbe_fdir.c  | 10 +++
 4 files changed, 35 insertions(+), 32 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH 1/3] ixgbe: fix wrong flow director mask

In mac-vlan mode, MAC address mask is not supported by HW.
The MAC address mask should not be set in mac-vlan mode.
Instead, only set it in tunnel mode.

Fixes: 82fb702077f6 ("ixgbe: support new flow director modes for X550")
Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_fdir.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_fdir.c b/drivers/net/ixgbe/ixgbe_fdir.c
index 861c7cb..4b81ee3 100644
--- a/drivers/net/ixgbe/ixgbe_fdir.c
+++ b/drivers/net/ixgbe/ixgbe_fdir.c
@@ -432,12 +432,12 @@ fdir_set_input_mask_x550(struct rte_eth_dev *dev,
fdiripv6m |= IXGBE_FDIRIP6M_TUNNEL_TYPE |
IXGBE_FDIRIP6M_TNI_VNI;

-   mac_mask = input_mask->mac_addr_byte_mask;
-   fdiripv6m |= (mac_mask << IXGBE_FDIRIP6M_INNER_MAC_SHIFT)
-   & IXGBE_FDIRIP6M_INNER_MAC;
-   info->mask.mac_addr_byte_mask = input_mask->mac_addr_byte_mask;
-
if (mode == RTE_FDIR_MODE_PERFECT_TUNNEL) {
+   mac_mask = input_mask->mac_addr_byte_mask;
+   fdiripv6m |= (mac_mask << IXGBE_FDIRIP6M_INNER_MAC_SHIFT)
+   & IXGBE_FDIRIP6M_INNER_MAC;
+   info->mask.mac_addr_byte_mask = input_mask->mac_addr_byte_mask;
+
switch (input_mask->tunnel_type_mask) {
case 0:
/* Mask turnnel type */
-- 
1.9.3

[dpdk-dev] [PATCH 2/3] app/testpmd: fix flow director endian issue

The vlan mask and tunnel id mask of flow director
are defined as big endian. So they should be
converted.
When the mask is printed, the parameters are not
converted either.

Fixes: 53b2bb9b7ea7 ("app/testpmd: new flow director commands")
Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c |  6 +++---
 app/test-pmd/config.c  | 43 +--
 2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6e95ca2..39e6d59 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -9074,7 +9074,7 @@ cmd_flow_director_mask_parsed(void *parsed_result,
return;
}

-   mask->vlan_tci_mask = res->vlan_mask;
+   mask->vlan_tci_mask = rte_cpu_to_be_16(res->vlan_mask);
mask->mac_addr_byte_mask = res->mac_addr_byte_mask;
} else if (fdir_conf.mode ==  RTE_FDIR_MODE_PERFECT_TUNNEL) {
if (strcmp(res->mode_value, "Tunnel")) {
@@ -9082,9 +9082,9 @@ cmd_flow_director_mask_parsed(void *parsed_result,
return;
}

-   mask->vlan_tci_mask = res->vlan_mask;
+   mask->vlan_tci_mask = rte_cpu_to_be_16(res->vlan_mask);
mask->mac_addr_byte_mask = res->mac_addr_byte_mask;
-   mask->tunnel_id_mask = res->tunnel_id_mask;
+   mask->tunnel_id_mask = rte_cpu_to_be_32(res->tunnel_id_mask);
mask->tunnel_type_mask = res->tunnel_type_mask;
} else {
if (strcmp(res->mode_value, "IP")) {
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 83bebfe..742a8d8 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2045,26 +2045,33 @@ set_qmap(portid_t port_id, uint8_t is_rx, uint16_t 
queue_id, uint8_t map_value)
 static inline void
 print_fdir_mask(struct rte_eth_fdir_masks *mask)
 {
-   printf("\nvlan_tci: 0x%04x, ", mask->vlan_tci_mask);
+   printf("\nvlan_tci: 0x%04x", rte_be_to_cpu_16(mask->vlan_tci_mask));

-   if (fdir_conf.mode == RTE_FDIR_MODE_PERFECT_MAC_VLAN)
-   printf("mac_addr: 0x%02x", mask->mac_addr_byte_mask);
-   else if (fdir_conf.mode == RTE_FDIR_MODE_PERFECT_TUNNEL)
-   printf("mac_addr: 0x%02x, tunnel_type: 0x%01x, tunnel_id: 
0x%08x",
+   if (fdir_conf.mode == RTE_FDIR_MODE_PERFECT_TUNNEL)
+   printf(", mac_addr: 0x%02x, tunnel_type: 0x%01x,"
+   " tunnel_id: 0x%08x",
mask->mac_addr_byte_mask, mask->tunnel_type_mask,
-   mask->tunnel_id_mask);
-   else {
-   printf("src_ipv4: 0x%08x, dst_ipv4: 0x%08x,"
-   " src_port: 0x%04x, dst_port: 0x%04x",
-   mask->ipv4_mask.src_ip, mask->ipv4_mask.dst_ip,
-   mask->src_port_mask, mask->dst_port_mask);
-
-   printf("\nsrc_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x,"
-   " dst_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x",
-   mask->ipv6_mask.src_ip[0], mask->ipv6_mask.src_ip[1],
-   mask->ipv6_mask.src_ip[2], mask->ipv6_mask.src_ip[3],
-   mask->ipv6_mask.dst_ip[0], mask->ipv6_mask.dst_ip[1],
-   mask->ipv6_mask.dst_ip[2], mask->ipv6_mask.dst_ip[3]);
+   rte_be_to_cpu_32(mask->tunnel_id_mask));
+   else if (fdir_conf.mode != RTE_FDIR_MODE_PERFECT_MAC_VLAN) {
+   printf(", src_ipv4: 0x%08x, dst_ipv4: 0x%08x",
+   rte_be_to_cpu_32(mask->ipv4_mask.src_ip),
+   rte_be_to_cpu_32(mask->ipv4_mask.dst_ip));
+
+   printf("\nsrc_port: 0x%04x, dst_port: 0x%04x",
+   rte_be_to_cpu_16(mask->src_port_mask),
+   rte_be_to_cpu_16(mask->dst_port_mask));
+
+   printf("\nsrc_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x",
+   rte_be_to_cpu_32(mask->ipv6_mask.src_ip[0]),
+   rte_be_to_cpu_32(mask->ipv6_mask.src_ip[1]),
+   rte_be_to_cpu_32(mask->ipv6_mask.src_ip[2]),
+   rte_be_to_cpu_32(mask->ipv6_mask.src_ip[3]));
+
+   printf("\ndst_ipv6: 0x%08x,0x%08x,0x%08x,0x%08x",
+   rte_be_to_cpu_32(mask->ipv6_mask.dst_ip[0]),
+   rte_be_to_cpu_32(mask->ipv6_mask.dst_ip[1]),
+   rte_be_to_cpu_32(mask->ipv6_mask.dst_ip[2]),
+   rte_be_to_cpu_32(mask->ipv6_mask.dst_ip[3]));
}

printf("\n");
-- 
1.9.3

[dpdk-dev] [PATCH 3/3] app/testpmd: fix wrong flow director mask

In mac-vlan mode, MAC address mask is not supported by HW.
The MAC address mask should not be set in mac-vlan mode.
Remove this parameter from the CLI.

Fixes: 53b2bb9b7ea7 ("app/testpmd: new flow director commands")
Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c  | 5 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 3 +--
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 39e6d59..7c5c5e7 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -720,7 +720,7 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set flow director IP mask.\n\n"

"flow_director_mask (port_id) mode MAC-VLAN"
-   " vlan (vlan_value) mac (mac_value)\n"
+   " vlan (vlan_value)\n"
"Set flow director MAC-VLAN mask.\n\n"

"flow_director_mask (port_id) mode Tunnel"
@@ -9075,7 +9075,6 @@ cmd_flow_director_mask_parsed(void *parsed_result,
}

mask->vlan_tci_mask = rte_cpu_to_be_16(res->vlan_mask);
-   mask->mac_addr_byte_mask = res->mac_addr_byte_mask;
} else if (fdir_conf.mode ==  RTE_FDIR_MODE_PERFECT_TUNNEL) {
if (strcmp(res->mode_value, "Tunnel")) {
printf("Please set mode to Tunnel.\n");
@@ -9206,8 +9205,6 @@ cmdline_parse_inst_t cmd_set_flow_director_mac_vlan_mask 
= {
(void *)&cmd_flow_director_mask_mode_mac_vlan,
(void *)&cmd_flow_director_mask_vlan,
(void *)&cmd_flow_director_mask_vlan_value,
-   (void *)&cmd_flow_director_mask_mac,
-   (void *)&cmd_flow_director_mask_mac_value,
NULL,
},
 };
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index f87e0c2..ed04532 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1844,8 +1844,7 @@ Set flow director's input masks::
   src_mask (ipv4_src) (ipv6_src) (src_port) \
   dst_mask (ipv4_dst) (ipv6_dst) (dst_port)

-   flow_director_mask (port_id) mode MAC-VLAN vlan (vlan_value) \
-  mac (mac_value)
+   flow_director_mask (port_id) mode MAC-VLAN vlan (vlan_value)

flow_director_mask (port_id) mode Tunnel vlan (vlan_value) \
   mac (mac_value) tunnel-type (tunnel_type_value) \
-- 
1.9.3

[dpdk-dev] [PATCH 0/6] vhost: add Tx zero copy support

On Sun, Oct 09, 2016 at 06:46:44PM +0800, linhaifeng wrote:
> ? 2016/8/23 16:10, Yuanhan Liu ??:
> > The basic idea of Tx zero copy is, instead of copying data from the
> > desc buf, here we let the mbuf reference the desc buf addr directly.
> 
> Is there problem when push vlan to the mbuf which reference the desc buf addr 
> directly?

Yes, you can't do that when zero copy is enabled, due to following code
piece:

+   if (unlikely(dev->dequeue_zero_copy && (hpa = 
gpa_to_hpa(dev,
+   desc->addr + desc_offset, 
cpy_len {
+   cur->data_len = cpy_len;
==> +   cur->data_off = 0;
+   cur->buf_addr = (void *)(uintptr_t)desc_addr;
+   cur->buf_physaddr = hpa;

The marked line basically makes the mbuf has no headroom to use.

--yliu

> We know if guest use virtio_net(kernel) maybe skb has no headroom.

[dpdk-dev] [PATCH 1/1] fix the number of operations in libcrypto test

2016-10-10 Thread Marcin Kerlin

This patch reduce the number of total operations from 1M to 10K, because
test is taking too long time now.

Fixes: ffbe3be0d4b5 ("app/test: add libcrypto")

Signed-off-by: Marcin Kerlin 
---
 app/test/test_cryptodev_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/app/test/test_cryptodev_perf.c b/app/test/test_cryptodev_perf.c
index 4aee9af..43a7166 100644
--- a/app/test/test_cryptodev_perf.c
+++ b/app/test/test_cryptodev_perf.c
@@ -3420,7 +3420,7 @@ test_perf_snow3G_vary_pkt_size(void)
 static int
 test_perf_libcrypto_vary_pkt_size(void)
 {
-   unsigned int total_operations = 100;
+   unsigned int total_operations = 1;
unsigned int burst_size = { 64 };
unsigned int buf_lengths[] = { 64, 128, 256, 512, 768, 1024, 1280, 1536,
1792, 2048 };
-- 
1.9.1

[dpdk-dev] Project Governance and Linux Foundation

2016-10-10 Thread O'Driscoll, Tim

This email is being sent on behalf of: Cavium, Cisco, Intel, NXP & Red Hat.

Since its creation as an open source project in 2013, DPDK has grown
significantly. The number of DPDK users, contributors, commercial products that
use DPDK and open source projects that depend on it have all increased
consistently over that time. DPDK is now a key ingredient in networking and
NFV, and we need to ensure that the project structure and governance are
appropriate for such a critical project, and that they facilitate the project's
continued growth.

For over a year now we've been discussing moving DPDK to the Linux Foundation.
We believe it's now time to conclude that discussion and make the move. The
benefits of doing this would include:
- The infrastructure for a project like DPDK should not be owned and controlled
by any single company.
- Remove any remaining perception that DPDK is not truly open.
- Allow the project to avail of the infrastructure and services provided by the
Linux Foundation. These include things like: Ability to host infrastructure for
integration and testing (the FD.io CSIT lab is an example of this - see
https://wiki.fd.io/view/CSIT/CSIT_LF_testbed); Support for legal issues
including trademarks and branding, and the ability to sign agreements on behalf
of the project; Ability to pool resources for events and brand promotion; Safe
haven for community IP resources.

We don't propose to debate the details here. Instead, an open discussion
session on DPDK Project Growth has been included in the agenda for the DPDK
Summit Userspace 2016 event in Dublin. We propose using that session to agree
that the DPDK project will move to the Linux Foundation, and then to move on to
discussing the specifics. Things that we'll need to consider include:
- Whether DPDK moves to the Linux Foundation as an independent project or as
part of a larger project like FD.io.
- Creation of a project charter similar to those created for FD.io
(https://fd.io/governance/technical-community-charter) and Open vSwitch (see
http://openvswitch.org/pipermail/discuss/attachments/20160619/5a2df53e/attachment-0001.pdf).
- Agreement on budget, membership levels etc. A draft budget was created by the
LF during previous discussions
(https://docs.google.com/spreadsheets/d/1-3686Xb_jf4FtxdX8Mus9UwIxUb2vI_ppmJV5GnXcLg/edit#gid=302618256),
but it is possible to adopt an even more lightweight model.

We could look at alternatives to the Linux Foundation, but a) we've been
talking to the LF for over a year now, and b) the preponderance of networking
projects in LF, like ODL, FD.io, and OVS, makes it a natural destination for
DPDK.

As highlighted in previous discussions on this topic, it's important to stress
that the intent is not to make significant changes to the technical governance
and decision making of the project. The project has a strong set of maintainers
and a Technical Board in place already. What's required is to supplement that
with an open governance structure taking advantage of the services offered by
the Linux Foundation.

The purpose of this email is to outline what we want to achieve during that
discussion session in Dublin, and to allow people to consider the issue and
prepare in advance. If people want to comment via email on the mailing list,
that's obviously fine, but we believe that an open and frank discussion when
people meet in person in Dublin is the best way to progress this.

For reference, below is a brief history of the previous discussions on this
topic:

September 2015:
- A DPDK community call was held to discuss project growth and possible
improvements. This was the first public discussion on possible governance
changes. The agreed next step was to discuss this in more detail at the 2015
DPDK Summit Userspace event Dublin. Minutes of the call are at:
http://dpdk.org/ml/archives/dev/2015-September/024120.html.

October 2015:
- An open discussion session on project governance was held at the 2015 DPDK
Summit Userspace event. For technical governance, we agreed to investigate
creating a technical steering committee. For non-technical governance
(including things like event planning, legal and trademark issues, hosting of
the website etc.), we agreed to work with the Linux Foundation on a proposal
for a lightweight governance model for DPDK. Minutes of the discussion are at:
http://dpdk.org/ml/archives/dev/2015-October/024825.html.

- The proposal for a technical steering committee was subsequently discussed on
the mailing list (http://dpdk.org/ml/archives/dev/2015-October/026598.html) and
agreed, leading to the creation of the DPDK Technical Board
(http://dpdk.org/dev#board).

December 2015:
- A community call was held to discuss migration to the Linux Foundation. Mike
Dolan (VP of Strategic Programs at The Linux Foundation) gave an overview of
the LF and the services they can provide. We agreed to form a small sub-team
(Dave Neary, Thomas Monjalon, Steph

[dpdk-dev] [RFC v2] Generic flow director/filtering/classification API

2016-10-10 Thread Zhao1, Wei

Hi Adrien Mazarguil,

In your v2 version of rte_flow.txt , there is an action type 
RTE_FLOW_ACTION_TYPE_MARK,  but there is no definition of struct 
rte_flow_action_mark.
And there is  an definition of struct rte_flow_action_id. Is it a typo or other 
usage?

Thank you.

struct rte_flow_action_id {
uint32_t id; /**< 32 bit value to return with packets. */
};

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil
> Sent: Saturday, August 20, 2016 3:33 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [RFC v2] Generic flow director/filtering/classification 
> API
> 
> Hi All,
> 
> Thanks to many for the positive and constructive feedback I've received so
> far. Here is the updated specification (v0.7) at last.
> 
> I've attempted to address as many comments as possible but could not
> process them all just yet. A new section "Future evolutions" has been
> added for the remaining topics.
> 
> This series adds rte_flow.h to the DPDK tree. Next time I will attempt to
> convert the specification as a documentation commit part of the patchset
> and actually implement API functions.
> 
> I think including the entire document here makes it easier to annotate on
> the ML, apologies in advance for the resulting traffic.
> 
> Finally I'm off for the next two weeks, do not expect replies from me in
> the meantime.
> 
> Updates are also available online:
> 
> HTML version:
>  https://rawgit.com/6WIND/rte_flow/master/rte_flow.html
> 
> PDF version:
>  https://rawgit.com/6WIND/rte_flow/master/rte_flow.pdf
> 
> Related draft header file (also in the next patch):
>  https://raw.githubusercontent.com/6WIND/rte_flow/master/rte_flow.h
> 
> Git tree:
>  https://github.com/6WIND/rte_flow
> 
> Changes from v1:
> 
>  Specification:
> 
>  - Settled on [generic] "flow interface" / "flow API" as the name of this
>framework, matches the rte_flow prefix better.
>  - Minor wording changes in several places.
>  - Partially added egress (TX) support.
>  - Added "unrecoverable errors" as another consequence of overlapping
>rules.
>  - Described flow rules groups and their interaction with flow rule
>priorities.
>  - Fully described PF and VF meta pattern items so they are not open to
>interpretation anymore.
>  - Removed the SIGNATURE meta pattern item as its description was too
>vague, may be re-added later if necessary.
>  - Added the PORT pattern item to apply rules to non-default physical
>ports.
>  - Entirely redefined the RAW pattern item.
>  - Fixed tag error in the ETH item definition.
>  - Updated protocol definitions (IPV4, IPV6, ICMP, UDP).
>  - Added missing protocols (SCTP, VXLAN).
>  - Converted ID action to MARK and FLAG actions, described interaction
>with the RSS hash result in mbufs.
>  - Updated COUNT query structure to retrieve the number of bytes.
>  - Updated VF action.
>  - Documented negative item and action types, those will be used for
>dynamic types generated at run-time.
>  - Added blurb about IPv4 options and IPv6 extension headers matching.
>  - Updated function definitions.
>  - Documented a flush method to remove all rules on a given port at once.
>  - Documented the verbose error reporting interface.
>  - Documented how the private interface for PMD use will work.
>  - Documented expected behavior between successive port initializations.
>  - Documented expected behavior for ports not under DPDK control.
>  - Updated API migration section.
>  - Added future evolutions section.
> 
>  Header file:
> 
>  - Not a draft anymore and can be used as-is for preliminary
>implementations.
>  - Flow rule attributes (group, priority, etc) now have their own
>structure provided separately to API functions (struct rte_flow_attr).
>  - Group and priority interactions have been documented.
>  - Added PORT item.
>  - Removed SIGNATURE item.
>  - Defined ICMP, SCTP and VXLAN items.
>  - Redefined PF, VF, RAW, IPV4, IPV6, UDP and TCP items.
>  - Fixed tag error in the ETH item definition.
>  - Converted ID action to MARK and FLAG actions.
>hash result in mbufs.
>  - Updated COUNT query structure.
>  - Updated VF action.
>  - Added verbose errors interface.
>  - Updated function prototypes according to the above.
>  - Defined rte_flow_flush().
> 
> 
> 
> ==
> Generic flow interface
> ==
> 
> .. footer::
> 
>v0.7
> 
> .. contents::
> .. sectnum::
> .. raw:: pdf
> 
>PageBreak
> 
> Overview
> 
> 
> DPDK provides several competing interfaces added over time to perform
> packet
> matching and related actions such as filtering and classification.
> 
> They must be extended to implement the features supported by newer
> devices
> in order to expose them to applications, however the current design has
> several drawbacks:
> 
> - Complicated filter combinations which have not been hard-coded cannot be
>   expressed.
> - Prone to API/ABI breakage when new features must be adde

[dpdk-dev] [PATCH 1/2] ethdev: fix hotplug attach

2016-10-07 15:01, David Marchand:
> If a pci probe operation creates a port but, for any reason, fails to
> finish this operation and decides to delete the newly created port, then
> the last created port id can not be trusted anymore and any subsequent
> attach operations will fail.
> 
> This problem was noticed while working on a vm that had a virtio-net
> management interface bound to the virtio-net kernel driver and no port
> whitelisted in the commandline:
[...]
> Two solutions:
> - either update the last created port index to something invalid
>   (when freeing a ethdev port),
> - or rely on the port count, before and after the eal attach.
> 
> The latter solution seems (well not really more robust but at least)
> less fragile than the former.
> We still have some issues with drivers that create multiple ethdev
> ports with a single probe operation, but this was already the case.
> 
> Fixes: b0fb26685570 ("ethdev: convert to EAL hotplug")
> 
> Reported-by: Daniel Mrzyglod 
> Signed-off-by: David Marchand 

Series applied, thanks

[dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

2016-10-10 Thread Xu, Qian Q

Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is 
broken. When we run --vdev 'eth_vhost0,', then it will report the error 
that the driver is not supported.  V16.07 is OK, but I haven't got time to do 
git bisect. 

-Original Message-
From: Yuanhan Liu [mailto:yuanhan@linux.intel.com] 
Sent: Sunday, October 9, 2016 3:03 AM
To: Xu, Qian Q 
Cc: dev at dpdk.org; Maxime Coquelin 
Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

On Thu, Oct 06, 2016 at 02:37:27PM +, Xu, Qian Q wrote:
> this function copy_desc_to_mbuf has changed on the dpdk-next-virtio repo. 
> Based on current dpdk-next-virtio repo, the commit ID is as below: 
> commit b4f7b43cd9d3b6413f41221051d03a23bc5f5fbe
> Author: Zhiyong Yang 
> Date:   Thu Sep 29 20:35:49 2016 +0800
> 
> Then you will find the parameter "struct vhost_virtqueue *vq" is removed, so 
> if apply your patch on that commit ID, the build will fail, since no vq 
> definition but we used it in the function. 
> Could you check? Thx. 

I knew that: a rebase is needed, and I have done the rebase (locally); just 
haven't sent it out yet.

--yliu

> 
> == Build lib/librte_table
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c: In function 
> 'copy_desc_to_mbuf':
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: error: 'vq' 
> undeclared (first use in this function)
>zmbuf = get_zmbuf(vq);
>  ^
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: note: each 
> undeclared identifier is reported only once for each function it 
> appears in
> /home/qxu10/dpdk-zero/mk/internal/rte.compile-pre.mk:138: recipe for 
> target 'virtio_net.o' failed
> make[5]: *** [virtio_net.o] Error 1
> /home/qxu10/dpdk-zero/mk/rte.subdir.mk:61: recipe for target 
> 'librte_vhost' failed
> make[4]: *** [librte_vhost] Error 2
> make[4]: *** Waiting for unfinished jobs
>

[dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

Hi Xu,

On 10/10/2016 12:12 PM, Xu, Qian Q wrote:
> Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is 
> broken. When we run --vdev 'eth_vhost0,', then it will report the error 
> that the driver is not supported.  V16.07 is OK, but I haven't got time to do 
> git bisect.
Name has been chenged to net_vhost0 for consistency with other PMDs.

Maxime

[dpdk-dev] [PATCH v2 14/15] ethdev: Support VFs on the different PCI domains

Hi Kamil,

On 9/30/2016 1:05 PM, Kamil Rytarowski wrote:
> It's possible to have the same numbers for bus, device id and function,
> therefore we need to differentiate on domain.
> 
> This enables DPDK with multiple VFs on ThunderX 2-socket hardware.
> 
> Signed-off-by: Maciej Czekaj 
> Signed-off-by: Kamil Rytarowski 
> Signed-off-by: Zyta Szpak 
> Signed-off-by: Slawomir Rosek 
> Signed-off-by: Radoslaw Biernacki 
> Signed-off-by: Jerin Jacob 
> ---
>  lib/librte_ether/rte_ethdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 382c959..01d5fb0 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -225,7 +225,7 @@ rte_eth_dev_create_unique_device_name(char *name, size_t 
> size,
>  {
>   int ret;
>  
> - ret = snprintf(name, size, "%d:%d.%d",
> + ret = snprintf(name, size, "%d:%d:%d.%d", pci_dev->addr.domain,
>   pci_dev->addr.bus, pci_dev->addr.devid,
>   pci_dev->addr.function);
>   if (ret < 0)
> 

Is it possible to separate this patch from patchset, this is a ethdev
patch and it seems not directly related to the rest of the patchset?

Thanks,
ferruh

[dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

2016-10-10 Thread Xu, Qian Q

Oh, thx for the info, it's better to have some documentation update in R16.11 
release notes. 

-Original Message-
From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] 
Sent: Monday, October 10, 2016 11:14 AM
To: Xu, Qian Q ; Yuanhan Liu 
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

Hi Xu,

On 10/10/2016 12:12 PM, Xu, Qian Q wrote:
> Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is 
> broken. When we run --vdev 'eth_vhost0,', then it will report the error 
> that the driver is not supported.  V16.07 is OK, but I haven't got time to do 
> git bisect.
Name has been chenged to net_vhost0 for consistency with other PMDs.

Maxime

[dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

2016-10-10 Thread Xu, Qian Q

I'm a little concerned if it's a correct way to change the name from release to 
release, some users may use eth_vhost for the driver, and found it was not 
working. Do we need also make the consistency b/w releases?  
What's the name difference b/w eth and net? Other PMDs are not virtual devices. 
Virtio_user is also a virtual device, do we need to change the name,too? 

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Xu, Qian Q
Sent: Monday, October 10, 2016 11:23 AM
To: Maxime Coquelin ; Yuanhan Liu 
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

Oh, thx for the info, it's better to have some documentation update in R16.11 
release notes. 

-Original Message-
From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] 
Sent: Monday, October 10, 2016 11:14 AM
To: Xu, Qian Q ; Yuanhan Liu 
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy

Hi Xu,

On 10/10/2016 12:12 PM, Xu, Qian Q wrote:
> Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is 
> broken. When we run --vdev 'eth_vhost0,', then it will report the error 
> that the driver is not supported.  V16.07 is OK, but I haven't got time to do 
> git bisect.
Name has been chenged to net_vhost0 for consistency with other PMDs.

Maxime

[dpdk-dev] DPDK Community Survey - about to close

2016-10-10 Thread Glynn, Michael J

-Original Message-
From: users [mailto:users-boun...@dpdk.org] On Behalf Of Glynn, Michael J
Sent: Monday, August 29, 2016 6:14 PM
To: users at dpdk.org; dev at dpdk.org; announce at dpdk.org
Subject: [dpdk-users] FW: DPDK Community Survey - about to close

> ... 
>
> Regarding improvement areas, the main feedback was:
> * Release Support (stable releases, LTS)
> * Documentation - certain aspects are outdated and need attention 
> (particularly the Programmers Guide)
> * No specific hotspots but performance bottlenecks seen in certain areas
> * Need for a continuous integration and test environment

Hi all - I just want to draw your attention to day 2 of the DPDK Userspace 
event in Dublin next week in which there will be a discussion on 'Usability 
(including packaging, stable releases, LTS releases etc.)' led by John McNamara 
(Intel), Christian Ehrhardt (Canonical), and Luca Boccassi (Brocade), and a 
presentation on 'Testing and Continuous Integration improvements in DPDK' by 
Qian Xu and Yuanhan Liu (both Intel). In addition, Bruce Richardson (Intel) 
will be giving a presentation on 'Identifying and Fixing Performance 
Bottlenecks in DPDK' using an example from the i40e driver illustrate the 
process.

The full planned agenda is located here https://dpdksummit.com/

[dpdk-dev] [PATCH v7] net/virtio: add set_mtu in virtio

2016-10-10 Thread Kavanagh, Mark B

>
>On Thu, Sep 29, 2016 at 04:31:30PM -0400, Dey wrote:
>>  /*
>>   * dev_ops for virtio, bare necessities for basic operation
>>   */
>> @@ -677,7 +685,6 @@ static const struct eth_dev_ops virtio_eth_dev_ops = {
>>  .allmulticast_enable = virtio_dev_allmulticast_enable,
>>  .allmulticast_disable= virtio_dev_allmulticast_disable,
>> +.mtu_set = virtio_mtu_set,
>>  .dev_infos_get   = virtio_dev_info_get,
>>  .stats_get   = virtio_dev_stats_get,
>>  .xstats_get  = virtio_dev_xstats_get,
>> --
>> 2.9.3.windows.1
>
>Your patch is malformed: I got an error while trying to apply it.
>
>patch:  malformed patch at line 167:   * dev_ops for virtio,
>bare necessities for basic operation
>
>Seems like the way you were generating the patch is wrong.
>
>Anyway, I applied it manually, with the "- frame_size" fix as well
>as few more minor coding style fixes.
>
>So applied to dpdk-next-virtio.

Hi Yuanhan/Souvik,

Given my contributions to this patch (and in particular since comments from 
here - http://dpdk.org/ml/archives/dev/2016-September/047208.html - were copied 
directly into the commit message), I think that I should have been added as 
co-author of the patch? 

Let me know if you think that I am mistaken.

Thanks in advance,
Mark
>
>   --yliu

[dpdk-dev] OpenSSL libcrypto PMD name

Hi,

I would like to raise a naming issue in crypto.

In the crypto side of DPDK, we have a library (similar to ethdev)
for crypto API and device interface:
http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev
There are also some drivers (which are some libraries):
http://dpdk.org/browse/dpdk/tree/drivers/crypto
Most of them (6/8) are some DPDK wrappers for external libraries.

Recently was introduced the libcrypto PMD which is a wrapper for
the OpenSSL libcrypto.
As we already have a lot of crypto libraries, I'm afraid the name
libcrypto is really confusing. Why not call it openssl PMD?

PS: I know OpenSSL has 2 libraries - ssl and crypto - but I do not
expect any high-level SSL feature in a crypto driver.
So drivers/crypto/openssl should not be confusing.

[dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy



On 10/10/2016 12:22 PM, Xu, Qian Q wrote:
> Oh, thx for the info, it's better to have some documentation update in R16.11 
> release notes.
I'm not the author of this change, just faced the same situation as
yours a a user.
But I agree that if documentation is not aligned, it should be updated.

Maxime

>
> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> Sent: Monday, October 10, 2016 11:14 AM
> To: Xu, Qian Q ; Yuanhan Liu  linux.intel.com>
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy
>
> Hi Xu,
>
> On 10/10/2016 12:12 PM, Xu, Qian Q wrote:
>> Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is 
>> broken. When we run --vdev 'eth_vhost0,', then it will report the error 
>> that the driver is not supported.  V16.07 is OK, but I haven't got time to 
>> do git bisect.
> Name has been chenged to net_vhost0 for consistency with other PMDs.
>
> Maxime
>

[dpdk-dev] [PATCH] examples/ipsec-secgw: Update checksum while decrementing ttl

2016-10-10 Thread Sergio Gonzalez Monroy

On 07/10/2016 21:53, De Lara Guarch, Pablo wrote:
>> -Original Message-
>> From: Akhil Goyal [mailto:akhil.goyal at nxp.com]
>> Sent: Tuesday, October 04, 2016 11:33 PM
>> To: De Lara Guarch, Pablo; Gonzalez Monroy, Sergio; dev at dpdk.org
>> Subject: Re: [PATCH] examples/ipsec-secgw: Update checksum while
>> decrementing ttl
>>
>> On 10/5/2016 6:04 AM, De Lara Guarch, Pablo wrote:
>>>
 -Original Message-
 From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Sergio Gonzalez
 Monroy
 Sent: Monday, September 26, 2016 6:28 AM
 To: akhil.goyal at nxp.com; dev at dpdk.org
 Subject: Re: [dpdk-dev] [PATCH] examples/ipsec-secgw: Update checksum
 while decrementing ttl

 Hi Akhil,

 This application relies on checksum offload in both outbound and
>> inbound
 paths (PKT_TX_IP_CKSUM flag).
>> [Akhil]Agreed that the application relies on checksum offload, but here
>> we are talking about the inner ip header. Inner IP checksum will be
>> updated on the next end point after decryption. This would expect that
>> the next end point must have checksum offload capability. What if we are
>> capturing the encrypted packets on wireshark or say send it to some
>> other machine which does not run DPDK and do not know about checksum
>> offload, then wireshark/other machine will not be able to get the
>> correct the checksum and will show error.

Understood, we need to have a valid inner checksum.
RFC1624 states that the computation would be incorrect in 
corner/boundary case.
I reckon you are basing your incremental update on RFC1141?

Also I think you should take care of endianess and increment the 
checksum with
host_to_be(0x0100) instead of +1.

 Because we assume that we always forward the packet in both paths, we
 decrement the ttl in both inbound and outbound.
 You seem to only increment (recalculate) the checksum of the inner IP
 header in the outbound path but not the inbound path.
>> [Akhil]Correct I missed out the inbound path.
 Also, in the inbound path you have to consider a possible ECN value
>> update.
>> [Akhil]If I take care of the ECN then it would mean I need to calculate
>> the checksum completely, incremental checksum wont give correct results.
>> This would surely impact performance. Any suggestion on how should we
>> take care of ECN update. Should I recalculate the checksum and send the
>> patch for ECN update? Or do we have a better solution.

If I am understanding the RFCs mentioned above correctly, you should be 
able to do
incremental checksum update for any 16bit field/value of the IP header.
I don't see no reason why you couldn't do something like that, except 
that you would
have to follow the full equation instead of just adding 0x0100, which 
would be always
the case when decrementing TTL.

What do you think?

Sergio

>>> Any further comments here, Akhil?
>>>
>>> Thanks,
>>> Pablo
>>>
>> [Akhil] Sorry I missed out the previous reply from Sergio.
> Any more comments, Sergio?
>
> Pablo
>> Thanks,
>> Akhil

[dpdk-dev] [PATCH 00/19] KNI checkpatch cleanup

Hi Pattan,

On 10/5/2016 11:38 AM, Pattan, Reshma wrote:
> Hi Ferruh,
> 
> 
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
>> Sent: Thursday, September 15, 2016 4:46 PM
>> To: dev at dpdk.org
>> Cc: Stephen Hemminger ; Ferruh Yigit
>> 
>> Subject: [dpdk-dev] [PATCH 00/19] KNI checkpatch cleanup
>>
>> KNI checkpatch cleanup, mostly non-functional but cosmetic modifications.
>> Only functional change is related logging, switched to kernel dynamic logging
>> and compile time KNI debug options removed, some log message levels
>> updated.
>>
>> Ferruh Yigit (19):
>>   kni: move externs to the header file
>>   kni: uninitialize global variables
>>   kni: make static struct const
>>   kni: whitespace, indentation, long line corrections
>>   kni: prefer unsigned int to unsigned
>>   kni: remove useless return
>>   kni: comparisons should place the constant on the right
>>   kni: trailing statements should be on next line
>>   kni: do not use assignment in if condition
>>   kni: macros with complex values should be enclosed in parentheses
>>   kni: prefer min_t to min
>>   kni: prefer ether_addr_copy to memcpy
>>   kni: update kernel logging
>>   kni: remove unnecessary 'out of memory' message
>>   kni: move functions to eliminate function declarations
>>   kni: remove compile time debug configuration
>>   kni: updated log messages
>>   kni: prefer uint32_t to unsigned int
>>   kni: move kernel version ifdefs to compat header
>>
> 
> Patches 4,5,11 and 13-19 are failed to apply.
> 

The patchset v3 has a dependency, mentioned in cover letter, I tested
with that patch and patches applied well for me,
would you mind trying again with that dependent patch applied please?

Thanks,
ferruh

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature



On 10/10/2016 06:22 AM, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 07:17:06AM +0300, Michael S. Tsirkin wrote:
>> On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote:
>>> On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
>> And the same is done is done in DPDK:
>>
>> static inline int __attribute__((always_inline))
>> copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
>>   uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
>>   struct rte_mempool *mbuf_pool)
>> {
>> ...
>> /*
>>  * A virtio driver normally uses at least 2 desc buffers
>>  * for Tx: the first for storing the header, and others
>>  * for storing the data.
>>  */
>> if (likely((desc->len == dev->vhost_hlen) &&
>>(desc->flags & VRING_DESC_F_NEXT) != 0)) {
>> desc = &descs[desc->next];
>> if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
>> return -1;
>>
>> desc_addr = gpa_to_vva(dev, desc->addr);
>> if (unlikely(!desc_addr))
>> return -1;
>>
>> rte_prefetch0((void *)(uintptr_t)desc_addr);
>>
>> desc_offset = 0;
>> desc_avail  = desc->len;
>> nr_desc+= 1;
>>
>> PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
>> } else {
>> desc_avail  = desc->len - dev->vhost_hlen;
>> desc_offset = dev->vhost_hlen;
>> }
>
> Actually, the header is parsed in DPDK vhost implementation.
> But as Virtio PMD provides a zero'ed header, we could just parse
> the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.

 host can always skip the header parse if it wants to.
 It didn't seem worth it to add branches there but
 if I'm wrong, by all means code it up.
>>>
>>> It's added by following commit, which yields about 10% performance
>>> boosts for PVP case (with 64B packet size).
>>>
>>> At that time, a packet always use 2 descs. Since indirect desc is
>>> enabled (by default) now, the assumption is not true then. What's
>>> worse, it might even slow things a bit down. That should also be
>>> part of the reason why performance is slightly worse than before.
>>>
>>> --yliu
>>
>> I'm not sure I get what you are saying
>>
>>> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
>>> Author: Yuanhan Liu 
>>> Date:   Mon May 2 17:46:17 2016 -0700
>>>
>>> vhost: optimize dequeue for small packets
>>>
>>> A virtio driver normally uses at least 2 desc buffers for Tx: the
>>> first for storing the header, and the others for storing the data.
>>>
>>> Therefore, we could fetch the first data desc buf before the main
>>> loop, and do the copy first before the check of "are we done yet?".
>>> This could save one check for small packets that just have one data
>>> desc buffer and need one mbuf to store it.
>>>
>>> Signed-off-by: Yuanhan Liu 
>>> Acked-by: Huawei Xie 
>>> Tested-by: Rich Lane 
>>
>> This fast-paths the 2-descriptors format but it's not active
>> for indirect descriptors. Is this what you mean?
>
> Yes. It's also not active when ANY_LAYOUT is actually turned on.
>> Should be a simple matter to apply this optimization for indirect.
>
> Might be.

If I understand the code correctly, indirect descs also benefit from 
this optimization, or am I missing something?

Maxime

[dpdk-dev] [PATCH 1/3] eal/drivers: prefix driver REGISTER macros with EAL

2016-10-10 Thread Neil Horman

On Sat, Oct 08, 2016 at 01:00:59PM +, Shreyansh Jain wrote:
> Hi Thomas,
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Friday, October 07, 2016 7:22 PM
> > To: Shreyansh Jain 
> > Cc: david.marchand at 6wind.com; dev at dpdk.org
> > Subject: Re: [PATCH 1/3] eal/drivers: prefix driver REGISTER macros with EAL
> > 
> > 2016-10-07 19:11, Shreyansh Jain:
> > > --- a/mk/internal/rte.compile-pre.mk
> > > +++ b/mk/internal/rte.compile-pre.mk
> > > @@ -87,7 +87,7 @@ endif
> > >   PMDINFO_GEN = $(RTE_SDK_BIN)/app/dpdk-pmdinfogen $@ $@.pmd.c
> > >   PMDINFO_CC = $(CC) $(CPPFLAGS) $(CFLAGS) -c -o $@.pmd.o $@.pmd.c
> > >   PMDINFO_LD = $(CROSS)ld $(LDFLAGS) -r -o $@.o $@.pmd.o $@
> > > -PMDINFO_TO_O = if grep -q 'DRIVER_REGISTER_.*(.*)' $<; then \
> > > +PMDINFO_TO_O = if grep 'EAL_REGISTER_.*(.*)' $<; then \
> > >  echo "$(if $V,$(PMDINFO_GEN),  PMDINFO $@.pmd.c)" && \
> > >  $(PMDINFO_GEN) && \
> > >  echo "$(if $V,$(PMDINFO_CC),  CC $@.pmd.o)" && \
> > >
> > > --->8---
> > >CC eal_pci_vfio.o
> > >PMDINFO eal_pci_vfio.o.pmd.c
> > > /bin/sh: 1:
> > > /home/shreyansh/build/DPDK/02_dpdk/x86_64-native-linuxapp-gcc/app/dpdk-
> > pmdinfogen:
> > > not found
> > > /home/shreyansh/build/DPDK/02_dpdk/mk/internal/rte.compile-pre.mk:138:
> > > recipe for target 'eal_pci_vfio.o' failed
> > > --->8---
> > >
> > > I don't think PMDINFO should be running on eal_pci_vfio file. Isn't it?
> > 
> > Every files are scanned for the pattern.
>  
> Sorry, I should have been clearer in my statement.
> I meant, I didn't think eal_pci_vfio.o had anything of interest for the PMD 
> tool and hence the mk files would have skipped over it in absence of a match.
> I understand that PMDINFO would run on all files.
> 
Thats incorrect, the Makefile does a REGEX search for appropriate registration
macros that imply the need for pmdinfo to run.  If its running on an
inappropriate file its because your new macros inadvertently match the current
regex, hence my suggestion that the regex should be tuned to be more specific

Neil

> > 
> > > Is it because EAL_REGISTER_* is matching EAL_REGISTER_TAILQ like macro
> > > as well?
> > 
> > Probably.
> > That's another argument in favor of good prefixes.
> > I think you should use RTE_DRIVER_REGISTER_ or better, RTE_PMD_REGISTER_
>  
> I thought of EAL_PMD_REGISTER_* but dropped it because for PCI_TABLE like 
> macros, the name got really long (EAL_PMD_REGISTER_PCI_TABLE()).
> Anyways, I will use RTE_PMD_REGISTER_* now and send another version.
> 
> > >
> > > I am not very well versed with how PMDINFO is linking with the build
> > > system so any hints/insight into this would be really helpful.
> > >
> > > One I get more clarity on this, I will push a new version of this patch.
> > 
> > Thanks
> 
> -
> Shreyansh
>

[dpdk-dev] [PATCH v2 14/15] ethdev: Support VFs on the different PCI domains

2016-10-10 Thread Kamil Rytarowski



W dniu 10.10.2016 o 12:19, Ferruh Yigit pisze:
> Hi Kamil,
>
> On 9/30/2016 1:05 PM, Kamil Rytarowski wrote:
>> It's possible to have the same numbers for bus, device id and function,
>> therefore we need to differentiate on domain.
>>
>> This enables DPDK with multiple VFs on ThunderX 2-socket hardware.
>>
>> Signed-off-by: Maciej Czekaj 
>> Signed-off-by: Kamil Rytarowski 
>> Signed-off-by: Zyta Szpak 
>> Signed-off-by: Slawomir Rosek 
>> Signed-off-by: Radoslaw Biernacki 
>> Signed-off-by: Jerin Jacob 
>> ---
>>   lib/librte_ether/rte_ethdev.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index 382c959..01d5fb0 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -225,7 +225,7 @@ rte_eth_dev_create_unique_device_name(char *name, size_t 
>> size,
>>   {
>>  int ret;
>>   
>> -ret = snprintf(name, size, "%d:%d.%d",
>> +ret = snprintf(name, size, "%d:%d:%d.%d", pci_dev->addr.domain,
>>  pci_dev->addr.bus, pci_dev->addr.devid,
>>  pci_dev->addr.function);
>>  if (ret < 0)
>>
> Is it possible to separate this patch from patchset, this is a ethdev
> patch and it seems not directly related to the rest of the patchset?
>
> Thanks,
> ferruh

This patch is directly related with secondary queue set support on 
ThunderX, but it can be skipped in this chain of patches and applied as 
a standalone diff.

Is disabling this one on patch work sufficient? Of course unless there 
are no more comments to produce v3 of the original patch chain "Add 
support for secondary queue set in nicvf thunderx driver".

Should I resubmit it as a new standalone patch?

[dpdk-dev] [RFC v2] Generic flow director/filtering/classification API

2016-10-10 Thread Adrien Mazarguil

Hi Wei,

On Mon, Oct 10, 2016 at 09:42:53AM +, Zhao1, Wei wrote:
> Hi Adrien Mazarguil,
> 
> In your v2 version of rte_flow.txt , there is an action type 
> RTE_FLOW_ACTION_TYPE_MARK,  but there is no definition of struct 
> rte_flow_action_mark.
> And there is  an definition of struct rte_flow_action_id. Is it a typo or 
> other usage?
> 
> Thank you.
> 
> struct rte_flow_action_id {
>   uint32_t id; /**< 32 bit value to return with packets. */
> };

That is indeed a mistake, this struct should be named
"rte_flow_action_mark". I'll fix it for the next update, thanks.

-- 
Adrien Mazarguil
6WIND

[dpdk-dev] [PATCH v3] net/kni: add KNI PMD

Add KNI PMD which wraps librte_kni for ease of use.

KNI PMD can be used as any regular PMD to send / receive packets to the
Linux networking stack.

Signed-off-by: Ferruh Yigit 
---

v3:
* rebase on top of latest master

v2:
* updated driver name eth_kni -> net_kni
---
 config/common_base  |   1 +
 config/common_linuxapp  |   1 +
 drivers/net/Makefile|   1 +
 drivers/net/kni/Makefile|  63 +
 drivers/net/kni/rte_eth_kni.c   | 463 
 drivers/net/kni/rte_pmd_kni_version.map |   4 +
 mk/rte.app.mk   |  10 +-
 7 files changed, 538 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/kni/Makefile
 create mode 100644 drivers/net/kni/rte_eth_kni.c
 create mode 100644 drivers/net/kni/rte_pmd_kni_version.map

diff --git a/config/common_base b/config/common_base
index f5d2eff..03b93c7 100644
--- a/config/common_base
+++ b/config/common_base
@@ -543,6 +543,7 @@ CONFIG_RTE_PIPELINE_STATS_COLLECT=n
 # Compile librte_kni
 #
 CONFIG_RTE_LIBRTE_KNI=n
+CONFIG_RTE_LIBRTE_PMD_KNI=n
 CONFIG_RTE_KNI_KMOD=n
 CONFIG_RTE_KNI_PREEMPT_DEFAULT=y
 CONFIG_RTE_KNI_KO_DEBUG=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2483dfa..2ecd510 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -39,6 +39,7 @@ CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
 CONFIG_RTE_LIBRTE_KNI=y
+CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index bc93230..c4771cd 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
 DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += kni
 DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
 DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe
diff --git a/drivers/net/kni/Makefile b/drivers/net/kni/Makefile
new file mode 100644
index 000..0b7cf91
--- /dev/null
+++ b/drivers/net/kni/Makefile
@@ -0,0 +1,63 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_kni.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -lpthread
+
+EXPORT_MAP := rte_pmd_kni_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += rte_eth_kni.c
+
+#
+# Export include files
+#
+SYMLINK-y-include +=
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_kni
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += lib/librte_mempool
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c
new file mode 100644
index 000..ce9e758
--- /dev/null
+++ b/drivers/net/kni/rte_eth_kni.c
@@ -0,0 +1,463 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and bina

[dpdk-dev] [PATCH] net/i40e: add additional prefetch instructions for bulk rx



> -Original Message-
> From: Yigit, Ferruh
> Sent: Wednesday, September 14, 2016 9:25 PM
> To: Vladyslav Buslov ; Zhang, Helin
> ; Wu, Jingjing 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch 
> instructions for bulk rx
> 
> On 7/14/2016 6:27 PM, Vladyslav Buslov wrote:
> > Added prefetch of first packet payload cacheline in i40e_rx_scan_hw_ring
> > Added prefetch of second mbuf cacheline in i40e_rx_alloc_bufs
> >
> > Signed-off-by: Vladyslav Buslov 
> > ---
> >  drivers/net/i40e/i40e_rxtx.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> > index d3cfb98..e493fb4 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -1003,6 +1003,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> > /* Translate descriptor info to mbuf parameters */
> > for (j = 0; j < nb_dd; j++) {
> > mb = rxep[j].mbuf;
> > +   rte_prefetch0(RTE_PTR_ADD(mb->buf_addr,
> RTE_PKTMBUF_HEADROOM));

Why did prefetch here? I think if application need to deal with packet, it is 
more suitable to put it in application.

> > qword1 = rte_le_to_cpu_64(\
> > rxdp[j].wb.qword1.status_error_len);
> > pkt_len = ((qword1 &
> I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
> > @@ -1086,9 +1087,11 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
> >
> > rxdp = &rxq->rx_ring[alloc_idx];
> > for (i = 0; i < rxq->rx_free_thresh; i++) {
> > -   if (likely(i < (rxq->rx_free_thresh - 1)))
> > +   if (likely(i < (rxq->rx_free_thresh - 1))) {
> > /* Prefetch next mbuf */
> > -   rte_prefetch0(rxep[i + 1].mbuf);
> > +   rte_prefetch0(&rxep[i + 1].mbuf->cacheline0);
> > +   rte_prefetch0(&rxep[i + 1].mbuf->cacheline1);
> > +   }
Agree with this change. And when I test it by testpmd with iofwd, no 
performance increase is observed but minor decrease.
Can you share will us when it will benefit the performance in your scenario ? 


Thanks
Jingjing

[dpdk-dev] [PATCH v2 14/15] ethdev: Support VFs on the different PCI domains

On 10/10/2016 2:01 PM, Kamil Rytarowski wrote:
> 
> 
> W dniu 10.10.2016 o 12:19, Ferruh Yigit pisze:
>> Hi Kamil,
>>
>> On 9/30/2016 1:05 PM, Kamil Rytarowski wrote:
>>> It's possible to have the same numbers for bus, device id and function,
>>> therefore we need to differentiate on domain.
>>>
>>> This enables DPDK with multiple VFs on ThunderX 2-socket hardware.
>>>
>>> Signed-off-by: Maciej Czekaj 
>>> Signed-off-by: Kamil Rytarowski 
>>> Signed-off-by: Zyta Szpak 
>>> Signed-off-by: Slawomir Rosek 
>>> Signed-off-by: Radoslaw Biernacki 
>>> Signed-off-by: Jerin Jacob 
>>> ---
>>>   lib/librte_ether/rte_ethdev.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>>> index 382c959..01d5fb0 100644
>>> --- a/lib/librte_ether/rte_ethdev.c
>>> +++ b/lib/librte_ether/rte_ethdev.c
>>> @@ -225,7 +225,7 @@ rte_eth_dev_create_unique_device_name(char *name, 
>>> size_t size,
>>>   {
>>> int ret;
>>>   
>>> -   ret = snprintf(name, size, "%d:%d.%d",
>>> +   ret = snprintf(name, size, "%d:%d:%d.%d", pci_dev->addr.domain,
>>> pci_dev->addr.bus, pci_dev->addr.devid,
>>> pci_dev->addr.function);
>>> if (ret < 0)
>>>
>> Is it possible to separate this patch from patchset, this is a ethdev
>> patch and it seems not directly related to the rest of the patchset?
>>
>> Thanks,
>> ferruh
> 
> This patch is directly related with secondary queue set support on 
> ThunderX, but it can be skipped in this chain of patches and applied as 
> a standalone diff.
> 
> Is disabling this one on patch work sufficient? Of course unless there 
> are no more comments to produce v3 of the original patch chain "Add 
> support for secondary queue set in nicvf thunderx driver".

I think it is sufficient, at least I don't have any more comment for
rest of the patchset and it looks good to me.

> 
> Should I resubmit it as a new standalone patch?

Can you please resubmit just this one patch, so it can be properly reviewed.

Thanks,
ferruh

[dpdk-dev] [PATCH] net/mlx5: fix init on secondary process

On 9/28/2016 3:24 PM, Olivier Gournet wrote:
> Fixes: 1d88ba171942 ("net/mlx5: refactor Tx data path")
> 
> Signed-off-by: Olivier Gournet 
> ---

Adding maintainer to the thread

[dpdk-dev] [PATCH v4 4/6] i40e: add Tx preparation

>  #include "i40e_logs.h"
>  #include "base/i40e_prototype.h"
> @@ -79,6 +81,17 @@
>   PKT_TX_TCP_SEG | \
>   PKT_TX_OUTER_IP_CKSUM)
> 
> +#define I40E_TX_OFFLOAD_MASK (  \
> + PKT_TX_IP_CKSUM |   \
> + PKT_TX_L4_MASK |\
> + PKT_TX_OUTER_IP_CKSUM | \
> + PKT_TX_TCP_SEG |\
> + PKT_TX_QINQ_PKT |   \
> + PKT_TX_VLAN_PKT)
> +
More TX flags are added for tunneling as below.
/**
 * Bits 45:48 used for the tunnel type.
 * When doing Tx offload like TSO or checksum, the HW needs to configure the
 * tunnel type into the HW descriptors.
 */
#define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
#define PKT_TX_TUNNEL_GRE (0x2ULL << 45)
#define PKT_TX_TUNNEL_IPIP(0x3ULL << 45)
#define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
/* add new TX TUNNEL type here */
#define PKT_TX_TUNNEL_MASK(0xFULL << 45)

Please check:
commit 63c0d74daaa9a807fbca8a3e363bbe41d6fb715f
Author: Jianfeng Tan 
Date:   Mon Aug 1 03:56:53 2016 +

mbuf: add Tx side tunneling type


Thanks
Jingjing

[dpdk-dev] [PATCH v4 1/6] ethdev: add Tx preparation

Hi,

Now that the feature seems to meet a consensus, I've looked at it more
closely before integrating. Sorry if it appears like a late review.

2016-09-30 11:00, Tomasz Kulasek:
> Added API for `rte_eth_tx_prep`

I would love to read the usability and performance considerations here.
No need for something as long as the cover letter. Just few lines
about why it is needed and why it is a good choice for performance.

> uint16_t rte_eth_tx_prep(uint8_t port_id, uint16_t queue_id,
>   struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> 
> Added fields to the `struct rte_eth_desc_lim`:
> 
>   uint16_t nb_seg_max;
>   /**< Max number of segments per whole packet. */
> 
>   uint16_t nb_mtu_seg_max;
>   /**< Max number of segments per one MTU */
[...]
> +#else
> +
> +static inline uint16_t
> +rte_eth_tx_prep(uint8_t port_id __rte_unused, uint16_t queue_id __rte_unused,
> + struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts)

Doxygen is failing here.
Have you tried to move __rte_unused before the identifier?

[...]
> +#define PKT_TX_OFFLOAD_MASK (\
> + PKT_TX_IP_CKSUM |\
> + PKT_TX_L4_MASK | \
> + PKT_TX_OUTER_IP_CKSUM |  \
> + PKT_TX_TCP_SEG | \
> + PKT_TX_QINQ_PKT |\
> + PKT_TX_VLAN_PKT)

We should really stop adding some public constants without the proper
RTE prefix.
And by the way, should not we move such flags into rte_net?

[...]
> -SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h 
> rte_sctp.h rte_icmp.h rte_arp.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h 
> rte_sctp.h rte_icmp.h rte_arp.h rte_pkt.h

You can use the += operator on a new line for free :)

No more comments, the rest seems OK. Thanks

[dpdk-dev] [PATCH] net/enic: update enic guide and add warning for invalid conf

2016-10-10 Thread Mcnamara, John

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John Daley
> Sent: Thursday, September 29, 2016 9:55 PM
> To: Richardson, Bruce 
> Cc: dev at dpdk.org; Nelson Escobar 
> Subject: [dpdk-dev] [PATCH] net/enic: update enic guide and add warning
> for invalid conf
> 
> From: Nelson Escobar 
> 
> Update the enic guide to better explain how to setup vNIC parameters on
> the Cisco VIC since the introduction of rx scatter and print an error
> message for the case of having 1 RQ configured in the vNIC.
> 
> Signed-off-by: Nelson Escobar 

Hi,

It would be better in the RST documentation to use  backticks to
designate function and variable names as fixed width. Also, the documentation
convention is to use Rx/Tx. However, these are minor so the patch is okay as
it is.

Acked-by: John McNamara

[dpdk-dev] [PATCH v2] mk: gcc -march support for intel processors code names

2016-09-28 10:54, Liu, Yong:
> Tested-by: Yong Liu 
> FangFang Wei 
[...]
>Description: Build test on different distributions
>Command / instruction:
>  Verify build pass on listed distributions.
> 
>  OS   GCCKernel
>  Ubuntu 16.04 5.4.0  4.4.0-36-generic
>  Fedora23 5.3.1  4.2.3-300
>  Fedora24 6.1.1  4.6.4-301
>  Ubuntu 12.04 4.6.3  3.8.0-29
>  Ubuntu 12.04 i6864.6.3  3.8.0-29
>  Ubuntu 14.04 4.8.4  3.16.0-30
>  Ubuntu 14.04 i6864.8.4  3.16.0-30
>  Fedora18 4.7.2  3.6.10-4
>  Fedora18 i6864.7.2  3.6.10-4
>  Fedora20 4.8.2  3.15.6-200
>  Fedora20 i6864.8.3  3.11.0
>  Suse11SP24.5.1  3.0.13-0.2
>  Suse12SP34.7.2  3.7.10-1.1
>  RHEL7.0  4.8.2  3.10.0-123
>  RHEL7.2  4.8.5  3.10.0-327
>  CentOS7.04.8.5  3.10.0-327
>  FreeBSD10.0  4.8.4  10.0-RELEASE
>  FreeBSD10.3  4.8.5  10.3-RELEASE

I do not understand this test.
This patch is dropping a lot of optimizations with compilers older than 4.9 !

Why not recommend GCC 4.9 and keep the graceful degradation for older versions,
at least for one more year, even if it is not optimal for newer architectures?

[dpdk-dev] [PATCH v7 2/2] net/ixgbe: add callback to user app on VF to PF mbox msg

2016-10-10 Thread Iremonger, Bernard

Hi Thomas,



> Subject: Re: [dpdk-dev] [PATCH v7 2/2] net/ixgbe: add callback to user app
> on VF to PF mbox msg
> 
> 2016-10-07 17:46, Bernard Iremonger:
> > +struct rte_pmd_ixgbe_mb_event_param {
> > +   uint16_t vfid; /**< Virtual Function number */
> > +   uint16_t msg_type; /**< message type */
> > +   uint16_t retval;   /**< return value */
> > +   void *userdata;/**< pointer to user data */
> 
> Generally speaking, the user data is a pointer passed by the application when
> registering the callback and must be untouched.
> It should be the name of the parameter that you are overriding with this
> structure.
> So this "userdata" pointer could probably be better defined.
> 
> By the way, it is far from trivial to understand how to write the callback.
> I think it deserves more explanations.
> 

I will clarify the description of the struct  rte_pmd_ixgbe_mb_event_param, and 
rename the member userdate to msg in a v8 patchset.

Regards,

Bernard.

[dpdk-dev] [PATCH v8 0/2] modify callback for VF management

2016-10-10 Thread Bernard Iremonger

This patchset modifies the callback function for VF management.

A third parameter has been added to the _rte_eth_dev_callback_process
function. All references to this function have been updated.
Changes have been made to the ixgbe_rcv_msg_from_vf function to
use the new callback parameter.

This patchset depends on the following patch.
http://dpdk.org/dev/patchwork/patch/16430/
[dpdk-dev,v7,1/2] net/ixgbe: add API's for VF management

These patches were part of the RFC PATCH v2 of the above patchset,
but were dropped from the v3 patchset.

Changes in v8:
Rebased to latest master.
Improve description of struct rte_pmd_ixgbe_mb_event_param,
rename  member user_data to msg.

Changes in v7:
Rebased to latest master.
Move references to _rte_eth_dev_callback_process in ixgbe patch
to librte_ether patch.
Revised description of callback parameter.

Changes in v6:
Rebased to latest master.
Squashed patches down to two patches.
Renamed parameter to _rte_eth_dev_callback_process function.
Updated API descriptions.

Changes in v5:
Rebased to latest master.
Added parameter to the _rte_eth_dev_callback_process function
Removed two new callback functions which were added previously.
Updated all calls to the _rte_eth_dev_callback_process function.

Changes in v4:
Rebased to latest master.
Moved the callback parameter structure from the ethdev to the ixgbe PMD.

Changes in v3:
Rebased to latest master.
Submitted as a seperate patchset.
Moved the response enums from the ethdev to the ixgbe PMD.

Changes in v2:
Rebased to latest master.

Bernard Iremonger (2):
  librte_ether: modify internal callback function
  net/ixgbe: add callback to user app on VF to PF mbox msg

 app/test/virtual_pmd.c |  2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c |  6 ++---
 drivers/net/e1000/em_ethdev.c  |  2 +-
 drivers/net/e1000/igb_ethdev.c |  4 ++--
 drivers/net/enic/enic_main.c   |  2 +-
 drivers/net/i40e/i40e_ethdev.c |  4 ++--
 drivers/net/i40e/i40e_ethdev_vf.c  |  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   |  4 ++--
 drivers/net/ixgbe/ixgbe_pf.c   | 42 +-
 drivers/net/ixgbe/rte_pmd_ixgbe.h  | 20 
 drivers/net/mlx4/mlx4.c|  4 ++--
 drivers/net/mlx5/mlx5_ethdev.c |  4 ++--
 drivers/net/nfp/nfp_net.c  |  2 +-
 drivers/net/thunderx/nicvf_ethdev.c|  2 +-
 drivers/net/vhost/rte_eth_vhost.c  |  6 ++---
 drivers/net/virtio/virtio_ethdev.c |  2 +-
 lib/librte_ether/rte_ethdev.c  |  5 +++-
 lib/librte_ether/rte_ethdev.h  | 12 +-
 18 files changed, 94 insertions(+), 31 deletions(-)

-- 
2.9.0

[dpdk-dev] [PATCH v8 1/2] librte_ether: modify internal callback function

2016-10-10 Thread Bernard Iremonger

add cb_arg parameter to the _rte_eth_dev_callback_process function.

Adding a parameter to this function allows passing information
to the application when an eth device event occurs such as
a VF to PF message.
This allows the application to decide if a particular function
is permitted.

drivers/net: add parameter to callback process function

add parameter to call of _rte_eth_dev_callback_process function
in the following PMD's:

net/bonding
net/e1000
net/i40e
net/enic
net/ixgbe
net/mlx4
net/mlx5
net/nfp
net/thunderx
net/vhost
net/virtio

app/test: add parameter to callback process function
add parameter to call of _rte_eth_dev_callback_process function

Signed-off-by: Bernard Iremonger 
Signed-off-by: Alex Zelezniak 
---
 app/test/virtual_pmd.c |  2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c |  6 +++---
 drivers/net/e1000/em_ethdev.c  |  2 +-
 drivers/net/e1000/igb_ethdev.c |  4 ++--
 drivers/net/enic/enic_main.c   |  2 +-
 drivers/net/i40e/i40e_ethdev.c |  4 ++--
 drivers/net/i40e/i40e_ethdev_vf.c  |  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c   |  4 ++--
 drivers/net/mlx4/mlx4.c|  4 ++--
 drivers/net/mlx5/mlx5_ethdev.c |  4 ++--
 drivers/net/nfp/nfp_net.c  |  2 +-
 drivers/net/thunderx/nicvf_ethdev.c|  2 +-
 drivers/net/vhost/rte_eth_vhost.c  |  6 +++---
 drivers/net/virtio/virtio_ethdev.c |  2 +-
 lib/librte_ether/rte_ethdev.c  |  5 -
 lib/librte_ether/rte_ethdev.h  | 12 +++-
 16 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index 4831113..65b44c6 100644
--- a/app/test/virtual_pmd.c
+++ b/app/test/virtual_pmd.c
@@ -485,7 +485,7 @@ virtual_ethdev_simulate_link_status_interrupt(uint8_t 
port_id,

vrtl_eth_dev->data->dev_link.link_status = link_status;

-   _rte_eth_dev_callback_process(vrtl_eth_dev, RTE_ETH_EVENT_INTR_LSC);
+   _rte_eth_dev_callback_process(vrtl_eth_dev, RTE_ETH_EVENT_INTR_LSC, 
NULL);
 }

 int
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index fb4050c..8d510e3 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1958,7 +1958,7 @@ bond_ethdev_delayed_lsc_propagation(void *arg)
return;

_rte_eth_dev_callback_process((struct rte_eth_dev *)arg,
-   RTE_ETH_EVENT_INTR_LSC);
+   RTE_ETH_EVENT_INTR_LSC, NULL);
 }

 void
@@ -2079,7 +2079,7 @@ bond_ethdev_lsc_event_callback(uint8_t port_id, enum 
rte_eth_event_type type,
(void *)bonded_eth_dev);
else
_rte_eth_dev_callback_process(bonded_eth_dev,
-   RTE_ETH_EVENT_INTR_LSC);
+   RTE_ETH_EVENT_INTR_LSC, NULL);

} else {
if (internals->link_down_delay_ms > 0)
@@ -2088,7 +2088,7 @@ bond_ethdev_lsc_event_callback(uint8_t port_id, enum 
rte_eth_event_type type,
(void *)bonded_eth_dev);
else
_rte_eth_dev_callback_process(bonded_eth_dev,
-   RTE_ETH_EVENT_INTR_LSC);
+   RTE_ETH_EVENT_INTR_LSC, NULL);
}
}
 }
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index f767e1c..e13e858 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1599,7 +1599,7 @@ eth_em_interrupt_handler(__rte_unused struct 
rte_intr_handle *handle,

eth_em_interrupt_get_status(dev);
eth_em_interrupt_action(dev);
-   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);
+   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
 }

 static int
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 5a1a83e..d9ab2df 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2683,7 +2683,7 @@ eth_igb_interrupt_action(struct rte_eth_dev *dev)
E1000_WRITE_REG(hw, E1000_TCTL, tctl);
E1000_WRITE_REG(hw, E1000_RCTL, rctl);
E1000_WRITE_FLUSH(hw);
-   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC);
+   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, 
NULL);
}

return 0;
@@ -2743,7 +2743,7 @@ void igbvf_mbx_process(struct rte_eth_dev *dev)

/* PF reset VF event */
if (in_msg == E1000_PF_CONTROL_MSG)
-   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET);
+   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET, 
NULL);
 }

 static int
diff --git a/drivers/net/enic/enic_main.c b/dr

[dpdk-dev] [PATCH v8 2/2] net/ixgbe: add callback to user app on VF to PF mbox msg

2016-10-10 Thread Bernard Iremonger

call _rte_eth_dev_callback_process from ixgbe_rcv_msg_from_vf function.

The callback asks the user application if it is allowed to perform
the function.

If the cb_param.retval is RTE_PMD_IXGBE_MB_EVENT_PROCEED then continue,
if 0, do nothing and send ACK to VF
if > 1, do nothing and send NAK to VF.

Signed-off-by: Alex Zelezniak 
Signed-off-by: Bernard Iremonger 
---
 drivers/net/ixgbe/ixgbe_pf.c  | 42 +--
 drivers/net/ixgbe/rte_pmd_ixgbe.h | 20 +++
 2 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index 56393ff..26395e4 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -51,6 +51,7 @@

 #include "base/ixgbe_common.h"
 #include "ixgbe_ethdev.h"
+#include "rte_pmd_ixgbe.h"

 #define IXGBE_MAX_VFTA (128)
 #define IXGBE_VF_MSG_SIZE_DEFAULT 1
@@ -660,6 +661,7 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf)
struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
struct ixgbe_vf_info *vfinfo =
*IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
+   struct rte_pmd_ixgbe_mb_event_param cb_param;

retval = ixgbe_read_mbx(hw, msgbuf, mbx_size, vf);
if (retval) {
@@ -674,27 +676,54 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t 
vf)
/* flush the ack before we write any messages back */
IXGBE_WRITE_FLUSH(hw);

+   /**
+* initialise structure to send to user application
+* will return response from user in retval field
+*/
+   cb_param.retval = RTE_PMD_IXGBE_MB_EVENT_PROCEED;
+   cb_param.vfid = vf;
+   cb_param.msg_type = msgbuf[0] & 0x;
+   cb_param.msg = (void *)msgbuf;
+
/* perform VF reset */
if (msgbuf[0] == IXGBE_VF_RESET) {
int ret = ixgbe_vf_reset(dev, vf, msgbuf);

vfinfo[vf].clear_to_send = true;
+
+   /* notify application about VF reset */
+   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_VF_MBOX, 
&cb_param);
return ret;
}

+   /**
+* ask user application if we allowed to perform those functions
+* if we get cb_param.retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED
+* then business as usual,
+* if 0, do nothing and send ACK to VF
+* if cb_param.retval > 1, do nothing and send NAK to VF
+*/
+   _rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_VF_MBOX, &cb_param);
+
+   retval = cb_param.retval;
+
/* check & process VF to PF mailbox message */
switch ((msgbuf[0] & 0x)) {
case IXGBE_VF_SET_MAC_ADDR:
-   retval = ixgbe_vf_set_mac_addr(dev, vf, msgbuf);
+   if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED)
+   retval = ixgbe_vf_set_mac_addr(dev, vf, msgbuf);
break;
case IXGBE_VF_SET_MULTICAST:
-   retval = ixgbe_vf_set_multicast(dev, vf, msgbuf);
+   if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED)
+   retval = ixgbe_vf_set_multicast(dev, vf, msgbuf);
break;
case IXGBE_VF_SET_LPE:
-   retval = ixgbe_set_vf_lpe(dev, vf, msgbuf);
+   if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED)
+   retval = ixgbe_set_vf_lpe(dev, vf, msgbuf);
break;
case IXGBE_VF_SET_VLAN:
-   retval = ixgbe_vf_set_vlan(dev, vf, msgbuf);
+   if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED)
+   retval = ixgbe_vf_set_vlan(dev, vf, msgbuf);
break;
case IXGBE_VF_API_NEGOTIATE:
retval = ixgbe_negotiate_vf_api(dev, vf, msgbuf);
@@ -704,7 +733,8 @@ ixgbe_rcv_msg_from_vf(struct rte_eth_dev *dev, uint16_t vf)
msg_size = IXGBE_VF_GET_QUEUE_MSG_SIZE;
break;
case IXGBE_VF_UPDATE_XCAST_MODE:
-   retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf);
+   if (retval == RTE_PMD_IXGBE_MB_EVENT_PROCEED)
+   retval = ixgbe_set_vf_mc_promisc(dev, vf, msgbuf);
break;
default:
PMD_DRV_LOG(DEBUG, "Unhandled Msg %8.8x", (unsigned)msgbuf[0]);
diff --git a/drivers/net/ixgbe/rte_pmd_ixgbe.h 
b/drivers/net/ixgbe/rte_pmd_ixgbe.h
index 2689668..74b64ad 100644
--- a/drivers/net/ixgbe/rte_pmd_ixgbe.h
+++ b/drivers/net/ixgbe/rte_pmd_ixgbe.h
@@ -179,4 +179,24 @@ int rte_pmd_ixgbe_set_vf_split_drop_en(uint8_t port, 
uint16_t vf, uint8_t on);
  */
 int
 rte_pmd_ixgbe_set_vf_vlan_stripq(uint8_t port, uint16_t vf, u

[dpdk-dev] [PATCH] pdump: fix dir permissions value in mkdir call

2016-10-10 Thread Reshma Pattan

From: Reshma Pattan 

Inside the function pdump_get_socket_path(), pdump socket
directories are created using mkdir() call with permissions 700,
which was assigning wrong permissions to the directories
i.e. "d-w-r-xr-T" instead of drwx---. The reason is mkdir() call
doesn't consider 700 as an octal value until unless 0 is explicitly
added before the value. Because of this, socket creation failure is
observed when DPDK application was ran in non root user mode.
DPDK application running in root user mode never reported the issue.

So 0 is prefixed to the value to create directories with
the correct permissions.

Fixes: e4ffa2d3 ("pdump: fix error handlings")
Fixes: bdd8dcc6 ("pdump: fix default socket path")

Reported-by: Jianfeng Tan 
Signed-off-by: Reshma Pattan 
---
 lib/librte_pdump/rte_pdump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 9b921ce..ea5ccd9 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -471,12 +471,12 @@ pdump_get_socket_path(char *buffer, int bufsz, enum 
rte_pdump_socktype type)
snprintf(dpdk_dir, sizeof(dpdk_dir), "%s%s",
SOCKET_PATH_VAR_RUN, DPDK_DIR);

-   mkdir(dpdk_dir, 700);
+   mkdir(dpdk_dir, 0700);
snprintf(dir, sizeof(dir), "%s%s",
dpdk_dir, SOCKET_DIR);
}

-   ret =  mkdir(dir, 700);
+   ret =  mkdir(dir, 0700);
/* if user passed socket path is invalid, return immediately */
if (ret < 0 && errno != EEXIST) {
RTE_LOG(ERR, PDUMP,
-- 
2.7.4

[dpdk-dev] [PATCH v2] net/i40e: remove weak symbols



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zoltan Kiss
> Sent: Thursday, July 21, 2016 1:11 AM
> To: dev at dpdk.org
> Cc: Matias Elo ; Gonzalez Monroy, Sergio
> ; Yigit, Ferruh  intel.com>;
> damarion at cisco.com; thomas.monjalon at 6wind.com
> Subject: [dpdk-dev] [PATCH v2] net/i40e: remove weak symbols
> 
> Using weak symbols have a few issues with static linking:
> 
> - normally the linker searches the .o files already linked, if your weak
>   one is there, it won't check if there is a strong version
> - unless the symbol is directly referred, but it's not the right thing to
>   rely on
> - or --whole-archive specified in the command line, which pulls in a lot
>   of unwanted stuff
> - whole-archive also makes libtool dropping the library from the command
>   line, which is what should happen with dynamic linking, but not with
>   static (observed on Ubuntu 14.04). This is an important bug if you
>   build a static library depending on DPDK
> 
> This patch merges the two version and make the behaviour rely on the
> config.
> 
> If we can agree in the concept, I can send a series to fix this for the
> other weak functions.
> 
> Signed-off-by: Zoltan Kiss 
> ---

Looks good to remove weak symbols.
Just one concern is vector PMD is one specific part to different 
Micro-Architecture.
It's better to keep independent from normal tx/rx files. Think about 
ixgbe_rxtx_vec_neon.c and ixgbe_rxtx_vec_sse.c


Thanks
Jingjing

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature

On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
> >>>At that time, a packet always use 2 descs. Since indirect desc is
> >>>enabled (by default) now, the assumption is not true then. What's
> >>>worse, it might even slow things a bit down. That should also be
> >>>part of the reason why performance is slightly worse than before.
> >>>
> >>>   --yliu
> >>
> >>I'm not sure I get what you are saying
> >>
> >>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> >>>Author: Yuanhan Liu 
> >>>Date:   Mon May 2 17:46:17 2016 -0700
> >>>
> >>>vhost: optimize dequeue for small packets
> >>>
> >>>A virtio driver normally uses at least 2 desc buffers for Tx: the
> >>>first for storing the header, and the others for storing the data.
> >>>
> >>>Therefore, we could fetch the first data desc buf before the main
> >>>loop, and do the copy first before the check of "are we done yet?".
> >>>This could save one check for small packets that just have one data
> >>>desc buffer and need one mbuf to store it.
> >>>
> >>>Signed-off-by: Yuanhan Liu 
> >>>Acked-by: Huawei Xie 
> >>>Tested-by: Rich Lane 
> >>
> >>This fast-paths the 2-descriptors format but it's not active
> >>for indirect descriptors. Is this what you mean?
> >
> >Yes. It's also not active when ANY_LAYOUT is actually turned on.
> >>Should be a simple matter to apply this optimization for indirect.
> >
> >Might be.
> 
> If I understand the code correctly, indirect descs also benefit from this
> optimization, or am I missing something?

Aha..., you are right!

--yliu

[dpdk-dev] [PATCH v3 03/16] mbuf: move packet type definitions in a new file

2016-10-03 10:38, Olivier Matz:
> The file rte_mbuf.h starts to be quite big, and next commits
> will introduce more functions related to packet types. Let's
> move them in a new file.
> 
> Signed-off-by: Olivier Matz 
> ---
>  lib/librte_mbuf/Makefile |   2 +-
>  lib/librte_mbuf/rte_mbuf.h   | 495 +--
>  lib/librte_mbuf/rte_mbuf_ptype.h | 552 
> +++

Why not moving packet type and other packet flags in librte_net?

[dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature



On 10/10/2016 04:42 PM, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
> At that time, a packet always use 2 descs. Since indirect desc is
> enabled (by default) now, the assumption is not true then. What's
> worse, it might even slow things a bit down. That should also be
> part of the reason why performance is slightly worse than before.
>
>   --yliu

 I'm not sure I get what you are saying

> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> Author: Yuanhan Liu 
> Date:   Mon May 2 17:46:17 2016 -0700
>
>vhost: optimize dequeue for small packets
>
>A virtio driver normally uses at least 2 desc buffers for Tx: the
>first for storing the header, and the others for storing the data.
>
>Therefore, we could fetch the first data desc buf before the main
>loop, and do the copy first before the check of "are we done yet?".
>This could save one check for small packets that just have one data
>desc buffer and need one mbuf to store it.
>
>Signed-off-by: Yuanhan Liu 
>Acked-by: Huawei Xie 
>Tested-by: Rich Lane 

 This fast-paths the 2-descriptors format but it's not active
 for indirect descriptors. Is this what you mean?
>>>
>>> Yes. It's also not active when ANY_LAYOUT is actually turned on.
 Should be a simple matter to apply this optimization for indirect.
>>>
>>> Might be.
>>
>> If I understand the code correctly, indirect descs also benefit from this
>> optimization, or am I missing something?
>
> Aha..., you are right!

The interesting thing is that the patch I send on Thursday that removes
header access when no offload has been negotiated[0] seems to reduce
almost to zero the performance seen with indirect descriptors enabled.
I see this with 64 bytes packets using testpmd on both ends.

When I did the patch, I would have expected the same gain with both
modes, whereas I measured +1% for direct and +4% for indirect.

Maxime
[0]: http://dpdk.org/dev/patchwork/patch/16423/

[dpdk-dev] [PATCH 1/4] testpmd: Add support to configure 25G and 50G speeds

Hi Ajit,

On 9/29/2016 6:03 PM, Ajit Khaparde wrote:
> Support to configure 25G and 50G speeds is missing from testpmd.
> This patch also updates the testpmd user guide accordingly.
> 
> Signed-off-by: Ajit Khaparde 

This patch seems not really part of the patchset for bnxt driver, but
standalone testpmd only modification, and can be threaded as single patch.

I am adding testpmd maintainer to cc.

Thanks,
ferruh

[dpdk-dev] [PATCH v2] mk: gcc -march support for intel processors code names

2016-10-10 Thread Pattan, Reshma

Hi Thomas

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, October 10, 2016 3:26 PM
> To: Liu, Yong ; Pattan, Reshma
> 
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] mk: gcc -march support for intel processors
> code names
> 
> 2016-09-28 10:54, Liu, Yong:
> > Tested-by: Yong Liu 
> > FangFang Wei 
> [...]
> >Description: Build test on different distributions
> >Command / instruction:
> >  Verify build pass on listed distributions.
> >
> >  OS   GCCKernel
> >  Ubuntu 16.04 5.4.0  4.4.0-36-generic
> >  Fedora23 5.3.1  4.2.3-300
> >  Fedora24 6.1.1  4.6.4-301
> >  Ubuntu 12.04 4.6.3  3.8.0-29
> >  Ubuntu 12.04 i6864.6.3  3.8.0-29
> >  Ubuntu 14.04 4.8.4  3.16.0-30
> >  Ubuntu 14.04 i6864.8.4  3.16.0-30
> >  Fedora18 4.7.2  3.6.10-4
> >  Fedora18 i6864.7.2  3.6.10-4
> >  Fedora20 4.8.2  3.15.6-200
> >  Fedora20 i6864.8.3  3.11.0
> >  Suse11SP24.5.1  3.0.13-0.2
> >  Suse12SP34.7.2  3.7.10-1.1
> >  RHEL7.0  4.8.2  3.10.0-123
> >  RHEL7.2  4.8.5  3.10.0-327
> >  CentOS7.04.8.5  3.10.0-327
> >  FreeBSD10.0  4.8.4  10.0-RELEASE
> >  FreeBSD10.3  4.8.5  10.3-RELEASE
> 
> I do not understand this test.
> This patch is dropping a lot of optimizations with compilers older than 4.9 !
> 
> Why not recommend GCC 4.9 and keep the graceful degradation for older
> versions, at least for one more year, even if it is not optimal for newer
> architectures?

I am ok with the idea, so should I send latest patch by keeping the code in  
mk/toolchain/gcc/rte.toolchain-compat.mk untouched/intact?

Thanks,
Reshma

[dpdk-dev] [PATCH 2/4] bnxt: Fix a segfault encountered during KNI exit

On 9/29/2016 8:06 PM, Ajit Khaparde wrote:
> On Thu, Sep 29, 2016 at 12:41 PM, Ferruh Yigit  >wrote:
> 
> On 9/29/2016 6:03 PM, Ajit Khaparde wrote:
> > The bnxt PMD is running into a segfault when exiting out of KNI
> 
> KNI?
> 
> examples/kni
> 
> ?On closing the kni app, I am seeing a segfault.? because of improper
> cleanup of resources
> in the dev_uninit path.

So this is mainly related to driver exit, KNI app is just a way to
reproduce it.

Are you OK with following update in commit log:

"
net/bnxt: fix a segfault encountered during PMD exit

This patch fixes segfault encountered during dev_uninit/close routine.
KNI sample app can be used to reproduce the issue.
"

[dpdk-dev] [PATCH] app/test: fix crypto mbuf pool size

2016-10-10 Thread Trahe, Fiona

Hi Piotr,

> -Original Message-
> From: Azarewicz, PiotrX T
> Sent: Friday, October 7, 2016 12:35 PM
> To: Trahe, Fiona 
> Cc: dev at dpdk.org
> Subject: [PATCH] app/test: fix crypto mbuf pool size
> 
> This patch fix that created pool for crypto mbufs may be too big in some
> environments.
> To avoid the issue, mbuf pool is reverted to its previous size.
> Moreover, here is added additional small pool with one large mbuf, what is
> needed in large data test scenarios.
> 
> Fixes: 2070f885b76d ("app/test: added tests for libcrypto PMD")
> 
> Signed-off-by: Piotr Azarewicz 
> ---
>  app/test/test_cryptodev.c |   28 +---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/app/test/test_cryptodev.c b/app/test/test_cryptodev.c index
> 2917454..914bb0b 100644
> --- a/app/test/test_cryptodev.c
> +++ b/app/test/test_cryptodev.c
> @@ -60,6 +60,7 @@ static enum rte_cryptodev_type gbl_cryptodev_type;
> 
>  struct crypto_testsuite_params {
>   struct rte_mempool *mbuf_pool;
> + struct rte_mempool *large_mbuf_pool;
>   struct rte_mempool *op_mpool;
>   struct rte_cryptodev_config conf;
>   struct rte_cryptodev_qp_conf qp_conf;
> @@ -169,7 +170,7 @@ testsuite_setup(void)
>   /* Not already created so create */
>   ts_params->mbuf_pool = rte_pktmbuf_pool_create(
>   "CRYPTO_MBUFPOOL",
> - NUM_MBUFS, MBUF_CACHE_SIZE, 0,
> UINT16_MAX,
> + NUM_MBUFS, MBUF_CACHE_SIZE, 0,
> MBUF_SIZE,
>   rte_socket_id());
>   if (ts_params->mbuf_pool == NULL) {
>   RTE_LOG(ERR, USER1, "Can't create
> CRYPTO_MBUFPOOL\n"); @@ -177,6 +178,21 @@ testsuite_setup(void)
>   }
>   }
> 
> + ts_params->large_mbuf_pool = rte_mempool_lookup(
> + "CRYPTO_LARGE_MBUFPOOL");
> + if (ts_params->large_mbuf_pool == NULL) {
> + /* Not already created so create */
> + ts_params->large_mbuf_pool = rte_pktmbuf_pool_create(
> + "CRYPTO_LARGE_MBUFPOOL",
> + 1, 0, 0, UINT16_MAX,
> + rte_socket_id());
> + if (ts_params->large_mbuf_pool == NULL) {
> + RTE_LOG(ERR, USER1,
> + "Can't create
> CRYPTO_LARGE_MBUFPOOL\n");
> + return TEST_FAILED;
> + }
> + }
> +
>   ts_params->op_mpool = rte_crypto_op_pool_create(
>   "MBUF_CRYPTO_SYM_OP_POOL",
>   RTE_CRYPTO_OP_TYPE_SYMMETRIC,
> @@ -5149,7 +5165,10 @@ test_AES_GMAC_authentication(const struct
> gmac_test_data *tdata)
>   if (retval < 0)
>   return retval;
> 
> - ut_params->ibuf = rte_pktmbuf_alloc(ts_params->mbuf_pool);
> + if (tdata->aad.len == GMAC_LARGE_PLAINTEXT_LENGTH)
> + ut_params->ibuf = rte_pktmbuf_alloc(ts_params-
> >large_mbuf_pool);
> + else
> + ut_params->ibuf = rte_pktmbuf_alloc(ts_params-
> >mbuf_pool);
> 
It would be safer to test if aad.len > MBUF_SIZE, here and below.


>   memset(rte_pktmbuf_mtod(ut_params->ibuf, uint8_t *), 0,
>   rte_pktmbuf_tailroom(ut_params->ibuf));
> @@ -5233,7 +5252,10 @@ test_AES_GMAC_authentication_verify(const
> struct gmac_test_data *tdata)
>   if (retval < 0)
>   return retval;
> 
> - ut_params->ibuf = rte_pktmbuf_alloc(ts_params->mbuf_pool);
> + if (tdata->aad.len == GMAC_LARGE_PLAINTEXT_LENGTH)
> + ut_params->ibuf = rte_pktmbuf_alloc(ts_params-
> >large_mbuf_pool);
> + else
> + ut_params->ibuf = rte_pktmbuf_alloc(ts_params-
> >mbuf_pool);
> 
>   memset(rte_pktmbuf_mtod(ut_params->ibuf, uint8_t *), 0,
>   rte_pktmbuf_tailroom(ut_params->ibuf));
> --
> 1.7.9.5

[dpdk-dev] [PATCH 3/4] bnxt: Add support for Async Link Notification

Hi Ajit,

On 9/29/2016 6:03 PM, Ajit Khaparde wrote:
> This patch adds support to get Link notification asynchronously.
> The HW sends Async notifications on default completion ring. The
> PMD processes these notifications and logs a message appropriately.
> 
> Signed-off-by: Ajit Khaparde 
> ---
>  drivers/net/bnxt/Makefile  |   1 +
>  drivers/net/bnxt/bnxt.h|   6 +-
>  drivers/net/bnxt/bnxt_cpr.c|  21 +++---
>  drivers/net/bnxt/bnxt_ethdev.c | 114 ++
>  drivers/net/bnxt/bnxt_hwrm.c   |  93 
>  drivers/net/bnxt/bnxt_irq.c| 156 
> +
>  drivers/net/bnxt/bnxt_irq.h|  51 ++
>  7 files changed, 367 insertions(+), 75 deletions(-)
>  create mode 100644 drivers/net/bnxt/bnxt_irq.c
>  create mode 100644 drivers/net/bnxt/bnxt_irq.h
> 

<...>

> +static inline int
> +rte_bnxt_atomic_read_link_status(struct rte_eth_dev *eth_dev,
> + struct rte_eth_link *link)
> +{
> + struct rte_eth_link *dst = link;
> + struct rte_eth_link *src = ð_dev->data->dev_link;
> +
> + if (rte_atomic64_cmpset((uint64_t *)dst, *(uint64_t *)dst,
> + *(uint64_t *)src) == 0)
> + return 1;
> +
> + return 0;
> +}
> +

This creates a compilation error:
.../drivers/net/bnxt/bnxt_ethdev.c:444:1:
error: unused function 'rte_bnxt_atomic_read_link_status'
[-Werror,-Wunused-function]
rte_bnxt_atomic_read_link_status(struct rte_eth_dev *eth_dev,
^

Since the patches in this patchet really not related, it is OK to send a
new version of just this patch.

Thanks,
ferruh

[dpdk-dev] 17.02 Roadmap

2016-10-10 Thread O'Driscoll, Tim

We published our initial roadmap for 17.02 at the end of August. Since then 
we've been doing more detailed planning and would like to provide an update on 
the features that we plan to submit for this release. This is our current plan, 
which should hopefully remain fairly stable now:

Consistent Filter API: Add support for the Consistent Filter API (see 
http://dpdk.org/ml/archives/dev/2016-September/047924.html) for IGB, IXGBE and 
I40E.

Elastic Flow Distributor: The Elastic Flow Distributor (EFD) is a flow-based 
load balancing library which scales linearly for both lookup and insert with 
the number of threads or cores.  EFD lookup uses a "perfect hashing" scheme 
where only the information needed to compute a key's value (and not the key 
itself) is stored in the lookup table, thus reducing CPU cache storage 
requirements. 

Extended Stats (Latency and Bit Rate Statistics): Enhance the Extended NIC 
Stats (Xstats) implementation to support the collection and reporting of 
latency and bit rate measurements. Latency statistics will include min, max and 
average latency, and jitter. Bit rate statistics will include peak and average 
bit rate aggregated over a user-defined time period. This will be implemented 
for IXGBE and I40E.

Run-Time Configuration of Packet Type (PTYPE) for I40E: At the moment all 
packet types in DPDK are statically defined. This makes impossible to add new 
values without first defining them statically and then recompiling DPDK. The 
ability to configure packet types at run time will be added for I40E.

Packet Distributor Enhancements: Enhancements will be made to the Packet 
Distributor library to improve performance:
1. Introduce burst functionality to allow batches of packets to be sent to 
workers.
2. Improve the performance of the flow/core affinity through the use of SSE/AVX 
instructions.

Add MACsec for IXGBE: MACsec support will be added for IXGBE. Ethdev API 
primitives will be added to create/delete/enable/disable SC/SA, Next_PN etc. 
similar to those used in Linux for the macsec_ops. Sample apps (l3fwd, testpmd, 
etc.) will be updated to support MACsec for the IXGBE. 

Enhance AESNI_GCM PMD: The current AESNI_GCM PMD is limited to AES-128 and does 
not support other features such as "any AAD length value". It will be updated 
to use a newer GCM implementation supporting AES128/192/256 and other features.

Create Crypto Performance Test App: A new app, similar to testpmd, will be 
created to allow crypto performance to be tested using any crypto PMD and any 
supported crypto algorithm.

Enable Cipher-Only and Hash-Only Support in AESNI_MB PMD: Support will be added 
for cipher-only and hash-only operations in the AESNI_MB PMD.

Support Chained Mbufs in Cryptodev: Currently, an application using the 
cryptodev API needs to reserve a continuous block of memory for mbufs. Support 
will be added for chaining of mbufs in both the QAT and SW PMDs supported by 
cryptodev.

Optimize Vhost-User Performance for Large Packets: A new memory copy function 
optimized for core-to-core memory copy which will be added. This will be 
beneficial for virtualization cases involving large packets, but it can be used 
for other core-to-core cases as well.

Support New Device Types in Vhost-User: Support will be added to vhost-user for 
new device types including vhost-scsi and vhost-blk.

Interrupt Mode Support in Virtio PMD: Support for interrupt mode will be added 
to the virtio PMD.

Virtio-User as an Alternative Exception Path: Investigate the use of 
virtio-user and vhost-net as an alternative exception path to KNI that does not 
require out of tree drivers. This work is still at an experimental stage, so it 
may not be included in 17.02.


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of O'Driscoll, Tim
> Sent: Wednesday, August 31, 2016 11:32 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] 17.02 Roadmap
> 
> Below are the features that we're planning to submit for the 17.02
> release. We'll submit a patch to update the roadmap page with this info.
> 
> Some things will obviously change during planning/development, so we'll
> provide a more detailed update in late September/early October. After
> that, things should hopefully be relatively stable.
> 
> It would be good if others are also willing to share their plans so that
> we can build up a complete picture of what's planned for 17.02 and make
> sure there's no duplication.
> 
> 
> Consistent Filter API phase 2: Extend support for the Consistent Filter
> API that will be first implemented in 16.11 to IGB and FM10K.
> 
> Elastic Flow Distributor: The Elastic Flow Distributor (EFD) is a flow-
> based load balancing library which scales linearly for both lookup and
> insert with the number of threads or cores.  EFD lookup uses a "perfect
> hashing" scheme where only the information needed to compute a key's
> value (and not the key itself) is stored in the lookup table, thus
> reducing CPU cache s

[dpdk-dev] [PATCH v2] test_cryptodev_perf: IV and digest should be stored at a DMAeble address

2016-10-10 Thread De Lara Guarch, Pablo

Hi Akhil,

> -Original Message-

[...]

> Hi Pablo,
> 
> I have already rebased this patch to dpdk-next-crypto subtree. Please
> let me know if there is any issue.
> Here is my git log
> 779f1301a382b6b9e2877fd0357bba33d1242d65 test_cryptodev_perf: IV and
> digest should be stored at a DMAeble address
> 8c9fdf4568768b7765fc2176e400a860dc758020 app/test: remove hard-coding
> of
> crypto num qps
> c1876c1cb90f0882ada0acd9e430be7cf63bc765 app/test: cleanup
> unnecessary
> ring size setup
> 136592c3a350ded56438b59cc4921a243f08e1d0 app/test: remove pointless
> for loop
> fca4f966b42adc0c8f3e1d43a94d93ea4fcb crypto/aesni_mb: free ring
> memory on qp release in PMD
> 
> 
> Regards

The problem is that AES_CBC_CIPHER_IV_LENGTH does not exist anymore, so 
compilation is broken.
Also, you could add this change for another tests, like what Arek suggested 
(for SNOW3G),
or someone can send a new version of this patch, if you prefer that.

Thanks,
Pablo

[dpdk-dev] [PATCH] net/i40e: add additional prefetch instructions for bulk rx

2016-10-10 Thread Vladyslav Buslov

> -Original Message-
> From: Wu, Jingjing [mailto:jingjing.wu at intel.com]
> Sent: Monday, October 10, 2016 4:26 PM
> To: Yigit, Ferruh; Vladyslav Buslov; Zhang, Helin
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] net/i40e: add additional prefetch
> instructions for bulk rx
> 
> 
> 
> > -Original Message-
> > From: Yigit, Ferruh
> > Sent: Wednesday, September 14, 2016 9:25 PM
> > To: Vladyslav Buslov ; Zhang, Helin
> > ; Wu, Jingjing 
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch
> > instructions for bulk rx
> >
> > On 7/14/2016 6:27 PM, Vladyslav Buslov wrote:
> > > Added prefetch of first packet payload cacheline in
> > > i40e_rx_scan_hw_ring Added prefetch of second mbuf cacheline in
> > > i40e_rx_alloc_bufs
> > >
> > > Signed-off-by: Vladyslav Buslov 
> > > ---
> > >  drivers/net/i40e/i40e_rxtx.c | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/i40e/i40e_rxtx.c
> > > b/drivers/net/i40e/i40e_rxtx.c index d3cfb98..e493fb4 100644
> > > --- a/drivers/net/i40e/i40e_rxtx.c
> > > +++ b/drivers/net/i40e/i40e_rxtx.c
> > > @@ -1003,6 +1003,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue
> *rxq)
> > > /* Translate descriptor info to mbuf parameters */
> > > for (j = 0; j < nb_dd; j++) {
> > > mb = rxep[j].mbuf;
> > > +   rte_prefetch0(RTE_PTR_ADD(mb->buf_addr,
> > RTE_PKTMBUF_HEADROOM));
> 
> Why did prefetch here? I think if application need to deal with packet, it is
> more suitable to put it in application.
> 
> > > qword1 = rte_le_to_cpu_64(\
> > > rxdp[j].wb.qword1.status_error_len);
> > > pkt_len = ((qword1 &
> > I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
> > > @@ -1086,9 +1087,11 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue
> *rxq)
> > >
> > > rxdp = &rxq->rx_ring[alloc_idx];
> > > for (i = 0; i < rxq->rx_free_thresh; i++) {
> > > -   if (likely(i < (rxq->rx_free_thresh - 1)))
> > > +   if (likely(i < (rxq->rx_free_thresh - 1))) {
> > > /* Prefetch next mbuf */
> > > -   rte_prefetch0(rxep[i + 1].mbuf);
> > > +   rte_prefetch0(&rxep[i + 1].mbuf->cacheline0);
> > > +   rte_prefetch0(&rxep[i + 1].mbuf->cacheline1);
> > > +   }
> Agree with this change. And when I test it by testpmd with iofwd, no
> performance increase is observed but minor decrease.
> Can you share will us when it will benefit the performance in your scenario ?
> 
> 
> Thanks
> Jingjing

Hello Jingjing,

Thanks for code review.

My use case: We have simple distributor thread that receives packets from port 
and distributes them among worker threads according to VLAN and MAC address 
hash. 

While working on performance optimization we determined that most of CPU usage 
of this thread is in DPDK.
As and optimization we decided to switch to rx burst alloc function, however 
that caused additional performance degradation compared to scatter rx mode.
In profiler two major culprits were:
  1. Access to packet data Eth header in application code. (cache miss)
  2. Setting next packet descriptor field to NULL in DPDK i40e_rx_alloc_bufs 
code. (this field is in second descriptor cache line that was not prefetched)
After applying my fixes performance improved compared to scatter rx mode.

I assumed that prefetch of first cache line of packet data belongs to DPDK 
because it is done in scatter rx mode. (in i40e_recv_scattered_pkts)
It can be moved to application side but IMO it is better to be consistent 
across all rx modes.

Regards,
Vladyslav

[dpdk-dev] [PATCH v2] mk: gcc -march support for intel processors code names

2016-10-10 15:21, Pattan, Reshma:
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Why not recommend GCC 4.9 and keep the graceful degradation for older
> > versions, at least for one more year, even if it is not optimal for newer
> > architectures?
> 
> I am ok with the idea, so should I send latest patch by keeping the code in
> mk/toolchain/gcc/rte.toolchain-compat.mk untouched/intact?

Yes please.

[dpdk-dev] [PATCH v4 4/6] i40e: add Tx preparation

2016-10-10 Thread Kulasek, TomaszX

Hi Jingjing,

> -Original Message-
> From: Wu, Jingjing
> Sent: Monday, October 10, 2016 16:03
> To: Kulasek, TomaszX ; dev at dpdk.org
> Cc: Ananyev, Konstantin ; Kulasek, TomaszX
> 
> Subject: RE: [dpdk-dev] [PATCH v4 4/6] i40e: add Tx preparation
> 
> >  #include "i40e_logs.h"
> >  #include "base/i40e_prototype.h"
> > @@ -79,6 +81,17 @@
> > PKT_TX_TCP_SEG | \
> > PKT_TX_OUTER_IP_CKSUM)
> >
> > +#define I40E_TX_OFFLOAD_MASK (  \
> > +   PKT_TX_IP_CKSUM |   \
> > +   PKT_TX_L4_MASK |\
> > +   PKT_TX_OUTER_IP_CKSUM | \
> > +   PKT_TX_TCP_SEG |\
> > +   PKT_TX_QINQ_PKT |   \
> > +   PKT_TX_VLAN_PKT)
> > +
> More TX flags are added for tunneling as below.
> /**
>  * Bits 45:48 used for the tunnel type.
>  * When doing Tx offload like TSO or checksum, the HW needs to configure
> the
>  * tunnel type into the HW descriptors.
>  */
> #define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
> #define PKT_TX_TUNNEL_GRE (0x2ULL << 45)
> #define PKT_TX_TUNNEL_IPIP(0x3ULL << 45)
> #define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
> /* add new TX TUNNEL type here */
> #define PKT_TX_TUNNEL_MASK(0xFULL << 45)
> 
> Please check:
> commit 63c0d74daaa9a807fbca8a3e363bbe41d6fb715f
> Author: Jianfeng Tan 
> Date:   Mon Aug 1 03:56:53 2016 +
> 
> mbuf: add Tx side tunneling type
> 
> 
> Thanks
> Jingjing

Thanks for spotting this, I will send updated v5.

Tomasz

[dpdk-dev] [PATCH 1/4] testpmd: Add support to configure 25G and 50G speeds

On Mon, Oct 10, 2016 at 10:01 AM, Ferruh Yigit 
wrote:

> Hi Ajit,
>
> On 9/29/2016 6:03 PM, Ajit Khaparde wrote:
> > Support to configure 25G and 50G speeds is missing from testpmd.
> > This patch also updates the testpmd user guide accordingly.
> >
> > Signed-off-by: Ajit Khaparde 
>
> This patch seems not really part of the patchset for bnxt driver, but
> standalone testpmd only modification,
> ??
> and can be threaded as single patch.
>
?OK. Got it.
I had worked on this change in the middle of other things.
?So it got bundled along with other bnxt patches.


>
> I am adding testpmd maintainer to cc.
>
?Let me know if you want me to send it again as a standalone patch.

Thanks?


>
> Thanks,
> ferruh
>
>

[dpdk-dev] [PATCH 2/4] bnxt: Fix a segfault encountered during KNI exit

On Mon, Oct 10, 2016 at 10:24 AM, Ferruh Yigit 
wrote:

> On 9/29/2016 8:06 PM, Ajit Khaparde wrote:
> > On Thu, Sep 29, 2016 at 12:41 PM, Ferruh Yigit  > >wrote:
> >
> > On 9/29/2016 6:03 PM, Ajit Khaparde wrote:
> > > The bnxt PMD is running into a segfault when exiting out of KNI
> >
> > KNI?
> >
> > examples/kni
> >
> > ?On closing the kni app, I am seeing a segfault.? because of improper
> > cleanup of resources
> > in the dev_uninit path.
>
> So this is mainly related to driver exit, KNI app is just a way to
> reproduce it.
>
> Are you OK with following update in commit log:
>
> "
> net/bnxt: fix a segfault encountered during PMD exit
>
> This patch fixes segfault encountered during dev_uninit/close routine.
> KNI sample app can be used to reproduce the issue.
> "
>
?I am OK with this Ferruh.

Thanks?

[dpdk-dev] [PATCH v2 4/4] net/bnxt: add ULL suffix to constant 1 for bit shift in bnxt_mac_addr_remove_op

On Thu, Sep 29, 2016 at 12:39 PM, John W. Linville 
wrote:

> Some(?) compilers will treat the unmarked constant 1 as a 32-bit
> integer, but the shift operation is in a loop that could run up to
> 63 times -- undefined behavior!
>
> Coverity issue: 127546
> Fixes: 778b759ba10e ("net/bnxt: add MAC address")
>
> Signed-off-by: John W. Linville 
>
?Acked
-by: Ajit Khaparde 


> ---
> v2: fix-up changelog entries based-on comments from Ferruh Yigit
>
>  drivers/net/bnxt/bnxt_ethdev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev
> .c
> index f4eedfd812bb..d7447b15983b 100644
> --- a/drivers/net/bnxt/bnxt_ethdev.c
> +++ b/drivers/net/bnxt/bnxt_ethdev.c
> @@ -463,7 +463,7 @@ static void bnxt_mac_addr_remove_op(struct rte_eth_dev
> *eth_dev,
>  * remove the corresponding MAC addr filter
>  */
> for (i = 0; i < MAX_FF_POOLS; i++) {
> -   if (!(pool_mask & (1 << i)))
> +   if (!(pool_mask & (1ULL << i)))
> continue;
>
> STAILQ_FOREACH(vnic, &bp->ff_pool[i], next) {
> --
> 2.7.4
>
>

[dpdk-dev] 17.02 Roadmap

Thanks Tim for the interesting inputs.
Some of them may require a dedicated thread to continue the discussion
based on some preliminary specifications or drafts.

2016-10-10 16:13, O'Driscoll, Tim:
> Elastic Flow Distributor: The Elastic Flow Distributor (EFD) is a flow-based 
> load balancing library which scales linearly for both lookup and insert with 
> the number of threads or cores.  EFD lookup uses a "perfect hashing" scheme 
> where only the information needed to compute a key's value (and not the key 
> itself) is stored in the lookup table, thus reducing CPU cache storage 
> requirements.

What is the scope of this library? Just apply rte_hash to a flow table?
Or is it also sending the packets in some queues?
Does it depend of librte_distributor?

> Extended Stats (Latency and Bit Rate Statistics): Enhance the Extended NIC 
> Stats (Xstats) implementation to support the collection and reporting of 
> latency and bit rate measurements. Latency statistics will include min, max 
> and average latency, and jitter. Bit rate statistics will include peak and 
> average bit rate aggregated over a user-defined time period. This will be 
> implemented for IXGBE and I40E.

Are they retrieved from hardware or just computed in software?
Could we have some drivers hook to compute them in a generic layer?

> Run-Time Configuration of Packet Type (PTYPE) for I40E: At the moment all 
> packet types in DPDK are statically defined. This makes impossible to add new 
> values without first defining them statically and then recompiling DPDK. The 
> ability to configure packet types at run time will be added for I40E.

The packet types are part of the API.
Although not yet convinced by the packet types, I do not understand how
to benefit from some run-time defined packet types.

> Packet Distributor Enhancements: Enhancements will be made to the Packet 
> Distributor library to improve performance:
> 1. Introduce burst functionality to allow batches of packets to be sent to 
> workers.
> 2. Improve the performance of the flow/core affinity through the use of 
> SSE/AVX instructions.

The packet distributor looks more and more like a DPDK application
at the same level of BESS, VPP, OVS, etc.

> Add MACsec for IXGBE: MACsec support will be added for IXGBE. Ethdev API 
> primitives will be added to create/delete/enable/disable SC/SA, Next_PN etc. 
> similar to those used in Linux for the macsec_ops. Sample apps (l3fwd, 
> testpmd, etc.) will be updated to support MACsec for the IXGBE. 

How the ethdev interface and the cryptodev capabilities will be linked for 
MACsec?

[...]
> Create Crypto Performance Test App: A new app, similar to testpmd, will be 
> created to allow crypto performance to be tested using any crypto PMD and any 
> supported crypto algorithm.

Good idea :)
When I read "testpmd", I tend to think that it could test any PMD,
including crypto. Are you saying that we should read dpdk-netdev-test?
And you would introduce dpdk-cryptodev-test?

[...]
> Optimize Vhost-User Performance for Large Packets: A new memory copy function 
> optimized for core-to-core memory copy which will be added. This will be 
> beneficial for virtualization cases involving large packets, but it can be 
> used for other core-to-core cases as well.

Is it an enhancement of rte_memcpy or something else?

> Support New Device Types in Vhost-User: Support will be added to vhost-user 
> for new device types including vhost-scsi and vhost-blk.

Great!
Is it only related to networking or should we expect some storage-related
code or drivers to come up?

> Interrupt Mode Support in Virtio PMD: Support for interrupt mode will be 
> added to the virtio PMD.

I guess you mean Rx interrupt in virtio PMD to avoid 100% polling in the VM?

> Virtio-User as an Alternative Exception Path: Investigate the use of 
> virtio-user and vhost-net as an alternative exception path to KNI that does 
> not require out of tree drivers. This work is still at an experimental stage, 
> so it may not be included in 17.02.

Interesting. Please share more details of the design you are thinking of.

[dpdk-dev] [PATCH v3] mk: gcc -march support for intel processors code names

2016-10-10 Thread Reshma Pattan

The GCC 4.9 -march option supports the intel code names for processors,
for example -march=silvermont, -march=broadwell.
The RTE_MACHINE config flag can be used to pass code name to
the compiler as -march flag.

Release notes is updated.

Linux and FreeBSD getting started guides are updated with recommended
gcc version as 4.9 and above.

Some of the gmake command examples in sample application guide and driver
guides are updated with gcc version as 4.9.

Signed-off-by: Reshma Pattan 
---
 doc/guides/freebsd_gsg/build_dpdk.rst| 4 ++--
 doc/guides/freebsd_gsg/build_sample_apps.rst | 6 +++---
 doc/guides/linux_gsg/sys_reqs.rst| 6 +++---
 doc/guides/nics/bnx2x.rst| 4 ++--
 doc/guides/nics/qede.rst | 2 +-
 doc/guides/rel_notes/release_16_11.rst   | 5 +
 mk/target/generic/rte.vars.mk| 4 
 7 files changed, 20 insertions(+), 11 deletions(-)

v3:
Reverted changes of mk/toolchain/gcc/rte.toolchain-compat.mk.

v2:
Updated Linux and FreeBSD gsg guides, sample application guide and other driver 
doc
with recommended gcc version as 4.9 and above.

diff --git a/doc/guides/freebsd_gsg/build_dpdk.rst 
b/doc/guides/freebsd_gsg/build_dpdk.rst
index 27f21de..24a9f87 100644
--- a/doc/guides/freebsd_gsg/build_dpdk.rst
+++ b/doc/guides/freebsd_gsg/build_dpdk.rst
@@ -88,7 +88,7 @@ The ports required and their locations are as follows:
 For compiling and using the DPDK with gcc, the compiler must be installed
 from the ports collection:

-* gcc: version 4.8 is recommended ``/usr/ports/lang/gcc48``.
+* gcc: version 4.9 is recommended ``/usr/ports/lang/gcc49``.
   Ensure that ``CPU_OPTS`` is selected (default is OFF).

 When running the make config-recursive command, a dialog may be presented to 
the
@@ -164,7 +164,7 @@ For example to compile for FreeBSD use:
If the compiler binary to be used does not correspond to that given in the
TOOLCHAIN part of the target, the compiler command may need to be explicitly
specified. For example, if compiling for gcc, where the gcc binary is called
-   gcc4.8, the command would need to be ``gmake install T= CC=gcc4.8``.
+   gcc4.9, the command would need to be ``gmake install T= CC=gcc4.9``.

 Browsing the Installed DPDK Environment Target
 --
diff --git a/doc/guides/freebsd_gsg/build_sample_apps.rst 
b/doc/guides/freebsd_gsg/build_sample_apps.rst
index 2662303..fffc4c0 100644
--- a/doc/guides/freebsd_gsg/build_sample_apps.rst
+++ b/doc/guides/freebsd_gsg/build_sample_apps.rst
@@ -54,7 +54,7 @@ the following variables must be exported:

 The following is an example of creating the ``helloworld`` application, which 
runs
 in the DPDK FreeBSD environment. While the example demonstrates compiling
-using gcc version 4.8, compiling with clang will be similar, except that the 
``CC=``
+using gcc version 4.9, compiling with clang will be similar, except that the 
``CC=``
 parameter can probably be omitted. The ``helloworld`` example may be found in 
the
 ``${RTE_SDK}/examples`` directory.

@@ -72,7 +72,7 @@ in the build directory.
 setenv RTE_SDK $HOME/DPDK
 setenv RTE_TARGET x86_64-native-bsdapp-gcc

-gmake CC=gcc48
+gmake CC=gcc49
   CC main.o
   LD helloworld
   INSTALL-APP helloworld
@@ -96,7 +96,7 @@ in the build directory.
 cd my_rte_app/
 setenv RTE_TARGET x86_64-native-bsdapp-gcc

-gmake CC=gcc48
+gmake CC=gcc49
   CC main.o
   LD helloworld
   INSTALL-APP helloworld
diff --git a/doc/guides/linux_gsg/sys_reqs.rst 
b/doc/guides/linux_gsg/sys_reqs.rst
index b321544..3d74342 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -61,8 +61,8 @@ Compilation of the DPDK

 *   coreutils: ``cmp``, ``sed``, ``grep``, ``arch``, etc.

-*   gcc: versions 4.5.x or later is recommended for ``i686/x86_64``. Versions 
4.8.x or later is recommended
-for ``ppc_64`` and ``x86_x32`` ABI. On some distributions, some specific 
compiler flags and linker flags are enabled by
+*   gcc: versions 4.9 or later is recommended for all platforms.
+On some distributions, some specific compiler flags and linker flags are 
enabled by
 default and affect performance (``-fstack-protector``, for example). 
Please refer to the documentation
 of your distribution and to ``gcc -dumpspecs``.

@@ -82,7 +82,7 @@ Compilation of the DPDK
 .. note::

 x86_x32 ABI is currently supported with distribution packages only on 
Ubuntu
-higher than 13.10 or recent Debian distribution. The only supported  
compiler is gcc 4.8+.
+higher than 13.10 or recent Debian distribution. The only supported  
compiler is gcc 4.9+.

 .. note::

diff --git a/doc/guides/nics/bnx2x.rst b/doc/guides/nics/bnx2x.rst
index 6453168..6d1768a 100644
--- a/doc/guides/nics/bnx2x.rst
+++ b/doc/guides/nics/bnx2x.rst
@@ -162,7 +162,7 @@ To compile BNX2X PMD for FreeBSD x86_64 gcc target, run the 
following "gmake"
 comm

[dpdk-dev] [PATCH] pdump: fix dir permissions value in mkdir call

2016-10-10 Thread Remy Horton


On 10/10/2016 15:35, Reshma Pattan wrote:
[..]
> Fixes: e4ffa2d3 ("pdump: fix error handlings")
> Fixes: bdd8dcc6 ("pdump: fix default socket path")
>
> Reported-by: Jianfeng Tan 
> Signed-off-by: Reshma Pattan 
> ---
>  lib/librte_pdump/rte_pdump.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Remy Horton

[dpdk-dev] [PATCH v2] log: respect rte_openlog_stream calls before rte_eal_init

2016-10-10 Thread John Ousterhout

Before this patch, application-specific loggers could not be
installed before rte_eal_init completed (the initialization process
called rte_openlog_stream, overwriting any previously installed
logger). This made it impossible for an application to capture the
initial log messages generated during rte_eal_init. This patch changes
initialization so that information from a previous call to
rte_openlog_stream is not lost. Specifically:
* The default log stream is now maintained separately from an
  application-specific log stream installed with rte_openlog_stream.
* rte_eal_common_log_init has been renamed to rte_eal_log_set_default,
  since this is all it does. It no longer invokes rte_openlog_stream; it
  just updates the default stream. Also, this method now returns void,
  rather than int, since there are no errors.
* The "early log" mechanism (e.g. rte_eal_log_early_init) has been
  removed; all of the desired functionality can be achieved by calling
  rte_eal_log_set_default.

Signed-off-by: John Ousterhout 

v2:
* Removed the early log mechanism, renamed rte_eal_common_log_init.

Note: I see from the code that Linux and BSD set different default streams:
Linux uses stdout, while BSD uses stderr. This patch retains the distinction,
though I'm not sure why it is there.
---
 lib/librte_eal/bsdapp/eal/eal.c |  3 +--
 lib/librte_eal/bsdapp/eal/eal_log.c | 11 +
 lib/librte_eal/common/eal_common_log.c  | 19 +++-
 lib/librte_eal/common/eal_private.h | 16 -
 lib/librte_eal/common/include/rte_log.h |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c   |  3 +--
 lib/librte_eal/linuxapp/eal/eal_log.c   | 40 +
 7 files changed, 17 insertions(+), 77 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index a0c8f8c..a1ef75b 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -501,8 +501,7 @@ rte_eal_init(int argc, char **argv)

thread_id = pthread_self();

-   if (rte_eal_log_early_init() < 0)
-   rte_panic("Cannot init early logs\n");
+   rte_eal_log_set_default(stderr);

eal_log_level_parse(argc, argv);

diff --git a/lib/librte_eal/bsdapp/eal/eal_log.c 
b/lib/librte_eal/bsdapp/eal/eal_log.c
index a425f7a..6b2c6da 100644
--- a/lib/librte_eal/bsdapp/eal/eal_log.c
+++ b/lib/librte_eal/bsdapp/eal/eal_log.c
@@ -44,14 +44,5 @@
 int
 rte_eal_log_init(const char *id __rte_unused, int facility __rte_unused)
 {
-   if (rte_eal_common_log_init(stderr) < 0)
-   return -1;
-   return 0;
-}
-
-int
-rte_eal_log_early_init(void)
-{
-   rte_openlog_stream(stderr);
-   return 0;
+   rte_eal_set_default(stderr);
 }
diff --git a/lib/librte_eal/common/eal_common_log.c 
b/lib/librte_eal/common/eal_common_log.c
index 967991a..2181cfa 100644
--- a/lib/librte_eal/common/eal_common_log.c
+++ b/lib/librte_eal/common/eal_common_log.c
@@ -48,11 +48,12 @@ struct rte_logs rte_logs = {
.file = NULL,
 };

+/* Stream to use for logging if rte_logs.file is NULL */
 static FILE *default_log_stream;

 /**
  * This global structure stores some informations about the message
- * that is currently beeing processed by one lcore
+ * that is currently being processed by one lcore
  */
 struct log_cur_msg {
uint32_t loglevel; /**< log level - see rte_log.h */
@@ -68,10 +69,7 @@ static RTE_DEFINE_PER_LCORE(struct log_cur_msg, log_cur_msg);
 int
 rte_openlog_stream(FILE *f)
 {
-   if (f == NULL)
-   rte_logs.file = default_log_stream;
-   else
-   rte_logs.file = f;
+   rte_logs.file = f;
return 0;
 }

@@ -127,6 +125,8 @@ rte_vlog(uint32_t level, uint32_t logtype, const char 
*format, va_list ap)
 {
int ret;
FILE *f = rte_logs.file;
+   if (f == NULL)
+   f = default_log_stream;

if ((level > rte_logs.level) || !(logtype & rte_logs.type))
return 0;
@@ -158,17 +158,14 @@ rte_log(uint32_t level, uint32_t logtype, const char 
*format, ...)
 }

 /*
- * called by environment-specific log init function
+ * Called by environment-specific initialization functions.
  */
-int
-rte_eal_common_log_init(FILE *default_log)
+void
+rte_eal_log_set_default(FILE *default_log)
 {
default_log_stream = default_log;
-   rte_openlog_stream(default_log);

 #if RTE_LOG_LEVEL >= RTE_LOG_DEBUG
RTE_LOG(NOTICE, EAL, "Debug logs available - lower performance\n");
 #endif
-
-   return 0;
 }
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 19f7535..a037994 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -47,7 +47,9 @@
 int rte_eal_memzone_init(void);

 /**
- * Common log initialization function (private to eal).
+ * Common log initialization function (private to eal).  Determines
+ * where log data is written when no call to eal_openlog_stream is
+

[dpdk-dev] [PATCH 3/4] bnxt: Add support for Async Link Notification