date:20151223

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Thomas Monjalon

2015-12-23 10:09, Yuanhan Liu:
> On Wed, Dec 23, 2015 at 09:55:54AM +0800, Peter Xu wrote:
> > On Tue, Dec 22, 2015 at 04:38:30PM +, Xie, Huawei wrote:
> > > On 12/22/2015 7:39 PM, Peter Xu wrote:
> > > > I tried to unbind one of the virtio net device, I see the PCI entry
> > > > still there.
> > > >
> > > > Before unbind:
> > > >
> > > > [root at vm proc]# lspci -k -s 00:03.0
> > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > > > Subsystem: Red Hat, Inc Device 0001
> > > > Kernel driver in use: virtio-pci
> > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> > > >   c060-c07f : :00:03.0
> > > > c060-c07f : virtio-pci
> > > >
> > > > After unbind:
> > > >
> > > > [root at vm proc]# lspci -k -s 00:03.0
> > > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > > > Subsystem: Red Hat, Inc Device 0001
> > > > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> > > >   c060-c07f : :00:03.0
> > > >
> > > > So... does this means that it is an alternative to black list
> > > > solution?
> > > Oh, we could firstly check if this port is manipulated by kernel driver
> > > in virtio_resource_init/eth_virtio_dev_init, as long as it is not too 
> > > late.
> 
> Why can't we simply quit at pci_scan_one, once finding that it's not
> bond to uio (or similar stuff)? That would be generic enough, that we
> don't have to do similar checks for each new pmd driver.
> 
> Or, am I missing something?

UIO is not needed to make virtio works (without interrupt support).
Sometimes it may be required to avoid using kernel modules.

> > I guess there might be two problems? Which are:
> > 
> > 1. How user avoid DPDK taking over virtio devices that they do not
> >want for IO (chooses which device to use)
> 
> Isn't that what's the 'binding/unbinding' for?

Binding is, sometimes, required.
But does it mean DPDK should use every available ports?
That's the default and may be configured with blacklist/whitelist.

> > 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in
> >kernel (happens on every virtio device that DPDK uses)
> 
> If you unbinded the kernel driver first, which is the suggested (or
> must?) way to use DPDK, that will not happen.

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Thomas Monjalon

2015-12-23 05:13, Xie, Huawei:
> On 12/23/2015 10:57 AM, Yuanhan Liu wrote:
> > On Wed, Dec 23, 2015 at 10:41:57AM +0800, Peter Xu wrote:
> >> On Wed, Dec 23, 2015 at 10:01:35AM +0800, Yuanhan Liu wrote:
> >>> On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote:
>  On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote:
> > Actually, you are right. I mentioned in the last email that this is
> > for configuration part. To answer your question in this email, you
> > will not be able to go that further (say initiating virtio pmd) if
> > you don't unbind the origin virtio-net driver, and bind it to igb_uio
> > (or something similar).
> >
> > The start point is from rte_eal_pci_scan, where the sub-function
> > pci_san_one just initates a DPDK bond driver.
>  I am not sure whether I do understand your meaning correctly
>  (regarding "you willl not be able to go that furture"): The problem
>  is that, we _can_ run testpmd without unbinding the ports and bind
>  to UIO or something. What we need to do is boot the guest, reserve
>  huge pages, and run testpmd (keeping its kernel driver as
>  "virtio-pci"). In pci_scan_one():
> 
>   if (!ret) {
>   if (!strcmp(driver, "vfio-pci"))
>   dev->kdrv = RTE_KDRV_VFIO;
>   else if (!strcmp(driver, "igb_uio"))
>   dev->kdrv = RTE_KDRV_IGB_UIO;
>   else if (!strcmp(driver, "uio_pci_generic"))
>   dev->kdrv = RTE_KDRV_UIO_GENERIC;
>   else
>   dev->kdrv = RTE_KDRV_UNKNOWN;
>   } else
>   dev->kdrv = RTE_KDRV_UNKNOWN;
> 
>  I think it should be going to RTE_KDRV_UNKNOWN
>  (driver=="virtio-pci") here.
> >>> Sorry, I simply overlook that. I was thinking it will quit here for
> >>> the RTE_KDRV_UNKNOWN case.
> >>>
>  I tried to run IO and it could work,
>  but I am not sure whether it is safe, and how.
> >>> I also did a quick test then, however, with the virtio 1.0 patchset
> >>> I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to
> >>> pci_map_device() failure and virtio pmd is not initiated at all.
> >> Then, will the patch work with ioport way to access virtio devices?
> > Yes.
> >
>  Also, I am not sure whether I need to (at least) unbind the
>  virtio-pci driver, so that there should have no kernel driver
>  running for the virtio device before DPDK using it.
> >>> Why not? That's what the DPDK document asked to do
> >>> (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html):
> >>>
> >>> 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules
> >>> 
> >>> As of release 1.4, DPDK applications no longer automatically unbind
> >>> all supported network ports from the kernel driver in use. Instead,
> >>> all ports that are to be used by an DPDK application must be bound
> >>> to the uio_pci_generic, igb_uio or vfio-pci module before the
> >>> application is run. Any network ports under Linux* control will be
> >>> ignored by the DPDK poll-mode drivers and cannot be used by the
> >>> application.
> >> This seems obsolete? since it's not covering ioport.
> > I don't think so. Above is for how to run DPDK applications. ioport
> > is just a (optional) way to access PCI resource in a specific PMD.
> >
> > And, above speicification avoids your concerns, that two drivers try
> > to manipulate same device concurrently, doesn't it?
> >
> > And, it is saying "any network ports under Linux* control will be
> > ignored by the DPDK poll-mode drivers and cannot be used by the
> > application", so that the case you were saying that virtio pmd
> > continues to work without the bind looks like a bug to me.
> >
> > Can anyone confirm that?
> 
> That document isn't accurate. virtio doesn't require binding to UIO
> driver if it uses PORT IO. The PORT IO commit said it is because UIO
> isn't secure, but avoid using uio doesn't bring more security as virtio
> PMD still could ask device to DMA into any memory.
> The thing we at least we might do is fail in virtio_resource_init if
> kernel driver is still manipulating this device. This saves the effort
> users use blacklist option and avoids the driver conflict.

+1 for checking kernel driver in use

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-23 Thread Thomas Monjalon

2015-12-23 10:44, Yuanhan Liu:
> On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu  > linux.intel.com>
> > wrote:
> > 
> > On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> > > The queue state change callback is the one new API that needs to be
> > > added because
> > > normal NICs don't have this behavior.
> > 
> > Again I'd ask, will vring_state_changed() be enough, when above issues
> > are resolved: vring_state_changed() will be invoked at new_device()/
> > destroy_device(), and of course, ethtool change?
> > 
> > 
> > It would be sufficient. It is not a great API though, because it requires 
> > the
> > application to do the conversion from struct virtio_net to a DPDK port 
> > number,
> > and from a virtqueue index to a DPDK queue id and direction. Also, the 
> > current
> > implementation often makes this callback when the vring state has not 
> > actually
> > changed (enabled -> enabled and disabled -> disabled).
> > 
> > If you're asking about using vring_state_changed() _instead_ of the link 
> > status
> > event and rte_eth_dev_socket_id(),
> 
> No, I like the idea of link status event and rte_eth_dev_socket_id();
> I was just wondering why a new API is needed. Both Tetsuya and I
> were thinking to leverage the link status event to represent the
> queue stats change (triggered by vring_state_changed()) as well,
> so that we don't need to introduce another eth event. However, I'd
> agree that it's better if we could have a new dedicate event.
> 
> Thomas, here is some background for you. For vhost pmd and linux
> virtio-net combo, the queue can be dynamically changed by ethtool,
> therefore, the application wishes to have another eth event, say
> RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
> add/remove corresponding queue to the datapath when that happens.
> What do you think of that?

Yes it is an event. So I don't understand the question.
What may be better than a specific rte_eth_event_type?

[dpdk-dev] [PATCH] mk: Fix examples install path

2015-12-23 Thread Thomas Monjalon

Hi,

2015-12-22 14:13, Christian Ehrhardt:
> Depending on non-doc targets being built before and the setting of DESTDIR
> the examples dir could in some cases not end up in the right target.
> Reason is just a typo variable reference in the copy target.
[...]
> - $(Q)cp -a $(RTE_SDK)/examples $(DESTDIR)$(datadir)
> + $(Q)cp -a $(RTE_SDK)/examples $(DESTDIR)$(docdir)

No, it was not a typo.
Do you really think the examples code should be in the doc dir
(i.e. /usr/share/doc/dpdk) instead of datadir (i.e. /usr/share/dpdk)?

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-23 Thread Rich Lane

On Wed, Dec 23, 2015 at 7:09 PM, Tetsuya Mukawa  wrote:

> On 2015/12/22 13:47, Rich Lane wrote:
> > On Mon, Dec 21, 2015 at 7:41 PM, Yuanhan Liu <
> yuanhan.liu at linux.intel.com>
> > wrote:
> >
> >> On Fri, Dec 18, 2015 at 10:01:25AM -0800, Rich Lane wrote:
> >>> I'm using the vhost callbacks and struct virtio_net with the vhost PMD
> >> in a few
> >>> ways:
> >> Rich, thanks for the info!
> >>
> >>> 1. new_device/destroy_device: Link state change (will be covered by the
> >> link
> >>> status interrupt).
> >>> 2. new_device: Add first queue to datapath.
> >> I'm wondering why vring_state_changed() is not used, as it will also be
> >> triggered at the beginning, when the default queue (the first queue) is
> >> enabled.
> >>
> > Turns out I'd misread the code and it's already using the
> > vring_state_changed callback for the
> > first queue. Not sure if this is intentional but vring_state_changed is
> > called for the first queue
> > before new_device.
> >
> >
> >>> 3. vring_state_changed: Add/remove queue to datapath.
> >>> 4. destroy_device: Remove all queues (vring_state_changed is not called
> >> when
> >>> qemu is killed).
> >> I had a plan to invoke vring_state_changed() to disable all vrings
> >> when destroy_device() is called.
> >>
> > That would be good.
> >
> >
> >>> 5. new_device and struct virtio_net: Determine NUMA node of the VM.
> >> You can get the 'struct virtio_net' dev from all above callbacks.
> >
> >
> >> 1. Link status interrupt.
> >>
> >> To vhost pmd, new_device()/destroy_device() equals to the link status
> >> interrupt, where new_device() is a link up, and destroy_device() is link
> >> down().
> >>
> >>
> >>> 2. New queue_state_changed callback. Unlike vring_state_changed this
> >> should
> >>> cover the first queue at new_device and removal of all queues at
> >>> destroy_device.
> >> As stated above, vring_state_changed() should be able to do that, except
> >> the one on destroy_device(), which is not done yet.
> >>
> >>> 3. Per-queue or per-device NUMA node info.
> >> You can query the NUMA node info implicitly by get_mempolicy(); check
> >> numa_realloc() at lib/librte_vhost/virtio-net.c for reference.
> >>
> > Your suggestions are exactly how my application is already working. I was
> > commenting on the
> > proposed changes to the vhost PMD API. I would prefer to
> > use RTE_ETH_EVENT_INTR_LSC
> > and rte_eth_dev_socket_id for consistency with other NIC drivers, instead
> > of these vhost-specific
> > hacks. The queue state change callback is the one new API that needs to
> be
> > added because
> > normal NICs don't have this behavior.
> >
> > You could add another rte_eth_event_type for the queue state change
> > callback, and pass the
> > queue ID, RX/TX direction, and enable bit through cb_arg.
>
> Hi Rich,
>
> So far, EAL provides rte_eth_dev_callback_register() for event handling.
> DPDK app can register callback handler and "callback argument".
> And EAL will call callback handler with the argument.
> Anyway, vhost library and PMD cannot change the argument.
>

You're right, I'd mistakenly thought that the PMD controlled the void *
passed to the callback.

Here's a thought:

struct rte_eth_vhost_queue_event {
uint16_t queue_id;
bool rx;
bool enable;
};

int rte_eth_vhost_get_queue_event(uint8_t port_id, struct
rte_eth_vhost_queue_event *event);

On receiving the ethdev event the application could repeatedly call
rte_eth_vhost_get_queue_event
to find out what happened.

An issue with having the application dig into struct virtio_net is that it
can only be safely accessed from
a callback on the vhost thread. A typical application running its control
plane on lcore 0 would need to
copy all the relevant info from struct virtio_net before sending it over.

As you mentioned, queues for a single vhost port could be located on
different NUMA nodes. I think this
is an uncommon scenario but if needed you could add an API to retrieve the
NUMA node for a given port
and queue.

[dpdk-dev] [PATCH v3 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

2015-12-23 Thread linhaifeng



>  
> + if (unlikely(alloc_err)) {
> + uint16_t i = entry_success;
> +
> + m->nb_segs = seg_num;
> + for (; i < free_entries; i++)
> + rte_pktmbuf_free(pkts[entry_success]); -> 
> rte_pktmbuf_free(pkts[i]);
> + }
> +
>   rte_compiler_barrier();
>   vq->used->idx += entry_success;
>   /* Kick guest if required. */
>

[dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-23 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Wednesday, December 23, 2015 6:38 PM
> To: Xie, Huawei
> Cc: dev at dpdk.org; dprovan at bivio.net
> Subject: Re: [dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk 
> API
> 
> On Wed, 23 Dec 2015 00:17:53 +0800
> Huawei Xie  wrote:
> 
> > +
> > +   rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
> > +   if (unlikely(rc))
> > +   return rc;
> > +
> > +   switch (count % 4) {
> > +   case 0: while (idx != count) {
> > +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> > +   rte_mbuf_refcnt_set(mbufs[idx], 1);
> > +   rte_pktmbuf_reset(mbufs[idx]);
> > +   idx++;
> > +   case 3:
> > +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> > +   rte_mbuf_refcnt_set(mbufs[idx], 1);
> > +   rte_pktmbuf_reset(mbufs[idx]);
> > +   idx++;
> > +   case 2:
> > +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> > +   rte_mbuf_refcnt_set(mbufs[idx], 1);
> > +   rte_pktmbuf_reset(mbufs[idx]);
> > +   idx++;
> > +   case 1:
> > +   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> > +   rte_mbuf_refcnt_set(mbufs[idx], 1);
> > +   rte_pktmbuf_reset(mbufs[idx]);
> > +   idx++;
> > +   }
> > +   }
> > +   return 0;
> > +}
> 
> Since function will not work if count can not be 0 (otherwise 
> rte_mempool_get_bulk will fail),

As I understand, rte_mempool_get_bulk() will work correctly and return 0, if 
count==0.
That's why Huawei prefers while() {}, instead of do {} while() - to avoid extra 
check for
(count != 0) at the start. 
Konstantin


> why not:
>   1. Document that assumption
>   2. Use that assumption to speed up code.
> 
> 
> 
>   switch(count % 4) {
>   do {
>   case 0:
>   ...
>   case 1:
>   ...
>   } while (idx != count);
>   }
> 
> Also you really need to add a big block comment about this loop, to explain
> what it does and why.

[dpdk-dev] [RFC v2 2/2] testpmd: add an example to show packet filter flow

2015-12-23 Thread Rahul Lakkireddy

Extend the existing flow_director_filter to add support for packet
filter flow.  Also shows how to pass the extra behavior arguments
to rewrite fields in matched filter rules.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v2:
1. Added new field filter-type to allow specifying maskfull vs maskless
   filter types.
2. Added new field filter-prio to allow specifying the priority between
   maskfull and maskless filters i.e. if we have a maskfull and a
   maskless filter both of which can match a single traffic pattern
   then, which one takes the priority is determined by filter-prio.
3. Added new field flow-label to be matched for ipv6.
4. Added new mac-swap behavior argument.


 app/test-pmd/cmdline.c | 528 -
 1 file changed, 520 insertions(+), 8 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..3402f2c 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -641,7 +641,7 @@ static void cmd_help_long_parsed(void *parsed_result,
" flow (ipv4-other|ipv4-frag|ipv6-other|ipv6-frag)"
" src (src_ip_address) dst (dst_ip_address)"
" vlan (vlan_value) flexbytes (flexbytes_value)"
-   " (drop|fwd) pf|vf(vf_id) queue (queue_id)"
+   " (drop|fwd|switch) pf|vf(vf_id) queue (queue_id)"
" fd_id (fd_id_value)\n"
"Add/Del an IP type flow director filter.\n\n"

@@ -650,7 +650,7 @@ static void cmd_help_long_parsed(void *parsed_result,
" src (src_ip_address) (src_port)"
" dst (dst_ip_address) (dst_port)"
" vlan (vlan_value) flexbytes (flexbytes_value)"
-   " (drop|fwd) pf|vf(vf_id) queue (queue_id)"
+   " (drop|fwd|switch) pf|vf(vf_id) queue (queue_id)"
" fd_id (fd_id_value)\n"
"Add/Del an UDP/TCP type flow director filter.\n\n"

@@ -659,16 +659,41 @@ static void cmd_help_long_parsed(void *parsed_result,
" src (src_ip_address) (src_port)"
" dst (dst_ip_address) (dst_port)"
" tag (verification_tag) vlan (vlan_value)"
-   " flexbytes (flexbytes_value) (drop|fwd)"
+   " flexbytes (flexbytes_value) (drop|fwd|switch)"
" pf|vf(vf_id) queue (queue_id) fd_id (fd_id_value)\n"
"Add/Del a SCTP type flow director filter.\n\n"

"flow_director_filter (port_id) mode IP 
(add|del|update)"
" flow l2_payload ether (ethertype)"
-   " flexbytes (flexbytes_value) (drop|fwd)"
+   " flexbytes (flexbytes_value) (drop|fwd|switch)"
" pf|vf(vf_id) queue (queue_id) fd_id (fd_id_value)\n"
"Add/Del a l2 payload type flow director 
filter.\n\n"

+   "flow_director_filter (port_id) mode IP 
(add|del|update)"
+   " flow (ipv4-tcp-pkt-filter|ipv4-udp-pkt-filter"
+   " ipv6-tcp-pkt-filter|ipv6-udp-pkt-filter)"
+   " filter-type maskfull|maskless"
+   " filter-prio default|maskfull|maskless"
+   " ingress-port (port_id) (port_id_mask)"
+   " ether (ethertype) (ethertype_mask)"
+   " inner-vlan (inner_vlan_value) (inner_vlan_mask)"
+   " outer-vlan (outer_vlan_value) (outer_vlan_mask)"
+   " tos (tos_value) (tos_mask)"
+   " flow-label (flow_label_value) (flow_label_mask)"
+   " proto (proto_value) (proto_mask)"
+   " ttl (ttl_value) (ttl_mask)"
+   " src (src_ip) (src_ip_mask) (src_port) (src_port_mask)"
+   " dst (dst_ip) (dst_ip_mask) (dst_port) (dst_port_mask)"
+   " flexbytes (flexbytes_value) (drop|fwd|switch)"
+   " pf|vf(vf_id) queue (queue_id)"
+   " port-arg none|port-redirect (dst-port-id)"
+   " mac-arg none|mac-rewrite|mac-swap (src-mac) (dst-mac)"
+   " vlan-arg none|vlan-rewrite|vlan-del (vlan_value)"
+   " nat-arg none|nat-rewrite"
+   " src (src_ip) (src_port) dst (dst_ip) (dst_port)"
+   " fd_id (fd_id_value)\n"
+   "Add/Del a packet filter type flow director 
filter.\n\n"
+
"flow_director_filter (port_id) mode MAC-VLAN 
(add|del|update)"
" mac (mac_address) vlan (vlan_value)"
" flexbytes (flexbytes_value) (drop|fwd)"
@@ -7973,14 +7998,44 @@ struct cmd_flow_dir

[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir

2015-12-23 Thread Rahul Lakkireddy

Add a new packet filter flow that allows filtering a packet based on
matching ingress port, ethertype, vlan, ip, and tcp/udp fields, i.e.
matching based on any or all fields at the same time.

Add the ability to provide masks for fields in flow to allow range of
values. Allow selection of maskfull vs maskless filter types. Provide
mechanism to set priority to maskfull vs maskless filter types when
packet matches several filter types.

Add a new vlan flow containing inner and outer vlan to match. Add tos,
proto, and ttl fields that can be matched for ipv4 flow.  Add tc,
flow_label, next_header, and hop_limit fields that can be matched for
ipv6 flow.

Add a new behavior switch.

Add the ability to provide behavior arguments to allow insert/deletion/
swapping of matched fields in the flow.  Useful when rewriting matched
fields with new values.  Adds arguments for port, mac, vlan, and nat.
Ex: allows to provide new ip and port addresses to rewrite the fields
of packets matching a filter rule before NAT'ing.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Kumar Sanghvi 
---
v2:
1. Added ttl to rte_eth_ipv4_flow and tc, flow_label, next_header,
   and hop_limit to rte_eth_ipv6_flow.

2. Added new field type to rte_eth_pkt_filter_flow to differentiate
   between maskfull and maskless filter types.

3. Added new field prio to rte_eth_pkt_filter_flow to allow setting
   priority over maskfull or maskless when packet matches multiple
   filter types.

4. Added new behavior sub op RTE_FDIR_BEHAVIOR_SUB_OP_SWAP to allow
   swapping fields in matched flows. Useful when swapping mac addresses
   in hardware before switching.

 lib/librte_ether/rte_eth_ctrl.h | 127 +++-
 1 file changed, 126 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index ce224ad..5cc22a0 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -74,7 +74,11 @@ extern "C" {
 #define RTE_ETH_FLOW_IPV6_EX15
 #define RTE_ETH_FLOW_IPV6_TCP_EX16
 #define RTE_ETH_FLOW_IPV6_UDP_EX17
-#define RTE_ETH_FLOW_MAX18
+#define RTE_ETH_FLOW_PKT_FILTER_IPV4_TCP 18
+#define RTE_ETH_FLOW_PKT_FILTER_IPV4_UDP 19
+#define RTE_ETH_FLOW_PKT_FILTER_IPV6_TCP 20
+#define RTE_ETH_FLOW_PKT_FILTER_IPV6_UDP 21
+#define RTE_ETH_FLOW_MAX22

 /**
  * Feature filter types
@@ -407,6 +411,9 @@ struct rte_eth_l2_flow {
 struct rte_eth_ipv4_flow {
uint32_t src_ip;  /**< IPv4 source address to match. */
uint32_t dst_ip;  /**< IPv4 destination address to match. */
+   uint8_t tos;  /**< IPV4 type of service to match. */
+   uint8_t proto;/**< IPV4 proto to match. */
+   uint8_t ttl;  /**< IPV4 time to live to match. */
 };

 /**
@@ -443,6 +450,10 @@ struct rte_eth_sctpv4_flow {
 struct rte_eth_ipv6_flow {
uint32_t src_ip[4];  /**< IPv6 source address to match. */
uint32_t dst_ip[4];  /**< IPv6 destination address to match. */
+   uint8_t  tc; /**< IPv6 traffic class to match. */
+   uint32_t flow_label; /**< IPv6 flow label to match. */
+   uint8_t  next_header;/**< IPv6 next header to match. */
+   uint8_t  hop_limit;  /**< IPv6 hop limits to match. */
 };

 /**
@@ -500,6 +511,51 @@ struct rte_eth_tunnel_flow {
 };

 /**
+ * A structure used to define the input for vlan flow.
+ */
+struct rte_eth_vlan_flow {
+   uint16_t inner_vlan;  /**< Inner vlan field to match. */
+   uint16_t outer_vlan;  /**< Outer vlan field to match. */
+};
+
+/**
+ * A union used to define the input for N-Tuple flow
+ */
+union rte_eth_ntuple_flow {
+   struct rte_eth_tcpv4_flow  tcp4;
+   struct rte_eth_udpv4_flow  udp4;
+   struct rte_eth_tcpv6_flow  tcp6;
+   struct rte_eth_udpv6_flow  udp6;
+};
+
+/**
+ * A structure used to define the input for packet filter.
+ */
+struct rte_eth_pkt_filter {
+   uint8_t port_id; /**< Port id to match. */
+   struct rte_eth_l2_flowl2_flow;   /**< L2 flow fields to match. */
+   struct rte_eth_vlan_flow  vlan_flow; /**< Vlan flow fields to match. */
+   union rte_eth_ntuple_flow ntuple_flow;
+   /**< N-tuple flow fields to match. */
+};
+
+/**
+ * A structure used to define the input for packet filter flow.
+ */
+enum rte_eth_pkt_filter_type {
+   RTE_ETH_PKT_FILTER_TYPE_MASKLESS = 0, /**< Ignore masks in the flow */
+   RTE_ETH_PKT_FILTER_TYPE_MASKFULL, /**< Consider masks in the flow */
+};
+
+struct rte_eth_pkt_filter_flow {
+   enum rte_eth_pkt_filter_type type;   /**< Type of filter */
+   enum rte_eth_pkt_filter_type prio;
+   /**< Prioritize the filter type when a packet matches several types */
+   struct rte_eth_pkt_filter pkt;  /**< Packet fields to match. */
+   struct rte_eth_pkt_filter mask; /**< Mask for matched fields. */
+};
+
+/**
  * An union conta

[dpdk-dev] [RFC v2 0/2] ethdev: Enhancements to flow director filter

2015-12-23 Thread Rahul Lakkireddy

This RFC series of patches attempt to extend the flow director filter to
add support for Chelsio T5 hardware filtering capabilities.

Chelsio T5 supports carrying out filtering in hardware which supports 3
actions to carry out on a packet which hit a filter viz.

1. Action Pass - Packets hitting a filter rule can be directed to a
   particular RXQ.

2. Action Drop - Packets hitting a filter rule are dropped in h/w.

3. Action Switch - Packets hitting a filter rule can be switched in h/w
   from one port to another, without involvement of host.  Also, the
   action Switch also supports rewrite of src-mac/dst-mac headers as
   well as rewrite of vlan headers.  It also supports rewrite of IP
   headers and thereby, supports NAT (Network Address Translation)
   in h/w.

Also, each filter rule can optionally support specifying a mask value
i.e. it's possible to create a filter rule for an entire subnet of IP
addresses or a range of tcp/udp ports, etc.

Patch 1 does the following:
- Adds an additional flow rte_eth_pkt_filter_flow which encapsulates
  ingress ports, l2 payload, vlan and ntuples.
- Adds an additional mask for the flow to allow range of values to be
  matched.
- Adds an ability to set both filters with masks (Maskfull) and
  without masks (Maskless).  Also allow prioritizing one of these
  filter types over the other when a packet matches several types.
- Adds a new behavior 'switch'.
- Adds behavior arguments that can be passed when a particular behavior
  is taken.  For ex: in case of action 'switch', pass additional 4-tuple
  to allow rewriting src/dst ip and port addresses to support NAT'ing.

Patch 2 shows testpmd command line example to support packet filter
flow.

The patch series has been compile tested on all x86 gcc targets and the
current fdir filter supported drivers seem to return appropriate error
codes when this new flow type and the new action are not supported and
hence are not affected.

Posting this series mainly for discussion on API change. Once this is
agreeable then, I will post the cxgbe PMD changes to use the new API.

---
v2:
1. Added ttl to rte_eth_ipv4_flow and tc, flow_label, next_header,
   and hop_limit to rte_eth_ipv6_flow.

2. Added new field type to rte_eth_pkt_filter_flow to differentiate
   between maskfull and maskless filter types.

3. Added new field prio to rte_eth_pkt_filter_flow to allow setting
   priority over maskfull or maskless when packet matches multiple
   filter types.

4. Added new behavior sub op RTE_FDIR_BEHAVIOR_SUB_OP_SWAP to allow
   swapping fields in matched flows. For ex, useful when swapping mac
   addresses in hardware before switching.

5. Updated the testpmd example to reflect the above new changes.

6. Dropped Patch 3 since the ABI announcement has already been merged.

Rahul Lakkireddy (2):
  ethdev: add packet filter flow and new behavior switch to fdir
  testpmd: add an example to show packet filter flow

 app/test-pmd/cmdline.c  | 528 +++-
 lib/librte_ether/rte_eth_ctrl.h | 127 +-
 2 files changed, 646 insertions(+), 9 deletions(-)

-- 
2.5.3

[dpdk-dev] [PATCH] doc: add Vector FM10K introductions

2015-12-23 Thread Chen Jing D(Mark)

From: "Chen Jing D(Mark)" 

Add introductions on how to enable Vector FM10K Rx/Tx functions,
the preconditions and assumptions on Rx/Tx configuration parameters.
The new content also lists the limitations of vector, so app/customer
can do better to select best Rx/Tx functions.

Signed-off-by: Chen Jing D(Mark) 
---
 doc/guides/nics/fm10k.rst |   89 +
 1 files changed, 89 insertions(+), 0 deletions(-)

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index 4206b7f..54b761c 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -34,6 +34,95 @@ FM10K Poll Mode Driver
 The FM10K poll mode driver library provides support for the Intel FM1
 (FM10K) family of 40GbE/100GbE adapters.

+Vector PMD for FM10K
+
+Vector PMD uses Intel? SIMD instructions to optimize packet I/O.
+It improves load/store bandwidth efficiency of L1 data cache by using a wider
+SSE/AVX register 1 (1).
+The wider register gives space to hold multiple packet buffers so as to save
+instruction number when processing bulk of packets.
+
+There is no change to PMD API. The RX/TX handler are the only two entries for
+vPMD packet I/O. They are transparently registered at runtime RX/TX execution
+if all condition checks pass.
+
+1.  To date, only an SSE version of FM10K vPMD is available.
+To ensure that vPMD is in the binary code, ensure that the option
+CONFIG_RTE_LIBRTE_FM10K_INC_VECTOR=y is in the configure file.
+
+Some constraints apply as pre-conditions for specific optimizations on bulk
+packet transfers. The following sections explain RX and TX constraints in the
+vPMD.
+
+RX Constraints
+~~
+
+Prerequisites and Pre-conditions
+
+Number of descriptor ring must be power of 2. This is the assumptions for
+Vector RX. With this pre-condition, ring pointer can easily scroll back to head
+after hitting tail without conditional check. Besides that, Vector RX can use
+it to do bit mask by ``ring_size - 1``.
+
+Feature not Supported by Vector RX PMD
+^^
+Some features are not supported when trying to increase the throughput in vPMD.
+They are:
+
+*   IEEE1588
+
+*   FDIR
+
+*   Header split
+
+*   RX checksum offload
+
+Other features are supported using optional MACRO configuration. They include:
+
+*   HW VLAN strip
+
+*   L3/L4 packet type
+
+To enabled by RX_OLFLAGS (RTE_LIBRTE_FM10K_RX_OLFLAGS_ENABLE=y)
+
+To guarantee the constraint, configuration flags in dev_conf.rxmode will be
+checked:
+
+*   hw_vlan_extend
+
+*   hw_ip_checksum
+
+*   header_split
+
+*   fdir_conf->mode
+
+RX Burst Size
+^
+
+As vPMD is focused on high throughput, which processes 4 packets at a time.
+So it assumes that the RX burst should be greater than 4 per burst. It returns
+zero if using nb_pkt < 4 in the receive handler. If nb_pkt is not multiple of
+4, a floor alignment will be applied.
+
+TX Constraint
+~
+
+Feature not Supported by TX Vector PMD
+^^
+
+TX vPMD only works when txq_flags is set to FM10K_SIMPLE_TX_FLAG.
+This means that it does not support TX multi-segment, VLAN offload and TX csum
+offload. The following MACROs are used for these three features:
+
+*   ETH_TXQ_FLAGS_NOMULTSEGS
+
+*   ETH_TXQ_FLAGS_NOVLANOFFL
+
+*   ETH_TXQ_FLAGS_NOXSUMSCTP
+
+*   ETH_TXQ_FLAGS_NOXSUMUDP
+
+*   ETH_TXQ_FLAGS_NOXSUMTCP

 Limitations
 ---
-- 
1.7.7.6

[dpdk-dev] [RFC PATCH 6/6] driver/i40e: tunnel configure in i40e

2015-12-23 Thread Jijiang Liu

Add i40e_udp_tunnel_flow_configre() to implement the configuration of flow rule 
with 'src IP, dst IP, src port, dst port and tunnel ID' using flow director.

Signed-off-by: Jijiang Liu 
---
 drivers/net/i40e/i40e_ethdev.c |   41 
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 7e03a1f..7d8c8d7 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -469,6 +469,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
.rss_hash_conf_get= i40e_dev_rss_hash_conf_get,
.udp_tunnel_add   = i40e_dev_udp_tunnel_add,
.udp_tunnel_del   = i40e_dev_udp_tunnel_del,
+   .tunnel_configure = i40e_dev_tunnel_configure,
.filter_ctrl  = i40e_dev_filter_ctrl,
.rxq_info_get = i40e_rxq_info_get,
.txq_info_get = i40e_txq_info_get,
@@ -6029,6 +6030,46 @@ i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
return ret;
 }

+static int
+i40e_udp_tunnel_flow_configre(struct i40e_pf *pf, rte_eth_tunnel_conf 
*tunnel_conf)
+{
+   int  idx, ret;
+   uint8_t filter_idx;
+   struct i40e_hw *hw = I40E_PF_TO_HW(pf);
+
+   /* set filter with src IP + dst IP + src port + dst port + tunnel id*/
+   /* flow director setting */ 
+   
+   return 0;
+}
+
+/* Add UDP tunneling port */
+static int
+i40e_dev_tunnel_conf(struct rte_eth_dev *dev,
+struct rte_eth_tunnel_conf *tunnel_conf)
+{
+   int ret = 0;
+   struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+
+   if (tunnel_tunnel == NULL)
+   return -EINVAL;
+
+   switch (udp_tunnel->prot_type) {
+   case RTE_TUNNEL_TYPE_VXLAN:
+   case RTE_TUNNEL_TYPE_GENEVE:
+   case RTE_TUNNEL_TYPE_TEREDO:
+   ret = i40e_udp_tunnel_flow_configure(pf, tunnel_conf);
+   break;
+
+   default:
+   PMD_DRV_LOG(ERR, "Invalid tunnel type");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
 /* Calculate the maximum number of contiguous PF queues that are configured */
 static int
 i40e_pf_calc_configured_queues_num(struct i40e_pf *pf)
-- 
1.7.7.6

[dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs

2015-12-23 Thread Jijiang Liu

Using SIMD instruction to accelarate encapsulation operation.

Signed-off-by: Jijiang Liu 
---
 lib/librte_ether/libtunnel/rte_vxlan_opt.c |  251 
 1 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c

diff --git a/lib/librte_ether/libtunnel/rte_vxlan_opt.c 
b/lib/librte_ether/libtunnel/rte_vxlan_opt.c
new file mode 100644
index 000..e59ed2c
--- /dev/null
+++ b/lib/librte_ether/libtunnel/rte_vxlan_opt.c
@@ -0,0 +1,251 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "vxlan_opt.h"
+
+#ifndef __INTEL_COMPILER
+#pragma GCC diagnostic ignored "-Wcast-qual"
+#endif
+
+#pragma GCC diagnostic ignored "-Wstrict-aliasing"
+
+#define PORT_MIN49152
+#define PORT_MAX65535
+#define PORT_RANGE ((PORT_MAX - PORT_MIN) + 1)
+
+#define DUMMY_FOR_TEST
+#define RTE_DEFAULT_VXLAN_PORT 4789
+ 
+#define LOOP   4
+#define MAC_LEN6
+#define PREFIX ETHER_HDR_LEN + 4
+#define UDP_PRE_SZ (sizeof(struct udp_hdr) + sizeof(struct vxlan_hdr))
+#define IP_PRE_SZ  (UDP_PRE_SZ + sizeof(struct ipv4_hdr))
+#define VXLAN_PKT_HDR_SIZE   (IP_PRE_SZ + ETHER_HDR_LEN)
+ 
+#define VXLAN_SIZE sizeof(struct vxlan_hdr)
+#define INNER_PRE_SZ   (14 + 20 + 8 + 8)
+#define DECAP_OFFSET   (16 + 8 + 8)
+#define DETECT_OFFSET  12
+
+struct eth_pkt_info {
+   uint8_t l2_len;
+   uint16_t ethertype;
+   uint16_t l3_len;
+   uint16_t l4_proto;
+   uint16_t l4_len;
+};
+
+/* 16Bytes tx meta data */
+struct vxlan_tx_meta {
+   uint32_t sip;
+   uint32_t dip;
+   uint32_t vni;
+   uint16_t sport;
+} __attribute__((__aligned__(16)));
+
+
+/* Parse an IPv4 header to fill l3_len, l4_len, and l4_proto */
+static void
+parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct eth_pkt_info *info)
+{
+   struct tcp_hdr *tcp_hdr;
+
+   info->l3_len = (ipv4_hdr->version_ihl & 0x0f) * 4;
+   info->l4_proto = ipv4_hdr->next_proto_id;
+
+   /* only fill l4_len for TCP, it's useful for TSO */
+   if (info->l4_proto == IPPROTO_TCP) {
+   tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + info->l3_len);
+   info->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+   } else
+   info->l4_len = 0;
+}
+
+/* Parse an IPv6 header to fill l3_len, l4_len, and l4_proto */
+static void
+parse_ipv6(struct ipv6_hdr *ipv6_hdr, struct eth_pkt_info *info)
+{
+   struct tcp_hdr *tcp_hdr;
+
+   info->l3_len = sizeof(struct ipv6_hdr);
+   info->l4_proto = ipv6_hdr->proto;
+
+   /* only fill l4_len for TCP, it's useful for TSO */
+   if (info->l4_proto == IPPROTO_TCP) {
+   tcp_hdr = (struct tcp_hdr *)((char *)ipv6_hdr + info->l3_len);
+   info->l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
+   } else
+   info->l4_len = 0;
+}
+
+/*
+ * Parse an ethernet header to fill the ethertype, l2_len, l3_len and
+ * ipproto. This function is able to recognize IPv4/IPv6 with one optional vlan
+ * header. The l4_len argument is only set in case of TCP (useful for TSO).
+ */
+static void
+parse_ethernet(struct ether_hdr *eth_hdr, struct eth_pkt_info *info)
+{

[dpdk-dev] [RFC PATCH 4/6] rte_ether: define rte_eth_vxlan_decap and rte_eth_vxlan_encap

2015-12-23 Thread Jijiang Liu

This function parameters should be the same as callback function 
(rte_rx/tx_callback_fn).

But we can redefine some parameters as 'unused'.

Signed-off-by: Jijiang Liu 
---
 lib/librte_ether/libtunnel/rte_vxlan_opt.h |   49 
 1 files changed, 49 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h

diff --git a/lib/librte_ether/libtunnel/rte_vxlan_opt.h 
b/lib/librte_ether/libtunnel/rte_vxlan_opt.h
new file mode 100644
index 000..d9412fc
--- /dev/null
+++ b/lib/librte_ether/libtunnel/rte_vxlan_opt.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_VXLAN_OPT_H_
+#define _RTE_VXLAN_OPT_H_
+
+extern void rte_vxlan_encap_burst (uint8_t port, uint16_t queue,
+  struct rte_mbuf *pkts[],
+  uint16_t nb_pkts,
+  uint16_t max_pkts,
+  void *user_param);
+
+extern uint16_t rte_vxlan_decap_burst(uint8_t port,
+ uint16_t queue,
+ struct rte_mbuf *pkts[],
+ uint16_t nb_pkts,
+ void *user_param);
+
+#endif /* _RTE_VXLAN_OPT_H_ */
-- 
1.7.7.6

[dpdk-dev] [RFC PATCH 3/6] rte_ether: implement tunnel config API

2015-12-23 Thread Jijiang Liu

Signed-off-by: Jijiang Liu 
---
 lib/librte_ether/rte_ethdev.c |   60 +
 1 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index c3eed49..6725398 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1004,6 +1004,66 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, 
uint16_t nb_tx_q,
return 0;
 }

+int
+rte_eth_dev_tunnel_configure(uint8_t port_id,
+struct rte_eth_tunnel_conf *tunnel_conf)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
+   int diag;
+
+   /* This function is only safe when called from the primary process
+   * * in a multi-process setup*/
+   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+   
+   dev = &rte_eth_devices[port_id];
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
+
+   /*
+* * Check that the numbers of RX and TX queues are not greater
+* * than the configured number of RX and TX queues supported by the
+* * configured device.
+* */
+   (*dev->dev_ops->dev_infos_get)(dev, &dev_info);
+   if (tunnel_conf->rx_queue > dev->data->nb_rx_queues - 1) {
+   RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_queues=%d > %d\n",
+   port_id, nb_rx_q, dev_info.max_rx_queues);
+   return -EINVAL;
+   }
+
+   if (tunnel_conf->tx_queue > dev->data->nb_rx_queues -1 ) {
+   RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_queues=%d > %d\n",
+   port_id, nb_tx_q, dev_info.max_tx_queues);
+   return -EINVAL;
+   }
+
+   tunnel_conf->tunnel_flow = rte_zmalloc(NULL,
+   sizeof(struct rte_eth_tunnel_flow)
+   * tunnel_conf->nb_flow, 0);
+   
+   /* Copy the dev_conf parameter into the dev structure */
+   memcpy(dev->data->dev_conf.tunnel_conf[tunnel_conf->rx_queue],
+   tunnel_conf, sizeof(struct rte_eth_tunnel_conf));
+
+   rte_eth_add_rx_callback(port_id, tunnel_conf->rx_queue,
+   rte_eth_tunnel_decap, (void *)tunnel_conf);
+
+   rte_eth_add_tx_callback(port_id, tunnel_conf->tx_queue,
+   rte_eth_tunnel_encap, (void *)tunnel_conf)
+
+   diag = (*dev->dev_ops->tunnel_configure)(dev);
+   if (diag != 0) {
+   RTE_PMD_DEBUG_TRACE("port%d dev_tunnel_configure = %d\n",
+   port_id, diag);
+   return diag;
+   }
+
+   return 0;
+}
+
 static void
 rte_eth_dev_config_restore(uint8_t port_id)
 {
-- 
1.7.7.6

[dpdk-dev] [RFC PATCH 2/6] rte_ether: define tunnel flow structure and APIs

2015-12-23 Thread Jijiang Liu

Add the struct 'rte_eth_tunnel_conf' and the tunnel configuration API. 

Signed-off-by: Jijiang Liu 
---
 lib/librte_ether/rte_ethdev.h |   28 
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bada8ad..cb4d9a2 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -630,6 +630,18 @@ struct rte_eth_rxconf {
uint8_t rx_deferred_start; /**< Do not start queue with 
rte_eth_dev_start(). */
 };

+/**
+ * A structure used to configure tunnel flow of an Ethernet port.
+ */
+struct rte_eth_tunnel_conf {
+   uint16_t rx_queue;
+   uint16_t tx_queue;
+   uint16_t udp_tunnel_port;
+   uint16_t nb_flow;
+   uint16_t filter_type;
+   struct rte_eth_tunnel_flow *tunnel_flow;
+};
+
 #define ETH_TXQ_FLAGS_NOMULTSEGS 0x0001 /**< nb_segs=1 for all mbufs */
 #define ETH_TXQ_FLAGS_NOREFCOUNT 0x0002 /**< refcnt can be ignored */
 #define ETH_TXQ_FLAGS_NOMULTMEMP 0x0004 /**< all bufs come from same mempool */
@@ -810,6 +822,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_TCP_CKSUM   0x0008
 #define DEV_RX_OFFLOAD_TCP_LRO 0x0010
 #define DEV_RX_OFFLOAD_QINQ_STRIP  0x0020
+#define DEV_RX_OFFLOAD_TUNNEL_DECAP 0x0040

 /**
  * TX offload capabilities of a device.
@@ -1210,6 +1223,10 @@ typedef int (*eth_udp_tunnel_add_t)(struct rte_eth_dev 
*dev,

 typedef int (*eth_udp_tunnel_del_t)(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *tunnel_udp);
+
+typedef int (*eth_tunnel_flow_conf_t)(struct rte_eth_dev *dev,
+ struct rte_eth_tunnel_conf *tunnel_conf);
+
 /**< @internal Delete tunneling UDP info */

 typedef int (*eth_set_mc_addr_list_t)(struct rte_eth_dev *dev,
@@ -1385,6 +1402,7 @@ struct eth_dev_ops {
eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter 
*/
eth_udp_tunnel_add_t   udp_tunnel_add;
eth_udp_tunnel_del_t   udp_tunnel_del;
+   eth_tunnel_flow_conf_t tunnel_configure;
eth_set_queue_rate_limit_t set_queue_rate_limit;   /**< Set queue rate 
limit */
eth_set_vf_rate_limit_tset_vf_rate_limit;   /**< Set VF rate limit 
*/
/** Update redirection table. */
@@ -1821,6 +1839,16 @@ extern int rte_eth_dev_configure(uint8_t port_id,
 const struct rte_eth_conf *eth_conf);

 /**
+ * Configure an Ethernet device for tunnelling packet.
+ *
+ * @return
+ *   - 0: Success, device configured.
+ *- <0: Error code returned by the driver configuration function.
+ */
+extern int rte_eth_dev_tunnel_configure(uint8_t port_id,
+   struct rte_eth_tunnel_conf 
*tunnel_conf);
+
+/**
  * Allocate and set up a receive queue for an Ethernet device.
  *
  * The function allocates a contiguous block of memory for *nb_rx_desc*
-- 
1.7.7.6

[dpdk-dev] [RFC PATCH 1/6] rte_ether: extend rte_eth_tunnel_flow structure

2015-12-23 Thread Jijiang Liu

The purpose of extending this structure is to support more tunnel filter 
conditions.

Signed-off-by: Jijiang Liu 
---
 lib/librte_ether/rte_eth_ctrl.h |   14 +++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
index ce224ad..39f52d9 100644
--- a/lib/librte_ether/rte_eth_ctrl.h
+++ b/lib/librte_ether/rte_eth_ctrl.h
@@ -494,9 +494,17 @@ enum rte_eth_fdir_tunnel_type {
  * NVGRE
  */
 struct rte_eth_tunnel_flow {
-   enum rte_eth_fdir_tunnel_type tunnel_type; /**< Tunnel type to match. */
-   uint32_t tunnel_id;/**< Tunnel ID to match. 
TNI, VNI... */
-   struct ether_addr mac_addr;/**< Mac address to match. */
+   enum rte_eth_tunnel_type tunnel_type;
+   uint64_t tunnel_id;  /**< Tunnel ID to match. TNI, VNI... */
+   struct ether_addr outer_src_mac;  /* for TX */
+   struct ether_addr outer_peer_mac; /* for TX */
+   enum rte_tunnel_iptype outer_ip_type; /**< IP address type. */
+   union {
+   struct rte_eth_ipv4_flow outer_ipv4;
+   struct rte_eth_ipv6_flow outer_ipv6;
+   } outer_ip_addr;
+   uint16_t dst_port;
+   uint16_t src_port;
 };

 /**
-- 
1.7.7.6

[dpdk-dev] [RFC PATCH 0/6] General tunneling APIs

2015-12-23 Thread Jijiang Liu

I want to define a set of General tunneling APIs, which are used to accelarate 
tunneling packet processing in DPDK,
In this RFC patch set, I wll explain my idea using some codes.

1. Using flow director offload to define a tunnel flow in a pair of queues.

flow rule: src IP + dst IP + src port + dst port + tunnel ID (for VXLAN)

For example:
struct rte_eth_tunnel_conf{
.tunnel_type = VXLAN,
.rx_queue = 1,
.tx_queue = 1,
.filter_type = 'src ip + dst ip + src port + dst port + tunnel id' 
.flow_tnl {
.tunnel_type = VXLAN,
.tunnel_id = 100,
.remote_mac = 11.22.33.44.55.66,
 .ip_type = ipv4, 
 .outer_ipv4.src_ip = 192.168.10.1
 .outer_ipv4.dst_ip = 10.239.129.11
 .src_port = 1000,
 .dst_port =2000
};

2. Configure tunnel flow for a device and for a pair of queues.

rte_eth_dev_tunnel_configure(0, &rte_eth_tunnel_conf);

In this API, it will call RX decapsulation and TX encapsulation callback 
function if HW doesn't support encap/decap, and
a space will be allocated for tunnel configuration and store a pointer to this 
new allocated space as dev->post_rx/tx_burst_cbs[].param.

rte_eth_add_rx_callback(port_id, tunnel_conf.rx_queue,
rte_eth_tunnel_decap, (void *)tunnel_conf);
rte_eth_add_tx_callback(port_id, tunnel_conf.tx_queue,
rte_eth_tunnel_encap, (void *)tunnel_conf)

3. Using rte_vxlan_decap_burst() to do decapsulation of tunneling packet.

4. Using rte_vxlan_encap_burst() to do encapsulation of tunneling packet.
   The 'src ip, dst ip, src port, dst port and  tunnel ID" can be got from 
tunnel configuration.
   And SIMD is used to accelarate the operation. 

How to use these APIs, there is a example below:

1)at config phase

dev_config(port, ...);
tunnel_config(port,...);
...
dev_start(port);
...
rx_burst(port, rxq,... );
tx_burst(port, txq,...);


2)at transmitting packet phase
The only outer src/dst MAC address need to be set for TX tunnel configuration 
in dev->post_tx_burst_cbs[].param.

In this patch set, I have not finished all of codes, the purpose of sending 
patch set is that I would like to collect more comments and sugestions on this 
idea.


Jijiang Liu (6):
  extend rte_eth_tunnel_flow
  define tunnel flow structure and APIs
  implement tunnel flow APIs
  define rte_vxlan_decap/encap
  implement rte_vxlan_decap/encap
  i40e tunnel configure

 drivers/net/i40e/i40e_ethdev.c |   41 +
 lib/librte_ether/libtunnel/rte_vxlan_opt.c |  251 
 lib/librte_ether/libtunnel/rte_vxlan_opt.h |   49 ++
 lib/librte_ether/rte_eth_ctrl.h|   14 ++-
 lib/librte_ether/rte_ethdev.h  |   28 +++
 lib/librte_ether/rte_ethdev.c  |   60 ++
 5 files changed, 440 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.c
 create mode 100644 lib/librte_ether/libtunnel/rte_vxlan_opt.h

-- 
1.7.7.6

[dpdk-dev] [PATCH v5 6/6] l3fwd-power: fix a memory leak for non-ip packet

2015-12-23 Thread Shaopeng He

Previous l3fwd-power only processes IP and IPv6 packet, other
packet's mbuf is not released, and causes a memory leak.
This patch fixes this issue.

v4 change:
  - update release note inside the patch

Signed-off-by: Shaopeng He 
---
 doc/guides/rel_notes/release_2_3.rst | 6 ++
 examples/l3fwd-power/main.c  | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 2cb5ebd..fc871ab 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -25,6 +25,12 @@ Libraries
 Examples
 

+* **l3fwd-power: Fixed memory leak for non-ip packet.**
+
+  Fixed issue in l3fwd-power where, recieving other packet than
+  types of IP and IPv6, the mbuf was not released, and caused
+  a memory leak.
+

 Other
 ~
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 828c18a..d9cd848 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -714,7 +714,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid,
/* We don't currently handle IPv6 packets in LPM mode. */
rte_pktmbuf_free(m);
 #endif
-   }
+   } else
+   rte_pktmbuf_free(m);

 }

-- 
1.9.3

[dpdk-dev] [PATCH v5 5/6] fm10k: make sure default VID available in dev_init

2015-12-23 Thread Shaopeng He

When PF establishes a connection with Switch Manager, it receives
a logic port range from SM, and registers certain logic ports from
that range, then a default VID will be send back from SM. This whole
transaction needs to be finished in dev_init, otherwise, in dev_start
the interrupt setting will be changed according to RX queue number,
and probably will cause this transaction failed.

Signed-off-by: Shaopeng He 
---
 drivers/net/fm10k/fm10k_ethdev.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 06bfffd..832a3fe 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2817,6 +2817,21 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)

fm10k_mbx_unlock(hw);

+   /* Make sure default VID is ready before going forward. */
+   if (hw->mac.type == fm10k_mac_pf) {
+   for (i = 0; i < MAX_QUERY_SWITCH_STATE_TIMES; i++) {
+   if (hw->mac.default_vid)
+   break;
+   /* Delay some time to acquire async port VLAN info. */
+   rte_delay_us(WAIT_SWITCH_MSG_US);
+   }
+
+   if (!hw->mac.default_vid) {
+   PMD_INIT_LOG(ERR, "default VID is not ready");
+   return -1;
+   }
+   }
+
/* Add default mac address */
fm10k_MAC_filter_set(dev, hw->mac.addr, true,
MAIN_VSI_POOL_NUMBER);
-- 
1.9.3

[dpdk-dev] [PATCH v5 4/6] fm10k: add rx queue interrupt en/dis functions

2015-12-23 Thread Shaopeng He

Interrupt mode framework has enable/disable functions for individual
rx queue, this patch implements these two functions.

v2 changes:
  - split one big patch into three smaller ones

Signed-off-by: Shaopeng He 
---
 drivers/net/fm10k/fm10k_ethdev.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index da78389..06bfffd 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2205,6 +2205,37 @@ fm10k_dev_disable_intr_vf(struct rte_eth_dev *dev)
 }

 static int
+fm10k_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   /* Enable ITR */
+   if (hw->mac.type == fm10k_mac_pf)
+   FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, queue_id)),
+   FM10K_ITR_AUTOMASK | FM10K_ITR_MASK_CLEAR);
+   else
+   FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, queue_id)),
+   FM10K_ITR_AUTOMASK | FM10K_ITR_MASK_CLEAR);
+   rte_intr_enable(&dev->pci_dev->intr_handle);
+   return 0;
+}
+
+static int
+fm10k_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   /* Disable ITR */
+   if (hw->mac.type == fm10k_mac_pf)
+   FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, queue_id)),
+   FM10K_ITR_MASK_SET);
+   else
+   FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, queue_id)),
+   FM10K_ITR_MASK_SET);
+   return 0;
+}
+
+static int
 fm10k_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -2539,6 +2570,8 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.tx_queue_setup = fm10k_tx_queue_setup,
.tx_queue_release   = fm10k_tx_queue_release,
.rx_descriptor_done = fm10k_dev_rx_descriptor_done,
+   .rx_queue_intr_enable   = fm10k_dev_rx_queue_intr_enable,
+   .rx_queue_intr_disable  = fm10k_dev_rx_queue_intr_disable,
.reta_update= fm10k_reta_update,
.reta_query = fm10k_reta_query,
.rss_hash_update= fm10k_rss_hash_update,
-- 
1.9.3

[dpdk-dev] [PATCH v5 3/6] fm10k: remove rx queue interrupts when dev stops

2015-12-23 Thread Shaopeng He

Previous dev_stop function stops the rx/tx queues. This patch adds logic
to disable rx queue interrupt, clean the datapath event and queue/vec map.

v5 changes:
  - remove one unnecessary NULL check for rte_free

v2 changes:
  - split one big patch into three smaller ones

Signed-off-by: Shaopeng He 
---
 drivers/net/fm10k/fm10k_ethdev.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 583335a..da78389 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1127,6 +1127,8 @@ fm10k_dev_start(struct rte_eth_dev *dev)
 static void
 fm10k_dev_stop(struct rte_eth_dev *dev)
 {
+   struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
int i;

PMD_INIT_FUNC_TRACE();
@@ -1138,6 +1140,24 @@ fm10k_dev_stop(struct rte_eth_dev *dev)
if (dev->data->rx_queues)
for (i = 0; i < dev->data->nb_rx_queues; i++)
fm10k_dev_rx_queue_stop(dev, i);
+
+   /* Disable datapath event */
+   if (rte_intr_dp_is_en(intr_handle)) {
+   for (i = 0; i < dev->data->nb_rx_queues; i++) {
+   FM10K_WRITE_REG(hw, FM10K_RXINT(i),
+   3 << FM10K_RXINT_TIMER_SHIFT);
+   if (hw->mac.type == fm10k_mac_pf)
+   FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, i)),
+   FM10K_ITR_MASK_SET);
+   else
+   FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, i)),
+   FM10K_ITR_MASK_SET);
+   }
+   }
+   /* Clean datapath event and queue/vec mapping */
+   rte_intr_efd_disable(intr_handle);
+   rte_free(intr_handle->intr_vec);
+   intr_handle->intr_vec = NULL;
 }

 static void
-- 
1.9.3

[dpdk-dev] [PATCH v5 2/6] fm10k: setup rx queue interrupts for PF and VF

2015-12-23 Thread Shaopeng He

In interrupt mode, each rx queue can have one interrupt to notify the up
layer application when packets are available in that queue. Some queues
also can share one interrupt.
Currently, fm10k needs one separate interrupt for mailbox. So, only those
drivers which support multiple interrupt vectors e.g. vfio-pci can work
in fm10k interrupt mode.
This patch uses the RXINT/INT_MAP registers to map interrupt causes
(rx queue and other events) to vectors, and enable these interrupts
through kernel drivers like vfio-pci.

v5 changes:
  - add more clean up when memory allocation fails
  - split line over 80 characters to 2 lines
  - update interrupt mode limitation in fm10k.rst

v4 change:
  - update release note inside the patch

v3 change:
  - macro renaming according to the EAL change

v2 changes:
  - split one big patch into three smaller ones
  - reword some comments and commit messages

Signed-off-by: Shaopeng He 
---
 doc/guides/nics/fm10k.rst|   7 +++
 doc/guides/rel_notes/release_2_3.rst |   2 +
 drivers/net/fm10k/fm10k.h|   3 +
 drivers/net/fm10k/fm10k_ethdev.c | 105 +++
 4 files changed, 106 insertions(+), 11 deletions(-)

diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
index 4206b7f..dc5cb6e 100644
--- a/doc/guides/nics/fm10k.rst
+++ b/doc/guides/nics/fm10k.rst
@@ -65,3 +65,10 @@ The FM1 family of NICS support a maximum of a 15K jumbo 
frame. The value
 is fixed and cannot be changed. So, even when the ``rxmode.max_rx_pkt_len``
 member of ``struct rte_eth_conf`` is set to a value lower than 15364, frames
 up to 15364 bytes can still reach the host interface.
+
+Interrupt mode
+~
+
+The FM1 family of NICS need one separate interrupt for mailbox. So only
+drivers which support multiple interrupt vectors e.g. vfio-pci can work
+for fm10k interrupt mode.
diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..2cb5ebd 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,8 @@ DPDK Release 2.3
 New Features
 

+* **Added fm10k Rx interrupt support.**
+

 Resolved Issues
 ---
diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index e2f677a..770d6ba 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -129,6 +129,9 @@
 #define RTE_FM10K_TX_MAX_FREE_BUF_SZ64
 #define RTE_FM10K_DESCS_PER_LOOP4

+#define FM10K_MISC_VEC_ID   RTE_INTR_VEC_ZERO_OFFSET
+#define FM10K_RX_VEC_START  RTE_INTR_VEC_RXTX_OFFSET
+
 #define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
ETH_TXQ_FLAGS_NOOFFLOADS)

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index d39c33b..583335a 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -54,6 +54,8 @@
 /* Number of chars per uint32 type */
 #define CHARS_PER_UINT32 (sizeof(uint32_t))
 #define BIT_MASK_PER_UINT32 ((1 << CHARS_PER_UINT32) - 1)
+/* default 1:1 map from queue ID to interrupt vector ID */
+#define Q2V(dev, queue_id) (dev->pci_dev->intr_handle.intr_vec[queue_id])

 static void fm10k_close_mbx_service(struct fm10k_hw *hw);
 static void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev);
@@ -109,6 +111,8 @@ struct fm10k_xstats_name_off fm10k_hw_stats_tx_q_strings[] 
= {

 #define FM10K_NB_XSTATS (FM10K_NB_HW_XSTATS + FM10K_MAX_QUEUES_PF * \
(FM10K_NB_RX_Q_XSTATS + FM10K_NB_TX_Q_XSTATS))
+static int
+fm10k_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);

 static void
 fm10k_mbx_initlock(struct fm10k_hw *hw)
@@ -687,6 +691,7 @@ static int
 fm10k_dev_rx_init(struct rte_eth_dev *dev)
 {
struct fm10k_hw *hw = FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
int i, ret;
struct fm10k_rx_queue *rxq;
uint64_t base_addr;
@@ -694,10 +699,25 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
uint32_t rxdctl = FM10K_RXDCTL_WRITE_BACK_MIN_DELAY;
uint16_t buf_size;

-   /* Disable RXINT to avoid possible interrupt */
-   for (i = 0; i < hw->mac.max_queues; i++)
+   /* enable RXINT for interrupt mode */
+   i = 0;
+   if (rte_intr_dp_is_en(intr_handle)) {
+   for (; i < dev->data->nb_rx_queues; i++) {
+   FM10K_WRITE_REG(hw, FM10K_RXINT(i), Q2V(dev, i));
+   if (hw->mac.type == fm10k_mac_pf)
+   FM10K_WRITE_REG(hw, FM10K_ITR(Q2V(dev, i)),
+   FM10K_ITR_AUTOMASK |
+   FM10K_ITR_MASK_CLEAR);
+   else
+   FM10K_WRITE_REG(hw, FM10K_VFITR(Q2V(dev, i)),
+   FM10K_ITR_AUTOMASK |
+

[dpdk-dev] [PATCH v5 1/6] fm10k: implement rx_descriptor_done function

2015-12-23 Thread Shaopeng He

rx_descriptor_done is used by interrupt mode example application
(l3fwd-power) to check rxd DD bit to decide the RX trend,
then l3fwd-power will adjust the cpu frequency according to
the result.

v5 change:
  - fix a wrong error message

Signed-off-by: Shaopeng He 
---
 drivers/net/fm10k/fm10k.h|  3 +++
 drivers/net/fm10k/fm10k_ethdev.c |  1 +
 drivers/net/fm10k/fm10k_rxtx.c   | 25 +
 3 files changed, 29 insertions(+)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index cd38af2..e2f677a 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -345,6 +345,9 @@ uint16_t fm10k_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
 uint16_t fm10k_recv_scattered_pkts(void *rx_queue,
struct rte_mbuf **rx_pkts, uint16_t nb_pkts);

+int
+fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset);
+
 uint16_t fm10k_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index e4aed94..d39c33b 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -2435,6 +2435,7 @@ static const struct eth_dev_ops fm10k_eth_dev_ops = {
.rx_queue_release   = fm10k_rx_queue_release,
.tx_queue_setup = fm10k_tx_queue_setup,
.tx_queue_release   = fm10k_tx_queue_release,
+   .rx_descriptor_done = fm10k_dev_rx_descriptor_done,
.reta_update= fm10k_reta_update,
.reta_query = fm10k_reta_query,
.rss_hash_update= fm10k_rss_hash_update,
diff --git a/drivers/net/fm10k/fm10k_rxtx.c b/drivers/net/fm10k/fm10k_rxtx.c
index e958865..0002f09 100644
--- a/drivers/net/fm10k/fm10k_rxtx.c
+++ b/drivers/net/fm10k/fm10k_rxtx.c
@@ -369,6 +369,31 @@ fm10k_recv_scattered_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
return nb_rcv;
 }

+int
+fm10k_dev_rx_descriptor_done(void *rx_queue, uint16_t offset)
+{
+   volatile union fm10k_rx_desc *rxdp;
+   struct fm10k_rx_queue *rxq = rx_queue;
+   uint16_t desc;
+   int ret;
+
+   if (unlikely(offset >= rxq->nb_desc)) {
+   PMD_DRV_LOG(ERR, "Invalid RX descriptor offset %u", offset);
+   return 0;
+   }
+
+   desc = rxq->next_dd + offset;
+   if (desc >= rxq->nb_desc)
+   desc -= rxq->nb_desc;
+
+   rxdp = &rxq->hw_ring[desc];
+
+   ret = !!(rxdp->w.status &
+   rte_cpu_to_le_16(FM10K_RXD_STATUS_DD));
+
+   return ret;
+}
+
 static inline void tx_free_descriptors(struct fm10k_tx_queue *q)
 {
uint16_t next_rs, count = 0;
-- 
1.9.3

[dpdk-dev] [PATCH v5 0/6] interrupt mode for fm10k

2015-12-23 Thread Shaopeng He

This patch series adds interrupt mode support for fm10k,
contains four major parts:

1. implement rx_descriptor_done function in fm10k
2. add rx interrupt support in fm10k PF and VF
3. make sure default VID available in dev_init in fm10k
4. fix a memory leak for non-ip packet in l3fwd-power,
   which happens mostly when testing fm10k interrupt mode.

v5 changes:
  - remove one unnecessary NULL check for rte_free
  - fix a wrong error message
  - add more clean up when memory allocation fails
  - split line over 80 characters to 2 lines
  - update interrupt mode limitation in fm10k.rst

v4 changes:
  - rebase to latest code
  - update release 2.3 note in corresponding patches

v3 changes:
  - rebase to latest code
  - macro renaming according to the EAL change

v2 changes:
  - reword some comments and commit messages
  - split one big patch into three smaller ones

Shaopeng He (6):
  fm10k: implement rx_descriptor_done function
  fm10k: setup rx queue interrupts for PF and VF
  fm10k: remove rx queue interrupts when dev stops
  fm10k: add rx queue interrupt en/dis functions
  fm10k: make sure default VID available in dev_init
  l3fwd-power: fix a memory leak for non-ip packet

 doc/guides/nics/fm10k.rst|   7 ++
 doc/guides/rel_notes/release_2_3.rst |   8 ++
 drivers/net/fm10k/fm10k.h|   6 ++
 drivers/net/fm10k/fm10k_ethdev.c | 174 ---
 drivers/net/fm10k/fm10k_rxtx.c   |  25 +
 examples/l3fwd-power/main.c  |   3 +-
 6 files changed, 211 insertions(+), 12 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH 3/3] examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd

2015-12-23 Thread Zhihong Wang

Handle SIGINT and SIGTERM in l3fwd.

Signed-off-by: Zhihong Wang 
---
 examples/l3fwd/main.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 5b0c2dd..aae16d2 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -2559,6 +2560,27 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
}
 }

+/* When we receive a INT signal, close all ports */
+static void
+sigint_handler(__rte_unused int signum)
+{
+   unsigned portid, nb_ports;
+
+   printf("Preparing to exit...\n");
+   nb_ports = rte_eth_dev_count();
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((enabled_port_mask & (1 << portid)) == 0) {
+   continue;
+   }
+   printf("Stopping port %d...", portid);
+   rte_eth_dev_stop(portid);
+   rte_eth_dev_close(portid);
+   printf(" Done\n");
+   }
+   printf("Bye...\n");
+   exit(0);
+}
+
 int
 main(int argc, char **argv)
 {
@@ -2572,6 +2594,9 @@ main(int argc, char **argv)
uint32_t n_tx_queue, nb_lcores;
uint8_t portid, nb_rx_queue, queue, socketid;

+   signal(SIGINT, sigint_handler);
+   signal(SIGTERM, sigint_handler);
+
/* init EAL */
ret = rte_eal_init(argc, argv);
if (ret < 0)
-- 
2.5.0

[dpdk-dev] [PATCH 2/3] examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd

2015-12-23 Thread Zhihong Wang

Handle SIGINT and SIGTERM in l2fwd.

Signed-off-by: Zhihong Wang 
---
 examples/l2fwd/main.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 720fd5a..0594037 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -534,6 +535,27 @@ check_all_ports_link_status(uint8_t port_num, uint32_t 
port_mask)
}
 }

+/* When we receive a INT signal, close all ports */
+static void
+sigint_handler(__rte_unused int signum)
+{
+   unsigned portid, nb_ports;
+
+   printf("Preparing to exit...\n");
+   nb_ports = rte_eth_dev_count();
+   for (portid = 0; portid < nb_ports; portid++) {
+   if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
+   continue;
+   }
+   printf("Stopping port %d...", portid);
+   rte_eth_dev_stop(portid);
+   rte_eth_dev_close(portid);
+   printf(" Done\n");
+   }
+   printf("Bye...\n");
+   exit(0);
+}
+
 int
 main(int argc, char **argv)
 {
@@ -546,6 +568,9 @@ main(int argc, char **argv)
unsigned lcore_id, rx_lcore_id;
unsigned nb_ports_in_mask = 0;

+   signal(SIGINT, sigint_handler);
+   signal(SIGTERM, sigint_handler);
+
/* init EAL */
ret = rte_eal_init(argc, argv);
if (ret < 0)
-- 
2.5.0

[dpdk-dev] [PATCH 1/3] app/test-pmd: Handle SIGINT and SIGTERM in testpmd

2015-12-23 Thread Zhihong Wang

Handle SIGINT and SIGTERM in testpmd.

Signed-off-by: Zhihong Wang 
---
 app/test-pmd/testpmd.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 98ae46d..c259ba3 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1573,6 +1573,7 @@ pmd_test_exit(void)
FOREACH_PORT(pt_id, ports) {
printf("Stopping port %d...", pt_id);
fflush(stdout);
+   rte_eth_dev_stop(pt_id);
rte_eth_dev_close(pt_id);
printf("done\n");
}
@@ -1984,12 +1985,34 @@ init_port(void)
ports[pid].enabled = 1;
 }

+/* When we receive a INT signal, close all ports */
+static void
+sigint_handler(__rte_unused int signum)
+{
+   unsigned portid;
+
+   printf("Preparing to exit...\n");
+   FOREACH_PORT(portid, ports) {
+   if (port_id_is_invalid(portid, ENABLED_WARN))
+   continue;
+   printf("Stopping port %d...", portid);
+   rte_eth_dev_stop(portid);
+   rte_eth_dev_close(portid);
+   printf(" Done\n");
+   }
+   printf("Bye...\n");
+   exit(0);
+}
+
 int
 main(int argc, char** argv)
 {
int  diag;
uint8_t port_id;

+   signal(SIGINT, sigint_handler);
+   signal(SIGTERM, sigint_handler);
+
diag = rte_eal_init(argc, argv);
if (diag < 0)
rte_panic("Cannot init EAL\n");
-- 
2.5.0

[dpdk-dev] [PATCH 0/3] Handle SIGINT and SIGTERM in DPDK examples

2015-12-23 Thread Zhihong Wang

This patch handles SIGINT and SIGTERM in testpmd, l2fwd, and l3fwd, make sure 
all ports are properly stopped and closed.
For virtual ports, the stop and close function may deal with resource cleanup, 
such as socket files unlinking.

Zhihong Wang (3):
  app/test-pmd: Handle SIGINT and SIGTERM in testpmd
  examples/l2fwd: Handle SIGINT and SIGTERM in l2fwd
  examples/l3fwd: Handle SIGINT and SIGTERM in l3fwd

 app/test-pmd/testpmd.c | 23 +++
 examples/l2fwd/main.c  | 25 +
 examples/l3fwd/main.c  | 25 +
 3 files changed, 73 insertions(+)

-- 
2.5.0

[dpdk-dev] [PATCH] i40e: fix the issue of port initialization failure

2015-12-23 Thread Helin Zhang

Workaround for the issue of cannot processing adminq commands during
initialization, when 2x40G or 4x10G is receiving packets in highest
throughput. Register 0x002698a8 and 0x002698ac should be cleared at
first, and restored with the default values at the end. No more
details, as they are not exposed registers.

Signed-off-by: Helin Zhang 
---
 drivers/net/i40e/i40e_ethdev.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bf6220d..149a31e 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -712,6 +712,41 @@ i40e_add_tx_flow_control_drop_filter(struct i40e_pf *pf)
  " frames from VSIs.");
 }

+/* Workaround for the issue of cannot processing adminq commands during
+ * initialization, when 2x40G or 4x10G is receiving packets in highest
+ * throughput. Register 0x002698a8 and 0x002698ac should be cleared at
+ * first, and restored with the default values at the end. No more details,
+ * as they are not exposed registers.
+ */
+static void
+i40e_clear_fdena(struct i40e_hw *hw)
+{
+   uint32_t fdena0, fdena1;
+
+   fdena0 = I40E_READ_REG(hw, 0x002698a8);
+   fdena1 = I40E_READ_REG(hw, 0x002698ac);
+   PMD_INIT_LOG(DEBUG, "[0x002698a8]: 0x%08x, [0x002698ac]: 0x%08x",
+fdena0, fdena1);
+
+   I40E_WRITE_REG(hw, 0x002698a8, 0x0);
+   I40E_WRITE_REG(hw, 0x002698ac, 0x0);
+   I40E_WRITE_FLUSH(hw);
+}
+
+/* Workaround for the issue of cannot processing adminq commands during
+ * initialization, when 2x40G or 4x10G is receiving packets in highest
+ * throughput. Register 0x002698a8 and 0x002698ac should be cleared at
+ * first, and restored with the default values at the end. No more details,
+ * as they are not exposed registers.
+ */
+static void
+i40e_restore_fdena(struct i40e_hw *hw)
+{
+   I40E_WRITE_REG(hw, 0x002698a8, 0xfc00);
+   I40E_WRITE_REG(hw, 0x002698ac, 0x80007fdf);
+   I40E_WRITE_FLUSH(hw);
+}
+
 static int
 eth_i40e_dev_init(struct rte_eth_dev *dev)
 {
@@ -774,6 +809,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
return ret;
}

+   i40e_clear_fdena(hw);
+
/* Initialize the shared code (base driver) */
ret = i40e_init_shared_code(hw);
if (ret) {
@@ -934,6 +971,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev)
pf->flags &= ~I40E_FLAG_DCB;
}

+   i40e_restore_fdena(hw);
+
return 0;

 err_mac_alloc:
-- 
1.9.3

[dpdk-dev] [PATCH] virtio: fix crashes in virtio stats functions

2015-12-23 Thread Ananyev, Konstantin



> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bernard Iremonger
> Sent: Wednesday, December 23, 2015 9:45 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] virtio: fix crashes in virtio stats functions
> 
> This initialisation of nb_rx_queues and nb_tx_queues has been removed
> from eth_virtio_dev_init.
> 
> The nb_rx_queues and nb_tx_queues were being initialised in 
> eth_virtio_dev_init
> before the tx_queues and rx_queues arrays were allocated.
> 
> The arrays are allocated when the ethdev port is configured and the
> nb_tx_queues and nb_rx_queues are initialised.
> 
> If any of the following functions were called before the ethdev
> port was configured there was a segmentation fault because
> rx_queues and tx_queues were NULL:
> 
> rte_eth_stats_get
> rte_eth_stats_reset
> rte_eth_xstats_get
> rte_eth_xstats_reset
> 
> Fixes: 823ad647950a ("virtio: support multiple queues")
> Signed-off-by: Bernard Iremonger 
> ---
>  drivers/net/virtio/virtio_ethdev.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> b/drivers/net/virtio/virtio_ethdev.c
> index d928339..5ef0752 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1378,9 +1378,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
>   hw->max_tx_queues = 1;
>   }
> 
> - eth_dev->data->nb_rx_queues = hw->max_rx_queues;
> - eth_dev->data->nb_tx_queues = hw->max_tx_queues;
> -
>   PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
>   hw->max_rx_queues, hw->max_tx_queues);
>   PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
> --

Acked-by: Konstantin Ananyev 

> 2.6.3

[dpdk-dev] [PATCH] hash: fix CRC32c computation

2015-12-23 Thread Vincent JARDIN

Le 23 d?c. 2015 10:12, "Qiu, Michael"  a ?crit :
>
> Is it suitable to put so many code in commit log?

It is more explicit than a text/comment. I do not think it should be
maintained code.

>
> Thanks,
> Michael
> On 12/22/2015 5:36 PM, Didier Pallard wrote:
> > As demonstrated by the following code, CRC32c computation is not valid
> > when buffer length is not a multiple of 4 bytes:
> > (Output obtained by code below)
> >
> > CRC of 1 NULL bytes expected: 0x527d5351
> > soft: 527d5351
> > rte accelerated: 48674bc7
> > rte soft: 48674bc7
> > CRC of 2 NULL bytes expected: 0xf16177d2
> > soft: f16177d2
> > rte accelerated: 48674bc7
> > rte soft: 48674bc7
> > CRC of 2x1 NULL bytes expected: 0xf16177d2
> > soft: f16177d2
> > rte accelerated: 8c28b28a
> > rte soft: 8c28b28a
> > CRC of 3 NULL bytes expected: 0x6064a37a
> > soft: 6064a37a
> > rte accelerated: 48674bc7
> > rte soft: 48674bc7
> > CRC of 4 NULL bytes expected: 0x48674bc7
> > soft: 48674bc7
> > rte accelerated: 48674bc7
> > rte soft: 48674bc7
> >
> > Values returned by rte_hash_crc functions does not match the one
> > computed by a trivial crc32c implementation.
> >
> > ARM code is a guess, it is not tested, neither compiled.
> >
> > code showing the problem:
> >
> > uint8_t null_test[32] = {0};
> >
> > static uint32_t crc32c_trivial(uint8_t *buffer, uint32_t length,
uint32_t crc)
> > {
> > uint32_t i, j;
> > for (i = 0; i < length; ++i)
> > {
> > crc = crc ^ buffer[i];
> > for (j = 0; j < 8; j++)
> > crc = (crc >> 1) ^ 0x8000 ^ ((~crc & 1) * 0x82f63b78);
> > }
> > return crc;
> > }
> >
> > void hash_test(void);
> > void hash_test(void)
> > {
> >   printf("CRC of 1 nul byte expected: 0x527d5351\n");
> >   printf("soft: %08x\n", crc32c_trivial(null_test, 1, 0));
> >   rte_hash_crc_init_alg();
> >   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 1,
0x));
> >   rte_hash_crc_set_alg(CRC32_SW);
> >   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 1,
0x));
> >
> >   printf("CRC of 2 nul bytes expected: 0xf16177d2\n");
> >   printf("soft: %08x\n", crc32c_trivial(null_test, 2, 0));
> >   rte_hash_crc_init_alg();
> >   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 2,
0x));
> >   rte_hash_crc_set_alg(CRC32_SW);
> >   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 2,
0x));
> >
> >   printf("CRC of 2x1 nul bytes expected: 0xf16177d2\n");
> >   printf("soft: %08x\n", crc32c_trivial(null_test, 1,
crc32c_trivial(null_test, 1, 0)));
> >   rte_hash_crc_init_alg();
> >   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 1,
rte_hash_crc(null_test, 1, 0x)));
> >   rte_hash_crc_set_alg(CRC32_SW);
> >   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 1,
rte_hash_crc(null_test, 1, 0x)));
> >
> >   printf("CRC of 3 nul bytes expected: 0x6064a37a\n");
> >   printf("soft: %08x\n", crc32c_trivial(null_test, 3, 0));
> >   rte_hash_crc_init_alg();
> >   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 3,
0x));
> >   rte_hash_crc_set_alg(CRC32_SW);
> >   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 3,
0x));
> >
> >   printf("CRC of 4 nul bytes expected: 0x48674bc7\n");
> >   printf("soft: %08x\n", crc32c_trivial(null_test, 4, 0));
> >   rte_hash_crc_init_alg();
> >   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 4,
0x));
> >   rte_hash_crc_set_alg(CRC32_SW);
> >   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 4,
0x));
> > }
> >
> > Signed-off-by: Didier Pallard 
> > Acked-by: David Marchand 
> > ---
> >  lib/librte_hash/rte_crc_arm64.h |  64 
> >  lib/librte_hash/rte_hash_crc.h  | 125
+++-
> >  2 files changed, 162 insertions(+), 27 deletions(-)
> >
> > diff --git a/lib/librte_hash/rte_crc_arm64.h
b/lib/librte_hash/rte_crc_arm64.h
> > index 02e26bc..44ef460 100644
> > --- a/lib/librte_hash/rte_crc_arm64.h
> > +++ b/lib/librte_hash/rte_crc_arm64.h
> > @@ -50,6 +50,28 @@ extern "C" {
> >  #include 
> >
> >  static inline uint32_t
> > +crc32c_arm64_u8(uint8_t data, uint32_t init_val)
> > +{
> > + asm(".arch armv8-a+crc");
> > + __asm__ volatile(
> > + "crc32cb %w[crc], %w[crc], %b[value]"
> > + : [crc] "+r" (init_val)
> > + : [value] "r" (data));
> > + return init_val;
> > +}
> > +
> > +static inline uint32_t
> > +crc32c_arm64_u16(uint16_t data, uint32_t init_val)
> > +{
> > + asm(".arch armv8-a+crc");
> > + __asm__ volatile(
> > + "crc32ch %w[crc], %w[crc], %h[value]"
> > + : [crc] "+r" (init_val)
> > + : [value] "r" (data));
> > + retur

[dpdk-dev] [PATCH 3/3] librte_ether: fix rte_eth_dev_configure

2015-12-23 Thread Reshma Pattan

User should be able to configure ethdev with zero rx/tx queues, but both should 
not be zero.
After above change, rte_eth_dev_tx_queue_config, rte_eth_dev_rx_queue_config 
should
allocate memory for rx/tx queues only when number of rx/tx queues are nonzero.

Signed-off-by: Reshma Pattan 
---
 lib/librte_ether/rte_ethdev.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 5849102..a7647b6 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -673,7 +673,7 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
void **rxq;
unsigned i;

-   if (dev->data->rx_queues == NULL) { /* first time configuration */
+   if (dev->data->rx_queues == NULL && nb_queues != 0) { /* first time 
configuration */
dev->data->rx_queues = rte_zmalloc("ethdev->rx_queues",
sizeof(dev->data->rx_queues[0]) * nb_queues,
RTE_CACHE_LINE_SIZE);
@@ -681,7 +681,7 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
dev->data->nb_rx_queues = 0;
return -(ENOMEM);
}
-   } else { /* re-configure */
+   } else if (dev->data->rx_queues != NULL && nb_queues != 0) { /* 
re-configure */
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, 
-ENOTSUP);

rxq = dev->data->rx_queues;
@@ -701,6 +701,13 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)

dev->data->rx_queues = rxq;

+   } else if (dev->data->rx_queues != NULL && nb_queues == 0) {
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, 
-ENOTSUP);
+
+   rxq = dev->data->rx_queues;
+
+   for (i = nb_queues; i < old_nb_queues; i++)
+   (*dev->dev_ops->rx_queue_release)(rxq[i]);
}
dev->data->nb_rx_queues = nb_queues;
return 0;
@@ -817,7 +824,7 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
void **txq;
unsigned i;

-   if (dev->data->tx_queues == NULL) { /* first time configuration */
+   if (dev->data->tx_queues == NULL && nb_queues != 0) { /* first time 
configuration */
dev->data->tx_queues = rte_zmalloc("ethdev->tx_queues",
   
sizeof(dev->data->tx_queues[0]) * nb_queues,
   RTE_CACHE_LINE_SIZE);
@@ -825,7 +832,7 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
dev->data->nb_tx_queues = 0;
return -(ENOMEM);
}
-   } else { /* re-configure */
+   } else if (dev->data->tx_queues != NULL && nb_queues != 0) { /* 
re-configure */
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, 
-ENOTSUP);

txq = dev->data->tx_queues;
@@ -845,6 +852,13 @@ rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)

dev->data->tx_queues = txq;

+   } else if (dev->data->tx_queues != NULL && nb_queues == 0) {
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release, 
-ENOTSUP);
+
+   txq = dev->data->tx_queues;
+
+   for (i = nb_queues; i < old_nb_queues; i++)
+   (*dev->dev_ops->tx_queue_release)(txq[i]);
}
dev->data->nb_tx_queues = nb_queues;
return 0;
@@ -891,25 +905,23 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, 
uint16_t nb_tx_q,
 * configured device.
 */
(*dev->dev_ops->dev_infos_get)(dev, &dev_info);
+
+   if (nb_rx_q == 0 && nb_tx_q == 0) {
+   RTE_PMD_DEBUG_TRACE("ethdev port_id=%d both rx and tx queue 
cannot be 0\n", port_id);
+   return -EINVAL;
+   }
+
if (nb_rx_q > dev_info.max_rx_queues) {
RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_queues=%d > %d\n",
port_id, nb_rx_q, dev_info.max_rx_queues);
return -EINVAL;
}
-   if (nb_rx_q == 0) {
-   RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_q == 0\n", 
port_id);
-   return -EINVAL;
-   }

if (nb_tx_q > dev_info.max_tx_queues) {
RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_queues=%d > %d\n",
port_id, nb_tx_q, dev_info.max_tx_queues);
return -EINVAL;
}
-   if (nb_tx_q == 0) {
-   RTE_PMD_DEBUG_TRACE("ethdev port_id=%d nb_tx_q == 0\n", 
port_id);
-   return -EINVAL;
-   }

/* Copy the dev_conf parameter into the dev structure */
memcpy(&dev->data->dev_conf, dev_conf, sizeof(dev->data->dev_conf));
-- 
2.5.0

[dpdk-dev] [PATCH 2/3] librte_cryptodev: remove RTE_PROC_PRIMARY_OR_RET

2015-12-23 Thread Reshma Pattan

Macro RTE_PROC_PRIMARY_OR_ERR_RET blocking the secondary process from API usage.
API access should be given to both secondary and primary.

Signed-off-by: Reshma Pattan 
---
 lib/librte_cryptodev/rte_cryptodev.c | 42 
 1 file changed, 42 deletions(-)

diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index f09f67e..207e92c 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -532,12 +532,6 @@ rte_cryptodev_queue_pair_start(uint8_t dev_id, uint16_t 
queue_pair_id)
 {
struct rte_cryptodev *dev;

-   /*
-* This function is only safe when called from the primary process
-* in a multi-process setup
-*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
return -EINVAL;
@@ -560,12 +554,6 @@ rte_cryptodev_queue_pair_stop(uint8_t dev_id, uint16_t 
queue_pair_id)
 {
struct rte_cryptodev *dev;

-   /*
-* This function is only safe when called from the primary process
-* in a multi-process setup
-*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
return -EINVAL;
@@ -593,12 +581,6 @@ rte_cryptodev_configure(uint8_t dev_id, struct 
rte_cryptodev_config *config)
struct rte_cryptodev *dev;
int diag;

-   /*
-* This function is only safe when called from the primary process
-* in a multi-process setup
-*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
return (-EINVAL);
@@ -635,12 +617,6 @@ rte_cryptodev_start(uint8_t dev_id)

CDEV_LOG_DEBUG("Start dev_id=%" PRIu8, dev_id);

-   /*
-* This function is only safe when called from the primary process
-* in a multi-process setup
-*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
return (-EINVAL);
@@ -670,12 +646,6 @@ rte_cryptodev_stop(uint8_t dev_id)
 {
struct rte_cryptodev *dev;

-   /*
-* This function is only safe when called from the primary process
-* in a multi-process setup
-*/
-   RTE_PROC_PRIMARY_OR_RET();
-
if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
return;
@@ -701,12 +671,6 @@ rte_cryptodev_close(uint8_t dev_id)
struct rte_cryptodev *dev;
int retval;

-   /*
-* This function is only safe when called from the primary process
-* in a multi-process setup
-*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-EINVAL);
-
if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
return -1;
@@ -747,12 +711,6 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t 
queue_pair_id,
 {
struct rte_cryptodev *dev;

-   /*
-* This function is only safe when called from the primary process
-* in a multi-process setup
-*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
return (-EINVAL);
-- 
2.5.0

[dpdk-dev] [PATCH 1/3] librte_ether: remove RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET

2015-12-23 Thread Reshma Pattan

Macros RTE_PROC_PRIMARY_OR_ERR_RET and RTE_PROC_PRIMARY_OR_RET
are blocking the secondary process from using the APIs.
API access should be given to both secondary and primary.

Fix minor checkpath issues in rte_ethdev.h

Reported-by: Sean Harte 
Signed-off-by: Reshma Pattan 
---
 lib/librte_ether/rte_ethdev.c | 50 +--
 lib/librte_ether/rte_ethdev.h | 20 -
 2 files changed, 11 insertions(+), 59 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed971b4..5849102 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -711,10 +711,6 @@ rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t 
rx_queue_id)
 {
struct rte_eth_dev *dev;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];
@@ -741,10 +737,6 @@ rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t 
rx_queue_id)
 {
struct rte_eth_dev *dev;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];
@@ -771,10 +763,6 @@ rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t 
tx_queue_id)
 {
struct rte_eth_dev *dev;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];
@@ -801,10 +789,6 @@ rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t 
tx_queue_id)
 {
struct rte_eth_dev *dev;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];
@@ -874,10 +858,6 @@ rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, 
uint16_t nb_tx_q,
struct rte_eth_dev_info dev_info;
int diag;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

if (nb_rx_q > RTE_MAX_QUEUES_PER_PORT) {
@@ -1059,10 +1039,6 @@ rte_eth_dev_start(uint8_t port_id)
struct rte_eth_dev *dev;
int diag;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];
@@ -1096,10 +1072,6 @@ rte_eth_dev_stop(uint8_t port_id)
 {
struct rte_eth_dev *dev;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_RET();
-
RTE_ETH_VALID_PORTID_OR_RET(port_id);
dev = &rte_eth_devices[port_id];

@@ -1121,10 +1093,6 @@ rte_eth_dev_set_link_up(uint8_t port_id)
 {
struct rte_eth_dev *dev;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];
@@ -1138,10 +1106,6 @@ rte_eth_dev_set_link_down(uint8_t port_id)
 {
struct rte_eth_dev *dev;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];
@@ -1155,10 +1119,6 @@ rte_eth_dev_close(uint8_t port_id)
 {
struct rte_eth_dev *dev;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_RET();
-
RTE_ETH_VALID_PORTID_OR_RET(port_id);
dev = &rte_eth_devices[port_id];

@@ -1183,10 +1143,6 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t 
rx_queue_id,
struct rte_eth_dev *dev;
struct rte_eth_dev_info dev_info;

-   /* This function is only safe when called from the primary process
-* in a multi-process setup*/
-   RTE_PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
-
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);

dev = &rte_eth_devices[port_id];
@@ -1266,10 +1222,6 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t 
tx_queue_id,
struct rte_eth_d

[dpdk-dev] [PATCH v3 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

2015-12-23 Thread Xie, Huawei

On 12/23/2015 7:25 PM, linhaifeng wrote:
>
>>  
>> +if (unlikely(alloc_err)) {
>> +uint16_t i = entry_success;
>> +
>> +m->nb_segs = seg_num;
>> +for (; i < free_entries; i++)
>> +rte_pktmbuf_free(pkts[entry_success]); -> 
>> rte_pktmbuf_free(pkts[i]);
>> +}
>> +
>>  rte_compiler_barrier();
>>  vq->used->idx += entry_success;
>>  /* Kick guest if required. */
Very sorry for silly typo. Thanks!
>>
>
>

[dpdk-dev] VFIO no-iommu

2015-12-23 Thread Burakov, Anatoly

Hi Alex,

> I've re-posted the unified patch upstream and it should start showing up in
> the next linux-next build. ?I expect the dpdk code won't be merged until
> after this gets back into a proper kernel, but could we get the dpdk
> modifications posted as rfc for others looking to try it?

I have already posted a patch that should work with No-IOMMU.

http://dpdk.org/dev/patchwork/patch/9619/

Apologies for not CC-ing you. I too would be interested to know if other people 
are having any issues with the patch.

Thanks,
Anatoly

[dpdk-dev] [RFC PATCH 0/6] General tunneling APIs

2015-12-23 Thread Walukiewicz, Miroslaw

Hi Jijang,

I like an idea of tunnel API very much. 

I have a few questions. 

1. I see that you have only i40e support due to lack of HW tunneling support in 
other NICs. 
I don't see a way how do you want to handle tunneling requests for NICs without 
HW offload. 

I think that we should have one common function for sending tunneled packets 
but the initialization should check the NIC capabilities and call some 
registered function making tunneling in SW in case of lack of HW support.

I know that making tunnel is very time consuming process, but it makes an API 
more generic. Similar only 3 protocols are supported by i40e by HW and we can 
imagine about 40 or more different tunnels working with this NIC. 

Making the SW implementation we could support missing tunnels even for i40e.

2. I understand that we need RX HW queue defined in struct rte_eth_tunnel_conf 
but why tx_queue is necessary?. 
  As I know i40e HW we can set tunneled packet descriptors in any HW queue and 
receive only on one specific queue.

3. I see a similar problem with receiving tunneled packets on the single queue 
only. I know that some NICs like fm10k could make hashing on packets and push 
same tunnel to many queues. Maybe we should support such RSS like feature in 
the design also. I know that it is not supported by i40e but it is good to have 
a more flexible API design. 

4. In your implementation you are assuming the there is one tunnel configured 
per DPDK interface

rte_eth_dev_tunnel_configure(uint8_t port_id,
+struct rte_eth_tunnel_conf *tunnel_conf)

The sense of tunnel is lack of interfaces in the system because number of 
possible VLANs is too small (4095). 
In the DPDK we have only one tunnel per physical port what is useless even with 
such big acceleration provided with i40e.

In normal use cases there is a need for 10,000s of tunnels per interface. Even 
for Vxlan we have 24 bits for tunnel definition

I think that we need a special API for sending like 
rte_eth_dev_tunnel_send_burst where we will provide some tunnel number 
allocated by rte_eth_dev_tunnel_configure to avoid setting the tunnel specific 
information separately in each descriptor .

Same on RX we should provide   in  struct rte_eth_tunnel_conf the callback 
functions that will make some specific action on received tunnel that could be 
pushing packet to the user ring or setting the tunnel information in RX 
descriptor or somewhat else.

5. I see that you have implementations for VXLAN,TEREDO, and GENEVE tunnels in 
i40e drivers. I could  find the implementation for VXLAN encap/decap. Are all 
files in the patch present?

6. What about with QinQ HW tunneling also supported by i40e HW. I know that the 
implementation is present in different place but why not include QinQ as 
additional tunnel. It would be very nice feature to have all tunnels API in 
single place.

Regards,

Mirek




> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Jijiang Liu
> Sent: Wednesday, December 23, 2015 9:50 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [RFC PATCH 0/6] General tunneling APIs
> 
> I want to define a set of General tunneling APIs, which are used to
> accelarate tunneling packet processing in DPDK,
> In this RFC patch set, I wll explain my idea using some codes.
> 
> 1. Using flow director offload to define a tunnel flow in a pair of queues.
> 
> flow rule: src IP + dst IP + src port + dst port + tunnel ID (for VXLAN)
> 
> For example:
>   struct rte_eth_tunnel_conf{
>   .tunnel_type = VXLAN,
>   .rx_queue = 1,
>   .tx_queue = 1,
>   .filter_type = 'src ip + dst ip + src port + dst port + tunnel id'
>   .flow_tnl {
>   .tunnel_type = VXLAN,
>   .tunnel_id = 100,
>   .remote_mac = 11.22.33.44.55.66,
>  .ip_type = ipv4,
>  .outer_ipv4.src_ip = 192.168.10.1
>  .outer_ipv4.dst_ip = 10.239.129.11
>  .src_port = 1000,
>  .dst_port =2000
> };
> 
> 2. Configure tunnel flow for a device and for a pair of queues.
> 
> rte_eth_dev_tunnel_configure(0, &rte_eth_tunnel_conf);
> 
> In this API, it will call RX decapsulation and TX encapsulation callback
> function if HW doesn't support encap/decap, and
> a space will be allocated for tunnel configuration and store a pointer to this
> new allocated space as dev->post_rx/tx_burst_cbs[].param.
> 
> rte_eth_add_rx_callback(port_id, tunnel_conf.rx_queue,
> rte_eth_tunnel_decap, (void *)tunnel_conf);
> rte_eth_add_tx_callback(port_id, tunnel_conf.tx_queue,
> rte_eth_tunnel_encap, (void *)tunnel_conf)
> 
> 3. Using rte_vxlan_decap_burst() to do decapsulation of tunneling packet.
> 
> 4. Using rte_vxlan_encap_burst() to do encapsulation of tunneling packet.
>The 'src ip, dst ip, src port, dst port and  tunnel ID" can be got from 
> tunnel
> configuration.
>And SIMD is used to accelarate the operation.
> 
> How to

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Yuanhan Liu

On Wed, Dec 23, 2015 at 10:41:57AM +0800, Peter Xu wrote:
> On Wed, Dec 23, 2015 at 10:01:35AM +0800, Yuanhan Liu wrote:
> > On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote:
> > > On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote:
> > > > Actually, you are right. I mentioned in the last email that this is
> > > > for configuration part. To answer your question in this email, you
> > > > will not be able to go that further (say initiating virtio pmd) if
> > > > you don't unbind the origin virtio-net driver, and bind it to igb_uio
> > > > (or something similar).
> > > > 
> > > > The start point is from rte_eal_pci_scan, where the sub-function
> > > > pci_san_one just initates a DPDK bond driver.
> > > 
> > > I am not sure whether I do understand your meaning correctly
> > > (regarding "you willl not be able to go that furture"): The problem
> > > is that, we _can_ run testpmd without unbinding the ports and bind
> > > to UIO or something. What we need to do is boot the guest, reserve
> > > huge pages, and run testpmd (keeping its kernel driver as
> > > "virtio-pci"). In pci_scan_one():
> > > 
> > >   if (!ret) {
> > >   if (!strcmp(driver, "vfio-pci"))
> > >   dev->kdrv = RTE_KDRV_VFIO;
> > >   else if (!strcmp(driver, "igb_uio"))
> > >   dev->kdrv = RTE_KDRV_IGB_UIO;
> > >   else if (!strcmp(driver, "uio_pci_generic"))
> > >   dev->kdrv = RTE_KDRV_UIO_GENERIC;
> > >   else
> > >   dev->kdrv = RTE_KDRV_UNKNOWN;
> > >   } else
> > >   dev->kdrv = RTE_KDRV_UNKNOWN;
> > > 
> > > I think it should be going to RTE_KDRV_UNKNOWN
> > > (driver=="virtio-pci") here.
> > 
> > Sorry, I simply overlook that. I was thinking it will quit here for
> > the RTE_KDRV_UNKNOWN case.
> > 
> > > I tried to run IO and it could work,
> > > but I am not sure whether it is safe, and how.
> > 
> > I also did a quick test then, however, with the virtio 1.0 patchset
> > I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to
> > pci_map_device() failure and virtio pmd is not initiated at all.
> 
> Then, will the patch work with ioport way to access virtio devices?

Yes.

> 
> > 
> > > 
> > > Also, I am not sure whether I need to (at least) unbind the
> > > virtio-pci driver, so that there should have no kernel driver
> > > running for the virtio device before DPDK using it.
> > 
> > Why not? That's what the DPDK document asked to do
> > (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html):
> > 
> > 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules
> > 
> > As of release 1.4, DPDK applications no longer automatically unbind
> > all supported network ports from the kernel driver in use. Instead,
> > all ports that are to be used by an DPDK application must be bound
> > to the uio_pci_generic, igb_uio or vfio-pci module before the
> > application is run. Any network ports under Linux* control will be
> > ignored by the DPDK poll-mode drivers and cannot be used by the
> > application.
> 
> This seems obsolete? since it's not covering ioport.

I don't think so. Above is for how to run DPDK applications. ioport
is just a (optional) way to access PCI resource in a specific PMD.

And, above speicification avoids your concerns, that two drivers try
to manipulate same device concurrently, doesn't it?

And, it is saying "any network ports under Linux* control will be
ignored by the DPDK poll-mode drivers and cannot be used by the
application", so that the case you were saying that virtio pmd
continues to work without the bind looks like a bug to me.

Can anyone confirm that?

--yliu

[dpdk-dev] [PATCH 0/8] bonding: fixes and enhancements

2015-12-23 Thread Iremonger, Bernard

Hi Stephen,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen
> Hemminger
> Sent: Friday, December 4, 2015 5:14 PM
> To: Doherty, Declan 
> Cc: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH 0/8] bonding: fixes and enhancements
> 
> These are bug fixes and some small enhancements to allow bonding to work
> with external control (teamd). Please consider integrating these into DPDK
> 2.2
> 
> Eric Kinzie (8):
>   bond: use existing enslaved device queues
>   bond mode 4: copy entire config structure
>   bond mode 4: do not ignore multicast
>   bond mode 4: allow external state machine
>   bond: active slaves with no primary
>   bond: handle slaves with fewer queues than bonding device
>   bond: per-slave intermediate rx ring
>   bond: do not activate slave twice
> 
>  app/test/test_link_bonding_mode4.c|   7 +-
>  drivers/net/bonding/rte_eth_bond_8023ad.c | 174
> +
>  drivers/net/bonding/rte_eth_bond_8023ad.h |  44 +
>  drivers/net/bonding/rte_eth_bond_8023ad_private.h |   2 +
>  drivers/net/bonding/rte_eth_bond_api.c|  48 -
>  drivers/net/bonding/rte_eth_bond_pmd.c| 217
> ++
>  drivers/net/bonding/rte_eth_bond_private.h|   9 +-
>  drivers/net/bonding/rte_eth_bond_version.map  |   6 +
>  8 files changed, 462 insertions(+), 45 deletions(-)
> 
> --
> 2.1.4

Patches 6 and 7 of this patchset do not apply successfully to DPDK 2.2, a 
rebase is probably needed.
It might be better to split this patchset into a fixes patchset and a new 
feature patchset.

Regards,

Bernard.

[dpdk-dev] [PATCH v5 1/3] vhost: Add callback and private data for vhost PMD

2015-12-23 Thread Yuanhan Liu

On Tue, Dec 22, 2015 at 01:38:29AM -0800, Rich Lane wrote:
> On Mon, Dec 21, 2015 at 9:47 PM, Yuanhan Liu 
> wrote:
> 
> On Mon, Dec 21, 2015 at 08:47:28PM -0800, Rich Lane wrote:
> > The queue state change callback is the one new API that needs to be
> > added because
> > normal NICs don't have this behavior.
> 
> Again I'd ask, will vring_state_changed() be enough, when above issues
> are resolved: vring_state_changed() will be invoked at new_device()/
> destroy_device(), and of course, ethtool change?
> 
> 
> It would be sufficient. It is not a great API though, because it requires the
> application to do the conversion from struct virtio_net to a DPDK port number,
> and from a virtqueue index to a DPDK queue id and direction. Also, the current
> implementation often makes this callback when the vring state has not actually
> changed (enabled -> enabled and disabled -> disabled).
> 
> If you're asking about using vring_state_changed() _instead_ of the link 
> status
> event and rte_eth_dev_socket_id(),

No, I like the idea of link status event and rte_eth_dev_socket_id();
I was just wondering why a new API is needed. Both Tetsuya and I
were thinking to leverage the link status event to represent the
queue stats change (triggered by vring_state_changed()) as well,
so that we don't need to introduce another eth event. However, I'd
agree that it's better if we could have a new dedicate event.

Thomas, here is some background for you. For vhost pmd and linux
virtio-net combo, the queue can be dynamically changed by ethtool,
therefore, the application wishes to have another eth event, say
RTE_ETH_EVENT_QUEUE_STATE_CHANGE, so that the application can
add/remove corresponding queue to the datapath when that happens.
What do you think of that?

> then yes, it still works. I'd only consider
> that a stopgap until the real ethdev APIs are implemented.
> 
> I'd suggest to add?RTE_ETH_EVENT_QUEUE_STATE_CHANGE rather than
> create another callback registration API.
> 
> Perhaps we could merge the basic PMD which I think is pretty solid and then
> continue the API discussion with patches to it.

Perhaps, but let's see how hard it could be for the new eth event
discussion then.

--yliu

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Peter Xu

On Wed, Dec 23, 2015 at 10:01:35AM +0800, Yuanhan Liu wrote:
> On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote:
> > On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote:
> > > Actually, you are right. I mentioned in the last email that this is
> > > for configuration part. To answer your question in this email, you
> > > will not be able to go that further (say initiating virtio pmd) if
> > > you don't unbind the origin virtio-net driver, and bind it to igb_uio
> > > (or something similar).
> > > 
> > > The start point is from rte_eal_pci_scan, where the sub-function
> > > pci_san_one just initates a DPDK bond driver.
> > 
> > I am not sure whether I do understand your meaning correctly
> > (regarding "you willl not be able to go that furture"): The problem
> > is that, we _can_ run testpmd without unbinding the ports and bind
> > to UIO or something. What we need to do is boot the guest, reserve
> > huge pages, and run testpmd (keeping its kernel driver as
> > "virtio-pci"). In pci_scan_one():
> > 
> > if (!ret) {
> > if (!strcmp(driver, "vfio-pci"))
> > dev->kdrv = RTE_KDRV_VFIO;
> > else if (!strcmp(driver, "igb_uio"))
> > dev->kdrv = RTE_KDRV_IGB_UIO;
> > else if (!strcmp(driver, "uio_pci_generic"))
> > dev->kdrv = RTE_KDRV_UIO_GENERIC;
> > else
> > dev->kdrv = RTE_KDRV_UNKNOWN;
> > } else
> > dev->kdrv = RTE_KDRV_UNKNOWN;
> > 
> > I think it should be going to RTE_KDRV_UNKNOWN
> > (driver=="virtio-pci") here.
> 
> Sorry, I simply overlook that. I was thinking it will quit here for
> the RTE_KDRV_UNKNOWN case.
> 
> > I tried to run IO and it could work,
> > but I am not sure whether it is safe, and how.
> 
> I also did a quick test then, however, with the virtio 1.0 patchset
> I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to
> pci_map_device() failure and virtio pmd is not initiated at all.

Then, will the patch work with ioport way to access virtio devices?

> 
> > 
> > Also, I am not sure whether I need to (at least) unbind the
> > virtio-pci driver, so that there should have no kernel driver
> > running for the virtio device before DPDK using it.
> 
> Why not? That's what the DPDK document asked to do
> (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html):
> 
> 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules
> 
> As of release 1.4, DPDK applications no longer automatically unbind
> all supported network ports from the kernel driver in use. Instead,
> all ports that are to be used by an DPDK application must be bound
> to the uio_pci_generic, igb_uio or vfio-pci module before the
> application is run. Any network ports under Linux* control will be
> ignored by the DPDK poll-mode drivers and cannot be used by the
> application.

This seems obsolete? since it's not covering ioport.

Peter

> 
> 
>   --yliu

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Peter Xu

On Wed, Dec 23, 2015 at 10:09:49AM +0800, Yuanhan Liu wrote:
> Why can't we simply quit at pci_scan_one, once finding that it's not
> bond to uio (or similar stuff)? That would be generic enough, that we
> don't have to do similar checks for each new pmd driver.
> 
> Or, am I missing something?

It seems that ioport way to play with virtio devices do not require
any PCI wrapper layer like UIO/VFIO? Please check
virtio_resource_init().

> 
> 
> > I guess there might be two problems? Which are:
> > 
> > 1. How user avoid DPDK taking over virtio devices that they do not
> >want for IO (chooses which device to use)
> 
> Isn't that what's the 'binding/unbinding' for?
> 
> > 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in
> >kernel (happens on every virtio device that DPDK uses)
> 
> If you unbinded the kernel driver first, which is the suggested (or
> must?) way to use DPDK, that will not happen.

Yes, maybe we should unbind it first. I am just not sure what will
happen if not.

Peter

[dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-23 Thread Stephen Hemminger

On Wed, 23 Dec 2015 00:17:53 +0800
Huawei Xie  wrote:

> +
> + rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
> + if (unlikely(rc))
> + return rc;
> +
> + switch (count % 4) {
> + case 0: while (idx != count) {
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 3:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 2:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + case 1:
> + RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
> + rte_mbuf_refcnt_set(mbufs[idx], 1);
> + rte_pktmbuf_reset(mbufs[idx]);
> + idx++;
> + }
> + }
> + return 0;
> +}

Since function will not work if count can not be 0 (otherwise 
rte_mempool_get_bulk will fail),
why not:
1. Document that assumption
2. Use that assumption to speed up code.



switch(count % 4) {
do {
case 0:
...
case 1:
...
} while (idx != count);
}

Also you really need to add a big block comment about this loop, to explain
what it does and why.

[dpdk-dev] [RFC PATCH 5/6] rte_ether: implement encap and decap APIs

2015-12-23 Thread Stephen Hemminger

On Wed, 23 Dec 2015 16:49:51 +0800
Jijiang Liu  wrote:

> +
> +#ifndef __INTEL_COMPILER
> +#pragma GCC diagnostic ignored "-Wcast-qual"
> +#endif
> +
> +#pragma GCC diagnostic ignored "-Wstrict-aliasing"
> +

Since this is new code, can't you please fix it to be warning safe?

[dpdk-dev] [RFC PATCH 0/6] General tunneling APIs

2015-12-23 Thread Stephen Hemminger

On Wed, 23 Dec 2015 16:49:46 +0800
Jijiang Liu  wrote:

> 1)at config phase
> 
> dev_config(port, ...);
> tunnel_config(port,...);
> ...
> dev_start(port);
> ...
> rx_burst(port, rxq,... );
> tx_burst(port, txq,...);

What about dynamically adding and deleting multiple tunnels after
device has started? This would be the more common case in a real world
environment.

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Yuanhan Liu

On Wed, Dec 23, 2015 at 09:55:54AM +0800, Peter Xu wrote:
> On Tue, Dec 22, 2015 at 04:38:30PM +, Xie, Huawei wrote:
> > On 12/22/2015 7:39 PM, Peter Xu wrote:
> > > I tried to unbind one of the virtio net device, I see the PCI entry
> > > still there.
> > >
> > > Before unbind:
> > >
> > > [root at vm proc]# lspci -k -s 00:03.0
> > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > > Subsystem: Red Hat, Inc Device 0001
> > > Kernel driver in use: virtio-pci
> > > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> > >   c060-c07f : :00:03.0
> > > c060-c07f : virtio-pci
> > >
> > > After unbind:
> > >
> > > [root at vm proc]# lspci -k -s 00:03.0
> > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > > Subsystem: Red Hat, Inc Device 0001
> > > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> > >   c060-c07f : :00:03.0
> > >
> > > So... does this means that it is an alternative to black list
> > > solution?
> > Oh, we could firstly check if this port is manipulated by kernel driver
> > in virtio_resource_init/eth_virtio_dev_init, as long as it is not too late.

Why can't we simply quit at pci_scan_one, once finding that it's not
bond to uio (or similar stuff)? That would be generic enough, that we
don't have to do similar checks for each new pmd driver.

Or, am I missing something?


> I guess there might be two problems? Which are:
> 
> 1. How user avoid DPDK taking over virtio devices that they do not
>want for IO (chooses which device to use)

Isn't that what's the 'binding/unbinding' for?

> 2. Driver conflict between virtio PMD in DPDK, and virtio-pci in
>kernel (happens on every virtio device that DPDK uses)

If you unbinded the kernel driver first, which is the suggested (or
must?) way to use DPDK, that will not happen.

--yliu
> 
> For the white/black list solution, I guess it's good enough to solve
> (1) for customers. I am just curious about the 2nd.
> 
> Or say, even we black listed some virtio devices (or doing white
> list), the virtio devices used by DPDK are still in danger if we
> cannot make sure that virtio-pci will not touch the device any more
> (even it will not touch it, it feels like errornous to not telling
> virtio-pci to remove it before hand). E.g., if virtio-pci interrupt
> is still working, when there are packets from outside to guest,
> vp_interrupt() might be called? Then virtio-pci driver might do
> read/write to vring as well? If so, that's problematic. Am I wrong?
> 
> Peter

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Yuanhan Liu

On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote:
> On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote:
> > Actually, you are right. I mentioned in the last email that this is
> > for configuration part. To answer your question in this email, you
> > will not be able to go that further (say initiating virtio pmd) if
> > you don't unbind the origin virtio-net driver, and bind it to igb_uio
> > (or something similar).
> > 
> > The start point is from rte_eal_pci_scan, where the sub-function
> > pci_san_one just initates a DPDK bond driver.
> 
> I am not sure whether I do understand your meaning correctly
> (regarding "you willl not be able to go that furture"): The problem
> is that, we _can_ run testpmd without unbinding the ports and bind
> to UIO or something. What we need to do is boot the guest, reserve
> huge pages, and run testpmd (keeping its kernel driver as
> "virtio-pci"). In pci_scan_one():
> 
>   if (!ret) {
>   if (!strcmp(driver, "vfio-pci"))
>   dev->kdrv = RTE_KDRV_VFIO;
>   else if (!strcmp(driver, "igb_uio"))
>   dev->kdrv = RTE_KDRV_IGB_UIO;
>   else if (!strcmp(driver, "uio_pci_generic"))
>   dev->kdrv = RTE_KDRV_UIO_GENERIC;
>   else
>   dev->kdrv = RTE_KDRV_UNKNOWN;
>   } else
>   dev->kdrv = RTE_KDRV_UNKNOWN;
> 
> I think it should be going to RTE_KDRV_UNKNOWN
> (driver=="virtio-pci") here.

Sorry, I simply overlook that. I was thinking it will quit here for
the RTE_KDRV_UNKNOWN case.

> I tried to run IO and it could work,
> but I am not sure whether it is safe, and how.

I also did a quick test then, however, with the virtio 1.0 patchset
I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to
pci_map_device() failure and virtio pmd is not initiated at all.

> 
> Also, I am not sure whether I need to (at least) unbind the
> virtio-pci driver, so that there should have no kernel driver
> running for the virtio device before DPDK using it.

Why not? That's what the DPDK document asked to do
(http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html):

3.6. Binding and Unbinding Network Ports to/from the Kernel Modules

As of release 1.4, DPDK applications no longer automatically unbind
all supported network ports from the kernel driver in use. Instead,
all ports that are to be used by an DPDK application must be bound
to the uio_pci_generic, igb_uio or vfio-pci module before the
application is run. Any network ports under Linux* control will be
ignored by the DPDK poll-mode drivers and cannot be used by the
application.

--yliu

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Peter Xu

On Tue, Dec 22, 2015 at 04:38:30PM +, Xie, Huawei wrote:
> On 12/22/2015 7:39 PM, Peter Xu wrote:
> > I tried to unbind one of the virtio net device, I see the PCI entry
> > still there.
> >
> > Before unbind:
> >
> > [root at vm proc]# lspci -k -s 00:03.0
> > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > Subsystem: Red Hat, Inc Device 0001
> > Kernel driver in use: virtio-pci
> > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> >   c060-c07f : :00:03.0
> > c060-c07f : virtio-pci
> >
> > After unbind:
> >
> > [root at vm proc]# lspci -k -s 00:03.0
> > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> > Subsystem: Red Hat, Inc Device 0001
> > [root at vm proc]# cat /proc/ioports | grep c060-c07f
> >   c060-c07f : :00:03.0
> >
> > So... does this means that it is an alternative to black list
> > solution?
> Oh, we could firstly check if this port is manipulated by kernel driver
> in virtio_resource_init/eth_virtio_dev_init, as long as it is not too late.

I guess there might be two problems? Which are:

1. How user avoid DPDK taking over virtio devices that they do not
   want for IO (chooses which device to use)

2. Driver conflict between virtio PMD in DPDK, and virtio-pci in
   kernel (happens on every virtio device that DPDK uses)

For the white/black list solution, I guess it's good enough to solve
(1) for customers. I am just curious about the 2nd.

Or say, even we black listed some virtio devices (or doing white
list), the virtio devices used by DPDK are still in danger if we
cannot make sure that virtio-pci will not touch the device any more
(even it will not touch it, it feels like errornous to not telling
virtio-pci to remove it before hand). E.g., if virtio-pci interrupt
is still working, when there are packets from outside to guest,
vp_interrupt() might be called? Then virtio-pci driver might do
read/write to vring as well? If so, that's problematic. Am I wrong?

Peter

[dpdk-dev] [PATCH] virtio: fix crashes in virtio stats functions

2015-12-23 Thread Bernard Iremonger

This initialisation of nb_rx_queues and nb_tx_queues has been removed
from eth_virtio_dev_init.

The nb_rx_queues and nb_tx_queues were being initialised in eth_virtio_dev_init
before the tx_queues and rx_queues arrays were allocated.

The arrays are allocated when the ethdev port is configured and the
nb_tx_queues and nb_rx_queues are initialised.

If any of the following functions were called before the ethdev
port was configured there was a segmentation fault because
rx_queues and tx_queues were NULL:

rte_eth_stats_get
rte_eth_stats_reset
rte_eth_xstats_get
rte_eth_xstats_reset

Fixes: 823ad647950a ("virtio: support multiple queues")
Signed-off-by: Bernard Iremonger 
---
 drivers/net/virtio/virtio_ethdev.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d928339..5ef0752 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1378,9 +1378,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
hw->max_tx_queues = 1;
}

-   eth_dev->data->nb_rx_queues = hw->max_rx_queues;
-   eth_dev->data->nb_tx_queues = hw->max_tx_queues;
-
PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
hw->max_rx_queues, hw->max_tx_queues);
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
-- 
2.6.3

[dpdk-dev] [PATCH] hash: fix CRC32c computation

2015-12-23 Thread Qiu, Michael

Is it suitable to put so many code in commit log?

Thanks,
Michael
On 12/22/2015 5:36 PM, Didier Pallard wrote:
> As demonstrated by the following code, CRC32c computation is not valid
> when buffer length is not a multiple of 4 bytes:
> (Output obtained by code below)
>
> CRC of 1 NULL bytes expected: 0x527d5351
> soft: 527d5351
> rte accelerated: 48674bc7
> rte soft: 48674bc7
> CRC of 2 NULL bytes expected: 0xf16177d2
> soft: f16177d2
> rte accelerated: 48674bc7
> rte soft: 48674bc7
> CRC of 2x1 NULL bytes expected: 0xf16177d2
> soft: f16177d2
> rte accelerated: 8c28b28a
> rte soft: 8c28b28a
> CRC of 3 NULL bytes expected: 0x6064a37a
> soft: 6064a37a
> rte accelerated: 48674bc7
> rte soft: 48674bc7
> CRC of 4 NULL bytes expected: 0x48674bc7
> soft: 48674bc7
> rte accelerated: 48674bc7
> rte soft: 48674bc7
>
> Values returned by rte_hash_crc functions does not match the one
> computed by a trivial crc32c implementation.
>
> ARM code is a guess, it is not tested, neither compiled.
>
> code showing the problem:
>
> uint8_t null_test[32] = {0};
>
> static uint32_t crc32c_trivial(uint8_t *buffer, uint32_t length, uint32_t crc)
> {
> uint32_t i, j;
> for (i = 0; i < length; ++i)
> {
> crc = crc ^ buffer[i];
> for (j = 0; j < 8; j++)
> crc = (crc >> 1) ^ 0x8000 ^ ((~crc & 1) * 0x82f63b78);
> }
> return crc;
> }
>
> void hash_test(void);
> void hash_test(void)
> {
>   printf("CRC of 1 nul byte expected: 0x527d5351\n");
>   printf("soft: %08x\n", crc32c_trivial(null_test, 1, 0));
>   rte_hash_crc_init_alg();
>   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 1, 
> 0x));
>   rte_hash_crc_set_alg(CRC32_SW);
>   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 1, 0x));
>
>   printf("CRC of 2 nul bytes expected: 0xf16177d2\n");
>   printf("soft: %08x\n", crc32c_trivial(null_test, 2, 0));
>   rte_hash_crc_init_alg();
>   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 2, 
> 0x));
>   rte_hash_crc_set_alg(CRC32_SW);
>   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 2, 0x));
>
>   printf("CRC of 2x1 nul bytes expected: 0xf16177d2\n");
>   printf("soft: %08x\n", crc32c_trivial(null_test, 1, 
> crc32c_trivial(null_test, 1, 0)));
>   rte_hash_crc_init_alg();
>   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 1, 
> rte_hash_crc(null_test, 1, 0x)));
>   rte_hash_crc_set_alg(CRC32_SW);
>   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 1, 
> rte_hash_crc(null_test, 1, 0x)));
>
>   printf("CRC of 3 nul bytes expected: 0x6064a37a\n");
>   printf("soft: %08x\n", crc32c_trivial(null_test, 3, 0));
>   rte_hash_crc_init_alg();
>   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 3, 
> 0x));
>   rte_hash_crc_set_alg(CRC32_SW);
>   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 3, 0x));
>
>   printf("CRC of 4 nul bytes expected: 0x48674bc7\n");
>   printf("soft: %08x\n", crc32c_trivial(null_test, 4, 0));
>   rte_hash_crc_init_alg();
>   printf("rte accelerated: %08x\n", ~rte_hash_crc(null_test, 4, 
> 0x));
>   rte_hash_crc_set_alg(CRC32_SW);
>   printf("rte soft: %08x\n", ~rte_hash_crc(null_test, 4, 0x));
> }
>
> Signed-off-by: Didier Pallard 
> Acked-by: David Marchand 
> ---
>  lib/librte_hash/rte_crc_arm64.h |  64 
>  lib/librte_hash/rte_hash_crc.h  | 125 
> +++-
>  2 files changed, 162 insertions(+), 27 deletions(-)
>
> diff --git a/lib/librte_hash/rte_crc_arm64.h b/lib/librte_hash/rte_crc_arm64.h
> index 02e26bc..44ef460 100644
> --- a/lib/librte_hash/rte_crc_arm64.h
> +++ b/lib/librte_hash/rte_crc_arm64.h
> @@ -50,6 +50,28 @@ extern "C" {
>  #include 
>  
>  static inline uint32_t
> +crc32c_arm64_u8(uint8_t data, uint32_t init_val)
> +{
> + asm(".arch armv8-a+crc");
> + __asm__ volatile(
> + "crc32cb %w[crc], %w[crc], %b[value]"
> + : [crc] "+r" (init_val)
> + : [value] "r" (data));
> + return init_val;
> +}
> +
> +static inline uint32_t
> +crc32c_arm64_u16(uint16_t data, uint32_t init_val)
> +{
> + asm(".arch armv8-a+crc");
> + __asm__ volatile(
> + "crc32ch %w[crc], %w[crc], %h[value]"
> + : [crc] "+r" (init_val)
> + : [value] "r" (data));
> + return init_val;
> +}
> +
> +static inline uint32_t
>  crc32c_arm64_u32(uint32_t data, uint32_t init_val)
>  {
>   asm(".arch armv8-a+crc");
> @@ -103,6 +125,48 @@ rte_hash_crc_init_alg(void)
>  }
>  
>  /**
> + * Use single crc32 instruction to perform a hash on a 1 byte value.
> + * Fall back to software crc32 implementation in case arm64 crc intrinsics is

[dpdk-dev] [PATCH] i40e: fix inverted check for ETH_TXQ_FLAGS_NOREFCOUNT

2015-12-23 Thread Zhang, Helin



> -Original Message-
> From: Rich Lane [mailto:rich.lane at bigswitch.com]
> Sent: Wednesday, December 23, 2015 4:08 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin
> Subject: [PATCH] i40e: fix inverted check for ETH_TXQ_FLAGS_NOREFCOUNT
> 
> The no-refcount path was being taken without the application opting in to it.
> 
> Reported-by: Mike Stolarchuk 
> Signed-off-by: Rich Lane 
Acked-by: Helin Zhang 

Thanks for the good catch!

[dpdk-dev] DPDP crash with sr-iov (with ESXi 5.5 hypervisor)

2015-12-23 Thread Lu, Wenzhuo

Hi Vithal,
The number of VF queues is decided by PF. Suppose you use kernel driver for PF. 
So the queue number is decided by PF kernel driver.
I have a 82599ES, and find no matter ixgbevf or dpdk igb_uio is used, the rx 
queue number is 2. Frankly, I believe 2 is the expected number. Surprised that 
you get 8 when using ixgbevf.
Hope this can help. Thanks.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vithal Mohare
> Sent: Wednesday, December 23, 2015 12:32 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] DPDP crash with sr-iov (with ESXi 5.5 hypervisor)
> 
> Hi,
> 
> While initializing pci port (VF) DPDK is crashing while configuring the 
> device.
> Reason/location:
> PMD: rte_eth_dev_configure: ethdev port_id=1 nb_rx_queues=8 >
> 2
> EAL: Error - exiting with code: 1
> 
> System info:
> DPDK version: 2.0
> NIC: 82599EB, sr-iov enabled.
> SR-IOV config at ESXi 5.5 hypervisor host: max_vfs=2 Guest OS: Linux OS
> based.  Driver: ixgbevf.ko
> 
> VM is configured with 3 vCPUs.  Before linking the port to DPDK, I see that,
> pci device (VF) comes up with 8 rx/tx queues (using native kernel driver
> ixgbevf.ko, /sys/class/net/ethx/queues/*).  But DPDK code expect max
> queues for device to be '2' and hence the crash.   Am I missing anything here?
> Appreciate for any suggestions/fixes for the issue.
> 
> Thanks,
> -Vithal

[dpdk-dev] [PATCH v4 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

2015-12-23 Thread Huawei Xie

v4 changes:
 fix a silly typo in error handling when rte_pktmbuf_alloc fails
reported by haifeng

pre-allocate a bulk of mbufs instead of allocating one mbuf a time on demand

Signed-off-by: Gerald Rogers 
Signed-off-by: Huawei Xie 
Acked-by: Konstantin Ananyev 
Acked-by: Yuanhan Liu 
Tested-by: Yuanhan Liu 
---
 lib/librte_vhost/vhost_rxtx.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index bbf3fac..f10d534 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -576,6 +576,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t i;
uint16_t free_entries, entry_success = 0;
uint16_t avail_idx;
+   uint8_t alloc_err = 0;
+   uint8_t seg_num;

if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
RTE_LOG(ERR, VHOST_DATA,
@@ -609,6 +611,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,

LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n",
dev->device_fh, free_entries);
+
+   if (unlikely(rte_pktmbuf_alloc_bulk(mbuf_pool,
+   pkts, free_entries)) < 0) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "Failed to bulk allocating %d mbufs\n", free_entries);
+   return 0;
+   }
+
/* Retrieve all of the head indexes first to avoid caching issues. */
for (i = 0; i < free_entries; i++)
head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 
1)];
@@ -621,9 +631,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t vb_avail, vb_offset;
uint32_t seg_avail, seg_offset;
uint32_t cpy_len;
-   uint32_t seg_num = 0;
+   seg_num = 0;
struct rte_mbuf *cur;
-   uint8_t alloc_err = 0;
+

desc = &vq->desc[head[entry_success]];

@@ -654,13 +664,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
vq->used->ring[used_idx].id = head[entry_success];
vq->used->ring[used_idx].len = 0;

-   /* Allocate an mbuf and populate the structure. */
-   m = rte_pktmbuf_alloc(mbuf_pool);
-   if (unlikely(m == NULL)) {
-   RTE_LOG(ERR, VHOST_DATA,
-   "Failed to allocate memory for mbuf.\n");
-   break;
-   }
+   prev = cur = m = pkts[entry_success];
seg_offset = 0;
seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
cpy_len = RTE_MIN(vb_avail, seg_avail);
@@ -668,8 +672,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0);

seg_num++;
-   cur = m;
-   prev = m;
while (cpy_len != 0) {
rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, 
seg_offset),
(void *)((uintptr_t)(vb_addr + vb_offset)),
@@ -761,16 +763,23 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
cpy_len = RTE_MIN(vb_avail, seg_avail);
}

-   if (unlikely(alloc_err == 1))
+   if (unlikely(alloc_err))
break;

m->nb_segs = seg_num;

-   pkts[entry_success] = m;
vq->last_used_idx++;
entry_success++;
}

+   if (unlikely(alloc_err)) {
+   uint16_t i = entry_success;
+
+   m->nb_segs = seg_num;
+   for (; i < free_entries; i++)
+   rte_pktmbuf_free(pkts[i]);
+   }
+
rte_compiler_barrier();
vq->used->idx += entry_success;
/* Kick guest if required. */
-- 
1.8.1.4

[dpdk-dev] [PATCH v4 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-23 Thread Huawei Xie

v3 changes:
 move while after case 0
 add context about duff's device and why we use while loop in the commit
message

v2 changes:
 unroll the loop a bit to help the performance

rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.

There is related thread about this bulk API.
http://dpdk.org/dev/patchwork/patch/4718/
Thanks to Konstantin's loop unrolling.

Attached the wiki page about duff's device. It explains the performance
optimization through loop unwinding, and also the most dramatic use of
case label fall-through.
https://en.wikipedia.org/wiki/Duff%27s_device

In our implementation, we use while() loop rather than do{} while() loop
because we could not assume count is strictly positive. Using while()
loop saves one line of check if count is zero.

Signed-off-by: Gerald Rogers 
Signed-off-by: Huawei Xie 
Acked-by: Konstantin Ananyev 
---
 lib/librte_mbuf/rte_mbuf.h | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..3381c28 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1336,6 +1336,55 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct 
rte_mempool *mp)
 }

 /**
+ * Allocate a bulk of mbufs, initialize refcnt and reset the fields to default
+ * values.
+ *
+ *  @param pool
+ *The mempool from which mbufs are allocated.
+ *  @param mbufs
+ *Array of pointers to mbufs
+ *  @param count
+ *Array size
+ *  @return
+ *   - 0: Success
+ */
+static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
+struct rte_mbuf **mbufs, unsigned count)
+{
+   unsigned idx = 0;
+   int rc;
+
+   rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
+   if (unlikely(rc))
+   return rc;
+
+   switch (count % 4) {
+   case 0: while (idx != count) {
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 3:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 2:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 1:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   }
+   }
+   return 0;
+}
+
+/**
  * Attach packet mbuf to another packet mbuf.
  *
  * After attachment we refer the mbuf we attached as 'indirect',
-- 
1.8.1.4

[dpdk-dev] [PATCH v4 0/2] provide rte_pktmbuf_alloc_bulk API and call it in vhost dequeue

2015-12-23 Thread Huawei Xie

v4 changes:
 fix a silly typo in error handling when rte_pktmbuf_alloc fails

v3 changes:
 move while after case 0
 add context about duff's device and why we use while loop in the commit
message

v2 changes:
 unroll the loop in rte_pktmbuf_alloc_bulk to help the performance

For symmetric rte_pktmbuf_free_bulk, if the app knows in its scenarios
their mbufs are all simple mbufs, i.e meet the following requirements:
 * no multiple segments
 * not indirect mbuf
 * refcnt is 1
 * belong to the same mbuf memory pool,
it could directly call rte_mempool_put to free the bulk of mbufs,
otherwise rte_pktmbuf_free_bulk has to call rte_pktmbuf_free to free
the mbuf one by one.
This patchset will not provide this symmetric implementation.

Huawei Xie (2):
  mbuf: provide rte_pktmbuf_alloc_bulk API
  vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

 lib/librte_mbuf/rte_mbuf.h| 49 +++
 lib/librte_vhost/vhost_rxtx.c | 35 +++
 2 files changed, 71 insertions(+), 13 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [Question] How pmd virtio works without UIO?

2015-12-23 Thread Xie, Huawei

On 12/23/2015 10:57 AM, Yuanhan Liu wrote:
> On Wed, Dec 23, 2015 at 10:41:57AM +0800, Peter Xu wrote:
>> On Wed, Dec 23, 2015 at 10:01:35AM +0800, Yuanhan Liu wrote:
>>> On Tue, Dec 22, 2015 at 05:56:41PM +0800, Peter Xu wrote:
 On Tue, Dec 22, 2015 at 04:32:46PM +0800, Yuanhan Liu wrote:
> Actually, you are right. I mentioned in the last email that this is
> for configuration part. To answer your question in this email, you
> will not be able to go that further (say initiating virtio pmd) if
> you don't unbind the origin virtio-net driver, and bind it to igb_uio
> (or something similar).
>
> The start point is from rte_eal_pci_scan, where the sub-function
> pci_san_one just initates a DPDK bond driver.
 I am not sure whether I do understand your meaning correctly
 (regarding "you willl not be able to go that furture"): The problem
 is that, we _can_ run testpmd without unbinding the ports and bind
 to UIO or something. What we need to do is boot the guest, reserve
 huge pages, and run testpmd (keeping its kernel driver as
 "virtio-pci"). In pci_scan_one():

if (!ret) {
if (!strcmp(driver, "vfio-pci"))
dev->kdrv = RTE_KDRV_VFIO;
else if (!strcmp(driver, "igb_uio"))
dev->kdrv = RTE_KDRV_IGB_UIO;
else if (!strcmp(driver, "uio_pci_generic"))
dev->kdrv = RTE_KDRV_UIO_GENERIC;
else
dev->kdrv = RTE_KDRV_UNKNOWN;
} else
dev->kdrv = RTE_KDRV_UNKNOWN;

 I think it should be going to RTE_KDRV_UNKNOWN
 (driver=="virtio-pci") here.
>>> Sorry, I simply overlook that. I was thinking it will quit here for
>>> the RTE_KDRV_UNKNOWN case.
>>>
 I tried to run IO and it could work,
 but I am not sure whether it is safe, and how.
>>> I also did a quick test then, however, with the virtio 1.0 patchset
>>> I sent before, which sets the RTE_PCI_DRV_NEED_MAPPING, resulting to
>>> pci_map_device() failure and virtio pmd is not initiated at all.
>> Then, will the patch work with ioport way to access virtio devices?
> Yes.
>
 Also, I am not sure whether I need to (at least) unbind the
 virtio-pci driver, so that there should have no kernel driver
 running for the virtio device before DPDK using it.
>>> Why not? That's what the DPDK document asked to do
>>> (http://dpdk.org/doc/guides/linux_gsg/build_dpdk.html):
>>>
>>> 3.6. Binding and Unbinding Network Ports to/from the Kernel Modules
>>> 
>>> As of release 1.4, DPDK applications no longer automatically unbind
>>> all supported network ports from the kernel driver in use. Instead,
>>> all ports that are to be used by an DPDK application must be bound
>>> to the uio_pci_generic, igb_uio or vfio-pci module before the
>>> application is run. Any network ports under Linux* control will be
>>> ignored by the DPDK poll-mode drivers and cannot be used by the
>>> application.
>> This seems obsolete? since it's not covering ioport.
> I don't think so. Above is for how to run DPDK applications. ioport
> is just a (optional) way to access PCI resource in a specific PMD.
>
> And, above speicification avoids your concerns, that two drivers try
> to manipulate same device concurrently, doesn't it?
>
> And, it is saying "any network ports under Linux* control will be
> ignored by the DPDK poll-mode drivers and cannot be used by the
> application", so that the case you were saying that virtio pmd
> continues to work without the bind looks like a bug to me.
>
> Can anyone confirm that?

That document isn't accurate. virtio doesn't require binding to UIO
driver if it uses PORT IO. The PORT IO commit said it is because UIO
isn't secure, but avoid using uio doesn't bring more security as virtio
PMD still could ask device to DMA into any memory.
The thing we at least we might do is fail in virtio_resource_init if
kernel driver is still manipulating this device. This saves the effort
users use blacklist option and avoids the driver conflict.

>
>   --yliu
>

[dpdk-dev] DPDP crash with sr-iov (with ESXi 5.5 hypervisor)

2015-12-23 Thread Vithal Mohare

Hi,

While initializing pci port (VF) DPDK is crashing while configuring the device. 
 Reason/location:
PMD: rte_eth_dev_configure: ethdev port_id=1 nb_rx_queues=8 > 2
EAL: Error - exiting with code: 1

System info:
DPDK version: 2.0
NIC: 82599EB, sr-iov enabled.
SR-IOV config at ESXi 5.5 hypervisor host: max_vfs=2
Guest OS: Linux OS based.  Driver: ixgbevf.ko

VM is configured with 3 vCPUs.  Before linking the port to DPDK, I see that, 
pci device (VF) comes up with 8 rx/tx queues (using native kernel driver 
ixgbevf.ko, /sys/class/net/ethx/queues/*).  But DPDK code expect max queues for 
device to be '2' and hence the crash.   Am I missing anything here?  Appreciate 
for any suggestions/fixes for the issue.

Thanks,
-Vithal

[dpdk-dev] [PATCH v4 2/6] fm10k: setup rx queue interrupts for PF and VF

2015-12-23 Thread He, Shaopeng


> -Original Message-
> From: Qiu, Michael
> Sent: Tuesday, December 22, 2015 3:28 PM
> To: He, Shaopeng; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 2/6] fm10k: setup rx queue interrupts for
> PF and VF
> 
> On 12/21/2015 6:20 PM, Shaopeng He wrote:
> > In interrupt mode, each rx queue can have one interrupt to notify the
> > up layer application when packets are available in that queue. Some
> > queues also can share one interrupt.
> > Currently, fm10k needs one separate interrupt for mailbox. So, only
> > those drivers which support multiple interrupt vectors e.g. vfio-pci
> > can work in fm10k interrupt mode.
> > This patch uses the RXINT/INT_MAP registers to map interrupt causes
> > (rx queue and other events) to vectors, and enable these interrupts
> > through kernel drivers like vfio-pci.
> >
> > Signed-off-by: Shaopeng He 
> > Acked-by: Jing Chen 
> > ---
> >  doc/guides/rel_notes/release_2_3.rst |   2 +
> >  drivers/net/fm10k/fm10k.h|   3 ++
> >  drivers/net/fm10k/fm10k_ethdev.c | 101
> +++
> >  3 files changed, 95 insertions(+), 11 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_2_3.rst
> > b/doc/guides/rel_notes/release_2_3.rst
> > index 99de186..2cb5ebd 100644
> > --- a/doc/guides/rel_notes/release_2_3.rst
> > +++ b/doc/guides/rel_notes/release_2_3.rst
> > @@ -4,6 +4,8 @@ DPDK Release 2.3
> >  New Features
> >  
> >
> > +* **Added fm10k Rx interrupt support.**
> > +
> >
> >  Resolved Issues
> >  ---
> > diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
> > index e2f677a..770d6ba 100644
> > --- a/drivers/net/fm10k/fm10k.h
> > +++ b/drivers/net/fm10k/fm10k.h
> > @@ -129,6 +129,9 @@
> >  #define RTE_FM10K_TX_MAX_FREE_BUF_SZ64
> >  #define RTE_FM10K_DESCS_PER_LOOP4
> >
> > +#define FM10K_MISC_VEC_ID   RTE_INTR_VEC_ZERO_OFFSET
> > +#define FM10K_RX_VEC_START  RTE_INTR_VEC_RXTX_OFFSET
> > +
> >  #define FM10K_SIMPLE_TX_FLAG
> ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
> > ETH_TXQ_FLAGS_NOOFFLOADS)
> >
> > diff --git a/drivers/net/fm10k/fm10k_ethdev.c
> > b/drivers/net/fm10k/fm10k_ethdev.c
> > index d39c33b..a34c5e2 100644
> > --- a/drivers/net/fm10k/fm10k_ethdev.c
> > +++ b/drivers/net/fm10k/fm10k_ethdev.c
> > @@ -54,6 +54,8 @@
> >  /* Number of chars per uint32 type */  #define CHARS_PER_UINT32
> > (sizeof(uint32_t))  #define BIT_MASK_PER_UINT32 ((1 <<
> > CHARS_PER_UINT32) - 1)
> > +/* default 1:1 map from queue ID to interrupt vector ID */ #define
> > +Q2V(dev, queue_id) (dev->pci_dev->intr_handle.intr_vec[queue_id])
> >
> >  static void fm10k_close_mbx_service(struct fm10k_hw *hw);  static
> > void fm10k_dev_promiscuous_enable(struct rte_eth_dev *dev); @@ -
> 109,6
> > +111,8 @@ struct fm10k_xstats_name_off fm10k_hw_stats_tx_q_strings[]
> =
> > {
> >
> >  #define FM10K_NB_XSTATS (FM10K_NB_HW_XSTATS +
> FM10K_MAX_QUEUES_PF * \
> > (FM10K_NB_RX_Q_XSTATS + FM10K_NB_TX_Q_XSTATS))
> > +static int
> > +fm10k_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
> >
> >  static void
> >  fm10k_mbx_initlock(struct fm10k_hw *hw) @@ -687,6 +691,7 @@ static
> > int  fm10k_dev_rx_init(struct rte_eth_dev *dev)  {
> > struct fm10k_hw *hw =
> > FM10K_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +   struct rte_intr_handle *intr_handle = &dev->pci_dev->intr_handle;
> > int i, ret;
> > struct fm10k_rx_queue *rxq;
> > uint64_t base_addr;
> > @@ -694,10 +699,23 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
> > uint32_t rxdctl = FM10K_RXDCTL_WRITE_BACK_MIN_DELAY;
> > uint16_t buf_size;
> >
> > -   /* Disable RXINT to avoid possible interrupt */
> > -   for (i = 0; i < hw->mac.max_queues; i++)
> > +   /* enable RXINT for interrupt mode */
> > +   i = 0;
> > +   if (rte_intr_dp_is_en(intr_handle)) {
> > +   for (; i < dev->data->nb_rx_queues; i++) {
> > +   FM10K_WRITE_REG(hw, FM10K_RXINT(i), Q2V(dev,
> i));
> > +   if (hw->mac.type == fm10k_mac_pf)
> > +   FM10K_WRITE_REG(hw,
> FM10K_ITR(Q2V(dev, i)),
> > +   FM10K_ITR_AUTOMASK |
> FM10K_ITR_MASK_CLEAR);
> > +   else
> > +   FM10K_WRITE_REG(hw,
> FM10K_VFITR(Q2V(dev, i)),
> > +   FM10K_ITR_AUTOMASK |
> FM10K_ITR_MASK_CLEAR);
> > +   }
> > +   }
> > +   /* Disable other RXINT to avoid possible interrupt */
> > +   for (; i < hw->mac.max_queues; i++)
> > FM10K_WRITE_REG(hw, FM10K_RXINT(i),
> > -   3 << FM10K_RXINT_TIMER_SHIFT);
> > +   3 << FM10K_RXINT_TIMER_SHIFT);
> >
> > /* Setup RX queues */
> > for (i = 0; i < dev->data->nb_rx_queues; ++i) { @@ -1053,6 +1071,9
> > @@ fm10k_dev_start(struct rte_eth_dev *dev)
> > return diag;
> > }
> >
> > +   if (fm10k_dev_rxq_interrupt_setup(dev))
> > +   return -EIO;
>

[dpdk-dev] [PATCH v3 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

2015-12-23 Thread Huawei Xie

pre-allocate a bulk of mbufs instead of allocating one mbuf a time on demand

Signed-off-by: Gerald Rogers 
Signed-off-by: Huawei Xie 
Acked-by: Konstantin Ananyev 
---
 lib/librte_vhost/vhost_rxtx.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index bbf3fac..0faae58 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -576,6 +576,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t i;
uint16_t free_entries, entry_success = 0;
uint16_t avail_idx;
+   uint8_t alloc_err = 0;
+   uint8_t seg_num;

if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) {
RTE_LOG(ERR, VHOST_DATA,
@@ -609,6 +611,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,

LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n",
dev->device_fh, free_entries);
+
+   if (unlikely(rte_pktmbuf_alloc_bulk(mbuf_pool,
+   pkts, free_entries)) < 0) {
+   RTE_LOG(ERR, VHOST_DATA,
+   "Failed to bulk allocating %d mbufs\n", free_entries);
+   return 0;
+   }
+
/* Retrieve all of the head indexes first to avoid caching issues. */
for (i = 0; i < free_entries; i++)
head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 
1)];
@@ -621,9 +631,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
uint32_t vb_avail, vb_offset;
uint32_t seg_avail, seg_offset;
uint32_t cpy_len;
-   uint32_t seg_num = 0;
+   seg_num = 0;
struct rte_mbuf *cur;
-   uint8_t alloc_err = 0;
+

desc = &vq->desc[head[entry_success]];

@@ -654,13 +664,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
vq->used->ring[used_idx].id = head[entry_success];
vq->used->ring[used_idx].len = 0;

-   /* Allocate an mbuf and populate the structure. */
-   m = rte_pktmbuf_alloc(mbuf_pool);
-   if (unlikely(m == NULL)) {
-   RTE_LOG(ERR, VHOST_DATA,
-   "Failed to allocate memory for mbuf.\n");
-   break;
-   }
+   prev = cur = m = pkts[entry_success];
seg_offset = 0;
seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
cpy_len = RTE_MIN(vb_avail, seg_avail);
@@ -668,8 +672,6 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0);

seg_num++;
-   cur = m;
-   prev = m;
while (cpy_len != 0) {
rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, 
seg_offset),
(void *)((uintptr_t)(vb_addr + vb_offset)),
@@ -761,16 +763,23 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
cpy_len = RTE_MIN(vb_avail, seg_avail);
}

-   if (unlikely(alloc_err == 1))
+   if (unlikely(alloc_err))
break;

m->nb_segs = seg_num;

-   pkts[entry_success] = m;
vq->last_used_idx++;
entry_success++;
}

+   if (unlikely(alloc_err)) {
+   uint16_t i = entry_success;
+
+   m->nb_segs = seg_num;
+   for (; i < free_entries; i++)
+   rte_pktmbuf_free(pkts[entry_success]);
+   }
+
rte_compiler_barrier();
vq->used->idx += entry_success;
/* Kick guest if required. */
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API

2015-12-23 Thread Huawei Xie

v3 changes:
 move while after case 0
 add context about duff's device and why we use while loop in the commit
message

v2 changes:
 unroll the loop a bit to help the performance

rte_pktmbuf_alloc_bulk allocates a bulk of packet mbufs.

There is related thread about this bulk API.
http://dpdk.org/dev/patchwork/patch/4718/
Thanks to Konstantin's loop unrolling.

Attached the wiki page about duff's device. It explains the performance
optimization through loop unwinding, and also the most dramatic use of
case label fall-through.
https://en.wikipedia.org/wiki/Duff%27s_device

In our implementation, we use while() loop rather than do{} while() loop
because we could not assume count is strictly positive. Using while()
loop saves one line of check if count is zero.

Signed-off-by: Gerald Rogers 
Signed-off-by: Huawei Xie 
Acked-by: Konstantin Ananyev 
---
 lib/librte_mbuf/rte_mbuf.h | 49 ++
 1 file changed, 49 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..3381c28 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1336,6 +1336,55 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct 
rte_mempool *mp)
 }

 /**
+ * Allocate a bulk of mbufs, initialize refcnt and reset the fields to default
+ * values.
+ *
+ *  @param pool
+ *The mempool from which mbufs are allocated.
+ *  @param mbufs
+ *Array of pointers to mbufs
+ *  @param count
+ *Array size
+ *  @return
+ *   - 0: Success
+ */
+static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
+struct rte_mbuf **mbufs, unsigned count)
+{
+   unsigned idx = 0;
+   int rc;
+
+   rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
+   if (unlikely(rc))
+   return rc;
+
+   switch (count % 4) {
+   case 0: while (idx != count) {
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 3:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 2:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   case 1:
+   RTE_MBUF_ASSERT(rte_mbuf_refcnt_read(mbufs[idx]) == 0);
+   rte_mbuf_refcnt_set(mbufs[idx], 1);
+   rte_pktmbuf_reset(mbufs[idx]);
+   idx++;
+   }
+   }
+   return 0;
+}
+
+/**
  * Attach packet mbuf to another packet mbuf.
  *
  * After attachment we refer the mbuf we attached as 'indirect',
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 0/2] provide rte_pktmbuf_alloc_bulk API and call it in vhost dequeue

2015-12-23 Thread Huawei Xie

v3 changes:
 move while after case 0
 add context about duff's device and why we use while loop in the commit
message

v2 changes:
 unroll the loop in rte_pktmbuf_alloc_bulk to help the performance

For symmetric rte_pktmbuf_free_bulk, if the app knows in its scenarios
their mbufs are all simple mbufs, i.e meet the following requirements:
 * no multiple segments
 * not indirect mbuf
 * refcnt is 1
 * belong to the same mbuf memory pool,
it could directly call rte_mempool_put to free the bulk of mbufs,
otherwise rte_pktmbuf_free_bulk has to call rte_pktmbuf_free to free
the mbuf one by one.
This patchset will not provide this symmetric implementation.



Huawei Xie (2):
  mbuf: provide rte_pktmbuf_alloc_bulk API
  vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

 lib/librte_mbuf/rte_mbuf.h| 49 +++
 lib/librte_vhost/vhost_rxtx.c | 35 +++
 2 files changed, 71 insertions(+), 13 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH] i40e: fix inverted check for ETH_TXQ_FLAGS_NOREFCOUNT

2015-12-23 Thread Rich Lane

The no-refcount path was being taken without the application opting in to it.

Reported-by: Mike Stolarchuk 
Signed-off-by: Rich Lane 
---
 drivers/net/i40e/i40e_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 39d94ec..d0bdeb9 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1762,7 +1762,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
for (i = 0; i < txq->tx_rs_thresh; i++)
rte_prefetch0((txep + i)->mbuf);

-   if (!(txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT)) {
+   if (txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) {
for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
rte_mempool_put(txep->mbuf->pool, txep->mbuf);
txep->mbuf = NULL;
-- 
1.9.1

63 matches

Mail list logo