[dpdk-dev] removing mbuf error flags

2016-04-29 Thread Olivier MATZ
Hi,

On 04/29/2016 10:00 PM, Arnon Warshavsky wrote:
> 
> 
> On Fri, Apr 29, 2016 at 9:24 PM, Jay Rolette  > wrote:
> 
> On Fri, Apr 29, 2016 at 1:16 PM, Don Provan  > wrote:
> 
> > >From: Olivier Matz [mailto:olivier.matz at 6wind.com 
> ]
> > >Subject: [dpdk-dev] removing mbuf error flags
> > >
> > >My opinion is that invalid packets should not be given to the 
> application
> > and only a statistic counter should be incremented.
> >
> > The idea of an application that handles bad packets is perfectly valid.
> > Most applications don't want to see them, of course, but, conceptually,
> > some applications would want to ask for bad packets because they are
> > specifically designed to handle various networking problems including 
> those
> > that result in bad packets that the application can look at and report.
> > Furthermore, it makes technical sense for DPDK to support such 
> applications.
> >
> > Having said that, I have no idea if that's why that field was added, 
> and I
> > don?t myself care if DPDK provides that feature in the future. I just
> > thought I'd put the idea out there in case it makes any difference to 
> you.
> > If it were me, I'd probably decide it isn't hurting anything and not 
> bother
> > to remove it in case some day someone wants to implement that feature in
> > one driver or another.
> >
> 
> Yep. Pretty much any networking security product needs to see malformed
> packets.
> 
> Jay
> 
> 
> +1 for letting the application see bad packets and decide what to do
> with them.
> We had some zero order insertion issues in the past where the ability to
> let the application capture malformed/unexpected packets was very helpful.

The point is today it's broken, and no application running on top of
DPDK check these flags because they are set to 0. If we decide to
assign a value to these flags, it will break the working applications
because they don't expect to receive invalid packets. Maybe a proper
solution would be to enable these flags on demand in PMD configuration,
and add a feature flag for this feature.

I think we should not keep things half-done too long. It's
confusing and useless as-is.

If some applications really need to see these malformed packets,
the API has to define in which conditions these flags are set and
what is expected in the mbuf data when one of these flags is set.
The only documentation we have now is:

  PKT_RX_OVERSIZE: Num of desc of an RX pkt oversize.
  PKT_RX_HBUF_OVERFLOW: Header buffer overflow.
  PKT_RX_RECIP_ERR: Hardware processing error.
  PKT_RX_MAC_ERR: MAC error.

If it's not better defined, I don't know how an application could
use these flags.

Also, the PMDs should not behave differently by default.

If someone commit on working on this in the comming weeks, I'll be
happy to help, else I still think the current state has to be reverted.


Regards,
Olivier


[dpdk-dev] [RFC 0/4] Include resources in tests

2016-04-29 Thread Jan Viktorin
On Fri, 29 Apr 2016 15:42:27 +0100
Bruce Richardson  wrote:

> On Fri, Apr 29, 2016 at 03:11:32PM +0200, Jan Viktorin wrote:
> > Hello,
> > 
> > this patch set introduces a mechanism to include a resource (in general a 
> > blob)
> > into the test binary. This allows to make tests less dependent on the target
> > testing environment. The first use case is testing of PCI bus scan by 
> > changing
> > the hard-coded path (/sys/bus/pci/devices) to something different and 
> > provide
> > a fake tree of devices with the test. It can help with testing of 
> > device-tree
> > parsing as I've proposed in [1] where such mechanism was missing at that 
> > time.
> > I'd like to use such framework for the SoC infra testing as well.
> > 
> > The patch set introduces a struct resource into the app/test. The resource 
> > is
> > generic to include any kind of binary data. The binary data can be created 
> > in
> > C or linked as an object file (created by objcopy). I am not sure where to
> > place the objcopy logic and how to perform guessing of the objcopy arguments
> > as they are pretty non-standard.
> > 
> > To include a complex resource (a file hierarchy), the last patch implements
> > an archive extraction logic. So, it is possible to include a tar archive and
> > unpack it before a test starts. Any ideas how to do this in a better way are
> > welcome.
> > 
> > [1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/36545
> > 
> > Regards
> > Jan Viktorin
> >   
> 
> Hi Jan,
> 
> this looks really interesting, especially since just yesterday I was looking 
> at
> taking the million-entry lpm test routing table out of the C code and into a
> separate resource file in this case an ini file.

If it is an automatic test case which can show some regressions over
time then it is a good example of the resource usage.

> 
> In terms of a solution, I'm not convinced of the placing of the blobs inside 
> the
> test binary. I think a better solution would be to allow the different 
> autotests
> to take parameters from the commandline, so that the user can specify the path
> to the file to use for the test. What would be your opinion of such a scheme?

I think that we are already at this stage. You can read parameters e.g.
from the environment vars (so in fact, no changes to the DPDK testing
code are needed). The thing is that I want to say "run tests" and don't
care about any parameters to receive the result.

It might be possible to pass parameters to the tests (optionally) to
reuse the testing code base for different cases. However, initially,
the test (in my opinion) should run on any architecture on any device
without any configuration and return consistent results.

How can I test probing of PCI devices on my PC when I have different
network card then the author of the test? I cannot or I have to pass
parameters which is not what I want as it complicates every single
such test (and I have to understand those tests or understand the
parameters syntax even though the code is unrelated to my work). I just
want to check that the DPDK works as expected - without regressions.

And what about the case when I have no such card (running automatic
tests on a build server)?

I don't like including the resources in binaries but I didn't find
any better way yet. I need to easily install the tests together with
the testing data (resources) to an embedded device and perform the
tests on the target system. When you are testing x86 code on your x86
you don't have to care too much about the location of resources. But I
do. And there is no standard way (at the moment) how to install the
resources together with the testing code.

What I like on this solution is that the DPDK git repository would
contain the testing data as a real file hierarchy (e.g. the
fake /sys/bus/...). After build and install of DPDK, the file hierachy
is packed (and compressed if needed) together with the tests. So moving
to the target platform works without any other changes to the
testing/install infrastructure. I could potentially run PCI tests on my
ARM board without any PCI available and don't have to ignore the
failures of those tests (as they simply pass).

Regards
Jan

> 
> /Bruce



-- 
  Jan ViktorinE-mail: Viktorin at RehiveTech.com
  System ArchitectWeb:www.RehiveTech.com
  RehiveTech
  Brno, Czech Republic


[dpdk-dev] Flow Director Example?

2016-04-29 Thread Alex Forster
Hi guys, apologies if this is the wrong list, but the others look pretty bare.

We have a 32 core server that has two X520-QDA1's NICs with 2x10G ports plugged 
into each. I'm using 2016.1 (latest stable) with ixgbe 4.3.15 (latest stable). 
I'm setting up 8 RX queues per port, and I'd like Flow Director in signature 
mode (?) to place packets into queues based on a hash of destination IPv4 or 
IPv6 address. However, I can't figure out rte_fdir_conf, and despite a good 
amount of trial and error, each of my ports are still only using one of the RX 
queues I set up.

Would anyone be able to point me in the right direction here? Thanks in advance!

Alex Forster


[dpdk-dev] [PATCH v2] examples/performance-thread: fix size of destination port ids in l3fwd-thread

2016-04-29 Thread Tomasz Kulasek
After extending IPv4 next hop in lpm library, size of dst_port array was
changed from 16 to 32 bits in l3fwd-thread example, without modification
of the rest of path written for 16 bit value.

This patch uses similar approach for fix, like in commit 8353a36a9b4b
("examples/l3fwd: fix size of destination port ids"), restoring 16 bit size
for destination port ids and doing necessary conversion from 32 to 16 bit
after lpm_lookupx4.

Fixes: dc81ebbacaeb ("lpm: extend IPv4 next hop field")

Signed-off-by: Tomasz Kulasek 
---
v2:
 - split into two patches

 examples/performance-thread/l3fwd-thread/main.c |   20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/examples/performance-thread/l3fwd-thread/main.c 
b/examples/performance-thread/l3fwd-thread/main.c
index 4ad16ea..c008d6a 100644
--- a/examples/performance-thread/l3fwd-thread/main.c
+++ b/examples/performance-thread/l3fwd-thread/main.c
@@ -1307,7 +1307,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid)
  * to BAD_PORT value.
  */
 static inline __attribute__((always_inline)) void
-rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint32_t *dp, uint32_t ptype)
+rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
 {
uint8_t ihl;

@@ -1326,7 +1326,7 @@ rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint32_t *dp, 
uint32_t ptype)
 }

 #else
-#definerfc1812_process(mb, dp) do { } while (0)
+#definerfc1812_process(mb, dp, ptype)  do { } while (0)
 #endif /* DO_RFC_1812_CHECKS */
 #endif /* APP_LOOKUP_LPM && ENABLE_MULTI_BUFFER_OPTIMIZE */

@@ -1364,7 +1364,7 @@ get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, 
uint8_t portid)
 }

 static inline void
-process_packet(struct rte_mbuf *pkt, uint32_t *dst_port, uint8_t portid)
+process_packet(struct rte_mbuf *pkt, uint16_t *dst_port, uint8_t portid)
 {
struct ether_hdr *eth_hdr;
struct ipv4_hdr *ipv4_hdr;
@@ -1431,9 +1431,9 @@ processx4_step1(struct rte_mbuf *pkt[FWDSTEP],
 static inline void
 processx4_step2(__m128i dip,
uint32_t ipv4_flag,
-   uint32_t portid,
+   uint8_t portid,
struct rte_mbuf *pkt[FWDSTEP],
-   uint32_t dprt[FWDSTEP])
+   uint16_t dprt[FWDSTEP])
 {
rte_xmm_t dst;
const __m128i bswap_mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11,
@@ -1445,7 +1445,11 @@ processx4_step2(__m128i dip,
/* if all 4 packets are IPV4. */
if (likely(ipv4_flag)) {
rte_lpm_lookupx4(RTE_PER_LCORE(lcore_conf)->ipv4_lookup_struct, 
dip,
-   dprt, portid);
+   dst.u32, portid);
+
+   /* get rid of unused upper 16 bit for each dport. */
+   dst.x = _mm_packs_epi32(dst.x, dst.x);
+   *(uint64_t *)dprt = dst.u64[0];
} else {
dst.x = dip;
dprt[0] = get_dst_port(pkt[0], dst.u32[0], portid);
@@ -1460,7 +1464,7 @@ processx4_step2(__m128i dip,
  * Perform RFC1812 checks and updates for IPV4 packets.
  */
 static inline void
-processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint32_t dst_port[FWDSTEP])
+processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 {
__m128i te[FWDSTEP];
__m128i ve[FWDSTEP];
@@ -1679,7 +1683,7 @@ process_burst(struct rte_mbuf *pkts_burst[MAX_PKT_BURST], 
int nb_rx,
int32_t k;
uint16_t dlp;
uint16_t *lp;
-   uint32_t dst_port[MAX_PKT_BURST];
+   uint16_t dst_port[MAX_PKT_BURST];
__m128i dip[MAX_PKT_BURST / FWDSTEP];
uint32_t ipv4_flag[MAX_PKT_BURST / FWDSTEP];
uint16_t pnum[MAX_PKT_BURST + 1];
-- 
1.7.9.5



[dpdk-dev] ethtool doesnt work on some interface after unbinding dpdk

2016-04-29 Thread Gopakumar Choorakkot Edakkunni
Just to update this thread. With power management for IGB disabled, this
problem is not seen .. So for now thats the "workaround"

Rgds,
Gopa.

On Mon, Apr 18, 2016 at 3:08 AM, Remy Horton  wrote:

>
> On 15/04/2016 23:56, Gopakumar Choorakkot Edakkunni wrote:
>
>> This time the problem statement is more narrowed down.
>>
>> 1. dpdk is enabled on the interface, interfaces bound to igb_uio
>> 3. kill the process using dpdk
>> 3. rmmod rte_kni
>> 4. rmmod igb_uio
>> 5. bind interface to igb
>> 6. ethtool, ifconfig up/down etc.. works for approximately 30 seconds,
>> and then stops working
>>
>
> Hmm.. can you try that but with rte_kni left out completely? KNI hooks
> into the Linux network stack and think it at the least needs eliminating as
> a casual factor. Can you also try using uio_pci_generic rather than igb_uio?
>
> Those aside, I'm suspecting driver issues, so seeing if I can get one of
> the driver test guys to have a look at this..
>
>
> Regards,
>
> ..Remy
>


[dpdk-dev] removing mbuf error flags

2016-04-29 Thread Don Provan
>From: Olivier Matz [mailto:olivier.matz at 6wind.com] 
>Subject: [dpdk-dev] removing mbuf error flags
>
>My opinion is that invalid packets should not be given to the application and 
>only a statistic counter should be incremented.

The idea of an application that handles bad packets is perfectly valid. Most 
applications don't want to see them, of course, but, conceptually, some 
applications would want to ask for bad packets because they are specifically 
designed to handle various networking problems including those that result in 
bad packets that the application can look at and report. Furthermore, it makes 
technical sense for DPDK to support such applications.

Having said that, I have no idea if that's why that field was added, and I 
don?t myself care if DPDK provides that feature in the future. I just thought 
I'd put the idea out there in case it makes any difference to you. If it were 
me, I'd probably decide it isn't hurting anything and not bother to remove it 
in case some day someone wants to implement that feature in one driver or 
another.

-don provan
dprovan at bivio.net



[dpdk-dev] [PATCH] ip_frag : Fix double-free of chained mbufs

2016-04-29 Thread Thomas Monjalon
> > If any fragment hole is found in ipv4_frag_reassemble() and 
> > ipv6_frag_reassemble(),
> > whole ip_frag_pkt mbufs are moved to death-row. Any mbufs already chained to
> > another mbuf are freed multiple times as there are still in ip_frag_pkt 
> > array.
> > 
> > Signed-off-by: cychong 
> 
> Acked-by: Konstantin Ananyev 

Applied, thanks


[dpdk-dev] [RFC PATCH] ivshmem ring aliases

2016-04-29 Thread Thomas Monjalon
2016-02-24 11:33, David Verbeiren:
> The goal of this parch is to allow VMs to use standard ring names regardless 
> of the names
> given to the rings by host environment. It applies to configurations using 
> ivshmem.
> 
> With shared memory rings, all VMs share a single namespace for the rings. 
> However, a VM
> will typically expect to find its rings with a pre-determined name (e.g. 
> p1_rx, p1_tx)
> regardless of how it's deployed, inserted in a service chain, or of which 
> other VMs are
> deployed alongside it. Hence, it is desirable to introduce a level of 
> indirection where
> the host can set a mapping from the actual ring names (e.g. dpdkr0_rx|tx with 
> OVS) and
> the names that will be visible in the VM. This patch provides a simple 
> implementation
> of such a mapping scheme.
> 
> Since the mapping must be VM specific, the aliases are inserted into the 
> IVSHMEM metadata
> area by the host and the guest side uses thoses aliases when doing 
> rte_ring_lookup().
> 
> A new function, rte_ivshmem_add_ring_alias() is provided in librte_ivshmem to 
> populate
> alias entries in the host environment when creating the per-VM metadata.

I'm still not sure this library is a good idea at all.
This patch continue the tradition of librte_ivshmem by adding more
#ifdef in the code (in rte_ring here).
We could also comment the compile time values or the checkpatch warnings.
But more importantly, what is the use case of this library and why is
it important to have such support in DPDK?

> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -352,6 +352,12 @@ rte_ring_lookup(const char *name)
>   struct rte_ring *r = NULL;
>   struct rte_ring_list *ring_list;
>  
> +#ifdef RTE_LIBRTE_IVSHMEM
> + const char * target_name  = rte_eal_ivshmem_alias_get(name);
> + if (target_name)
> + name = target_name;
> +#endif

This #ifdef should not exist.


[dpdk-dev] [PATCH] examples: remove useless checking

2016-04-29 Thread Mauricio Vasquez B
The rte_eth_dev_count() function will never return a value greater
than RTE_MAX_ETHPORTS, so that checking is useless.

Signed-off-by: Mauricio Vasquez B 
---
 app/proc_info/main.c | 4 
 app/test/test_pmd_perf.c | 3 ---
 doc/guides/sample_app_ug/l2_forward_job_stats.rst| 3 ---
 doc/guides/sample_app_ug/l2_forward_real_virtual.rst | 3 ---
 doc/guides/sample_app_ug/link_status_intr.rst| 3 ---
 examples/dpdk_qat/main.c | 2 --
 examples/ip_fragmentation/main.c | 4 +---
 examples/ip_reassembly/main.c| 4 +---
 examples/ipsec-secgw/ipsec-secgw.c   | 4 
 examples/l2fwd-crypto/main.c | 3 ---
 examples/l2fwd-ivshmem/host/host.c   | 3 ---
 examples/l2fwd-jobstats/main.c   | 3 ---
 examples/l2fwd-keepalive/main.c  | 3 ---
 examples/l2fwd/main.c| 3 ---
 examples/l3fwd-acl/main.c| 2 --
 examples/l3fwd-power/main.c  | 3 ---
 examples/l3fwd-vf/main.c | 2 --
 examples/l3fwd/main.c| 2 --
 examples/link_status_interrupt/main.c| 3 ---
 examples/multi_process/l2fwd_fork/main.c | 3 ---
 examples/performance-thread/l3fwd-thread/main.c  | 2 --
 examples/tep_termination/main.c  | 2 --
 examples/vhost/main.c| 2 --
 examples/vhost_xen/main.c| 2 --
 examples/vmdq/main.c | 2 --
 examples/vmdq_dcb/main.c | 2 --
 26 files changed, 2 insertions(+), 70 deletions(-)

diff --git a/app/proc_info/main.c b/app/proc_info/main.c
index 341176d..5f83092 100644
--- a/app/proc_info/main.c
+++ b/app/proc_info/main.c
@@ -327,10 +327,6 @@ main(int argc, char **argv)
if (nb_ports == 0)
rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

-
-   if (nb_ports > RTE_MAX_ETHPORTS)
-   nb_ports = RTE_MAX_ETHPORTS;
-
/* If no port mask was specified*/
if (enabled_port_mask == 0)
enabled_port_mask = 0x;
diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 59803f7..3d56cd2 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -709,9 +709,6 @@ test_pmd_perf(void)
return -1;
}

-   if (nb_ports > RTE_MAX_ETHPORTS)
-   nb_ports = RTE_MAX_ETHPORTS;
-
nb_lcores = rte_lcore_count();

memset(lcore_conf, 0, sizeof(lcore_conf));
diff --git a/doc/guides/sample_app_ug/l2_forward_job_stats.rst 
b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
index 03d9977..2444e36 100644
--- a/doc/guides/sample_app_ug/l2_forward_job_stats.rst
+++ b/doc/guides/sample_app_ug/l2_forward_job_stats.rst
@@ -238,9 +238,6 @@ in the *DPDK Programmer's Guide* and the *DPDK API 
Reference*.
 if (nb_ports == 0)
 rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

-if (nb_ports > RTE_MAX_ETHPORTS)
-nb_ports = RTE_MAX_ETHPORTS;
-
 /* reset l2fwd_dst_ports */

 for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++)
diff --git a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst 
b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
index e77d67c..b51b2dc 100644
--- a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
+++ b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
@@ -242,9 +242,6 @@ in the *DPDK Programmer's Guide* - Rel 1.4 EAR and the 
*DPDK API Reference*.
 if (nb_ports == 0)
 rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

-if (nb_ports > RTE_MAX_ETHPORTS)
-nb_ports = RTE_MAX_ETHPORTS;
-
 /* reset l2fwd_dst_ports */

 for (portid = 0; portid < RTE_MAX_ETHPORTS; portid++)
diff --git a/doc/guides/sample_app_ug/link_status_intr.rst 
b/doc/guides/sample_app_ug/link_status_intr.rst
index a4dbb54..779df97 100644
--- a/doc/guides/sample_app_ug/link_status_intr.rst
+++ b/doc/guides/sample_app_ug/link_status_intr.rst
@@ -145,9 +145,6 @@ To fully understand this code, it is recommended to study 
the chapters that rela
 if (nb_ports == 0)
 rte_exit(EXIT_FAILURE, "No Ethernet ports - bye\n");

-if (nb_ports > RTE_MAX_ETHPORTS)
-nb_ports = RTE_MAX_ETHPORTS;
-
 /*
  * Each logical core is assigned a dedicated TX queue on each port.
  */
diff --git a/examples/dpdk_qat/main.c b/examples/dpdk_qat/main.c
index dc68989..3c6112d 100644
--- a/examples/dpdk_qat/main.c
+++ b/examples/dpdk_qat/main.c
@@ -661,8 +661,6 @@ main(int argc, char **argv)
return -1;

nb_ports = rte_eth_dev_count();
-   if (nb_ports > RTE_MAX_ETHPORTS)
-   nb_ports = RTE_MAX_ETHPORTS;

if (check_port_config(nb_ports) < 0)

[dpdk-dev] removing mbuf error flags

2016-04-29 Thread John Daley (johndale)
Hi,

> -Original Message-
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Friday, April 29, 2016 5:25 AM
> To: dev at dpdk.org; Zhang, Helin 
> Cc: Ananyev, Konstantin ; John Daley
> (johndale) 
> Subject: removing mbuf error flags
> 
> Hi,
> 
> In rte_mbuf.h, some rx flags are set to 0 since a long time since nearly 2
> years. It means nobody use them. They were introduced by the following
> commit:
> 
>   http://dpdk.org/browse/dpdk/commit/?id=c22265f6
> 
> As far as I understand, these flags were introduced to let the application
> know that a received packet is invalid.
> 
> The 2 drivers using them are i40e and enic. But as this flags are 0 today, it
> means that invalid packets are silently given to the application.
> 
> My opinion is that invalid packets should not be given to the application and
> only a statistic counter should be incremented.
> No application check these flags today (in examples, or testpmd).
> 
> I would like to remove these flags.
> Thoughs?

I agree. Enic needs a little work to increment a counter and update internal 
indexes correctly. If you are in a hurry, feel free to 's/PKT_RX_MAC_ERR/0/' in 
enic for now.

-John


[dpdk-dev] [PATCH 0/2] l2fwd improvements

2016-04-29 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com]
> Sent: Tuesday, April 19, 2016 10:35 AM
> To: dev at dpdk.org
> Cc: thomas.monjalon at 6wind.com; Richardson, Bruce; De Lara Guarch,
> Pablo; Jerin Jacob
> Subject: [dpdk-dev] [PATCH 0/2] l2fwd improvements
> 
> 
> Jerin Jacob (2):
>   examples/l2fwd: remove number of cycles per second hardcording
>   examples/l2fwd: increase mempool cache size for better performance
> 
>  examples/l2fwd/main.c | 23 ++-
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> --
> 2.1.0

Series-acked-by: Pablo de Lara 



[dpdk-dev] [PATCH] bond: inherit maximum rx packet length

2016-04-29 Thread Eric Kinzie
On Tue Apr 26 11:51:53 +0100 2016, Declan Doherty wrote:
> On 14/04/16 18:23, Eric Kinzie wrote:
> >   Instead of a hard-coded maximum receive length, allow the bond interface
> >   to inherit this limit from the first slave added.  This allows
> >   an application that uses jumbo frames to pass realistic values to
> >   rte_eth_dev_configure without causing an error.
> >
> >Signed-off-by: Eric Kinzie 
> >---
> ...
> >
> 
> Hey Eric, just one small thing, I think it probably makes sense to
> return the max rx pktlen for all slaves, so as we add each slave
> just check if that the slave being value is larger than the current
> value.
> 
> @@ -385,6 +389,10 @@ __eth_bond_slave_add_lock_free(uint8_t
> bonded_port_id, uint8_t slave_port_id)
> internals->tx_offload_capa &= dev_info.tx_offload_capa;
> internals->flow_type_rss_offloads &=
> dev_info.flow_type_rss_offloads;
> 
> +   /* If new slave's max rx packet size is larger than
> current value then override */
> +   if (dev_info.max_rx_pktlen > internals->max_rx_pktlen)
> +   internals->max_rx_pktlen = dev_info.max_rx_pktlen;
> +
> 
> Declan

Declan, I sent an updated patch but now release that I mis-read your
comments.  Is it a good idea to change the value once it's been set?
My patch now refuses to add a slave with a pktlen value that's smaller
than that of the first slave.

Eric



[dpdk-dev] supported packet types

2016-04-29 Thread Olivier Matz
Hi,

The following commit introduces a function to list the supported
packet types of a device:

  http://dpdk.org/browse/dpdk/commit/?id=78a38edf66

I would like to know what does "supported" precisely mean.
Is it:

1/ - if a ptype is marked as supported, the driver MUST set
 this ptype if the packet matches the format described in rte_mbuf.h

   -> if the ptype is not recognized, the application is sure
  that the packet is not one of the supported ptype
   -> but this is difficult to take advantage of this inside an
  application that supports several different ports model
  that do not support the same packet types

2/ - if a ptype is marked as supported, the driver CAN set
 the ptype if the packet matches the format described in rte_mbuf.h

   -> if a ptype is not recognized, the application does a software
  fallback
   -> in this case, it would useless to have the get_supported_ptype()

Can you confirm if the PMDs and l3fwd (the only user) expect 1/
or 2/ ?

Can you elaborate on the advantages of having this API?

And a supplementary question: if a ptype is not marked as supported,
is it forbidden for a driver to set this ptype anyway? Because we can
imagine a hardware that can only recognize packets in some conditions
(ex: can recognize IPv4 if there is no vlan).

I think properly defining the meaning of "supported" is mandatory
to have an application beeing able to use this feature, and avoid
PMDs to behave differently because the API is unclear (like we've
already seen for other features).


Thanks,
Olivier


[dpdk-dev] [PATCH v2] scripts: add script for generating customised build config

2016-04-29 Thread Bruce Richardson
This patch adds in the configure.py script file. It can be used to
generate custom build-time configurations for DPDK either manually on
the commandline or by calling it from other scripts. It takes as parameters
the base config template to use, and output directory to which the modified
configuration will be written. Other optional parameters are then taken
as y/n values which should be adjusted in the config file, and a special
-m flag is provided to override the default RTE_MACHINE setting in the
config template too.

Example, to create a build configuration with extra non-default PMDs
enabled, and the kernel modules disabled:

  ./scripts/configure.py -b $RTE_TARGET -o $RTE_TARGET PMD_PCAP=y \
 IGB_UIO=n KNI_KMOD=n MLX4_PMD=y MLX5_PMD=y SZEDATA2=y \
 NFP_PMD=y BNX2X_PMD=y

See the included help text in the script for more details.

Signed-off-by: Bruce Richardson 

---

V2: Add listing configs in help text and on error with invalid cfg.
Renamed from dpdk_config.py to configure.py
---
 scripts/configure.py | 205 +++
 1 file changed, 205 insertions(+)
 create mode 100755 scripts/configure.py

diff --git a/scripts/configure.py b/scripts/configure.py
new file mode 100755
index 000..58b8734
--- /dev/null
+++ b/scripts/configure.py
@@ -0,0 +1,205 @@
+#! /usr/bin/env python
+#
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+
+from __future__ import print_function
+import sys
+import os.path
+import getopt
+import subprocess
+
+out = "build"
+base = None
+machine = None
+
+
+def print_usage(cmd, f):
+print("""Usage: %s -b  [-o ] [options...] """
+  % os.path.basename(cmd), file=f)
+
+
+def get_valid_configs():
+c = [f[10:] for f in os.listdir("config") if f.startswith("defconfig_")]
+c.sort()
+return c
+
+
+def print_help(cmd):
+print_usage(cmd, sys.stdout)
+print("""
+Generates a new DPDK build time configuration in the given output directory,
+using as a starting point the provided base configuration. Once completed
+successfully, DPDK can then be built using that configuration by running the
+command "make -C ".
+
+-b, --base-config=CONFIG
+Use the configuration given by CONFIG as the starting point for the
+build configuration. CONFIG must be a valid DPDK default configuration
+provided by a file "defconfig_CONFIG" in the top level "config"
+directory.
+Valid --base-config values:\n\t\t%s
+
+-o, --output-dir=DIR
+Use DIR as the resulting output directory. If unspecified the default
+value of "build" is used.
+
+-m, --machine-type=TYPE
+Override the machine-type value given in the default configuration by
+setting it to TYPE. Using regular options, regular y/n values can be
+changed, but not string options, so this explicit override for machine
+type exists to allow changing it.
+
+-h, --help
+Print this help text and then exit
+
+Other options passed after these on the commandline should be in the format of
+"name=y|n" and are overrides to be applied to the configuration to toggle the
+y|n settings in the file.
+
+Matches are applied on partial values, so for example "DEBUG=y" will turn on
+all options ending in "DEBUG". This means that a full option need not always be
+specified, for example, "CONFIG_RTE_LIBRTE_" prefixes can generally be omitted.
+

[dpdk-dev] [PATCH v2] mem: fix freeing of memzone used by ivshmem

2016-04-29 Thread Thomas Monjalon
2016-04-21 12:21, Sergio Gonzalez Monroy:
> On 15/04/2016 09:29, Mauricio Vasquez B wrote:
> > although previous implementation returned an error when trying to release a
> > memzone assigned to an ivshmem device, it stills freed it.
> >
> > Fixes: cd10c42eb5bc ("mem: fix ivshmem freeing")
> >
> > Signed-off-by: Mauricio Vasquez B  > studenti.polito.it>
> > ---
> > v2:
> > solved compilation problem when ivshmem is disabled
> >   lib/librte_eal/common/eal_common_memzone.c | 10 +++---
> >   1 file changed, 7 insertions(+), 3 deletions(-)
> 
> This time I have waited to see the test-report (which I should have done 
> for the v1).
> 
> Acked-by: Sergio Gonzalez Monroy 

Applied, thanks


[dpdk-dev] [PATCH v2] kni: add chained mbufs support

2016-04-29 Thread Thomas Monjalon
> > rx_q fifo may have chained mbufs, merge them into single skb before handing 
> > to
> > the network stack.
> > 
> > Signed-off-by: Ferruh Yigit 
> Acked-by: Helin Zhang 

Applied, thanks


[dpdk-dev] [PATCH v3 1/1] cmdline: add any multi string mode to token string

2016-04-29 Thread Piotr Azarewicz
While parsing token string there may be several modes:
- fixed single string
- multi-choice single string
- any single string

This patch add one more mode - any multi string.

Signed-off-by: Piotr Azarewicz 
Acked-by: Olivier Matz 
---

v3 changes:
- add a comment above the definiton of TOKEN_STRING_MULTI

v2 changes:
- add cmdline_multi_string_t type for the new mode

 app/test/test_cmdline_string.c|   15 ++
 lib/librte_cmdline/cmdline_parse.c|8 ++
 lib/librte_cmdline/cmdline_parse.h|3 ++
 lib/librte_cmdline/cmdline_parse_string.c |   43 +
 lib/librte_cmdline/cmdline_parse_string.h |   15 ++
 5 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/app/test/test_cmdline_string.c b/app/test/test_cmdline_string.c
index 915a7d7..c5bb9c0 100644
--- a/app/test/test_cmdline_string.c
+++ b/app/test/test_cmdline_string.c
@@ -35,6 +35,7 @@
 #include 
 #include 

+#include 
 #include 

 #include 
@@ -65,9 +66,10 @@ struct string_elt_str string_elt_strs[] = {
{"one#two\nwith\nnewlines#three", "two\nwith\nnewlines", 1},
 };

-#if CMDLINE_TEST_BUFSIZE < STR_TOKEN_SIZE
+#if (CMDLINE_TEST_BUFSIZE < STR_TOKEN_SIZE) \
+|| (CMDLINE_TEST_BUFSIZE < STR_MULTI_TOKEN_SIZE)
 #undef CMDLINE_TEST_BUFSIZE
-#define CMDLINE_TEST_BUFSIZE STR_TOKEN_SIZE
+#define CMDLINE_TEST_BUFSIZE RTE_MAX(STR_TOKEN_SIZE, STR_MULTI_TOKEN_SIZE)
 #endif

 struct string_nb_str {
@@ -97,6 +99,11 @@ struct string_parse_str string_parse_strs[] = {
{"two with\rgarbage\tcharacters\n",
"one#two with\rgarbage\tcharacters\n#three",
"two with\rgarbage\tcharacters\n"},
+   {"one two", "one", "one"}, /* fixed string */
+   {"one two", TOKEN_STRING_MULTI, "one two"}, /* multi string */
+   {"one two", NULL, "one"}, /* any string */
+   {"one two #three", TOKEN_STRING_MULTI, "one two "},
+   /* multi string with comment */
 };


@@ -124,7 +131,6 @@ struct string_invalid_str string_invalid_strs[] = {
 "toolong!!!toolong!!!toolong!!!toolong!!!toolong!!!toolong!!!"
 "toolong!!!toolong!!!toolong!!!toolong!!!toolong!!!toolong!!!"
 "toolong!!!" },
-{"invalid", ""},
 {"", "invalid"}
 };

@@ -350,8 +356,7 @@ test_parse_string_valid(void)
string_parse_strs[i].str, help_str);
return -1;
}
-   if (strncmp(buf, string_parse_strs[i].result,
-   sizeof(string_parse_strs[i].result) - 1) != 0) {
+   if (strcmp(buf, string_parse_strs[i].result) != 0) {
printf("Error: result mismatch!\n");
return -1;
}
diff --git a/lib/librte_cmdline/cmdline_parse.c 
b/lib/librte_cmdline/cmdline_parse.c
index 24a6ed6..b496067 100644
--- a/lib/librte_cmdline/cmdline_parse.c
+++ b/lib/librte_cmdline/cmdline_parse.c
@@ -118,6 +118,14 @@ cmdline_isendoftoken(char c)
return 0;
 }

+int
+cmdline_isendofcommand(char c)
+{
+   if (!c || iscomment(c) || isendofline(c))
+   return 1;
+   return 0;
+}
+
 static unsigned int
 nb_common_chars(const char * s1, const char * s2)
 {
diff --git a/lib/librte_cmdline/cmdline_parse.h 
b/lib/librte_cmdline/cmdline_parse.h
index 4b25c45..4ac05d6 100644
--- a/lib/librte_cmdline/cmdline_parse.h
+++ b/lib/librte_cmdline/cmdline_parse.h
@@ -184,6 +184,9 @@ int cmdline_complete(struct cmdline *cl, const char *buf, 
int *state,
  * isendofline(c)) */
 int cmdline_isendoftoken(char c);

+/* return true if(!c || iscomment(c) || isendofline(c)) */
+int cmdline_isendofcommand(char c);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_cmdline/cmdline_parse_string.c 
b/lib/librte_cmdline/cmdline_parse_string.c
index 45883b3..35917a7 100644
--- a/lib/librte_cmdline/cmdline_parse_string.c
+++ b/lib/librte_cmdline/cmdline_parse_string.c
@@ -76,9 +76,10 @@ struct cmdline_token_ops cmdline_token_string_ops = {
.get_help = cmdline_get_help_string,
 };

-#define MULTISTRING_HELP "Mul-choice STRING"
-#define ANYSTRING_HELP   "Any STRING"
-#define FIXEDSTRING_HELP "Fixed STRING"
+#define CHOICESTRING_HELP "Mul-choice STRING"
+#define ANYSTRING_HELP"Any STRING"
+#define ANYSTRINGS_HELP   "Any STRINGS"
+#define FIXEDSTRING_HELP  "Fixed STRING"

 static unsigned int
 get_token_len(const char *s)
@@ -123,8 +124,8 @@ cmdline_parse_string(cmdline_parse_token_hdr_t *tk, const 
char *buf, void *res,

sd = >string_data;

-   /* fixed string */
-   if (sd->str) {
+   /* fixed string (known single token) */
+   if ((sd->str != NULL) && (strcmp(sd->str, TOKEN_STRING_MULTI) != 0)) {
str = sd->str;
do {
token_len = get_token_len(str);
@@ -148,7 +149,21 @@ 

[dpdk-dev] supported packet types

2016-04-29 Thread Ananyev, Konstantin
Hi Olivier,


> Hi,
> 
> The following commit introduces a function to list the supported
> packet types of a device:
> 
>   http://dpdk.org/browse/dpdk/commit/?id=78a38edf66
> 
> I would like to know what does "supported" precisely mean.
> Is it:
> 
> 1/ - if a ptype is marked as supported, the driver MUST set
>  this ptype if the packet matches the format described in rte_mbuf.h
> 
>-> if the ptype is not recognized, the application is sure
>   that the packet is not one of the supported ptype
>-> but this is difficult to take advantage of this inside an
>   application that supports several different ports model
>   that do not support the same packet types
> 
> 2/ - if a ptype is marked as supported, the driver CAN set
>  the ptype if the packet matches the format described in rte_mbuf.h
> 
>-> if a ptype is not recognized, the application does a software
>   fallback
>-> in this case, it would useless to have the get_supported_ptype()
> 
> Can you confirm if the PMDs and l3fwd (the only user) expect 1/
> or 2/ ?

1) 

> 
> Can you elaborate on the advantages of having this API?

Application can rely on information provided by PMD avoid parsing packet 
manually to recognise it's pytpe.

> 
> And a supplementary question: if a ptype is not marked as supported,
> is it forbidden for a driver to set this ptype anyway?

I suppose it is not forbidden, but there is no guarantee from PMD that it
would be able to recognise that ptype.

Konstantin

> Because we can
> imagine a hardware that can only recognize packets in some conditions
> (ex: can recognize IPv4 if there is no vlan).
> 
> I think properly defining the meaning of "supported" is mandatory
> to have an application beeing able to use this feature, and avoid
> PMDs to behave differently because the API is unclear (like we've
> already seen for other features).
> 
> 
> Thanks,
> Olivier


[dpdk-dev] [PATCH 15/15] vfio: change VFIO init to be extendable

2016-04-29 Thread Jan Viktorin
We can now just OR the vfio_enabled sequentially and so adding new VFIO
subsystems (vfio_platform) is possible.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 92225cf..1549fe5 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -727,12 +727,14 @@ rte_eal_iopl_init(void)
 #ifdef VFIO_PRESENT
 static int rte_eal_vfio_setup(void)
 {
-   if (internal_config.no_pci)
-   return 0;
+   int vfio_enabled = 0;

-   pci_vfio_enable();
+   if (!internal_config.no_pci) {
+   pci_vfio_enable();
+   vfio_enabled |= pci_vfio_is_enabled();
+   }

-   if (pci_vfio_is_enabled()) {
+   if (vfio_enabled) {

/* if we are primary process, create a thread to communicate 
with
 * secondary processes. the thread will use a socket to wait for
-- 
2.8.0



[dpdk-dev] [PATCH 14/15] vfio: initialize vfio out of the PCI subsystem

2016-04-29 Thread Jan Viktorin
The VFIO does not depend on the PCI anymore so it can be initialized out of
the PCI subsystem.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal.c  | 31 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c  | 17 +---
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  3 ---
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  3 +++
 4 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 8aafd51..92225cf 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -82,6 +82,7 @@
 #include "eal_filesystem.h"
 #include "eal_hugepages.h"
 #include "eal_options.h"
+#include "eal_vfio.h"

 #define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL)

@@ -723,6 +724,31 @@ rte_eal_iopl_init(void)
 #endif
 }

+#ifdef VFIO_PRESENT
+static int rte_eal_vfio_setup(void)
+{
+   if (internal_config.no_pci)
+   return 0;
+
+   pci_vfio_enable();
+
+   if (pci_vfio_is_enabled()) {
+
+   /* if we are primary process, create a thread to communicate 
with
+* secondary processes. the thread will use a socket to wait for
+* requests from secondary process to send open file 
descriptors,
+* because VFIO does not allow multiple open descriptors on a 
group or
+* VFIO container.
+*/
+   if (internal_config.process_type == RTE_PROC_PRIMARY &&
+   vfio_mp_sync_setup() < 0)
+   return -1;
+   }
+
+   return 0;
+}
+#endif
+
 /* Launch threads, called at application init(). */
 int
 rte_eal_init(int argc, char **argv)
@@ -788,6 +814,11 @@ rte_eal_init(int argc, char **argv)
if (rte_eal_pci_init() < 0)
rte_panic("Cannot init PCI\n");

+#ifdef VFIO_PRESENT
+   if (rte_eal_vfio_setup() < 0)
+   rte_panic("Cannot init VFIO\n");
+#endif
+
 #ifdef RTE_LIBRTE_IVSHMEM
if (rte_eal_ivshmem_init() < 0)
rte_panic("Cannot init IVSHMEM\n");
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 732e21c..1ca4c1f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -730,21 +730,6 @@ rte_eal_pci_init(void)
RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
return -1;
}
-#ifdef VFIO_PRESENT
-   pci_vfio_enable();
-
-   if (pci_vfio_is_enabled()) {
-
-   /* if we are primary process, create a thread to communicate 
with
-* secondary processes. the thread will use a socket to wait for
-* requests from secondary process to send open file 
descriptors,
-* because VFIO does not allow multiple open descriptors on a 
group or
-* VFIO container.
-*/
-   if (internal_config.process_type == RTE_PROC_PRIMARY &&
-   vfio_mp_sync_setup() < 0)
-   return -1;
-   }
-#endif
+
return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index cdfdada..9eb9cb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -64,9 +64,6 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);

 #ifdef VFIO_PRESENT

-int pci_vfio_enable(void);
-int pci_vfio_is_enabled(void);
-
 /* access config space */
 int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
 void *buf, size_t len, off_t offs);
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 471627f..c2c8f80 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -143,6 +143,9 @@ int vfio_setup_device(const char *sysfs_base, const char 
*dev_addr,
 int vfio_enable(const char *modname);
 int vfio_is_enabled(const char *modname);

+int pci_vfio_enable(void);
+int pci_vfio_is_enabled(void);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_OK 0x0
-- 
2.8.0



[dpdk-dev] [PATCH 13/15] vfio: rename and generalize eal_pci_vfio_mp_sync

2016-04-29 Thread Jan Viktorin
The module eal_pci_vfio_mp_sync is quite generic so it shouldn't contain the
"pci" string in its name. The internal functions don't need the pci_* prefix.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/Makefile  | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c | 2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_init.h| 1 -
 lib/librte_eal/linuxapp/eal/eal_vfio.h| 2 ++
 .../linuxapp/eal/{eal_pci_vfio_mp_sync.c => eal_vfio_mp_sync.c}   | 8 
 5 files changed, 9 insertions(+), 8 deletions(-)
 rename lib/librte_eal/linuxapp/eal/{eal_pci_vfio_mp_sync.c => 
eal_vfio_mp_sync.c} (98%)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 128eb87..c6f6b53 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -64,10 +64,10 @@ endif
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_vfio.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_vfio.c
-SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_timer.c
@@ -108,7 +108,7 @@ CFLAGS_eal_common_cpuflags.o := $(CPUFLAGS_LIST)

 CFLAGS_eal.o := -D_GNU_SOURCE
 CFLAGS_eal_interrupts.o := -D_GNU_SOURCE
-CFLAGS_eal_pci_vfio_mp_sync.o := -D_GNU_SOURCE
+CFLAGS_eal_vfio_mp_sync.o := -D_GNU_SOURCE
 CFLAGS_eal_timer.o := -D_GNU_SOURCE
 CFLAGS_eal_lcore.o := -D_GNU_SOURCE
 CFLAGS_eal_thread.o := -D_GNU_SOURCE
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index bdc08a0..732e21c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -742,7 +742,7 @@ rte_eal_pci_init(void)
 * VFIO container.
 */
if (internal_config.process_type == RTE_PROC_PRIMARY &&
-   pci_vfio_mp_sync_setup() < 0)
+   vfio_mp_sync_setup() < 0)
return -1;
}
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index 41d2dd6..cdfdada 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -66,7 +66,6 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);

 int pci_vfio_enable(void);
 int pci_vfio_is_enabled(void);
-int pci_vfio_mp_sync_setup(void);

 /* access config space */
 int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index f8728bd..471627f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -149,6 +149,8 @@ int vfio_is_enabled(const char *modname);
 #define SOCKET_NO_FD 0x1
 #define SOCKET_ERR 0xFF

+int vfio_mp_sync_setup(void);
+
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c
similarity index 98%
rename from lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
rename to lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c
index b2aa33f..bff6e81 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c
@@ -265,7 +265,7 @@ vfio_mp_sync_connect_to_primary(void)
  * socket listening thread for primary process
  */
 static __attribute__((noreturn)) void *
-pci_vfio_mp_sync_thread(void __rte_unused * arg)
+vfio_mp_sync_thread(void __rte_unused * arg)
 {
int ret, fd, vfio_group_no;

@@ -376,7 +376,7 @@ vfio_mp_sync_socket_setup(void)
  * set up a local socket and tell it to listen for incoming connections
  */
 int
-pci_vfio_mp_sync_setup(void)
+vfio_mp_sync_setup(void)
 {
int ret;
char thread_name[RTE_MAX_THREAD_NAME_LEN];
@@ -387,7 +387,7 @@ pci_vfio_mp_sync_setup(void)
}

ret = pthread_create(_thread, NULL,
-   pci_vfio_mp_sync_thread, NULL);
+   vfio_mp_sync_thread, NULL);
if (ret) {
RTE_LOG(ERR, EAL,
"Failed to create thread for communication with 
secondary processes!\n");
@@ -396,7 +396,7 @@ pci_vfio_mp_sync_setup(void)
}

/* Set thread_name for aid in debugging. */
-   snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "pci-vfio-sync");
+   snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "vfio-sync");
ret = rte_thread_setname(socket_thread, thread_name);
 

[dpdk-dev] [PATCH 12/15] vfio: make vfio_*_dma_map and iommu_types private

2016-04-29 Thread Jan Viktorin
There is no more reason to expose those definitions as nobody uses them.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 15 +--
 lib/librte_eal/linuxapp/eal/eal_vfio.h | 11 ---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 7dce880..3f03f45 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -47,6 +47,17 @@
 /* per-process VFIO config */
 static struct vfio_config vfio_cfg;

+static int vfio_type1_dma_map(int);
+static int vfio_noiommu_dma_map(int);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { RTE_VFIO_TYPE1, "Type 1", _type1_dma_map},
+   /* IOMMU-less mode */
+   { RTE_VFIO_NOIOMMU, "No-IOMMU", _noiommu_dma_map},
+};
+
 int
 vfio_get_group_fd(int iommu_group_no)
 {
@@ -477,7 +488,7 @@ vfio_get_group_no(const char *sysfs_base,
return 1;
 }

-int
+static int
 vfio_type1_dma_map(int vfio_container_fd)
 {
const struct rte_memseg *ms = rte_eal_get_physmem_layout();
@@ -509,7 +520,7 @@ vfio_type1_dma_map(int vfio_container_fd)
return 0;
 }

-int
+static int
 vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
 {
/* No-IOMMU mode does not need DMA mapping */
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index d4532a5..f8728bd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -149,17 +149,6 @@ int vfio_is_enabled(const char *modname);
 #define SOCKET_NO_FD 0x1
 #define SOCKET_ERR 0xFF

-int vfio_type1_dma_map(int);
-int vfio_noiommu_dma_map(int);
-
-/* IOMMU types we support */
-static const struct vfio_iommu_type iommu_types[] = {
-   /* x86 IOMMU, otherwise known as type 1 */
-   { RTE_VFIO_TYPE1, "Type 1", _type1_dma_map},
-   /* IOMMU-less mode */
-   { RTE_VFIO_NOIOMMU, "No-IOMMU", _noiommu_dma_map},
-};
-
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
-- 
2.8.0



[dpdk-dev] [PATCH 11/15] vfio: move global vfio_cfg to eal_vfio.c

2016-04-29 Thread Jan Viktorin
The vfio_cfg is a module-global variable and so together with this
variable, it is necessary to move functions:

* pci_vfio_get_group_fd
  - renamed to vfio_get_group_fd
  - pci_* version removed (no other call in EAL)

* pci_vfio_setup_device
  - renamed as vfio_setup_device

* pci_vfio_enable
  - renamed as vfio_enable
  - generalized to check for a specific vfio driver presence
  - pci_* specialization preserved as a wrapper

* pci_vfio_is_enabled
  - renamed as vfio_is_enabled
  - generalized to check for a specific vfio driver presence
to preserve the semantics of VFIO + PCI
  - pci_* specialization preserved as a wrapper

* clear_current_group
  - private function, just moved

To stop GCC complaining about "defined but not used", the private
function pci_vfio_get_group_no has been removed entirely.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |   1 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 282 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c |   2 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 272 
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  17 ++
 5 files changed, 294 insertions(+), 280 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index 6e7a603..41d2dd6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -84,7 +84,6 @@ int pci_vfio_ioport_unmap(struct rte_pci_ioport *p);

 /* map VFIO resource prototype */
 int pci_vfio_map_resource(struct rte_pci_device *dev);
-int pci_vfio_get_group_fd(int iommu_group_fd);

 #endif

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index cec6ce1..3d3f7dc 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -69,9 +69,6 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)

-/* per-process VFIO config */
-static struct vfio_config vfio_cfg;
-
 int
 pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
void *buf, size_t len, off_t offs)
@@ -299,240 +296,6 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
return -1;
 }

-/* open group fd or get an existing one */
-int
-pci_vfio_get_group_fd(int iommu_group_no)
-{
-   int i;
-   int vfio_group_fd;
-   char filename[PATH_MAX];
-
-   /* check if we already have the group descriptor open */
-   for (i = 0; i < vfio_cfg.vfio_group_idx; i++)
-   if (vfio_cfg.vfio_groups[i].group_no == iommu_group_no)
-   return vfio_cfg.vfio_groups[i].fd;
-
-   /* if primary, try to open the group */
-   if (internal_config.process_type == RTE_PROC_PRIMARY) {
-   /* try regular group format */
-   snprintf(filename, sizeof(filename),
-VFIO_GROUP_FMT, iommu_group_no);
-   vfio_group_fd = open(filename, O_RDWR);
-   if (vfio_group_fd < 0) {
-   /* if file not found, it's not an error */
-   if (errno != ENOENT) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", 
filename,
-   strerror(errno));
-   return -1;
-   }
-
-   /* special case: try no-IOMMU path as well */
-   snprintf(filename, sizeof(filename),
-   VFIO_NOIOMMU_GROUP_FMT, iommu_group_no);
-   vfio_group_fd = open(filename, O_RDWR);
-   if (vfio_group_fd < 0) {
-   if (errno != ENOENT) {
-   RTE_LOG(ERR, EAL, "Cannot open %s: 
%s\n", filename,
-   strerror(errno));
-   return -1;
-   }
-   return 0;
-   }
-   /* noiommu group found */
-   }
-
-   /* if the fd is valid, create a new group for it */
-   if (vfio_cfg.vfio_group_idx == VFIO_MAX_GROUPS) {
-   RTE_LOG(ERR, EAL, "Maximum number of VFIO groups 
reached!\n");
-   close(vfio_group_fd);
-   return -1;
-   }
-   vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = 
iommu_group_no;
-   vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = 
vfio_group_fd;
-   return vfio_group_fd;
-   }
-   /* if we're in a secondary process, request group fd from the primary
-* process via our socket
-*/
-   else {
-   int socket_fd, ret;
-
-   

[dpdk-dev] [PATCH 10/15] vfio: extract setup logic out of pci_vfio_map_resource

2016-04-29 Thread Jan Viktorin
The setup logic access the global vfio_cfg variable that will be moved in the
following commits. We need to separate all accesses to this variable to a
general code.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 96 +++---
 1 file changed, 47 insertions(+), 49 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 2b3dd2e..cec6ce1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -411,37 +411,22 @@ clear_current_group(void)
vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = -1;
 }

-
-/*
- * map the PCI resources of a PCI device in virtual memory (VFIO version).
- * primary and secondary processes follow almost exactly the same path
+/**
+ * Setup vfio_cfg for the device indentified by its address. It discovers
+ * the configured I/O MMU groups or sets a new one for the device. If a new
+ * groups is assigned, the DMA mapping is performed.
+ * Returns 0 on success, a negative value on failure and a positive value in
+ * case the given device cannot be managed this way.
  */
-int
-pci_vfio_map_resource(struct rte_pci_device *dev)
+static int pci_vfio_setup_device(const char *pci_addr, int *vfio_dev_fd,
+   struct vfio_device_info *device_info)
 {
struct vfio_group_status group_status = {
.argsz = sizeof(group_status)
};
-   struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
-   int vfio_group_fd, vfio_dev_fd;
+   int vfio_group_fd;
int iommu_group_no;
-   char pci_addr[PATH_MAX] = {0};
-   struct rte_pci_addr *loc = >addr;
-   int i, ret, msix_bar;
-   struct mapped_pci_resource *vfio_res = NULL;
-   struct mapped_pci_res_list *vfio_res_list = 
RTE_TAILQ_CAST(rte_vfio_tailq.head, mapped_pci_res_list);
-
-   struct pci_map *maps;
-   uint32_t msix_table_offset = 0;
-   uint32_t msix_table_size = 0;
-   uint32_t ioport_bar;
-
-   dev->intr_handle.fd = -1;
-   dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-
-   /* store PCI address string */
-   snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
-   loc->domain, loc->bus, loc->devid, loc->function);
+   int ret;

/* get group number */
ret = pci_vfio_get_group_no(pci_addr, _group_no);
@@ -476,26 +461,6 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
}

/*
-* at this point, we know at least one port on this device is bound to 
VFIO,
-* so we can proceed to try and set this particular port up
-*/
-
-   /* check if the group is viable */
-   ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, _status);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  %s cannot get group status, "
-   "error %i (%s)\n", pci_addr, errno, 
strerror(errno));
-   close(vfio_group_fd);
-   clear_current_group();
-   return -1;
-   } else if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
-   RTE_LOG(ERR, EAL, "  %s VFIO group is not viable!\n", pci_addr);
-   close(vfio_group_fd);
-   clear_current_group();
-   return -1;
-   }
-
-   /*
 * at this point, we know that this group is viable (meaning, all 
devices
 * are either bound to VFIO or not bound to anything)
 */
@@ -546,8 +511,8 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
}

/* get a file descriptor for the device */
-   vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, pci_addr);
-   if (vfio_dev_fd < 0) {
+   *vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, pci_addr);
+   if (*vfio_dev_fd < 0) {
/* if we cannot get a device fd, this simply means that this
 * particular port is not bound to VFIO
 */
@@ -557,14 +522,47 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
}

/* test and setup the device */
-   ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_INFO, _info);
+   ret = ioctl(*vfio_dev_fd, VFIO_DEVICE_GET_INFO, device_info);
if (ret) {
RTE_LOG(ERR, EAL, "  %s cannot get device info, "
"error %i (%s)\n", pci_addr, errno, 
strerror(errno));
-   close(vfio_dev_fd);
+   close(*vfio_dev_fd);
return -1;
}

+   return 0;
+}
+
+/*
+ * map the PCI resources of a PCI device in virtual memory (VFIO version).
+ * primary and secondary processes follow almost exactly the same path
+ */
+int
+pci_vfio_map_resource(struct rte_pci_device *dev)
+{
+   struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+   char pci_addr[PATH_MAX] = {0};
+   int vfio_dev_fd;
+   struct rte_pci_addr *loc = 

[dpdk-dev] [PATCH 09/15] vfio: generalize pci_vfio_get_group_no

2016-04-29 Thread Jan Viktorin
Generalize the pci_vfio_get_group_no to not be PCI-specific. Move the general
implementation to the eal_vfio.c as vfio_get_group_no and leave the original
pci_vfio_get_group_no being a wrapper around this to preserve compilation
issues. The pci_vfio_get_group_no function will be removed later.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 38 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 43 ++
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  7 +
 3 files changed, 51 insertions(+), 37 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 33c11a1..2b3dd2e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -401,43 +401,7 @@ pci_vfio_get_group_fd(int iommu_group_no)
 static int
 pci_vfio_get_group_no(const char *pci_addr, int *iommu_group_no)
 {
-   char linkname[PATH_MAX];
-   char filename[PATH_MAX];
-   char *tok[16], *group_tok, *end;
-   int ret;
-
-   memset(linkname, 0, sizeof(linkname));
-   memset(filename, 0, sizeof(filename));
-
-   /* try to find out IOMMU group for this device */
-   snprintf(linkname, sizeof(linkname),
-SYSFS_PCI_DEVICES "/%s/iommu_group", pci_addr);
-
-   ret = readlink(linkname, filename, sizeof(filename));
-
-   /* if the link doesn't exist, no VFIO for us */
-   if (ret < 0)
-   return 0;
-
-   ret = rte_strsplit(filename, sizeof(filename),
-   tok, RTE_DIM(tok), '/');
-
-   if (ret <= 0) {
-   RTE_LOG(ERR, EAL, "  %s cannot get IOMMU group\n", pci_addr);
-   return -1;
-   }
-
-   /* IOMMU group is always the last token */
-   errno = 0;
-   group_tok = tok[ret - 1];
-   end = group_tok;
-   *iommu_group_no = strtol(group_tok, , 10);
-   if ((end != group_tok && *end != '\0') || errno != 0) {
-   RTE_LOG(ERR, EAL, "  %s error parsing IOMMU number!\n", 
pci_addr);
-   return -1;
-   }
-
-   return 1;
+   return vfio_get_group_no(SYSFS_PCI_DEVICES, pci_addr, iommu_group_no);
 }

 static void
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 3bcf55c..33d2e11 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -163,6 +163,49 @@ vfio_get_container_fd(void)
 }

 int
+vfio_get_group_no(const char *sysfs_base,
+   const char *dev_addr, int *iommu_group_no)
+{
+   char linkname[PATH_MAX];
+   char filename[PATH_MAX];
+   char *tok[16], *group_tok, *end;
+   int ret;
+
+   memset(linkname, 0, sizeof(linkname));
+   memset(filename, 0, sizeof(filename));
+
+   /* try to find out IOMMU group for this device */
+   snprintf(linkname, sizeof(linkname),
+"%s/%s/iommu_group", sysfs_base, dev_addr);
+
+   ret = readlink(linkname, filename, sizeof(filename));
+
+   /* if the link doesn't exist, no VFIO for us */
+   if (ret < 0)
+   return 0;
+
+   ret = rte_strsplit(filename, sizeof(filename),
+   tok, RTE_DIM(tok), '/');
+
+   if (ret <= 0) {
+   RTE_LOG(ERR, EAL, "  %s cannot get IOMMU group\n", dev_addr);
+   return -1;
+   }
+
+   /* IOMMU group is always the last token */
+   errno = 0;
+   group_tok = tok[ret - 1];
+   end = group_tok;
+   *iommu_group_no = strtol(group_tok, , 10);
+   if ((end != group_tok && *end != '\0') || errno != 0) {
+   RTE_LOG(ERR, EAL, "  %s error parsing IOMMU number!\n", 
dev_addr);
+   return -1;
+   }
+
+   return 1;
+}
+
+int
 vfio_type1_dma_map(int vfio_container_fd)
 {
const struct rte_memseg *ms = rte_eal_get_physmem_layout();
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 0fb05a4..619cd6b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -119,6 +119,13 @@ vfio_has_supported_extensions(int vfio_container_fd);
 int
 vfio_get_container_fd(void);

+/* parse IOMMU group number for a device
+ * returns 1 on success, -1 for errors, 0 for non-existent group
+ */
+int
+vfio_get_group_no(const char *sysfs_base,
+   const char *dev_addr, int *iommu_group_no);
+
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
 #define SOCKET_OK 0x0
-- 
2.8.0



[dpdk-dev] [PATCH 08/15] vfio: generalize pci_vfio_get_container_fd

2016-04-29 Thread Jan Viktorin
The pci_vfio_get_container_fd is not PCI-specific. Move the implementation to
the eal_vfio.c as vfio_get_container_fd. No other code seems to call this
function.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  1 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 67 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c |  2 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 68 ++
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  4 ++
 5 files changed, 74 insertions(+), 68 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index fd81f4d..6e7a603 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -85,7 +85,6 @@ int pci_vfio_ioport_unmap(struct rte_pci_ioport *p);
 /* map VFIO resource prototype */
 int pci_vfio_map_resource(struct rte_pci_device *dev);
 int pci_vfio_get_group_fd(int iommu_group_fd);
-int pci_vfio_get_container_fd(void);

 #endif

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 21aded7..33c11a1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -299,71 +299,6 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
return -1;
 }

-/* open container fd or get an existing one */
-int
-pci_vfio_get_container_fd(void)
-{
-   int ret, vfio_container_fd;
-
-   /* if we're in a primary process, try to open the container */
-   if (internal_config.process_type == RTE_PROC_PRIMARY) {
-   vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
-   if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot open VFIO container, "
-   "error %i (%s)\n", errno, 
strerror(errno));
-   return -1;
-   }
-
-   /* check VFIO API version */
-   ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
-   if (ret != VFIO_API_VERSION) {
-   if (ret < 0)
-   RTE_LOG(ERR, EAL, "  could not get VFIO API 
version, "
-   "error %i (%s)\n", errno, 
strerror(errno));
-   else
-   RTE_LOG(ERR, EAL, "  unsupported VFIO API 
version!\n");
-   close(vfio_container_fd);
-   return -1;
-   }
-
-   ret = vfio_has_supported_extensions(vfio_container_fd);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  no supported IOMMU "
-   "extensions found!\n");
-   return -1;
-   }
-
-   return vfio_container_fd;
-   } else {
-   /*
-* if we're in a secondary process, request container fd from 
the
-* primary process via our socket
-*/
-   int socket_fd;
-
-   socket_fd = vfio_mp_sync_connect_to_primary();
-   if (socket_fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot connect to primary 
process!\n");
-   return -1;
-   }
-   if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) 
< 0) {
-   RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
-   close(socket_fd);
-   return -1;
-   }
-   vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
-   if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
-   close(socket_fd);
-   return -1;
-   }
-   close(socket_fd);
-   return vfio_container_fd;
-   }
-
-   return -1;
-}
-
 /* open group fd or get an existing one */
 int
 pci_vfio_get_group_fd(int iommu_group_no)
@@ -950,7 +885,7 @@ pci_vfio_enable(void)
return 0;
}

-   vfio_cfg.vfio_container_fd = pci_vfio_get_container_fd();
+   vfio_cfg.vfio_container_fd = vfio_get_container_fd();

/* check if we have VFIO driver enabled */
if (vfio_cfg.vfio_container_fd != -1) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
index 26d966e..fbfa71d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
@@ -296,7 +296,7 @@ pci_vfio_mp_sync_thread(void __rte_unused * arg)

switch (ret) {
case SOCKET_REQ_CONTAINER:
-   fd = pci_vfio_get_container_fd();
+   fd = vfio_get_container_fd();
  

[dpdk-dev] [PATCH 07/15] vfio: move vfio-specific SOCKET_* constants

2016-04-29 Thread Jan Viktorin
The constants are not PCI-specific. Move them into the eal_vfio.h.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_init.h | 7 ---
 lib/librte_eal/linuxapp/eal/eal_vfio.h | 6 ++
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index b4d7628..fd81f4d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -87,13 +87,6 @@ int pci_vfio_map_resource(struct rte_pci_device *dev);
 int pci_vfio_get_group_fd(int iommu_group_fd);
 int pci_vfio_get_container_fd(void);

-/* socket comm protocol definitions */
-#define SOCKET_REQ_CONTAINER 0x100
-#define SOCKET_REQ_GROUP 0x200
-#define SOCKET_OK 0x0
-#define SOCKET_NO_FD 0x1
-#define SOCKET_ERR 0xFF
-
 #endif

 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index 8cb0d1d..121df0a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -115,6 +115,12 @@ vfio_set_iommu_type(int vfio_container_fd);
 int
 vfio_has_supported_extensions(int vfio_container_fd);

+#define SOCKET_REQ_CONTAINER 0x100
+#define SOCKET_REQ_GROUP 0x200
+#define SOCKET_OK 0x0
+#define SOCKET_NO_FD 0x1
+#define SOCKET_ERR 0xFF
+
 int vfio_type1_dma_map(int);
 int vfio_noiommu_dma_map(int);

-- 
2.8.0



[dpdk-dev] [PATCH 05/15] vfio: generalize pci_vfio_set_iommu_type

2016-04-29 Thread Jan Viktorin
The pci_vfio_set_iommu_type is not PCI-specific and it is a private function
of the eal_pci_vfio.c. We just rename the function and make it available even
for non-PCI devices.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 25 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 22 ++
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  4 
 3 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index cfa26ee..f82368f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -203,29 +203,6 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
-static const struct vfio_iommu_type *
-pci_vfio_set_iommu_type(int vfio_container_fd) {
-   unsigned idx;
-   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
-   const struct vfio_iommu_type *t = _types[idx];
-
-   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   t->type_id);
-   if (!ret) {
-   RTE_LOG(NOTICE, EAL, "  using IOMMU type %d (%s)\n",
-   t->type_id, t->name);
-   return t;
-   }
-   /* not an error, there may be more supported IOMMU types */
-   RTE_LOG(DEBUG, EAL, "  set IOMMU type %d (%s) failed, "
-   "error %i (%s)\n", t->type_id, t->name, errno,
-   strerror(errno));
-   }
-   /* if we didn't find a suitable IOMMU type, fail */
-   return NULL;
-}
-
 /* check if we have any supported extensions */
 static int
 pci_vfio_has_supported_extensions(int vfio_container_fd) {
@@ -689,7 +666,7 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
vfio_cfg.vfio_container_has_dma == 0) {
/* select an IOMMU type which we will be using */
const struct vfio_iommu_type *t =
-   
pci_vfio_set_iommu_type(vfio_cfg.vfio_container_fd);
+   vfio_set_iommu_type(vfio_cfg.vfio_container_fd);
if (!t) {
RTE_LOG(ERR, EAL, "  %s failed to select IOMMU type\n", 
pci_addr);
return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index d3ffebe..ff85283 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -39,6 +39,28 @@

 #include "eal_vfio.h"

+const struct vfio_iommu_type *
+vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = _types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t->type_id);
+   if (!ret) {
+   RTE_LOG(NOTICE, EAL, "  using IOMMU type %d (%s)\n",
+   t->type_id, t->name);
+   return t;
+   }
+   /* not an error, there may be more supported IOMMU types */
+   RTE_LOG(DEBUG, EAL, "  set IOMMU type %d (%s) failed, "
+   "error %i (%s)\n", t->type_id, t->name, errno,
+   strerror(errno));
+   }
+   /* if we didn't find a suitable IOMMU type, fail */
+   return NULL;
+}
+
 int
 vfio_type1_dma_map(int vfio_container_fd)
 {
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index c62f269..afbb98a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -107,6 +107,10 @@ struct vfio_iommu_type {
vfio_dma_func_t dma_map_func;
 };

+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+const struct vfio_iommu_type *
+vfio_set_iommu_type(int vfio_container_fd);
+
 int vfio_type1_dma_map(int);
 int vfio_noiommu_dma_map(int);

-- 
2.8.0



[dpdk-dev] [PATCH 04/15] vfio: move vfio_iommu_type and dma_map functions to eal_vfio

2016-04-29 Thread Jan Viktorin
We make the iommu_types public temporarily here until the depending stuff is
refactored. The iommu_types and dma_map functions will be changed to be private
inside the eal_vfio module later.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/Makefile   |  1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 62 ---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 79 ++
 lib/librte_eal/linuxapp/eal/eal_vfio.h | 23 +
 4 files changed, 103 insertions(+), 62 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index e109361..128eb87 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -63,6 +63,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_xen_memory.c
 endif
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_log.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_vfio.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_pci_vfio.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index d29b7f1..cfa26ee 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,68 +72,6 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 /* per-process VFIO config */
 static struct vfio_config vfio_cfg;

-/* DMA mapping function prototype.
- * Takes VFIO container fd as a parameter.
- * Returns 0 on success, -1 on error.
- * */
-typedef int (*vfio_dma_func_t)(int);
-
-struct vfio_iommu_type {
-   int type_id;
-   const char *name;
-   vfio_dma_func_t dma_map_func;
-};
-
-static int vfio_type1_dma_map(int);
-static int vfio_noiommu_dma_map(int);
-
-/* IOMMU types we support */
-static const struct vfio_iommu_type iommu_types[] = {
-   /* x86 IOMMU, otherwise known as type 1 */
-   { RTE_VFIO_TYPE1, "Type 1", _type1_dma_map},
-   /* IOMMU-less mode */
-   { RTE_VFIO_NOIOMMU, "No-IOMMU", _noiommu_dma_map},
-};
-
-int
-vfio_type1_dma_map(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
-   for (i = 0; i < RTE_MAX_MEMSEG; i++) {
-   struct vfio_iommu_type1_dma_map dma_map;
-
-   if (ms[i].addr == NULL)
-   break;
-
-   memset(_map, 0, sizeof(dma_map));
-   dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
-   dma_map.vaddr = ms[i].addr_64;
-   dma_map.size = ms[i].len;
-   dma_map.iova = ms[i].phys_addr;
-   dma_map.flags = VFIO_DMA_MAP_FLAG_READ | 
VFIO_DMA_MAP_FLAG_WRITE;
-
-   ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, _map);
-
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
-   "error %i (%s)\n", errno, 
strerror(errno));
-   return -1;
-   }
-   }
-
-   return 0;
-}
-
-int
-vfio_noiommu_dma_map(int __rte_unused vfio_container_fd)
-{
-   /* No-IOMMU mode does not need DMA mapping */
-   return 0;
-}
-
 int
 pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
void *buf, size_t len, off_t offs)
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
new file mode 100644
index 000..d3ffebe
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -0,0 +1,79 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR 

[dpdk-dev] [PATCH 03/15] vfio: move common vfio constants to eal_vfio.h

2016-04-29 Thread Jan Viktorin
Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 7 ---
 lib/librte_eal/linuxapp/eal/eal_vfio.h | 7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 257162b..d29b7f1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -69,13 +69,6 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)

-#define VFIO_DIR "/dev/vfio"
-#define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
-#define VFIO_GROUP_FMT "/dev/vfio/%u"
-#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
-#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
-#define VFIO_GET_REGION_IDX(x) (x >> 40)
-
 /* per-process VFIO config */
 static struct vfio_config vfio_cfg;

diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index cedbeb0..bcf6860 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -88,6 +88,13 @@ struct vfio_config {
struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
 };

+#define VFIO_DIR "/dev/vfio"
+#define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
+#define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
+#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
+#define VFIO_GET_REGION_IDX(x) (x >> 40)
+
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
-- 
2.8.0



[dpdk-dev] [PATCH 02/15] vfio: move VFIO-specific stuff to eal_vfio.h

2016-04-29 Thread Jan Viktorin
The common VFIO definitions should be separated from the PCI-specific parts.

Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_init.h | 28 
 lib/librte_eal/linuxapp/eal/eal_vfio.h | 28 
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index 7011753..b4d7628 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -64,8 +64,6 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);

 #ifdef VFIO_PRESENT

-#define VFIO_MAX_GROUPS 64
-
 int pci_vfio_enable(void);
 int pci_vfio_is_enabled(void);
 int pci_vfio_mp_sync_setup(void);
@@ -89,15 +87,6 @@ int pci_vfio_map_resource(struct rte_pci_device *dev);
 int pci_vfio_get_group_fd(int iommu_group_fd);
 int pci_vfio_get_container_fd(void);

-/*
- * Function prototypes for VFIO multiprocess sync functions
- */
-int vfio_mp_sync_send_request(int socket, int req);
-int vfio_mp_sync_receive_request(int socket);
-int vfio_mp_sync_send_fd(int socket, int fd);
-int vfio_mp_sync_receive_fd(int socket);
-int vfio_mp_sync_connect_to_primary(void);
-
 /* socket comm protocol definitions */
 #define SOCKET_REQ_CONTAINER 0x100
 #define SOCKET_REQ_GROUP 0x200
@@ -105,23 +94,6 @@ int vfio_mp_sync_connect_to_primary(void);
 #define SOCKET_NO_FD 0x1
 #define SOCKET_ERR 0xFF

-/*
- * we don't need to store device fd's anywhere since they can be obtained from
- * the group fd via an ioctl() call.
- */
-struct vfio_group {
-   int group_no;
-   int fd;
-};
-
-struct vfio_config {
-   int vfio_enabled;
-   int vfio_container_fd;
-   int vfio_container_has_dma;
-   int vfio_group_idx;
-   struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
-};
-
 #endif

 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h 
b/lib/librte_eal/linuxapp/eal/eal_vfio.h
index f483bf4..cedbeb0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h
@@ -60,6 +60,34 @@
 #define RTE_VFIO_NOIOMMU VFIO_NOIOMMU_IOMMU
 #endif

+#define VFIO_MAX_GROUPS 64
+
+/*
+ * Function prototypes for VFIO multiprocess sync functions
+ */
+int vfio_mp_sync_send_request(int socket, int req);
+int vfio_mp_sync_receive_request(int socket);
+int vfio_mp_sync_send_fd(int socket, int fd);
+int vfio_mp_sync_receive_fd(int socket);
+int vfio_mp_sync_connect_to_primary(void);
+
+/*
+ * we don't need to store device fd's anywhere since they can be obtained from
+ * the group fd via an ioctl() call.
+ */
+struct vfio_group {
+   int group_no;
+   int fd;
+};
+
+struct vfio_config {
+   int vfio_enabled;
+   int vfio_container_fd;
+   int vfio_container_has_dma;
+   int vfio_group_idx;
+   struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
+};
+
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
-- 
2.8.0



[dpdk-dev] [PATCH 01/15] vfio: fix include of eal_private.h to be local

2016-04-29 Thread Jan Viktorin
Signed-off-by: Jan Viktorin 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 10266f8..257162b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -43,11 +43,11 @@
 #include 
 #include 
 #include 
-#include 

 #include "eal_filesystem.h"
 #include "eal_pci_init.h"
 #include "eal_vfio.h"
+#include "eal_private.h"

 /**
  * @file
-- 
2.8.0



[dpdk-dev] [PATCH 00/15] Make VFIO support independent on PCI

2016-04-29 Thread Jan Viktorin
Hello,

here follows several patchs extracting the general VFIO code out of the
PCI + VFIO code base. Usually, it's just move and rename of functions.
The most complicated ones are:

* eal/linux: extract setup logic out of pci_vfio_map_resource

  - separation of some setup code out of the pci_vfio_map_resource
(which is otherwise quite PCI-speicific)
  - it is required by the following one

* eal/linux: move global vfio_cfg to eal_vfio.c

  - moving the vfio_cfg global variable out of the eal_pci_vfio together with
the functions working with this variable

Some patchs make just temporary changes to avoid breakages throughout the
patch set (dma mapping).

I am not sure, how exactly is the mp_sync code intended to work. Should there
be just a single socket connection (no matter how many bus-systems we support)?
I assume it works this way so I've generalized the eal_pci_vfio_mp_sync.

The vfio initialization is moved out of the rte_eal_pci_init into EAL.

The code is now prepared for adding of other infrastructures such as the SoC
that I've introduced in [1]. I've partially done this in my workspace.

The VFIO code is quite complex and written in a spaghetti style so a more
maintainance would be helpful. I've did my best to preserve the semantics
(I hope) to be 100 % the same as before.

Important: I didn't test whether it's working as I have no VFIO-enabled machine
at the moment and the SoC infra is not so ready yet.

[1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/30913

Regards
Jan

---

Jan Viktorin (15):
  vfio: fix include of eal_private.h to be local
  vfio: move VFIO-specific stuff to eal_vfio.h
  vfio: move common vfio constants to eal_vfio.h
  vfio: move vfio_iommu_type and dma_map functions to eal_vfio
  vfio: generalize pci_vfio_set_iommu_type
  vfio: generalize pci_vfio_has_supported_extensions
  vfio: move vfio-specific SOCKET_* constants
  vfio: generalize pci_vfio_get_container_fd
  vfio: generalize pci_vfio_get_group_no
  vfio: extract setup logic out of pci_vfio_map_resource
  vfio: move global vfio_cfg to eal_vfio.c
  vfio: make vfio_*_dma_map and iommu_types private
  vfio: rename and generalize eal_pci_vfio_mp_sync
  vfio: initialize vfio out of the PCI subsystem
  vfio: change VFIO init to be extendable

 lib/librte_eal/linuxapp/eal/Makefile   |   5 +-
 lib/librte_eal/linuxapp/eal/eal.c  |  33 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c  |  17 +-
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  41 --
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 517 +---
 lib/librte_eal/linuxapp/eal/eal_vfio.c | 528 +
 lib/librte_eal/linuxapp/eal/eal_vfio.h |  94 
 .../{eal_pci_vfio_mp_sync.c => eal_vfio_mp_sync.c} |  12 +-
 8 files changed, 672 insertions(+), 575 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vfio.c
 rename lib/librte_eal/linuxapp/eal/{eal_pci_vfio_mp_sync.c => 
eal_vfio_mp_sync.c} (97%)

-- 
2.8.0



[dpdk-dev] [RFC 0/4] Include resources in tests

2016-04-29 Thread Bruce Richardson
On Fri, Apr 29, 2016 at 03:11:32PM +0200, Jan Viktorin wrote:
> Hello,
> 
> this patch set introduces a mechanism to include a resource (in general a 
> blob)
> into the test binary. This allows to make tests less dependent on the target
> testing environment. The first use case is testing of PCI bus scan by changing
> the hard-coded path (/sys/bus/pci/devices) to something different and provide
> a fake tree of devices with the test. It can help with testing of device-tree
> parsing as I've proposed in [1] where such mechanism was missing at that time.
> I'd like to use such framework for the SoC infra testing as well.
> 
> The patch set introduces a struct resource into the app/test. The resource is
> generic to include any kind of binary data. The binary data can be created in
> C or linked as an object file (created by objcopy). I am not sure where to
> place the objcopy logic and how to perform guessing of the objcopy arguments
> as they are pretty non-standard.
> 
> To include a complex resource (a file hierarchy), the last patch implements
> an archive extraction logic. So, it is possible to include a tar archive and
> unpack it before a test starts. Any ideas how to do this in a better way are
> welcome.
> 
> [1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/36545
> 
> Regards
> Jan Viktorin
> 

Hi Jan,

this looks really interesting, especially since just yesterday I was looking at
taking the million-entry lpm test routing table out of the C code and into a
separate resource file in this case an ini file.

In terms of a solution, I'm not convinced of the placing of the blobs inside the
test binary. I think a better solution would be to allow the different autotests
to take parameters from the commandline, so that the user can specify the path
to the file to use for the test. What would be your opinion of such a scheme?

/Bruce


[dpdk-dev] [PATCH v2] bond: inherit maximum rx packet length

2016-04-29 Thread Eric Kinzie
  Instead of a hard-coded maximum receive length, allow the bond interface
  to inherit this limit from the first slave added.  This allows
  an application that uses jumbo frames to pass realistic values to
  rte_eth_dev_configure without causing an error.

Signed-off-by: Eric Kinzie 
---
 drivers/net/bonding/rte_eth_bond_api.c | 12 +++-
 drivers/net/bonding/rte_eth_bond_pmd.c |  2 +-
 drivers/net/bonding/rte_eth_bond_private.h |  2 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
b/drivers/net/bonding/rte_eth_bond_api.c
index e9247b5..acc1c32 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -247,6 +247,7 @@ rte_eth_bond_create(const char *name, uint8_t mode, uint8_t 
socket_id)
internals->active_slave_count = 0;
internals->rx_offload_capa = 0;
internals->tx_offload_capa = 0;
+   internals->max_rx_pktlen = 2048;

/* Initially allow to choose any offload type */
internals->flow_type_rss_offloads = ETH_RSS_PROTO_MASK;
@@ -331,9 +332,15 @@ __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)

/* Add slave details to bonded device */
slave_eth_dev->data->dev_flags |= RTE_ETH_DEV_BONDED_SLAVE;
-   slave_add(internals, slave_eth_dev);

rte_eth_dev_info_get(slave_port_id, _info);
+   if (dev_info.max_rx_pktlen < internals->max_rx_pktlen) {
+   RTE_BOND_LOG(ERR, "Slave (port %u) max_rx_pktlen too small",
+slave_port_id);
+   return -1;
+   }
+
+   slave_add(internals, slave_eth_dev);

/* We need to store slaves reta_size to be able to synchronize RETA for 
all
 * slave devices even if its sizes are different.
@@ -365,6 +372,9 @@ __eth_bond_slave_add_lock_free(uint8_t bonded_port_id, 
uint8_t slave_port_id)
internals->tx_offload_capa = dev_info.tx_offload_capa;
internals->flow_type_rss_offloads = 
dev_info.flow_type_rss_offloads;

+   /* Inherit first slave's max rx packet size */
+   internals->max_rx_pktlen = dev_info.max_rx_pktlen;
+
} else {
/* Check slave link properties are supported if props are set,
 * all slaves must be the same */
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 54788cf..189fb47 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1650,7 +1650,7 @@ bond_ethdev_info(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)

dev_info->max_mac_addrs = 1;

-   dev_info->max_rx_pktlen = (uint32_t)2048;
+   dev_info->max_rx_pktlen = internals->max_rx_pktlen;

dev_info->max_rx_queues = (uint16_t)128;
dev_info->max_tx_queues = (uint16_t)512;
diff --git a/drivers/net/bonding/rte_eth_bond_private.h 
b/drivers/net/bonding/rte_eth_bond_private.h
index 8312397..79ca69d 100644
--- a/drivers/net/bonding/rte_eth_bond_private.h
+++ b/drivers/net/bonding/rte_eth_bond_private.h
@@ -169,6 +169,8 @@ struct bond_dev_private {

struct rte_kvargs *kvlist;
uint8_t slave_update_idx;
+
+   uint32_t max_rx_pktlen;
 };

 extern const struct eth_dev_ops default_dev_ops;
-- 
2.1.4



[dpdk-dev] [PATCH v2] bond: inherit maximum rx packet length

2016-04-29 Thread Eric Kinzie
v2 changes:
 - remove type cast on constant
 - check max_rx_pktlen when adding a slave to make sure it is >= max
   packet length of existing slave interfaces

Eric Kinzie (1):
  bond: inherit maximum rx packet length

 drivers/net/bonding/rte_eth_bond_api.c | 12 +++-
 drivers/net/bonding/rte_eth_bond_pmd.c |  2 +-
 drivers/net/bonding/rte_eth_bond_private.h |  2 ++
 3 files changed, 14 insertions(+), 2 deletions(-)

-- 
2.1.4



[dpdk-dev] [RFC 4/4] app/test: support resources archived by tar

2016-04-29 Thread Jan Viktorin
When needing a more complex resource (a file hierarchy), packing every single
file as a single resource would be very ineffective. For that purpose, it is
possible to pack the files into a tar archive, extract it before test from the
resource and finally clean up all the created files.

This patch introduces functions resource_untar and resource_rm_by_tar to
perform those tasks. An example of using those functions is included as a test.

Signed-off-by: Jan Viktorin 
---
 app/test/Makefile|   4 ++
 app/test/resource.c  | 180 +++
 app/test/resource.h  |  13 
 app/test/test_resource.c |  29 
 4 files changed, 226 insertions(+)

diff --git a/app/test/Makefile b/app/test/Makefile
index a9502f1..90acd63 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -77,6 +77,9 @@ SRCS-y += test.c
 SRCS-y += resource.c
 SRCS-y += test_resource.c
 $(eval $(call resource,test_resource_c,resource.c))
+$(eval $(call resource,test_resource_tar,resource.tar))
+resource.tar: test_resource.c
+   tar -C $(dir $<) -cf $@ $(notdir $<)
 SRCS-y += test_pci.c
 SRCS-y += test_prefetch.c
 SRCS-y += test_byteorder.c
@@ -196,6 +199,7 @@ CFLAGS += $(WERROR_FLAGS)
 CFLAGS += -D_GNU_SOURCE

 LDLIBS += -lm
+LDLIBS += -larchive

 # Disable VTA for memcpy test
 ifeq ($(CC), gcc)
diff --git a/app/test/resource.c b/app/test/resource.c
index a11c86e..7732bc7 100644
--- a/app/test/resource.c
+++ b/app/test/resource.c
@@ -33,6 +33,8 @@

 #include 
 #include 
+#include 
+#include 
 #include 
 #include 

@@ -90,6 +92,184 @@ int resource_fwrite_file(const struct resource *r, const 
char *fname)
return ret;
 }

+static int do_copy(struct archive *r, struct archive *w)
+{
+   const void *buf;
+   size_t len;
+   off_t off;
+   int ret;
+
+   while (1) {
+   ret = archive_read_data_block(r, , , );
+   if (ret == ARCHIVE_RETRY)
+   continue;
+
+   if (ret == ARCHIVE_EOF)
+   return 0;
+
+   if (ret != ARCHIVE_OK)
+   return ret;
+
+   do {
+   ret = archive_write_data_block(w, buf, len, off);
+   if (ret != ARCHIVE_OK && ret != ARCHIVE_RETRY)
+   return ret;
+   } while (ret != ARCHIVE_OK);
+   }
+}
+
+int resource_untar(const struct resource *res)
+{
+   struct archive *r;
+   struct archive *w;
+   struct archive_entry *e;
+   void *p;
+   int flags = 0;
+   int ret;
+
+   p = malloc(resource_size(res));
+   if (p == NULL)
+   rte_panic("Failed to malloc %zu B\n", resource_size(res));
+
+   memcpy(p, res->beg, resource_size(res));
+
+   r = archive_read_new();
+   if (r == NULL) {
+   free(p);
+   return -1;
+   }
+
+   archive_read_support_format_all(r);
+   archive_read_support_filter_all(r);
+
+   w = archive_write_disk_new();
+   if (w == NULL) {
+   archive_read_free(r);
+   free(p);
+   return -1;
+   }
+
+   flags |= ARCHIVE_EXTRACT_PERM;
+   flags |= ARCHIVE_EXTRACT_FFLAGS;
+   archive_write_disk_set_options(w, flags);
+   archive_write_disk_set_standard_lookup(w);
+
+   ret = archive_read_open_memory(r, p, resource_size(res));
+   if (ret != ARCHIVE_OK)
+   goto fail;
+
+   while (1) {
+   ret = archive_read_next_header(r, );
+   if (ret == ARCHIVE_EOF)
+   break;
+   if (ret != ARCHIVE_OK)
+   goto fail;
+
+   ret = archive_write_header(w, e);
+   if (ret == ARCHIVE_EOF)
+   break;
+   if (ret != ARCHIVE_OK)
+   goto fail;
+
+   if (archive_entry_size(e) == 0)
+   continue;
+
+   ret = do_copy(r, w);
+   if (ret != ARCHIVE_OK)
+   goto fail;
+
+   ret = archive_write_finish_entry(w);
+   if (ret != ARCHIVE_OK)
+   goto fail;
+   }
+
+   archive_write_free(w);
+   archive_read_free(r);
+   free(p);
+   return 0;
+
+fail:
+   archive_write_free(w);
+   archive_read_free(r);
+   free(p);
+   rte_panic("Failed: %s\n", archive_error_string(r));
+   return -1;
+}
+
+int resource_rm_by_tar(const struct resource *res)
+{
+   struct archive *r;
+   struct archive_entry *e;
+   void *p;
+   int try_again = 1;
+   int ret;
+
+   p = malloc(resource_size(res));
+   if (p == NULL)
+   rte_panic("Failed to malloc %zu B\n", resource_size(res));
+
+   memcpy(p, res->beg, resource_size(res));
+
+   while (try_again) {
+   r = archive_read_new();
+   if (r == NULL) {
+   free(p);
+   

[dpdk-dev] [RFC 3/4] app/test: add functions to create files from resources

2016-04-29 Thread Jan Viktorin
A resource can be written into the target filesystem by calling resource_fwrite
or resource_fwrite_file. Such file can be created before a test is started and
removed after the test finishes.

Signed-off-by: Jan Viktorin 
---
 app/test/resource.c  | 35 +++
 app/test/resource.h  |  4 
 app/test/test_resource.c | 10 ++
 3 files changed, 49 insertions(+)

diff --git a/app/test/resource.c b/app/test/resource.c
index 50a4510..a11c86e 100644
--- a/app/test/resource.c
+++ b/app/test/resource.c
@@ -31,6 +31,7 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

+#include 
 #include 
 #include 
 #include 
@@ -55,6 +56,40 @@ const struct resource *resource_find(const char *name)
return NULL;
 }

+int resource_fwrite(const struct resource *r, FILE *f)
+{
+   const size_t goal = resource_size(r);
+   size_t total = 0;
+
+   while (total < goal) {
+   size_t wlen = fwrite(r->beg + total, 1, goal - total, f);
+   if (wlen == 0) {
+   perror(__func__);
+   return -1;
+   }
+
+   total += wlen;
+   }
+
+   return 0;
+}
+
+int resource_fwrite_file(const struct resource *r, const char *fname)
+{
+   FILE *f;
+   int ret;
+
+   f = fopen(fname, "w");
+   if (f == NULL) {
+   perror(__func__);
+   return -1;
+   }
+
+   ret = resource_fwrite(r, f);
+   fclose(f);
+   return ret;
+}
+
 void __resource_register(struct resource *r)
 {
TAILQ_INSERT_TAIL(_list, r, next);
diff --git a/app/test/resource.h b/app/test/resource.h
index 7978e77..096633c 100644
--- a/app/test/resource.h
+++ b/app/test/resource.h
@@ -35,6 +35,7 @@
 #define _RESOURCE_H_

 #include 
+#include 
 #include 

 #include 
@@ -57,6 +58,9 @@ static inline size_t resource_size(const struct resource *r)

 const struct resource *resource_find(const char *name);

+int resource_fwrite(const struct resource *r, FILE *f);
+int resource_fwrite_file(const struct resource *r, const char *fname);
+
 void __resource_register(struct resource *r);

 #define REGISTER_LINKED_RESOURCE(_n) \
diff --git a/app/test/test_resource.c b/app/test/test_resource.c
index 7752522..148964e 100644
--- a/app/test/test_resource.c
+++ b/app/test/test_resource.c
@@ -65,6 +65,7 @@ REGISTER_LINKED_RESOURCE(test_resource_c);
 static int test_resource_c(void)
 {
const struct resource *r;
+   FILE *f;

r = resource_find("test_resource_c");
TEST_ASSERT_NOT_NULL(r, "No test_resource_c found");
@@ -72,6 +73,15 @@ static int test_resource_c(void)
"Found resource %s, expected test_resource_c",
r->name);

+   TEST_ASSERT_SUCCESS(resource_fwrite_file(r, "test_resource.c"),
+   "Failed to to write file %s", r->name);
+
+   f = fopen("test_resource.c", "r");
+   TEST_ASSERT_NOT_NULL(f,
+   "Missing extracted file resource.c");
+   fclose(f);
+   remove("test_resource.c");
+
return 0;
 }

-- 
2.8.0



[dpdk-dev] [RFC 2/4] app/test: support resources externally linked

2016-04-29 Thread Jan Viktorin
To include resources from other source that the C source code we can take
advantage of the objcopy behaviour, i.e. packing of an arbitrary file as an
object file that is linked to the target program.

A linked object file is always accessible as a pair

extern const char beg_;
extern const char end_;
(extern const char siz_;)

The objcopy however accepts architecture parameters in a very tricky way.
Try to translate as many arguments as possible from the RTE_ARCH into the
objcopy specific names.

We pack the resource.c source file as an example for testing.

*** CAUTION: ***
The objcopy and resource creation is a subject of change. Any comments of how
to integrate those is welcome.

Signed-off-by: Jan Viktorin 
---
 app/test/Makefile| 32 
 app/test/resource.h  |  5 +
 app/test/test_resource.c | 18 ++
 3 files changed, 55 insertions(+)

diff --git a/app/test/Makefile b/app/test/Makefile
index 7fbdd18..a9502f1 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -33,6 +33,37 @@ include $(RTE_SDK)/mk/rte.vars.mk

 ifeq ($(CONFIG_RTE_APP_TEST),y)

+ifeq ($(RTE_ARCH),arm)
+RTE_OBJCOPY_O = elf32-littlearm
+RTE_OBJCOPY_B = arm
+else ifeq ($(RTE_ARCH),arm64)
+RTE_OBJCOPY_O = elf64-littleaarch64
+RTE_OBJCOPY_B = aarch64
+else ifeq ($(RTE_ARCH),i686)
+RTE_OBJCOPY_O = elf32-i386
+RTE_OBJCOPY_B = i386
+else ifeq ($(RTE_ARCH),x86_64)
+RTE_OBJCOPY_O = elf64-x86-64
+RTE_OBJCOPY_B = i386:x86-64
+else ifeq ($(RTE_ARCH),x86_x32)
+RTE_OBJCOPY_O = elf32-x86-64
+RTE_OBJCOPY_B = i386:x86-64
+else
+$(error Unrecognized RTE_ARCH: $(RTE_ARCH))
+endif
+
+define resource
+SRCS-y += $(1).res.o
+$(1).res.o: $(2)
+   $(OBJCOPY) -I binary -B $(RTE_OBJCOPY_B) -O $(RTE_OBJCOPY_O) \
+   --rename-section \
+   .data=.rodata,alloc,load,data,contents,readonly  \
+   --redefine-sym _binary__dev_stdin_start=beg_$(1) \
+   --redefine-sym _binary__dev_stdin_end=end_$(1)   \
+   --redefine-sym _binary__dev_stdin_size=siz_$(1)  \
+   /dev/stdin $$@ < $$<
+endef
+
 #
 # library name
 #
@@ -45,6 +76,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) := commands.c
 SRCS-y += test.c
 SRCS-y += resource.c
 SRCS-y += test_resource.c
+$(eval $(call resource,test_resource_c,resource.c))
 SRCS-y += test_pci.c
 SRCS-y += test_prefetch.c
 SRCS-y += test_byteorder.c
diff --git a/app/test/resource.h b/app/test/resource.h
index a50a973..7978e77 100644
--- a/app/test/resource.h
+++ b/app/test/resource.h
@@ -59,6 +59,11 @@ const struct resource *resource_find(const char *name);

 void __resource_register(struct resource *r);

+#define REGISTER_LINKED_RESOURCE(_n) \
+extern const char beg_ ##_n; \
+extern const char end_ ##_n; \
+REGISTER_RESOURCE(_n, _ ##_n, _ ##_n); \
+
 #define REGISTER_RESOURCE(_n, _b, _e) \
 static struct resource linkres_ ##_n = {   \
.name = RTE_STR(_n), \
diff --git a/app/test/test_resource.c b/app/test/test_resource.c
index e3d2486..7752522 100644
--- a/app/test/test_resource.c
+++ b/app/test/test_resource.c
@@ -60,11 +60,29 @@ static int test_resource_dpdk(void)
return 0;
 }

+REGISTER_LINKED_RESOURCE(test_resource_c);
+
+static int test_resource_c(void)
+{
+   const struct resource *r;
+
+   r = resource_find("test_resource_c");
+   TEST_ASSERT_NOT_NULL(r, "No test_resource_c found");
+   TEST_ASSERT(!strcmp(r->name, "test_resource_c"),
+   "Found resource %s, expected test_resource_c",
+   r->name);
+
+   return 0;
+}
+
 static int test_resource(void)
 {
if (test_resource_dpdk())
return -1;

+   if (test_resource_c())
+   return -1;
+
return 0;
 }

-- 
2.8.0



[dpdk-dev] [RFC 1/4] app/test: introduce resources for tests

2016-04-29 Thread Jan Viktorin
Certain internal mechanisms of DPDK access different file system structures
(e.g. /sys/bus/pci/devices). It is difficult to test those cases automatically
by a unit test when such path is not hard-coded and there is no simple way how
to distribute fake ones with the current testing environment.

This patch adds a possibility to declare a resource embedded in the test binary
itself. The structure resource cover the generic situation - it provides a name
for lookup and pointers to the embedded data blob. A resource is registered
in a constructor by the macro REGISTER_RESOURCE.

Some initial tests of simple resources is included.

Signed-off-by: Jan Viktorin 
---
 app/test/Makefile|  2 ++
 app/test/resource.c  | 61 ++
 app/test/resource.h  | 77 
 app/test/test_resource.c | 75 ++
 4 files changed, 215 insertions(+)
 create mode 100644 app/test/resource.c
 create mode 100644 app/test/resource.h
 create mode 100644 app/test/test_resource.c

diff --git a/app/test/Makefile b/app/test/Makefile
index a4907d5..7fbdd18 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -43,6 +43,8 @@ APP = test
 #
 SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) := commands.c
 SRCS-y += test.c
+SRCS-y += resource.c
+SRCS-y += test_resource.c
 SRCS-y += test_pci.c
 SRCS-y += test_prefetch.c
 SRCS-y += test_byteorder.c
diff --git a/app/test/resource.c b/app/test/resource.c
new file mode 100644
index 000..50a4510
--- /dev/null
+++ b/app/test/resource.c
@@ -0,0 +1,61 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 RehiveTech. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of RehiveTech nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+#include "resource.h"
+
+struct resource_list resource_list = TAILQ_HEAD_INITIALIZER(resource_list);
+
+const struct resource *resource_find(const char *name)
+{
+   struct resource *r;
+
+   TAILQ_FOREACH(r, _list, next) {
+   RTE_VERIFY(r->name);
+
+   if (!strcmp(r->name, name))
+   return r;
+   }
+
+   return NULL;
+}
+
+void __resource_register(struct resource *r)
+{
+   TAILQ_INSERT_TAIL(_list, r, next);
+}
diff --git a/app/test/resource.h b/app/test/resource.h
new file mode 100644
index 000..a50a973
--- /dev/null
+++ b/app/test/resource.h
@@ -0,0 +1,77 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 RehiveTech. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of RehiveTech nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED 

[dpdk-dev] [RFC 0/4] Include resources in tests

2016-04-29 Thread Jan Viktorin
Hello,

this patch set introduces a mechanism to include a resource (in general a blob)
into the test binary. This allows to make tests less dependent on the target
testing environment. The first use case is testing of PCI bus scan by changing
the hard-coded path (/sys/bus/pci/devices) to something different and provide
a fake tree of devices with the test. It can help with testing of device-tree
parsing as I've proposed in [1] where such mechanism was missing at that time.
I'd like to use such framework for the SoC infra testing as well.

The patch set introduces a struct resource into the app/test. The resource is
generic to include any kind of binary data. The binary data can be created in
C or linked as an object file (created by objcopy). I am not sure where to
place the objcopy logic and how to perform guessing of the objcopy arguments
as they are pretty non-standard.

To include a complex resource (a file hierarchy), the last patch implements
an archive extraction logic. So, it is possible to include a tar archive and
unpack it before a test starts. Any ideas how to do this in a better way are
welcome.

[1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/36545

Regards
Jan Viktorin

---

Jan Viktorin (4):
  app/test: introduce resources for tests
  app/test: support resources externally linked
  app/test: add functions to create files from resources
  app/test: support resources archived by tar

 app/test/Makefile|  38 +++
 app/test/resource.c  | 276 +++
 app/test/resource.h  |  99 +
 app/test/test_resource.c | 132 +++
 4 files changed, 545 insertions(+)
 create mode 100644 app/test/resource.c
 create mode 100644 app/test/resource.h
 create mode 100644 app/test/test_resource.c

-- 
2.8.0



[dpdk-dev] removing mbuf error flags

2016-04-29 Thread Olivier Matz
Hi,

In rte_mbuf.h, some rx flags are set to 0 since a long time since
nearly 2 years. It means nobody use them. They were introduced by
the following commit:

  http://dpdk.org/browse/dpdk/commit/?id=c22265f6

As far as I understand, these flags were introduced to let the
application know that a received packet is invalid.

The 2 drivers using them are i40e and enic. But as this flags are 0
today, it means that invalid packets are silently given to the
application.

My opinion is that invalid packets should not be given to the
application and only a statistic counter should be incremented.
No application check these flags today (in examples, or testpmd).

I would like to remove these flags.
Thoughs?

Olivier



[dpdk-dev] DPDK Community Call - 16.04 Retrospective - Wednesday May 11th

2016-04-29 Thread O'Driscoll, Tim
At the end of each release, we typically hold a retrospective within our 
development team to discuss what went well, what could be improved etc. For 
16.04, we thought it would be a good idea to try doing this with the open 
source community, so that everybody involved in the project can provide their 
input and we can discuss any suggested improvements collectively.

Mike Glynn, who's our Program Manager for DPDK, will facilitate the discussion. 
John McNamara has gathered some stats on things like how many revisions patches 
typically went through etc., which we'll present to help initiate the 
discussion, but the meeting should mostly be an open discussion where people 
should feel free to suggest any improvements they think we should make.

If the approach is successful we can repeat it for future releases.


When:
London (United Kingdom - England)Wednesday, May 11, 2016 at 4PMBST  
UTC+1 hour 
San Jose (USA - California)  Wednesday, May 11, 2016 at 8AMPDT  
UTC-7 hours
Boston (USA - Massachusetts) Wednesday, May 11, 2016 at 11AM   EDT  
UTC-4 hours
Paris (France - ?le-de-France)   Wednesday, May 11, 2016 at 5PMCEST 
UTC+2 hours
New Delhi (India - Delhi)Wednesday, May 11, 2016 at 8:30PM IST  
UTC+5:30 hours 
Shanghai (China - Shanghai Municipality) Wednesday, May 11, 2016 at 11PM   CST  
UTC+8 hours
Tokyo (Japan)Midnight between Wednesday, May 11 and 
Thursday, May 12 JST  UTC+9 hours


GoToMeeting details:
Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/634498213

You can also dial in using your phone. 
United States : +1 (312) 757-3117

Access Code: 634-498-213

More phone numbers
Australia : +61 2 8355 1039
Austria : +43 7 2088 1033
Belgium : +32 (0) 28 93 7001
Canada : +1 (647) 497-9379
Denmark : +45 69 91 89 33
Finland : +358 (0) 942 41 5770
France : +33 (0) 170 950 585
Germany : +49 (0) 692 5736 7303
Ireland : +353 (0) 19 030 050
Italy : +39 0 693 38 75 50
Netherlands : +31 (0) 208 080 208
New Zealand : +64 9 925 0481
Norway : +47 21 54 82 21
Spain : +34 911 82 9890
Sweden : +46 (0) 853 527 817
Switzerland : +41 (0) 435 0167 65
United Kingdom : +44 (0) 330 221 0099



[dpdk-dev] [RFC PATCH v1 2/3] drivers/net/ixgbe: change xstats to use integers

2016-04-29 Thread David Harton (dharton)

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Remy Horton
> Sent: Friday, April 15, 2016 10:44 AM
> To: dev at dpdk.org; Helin Zhang 
> Subject: [dpdk-dev] [RFC PATCH v1 2/3] drivers/net/ixgbe: change xstats to
> use integers
> 
> Signed-off-by: Remy Horton 
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 87
> +++-
>  1 file changed, 78 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 3f1ebc1..4d31fe9 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -179,6 +179,10 @@ static int ixgbevf_dev_xstats_get(struct rte_eth_dev
> *dev,
> struct rte_eth_xstats *xstats, unsigned n);
>  static void ixgbe_dev_stats_reset(struct rte_eth_dev *dev);
>  static void ixgbe_dev_xstats_reset(struct rte_eth_dev *dev);
> +static int ixgbe_dev_xstats_names(__rte_unused struct rte_eth_dev *dev,
> + struct rte_eth_xstats_name *ptr_names, __rte_unused unsigned limit);
> +static int ixgbevf_dev_xstats_names(__rte_unused struct rte_eth_dev *dev,
> + struct rte_eth_xstats_name *ptr_names, __rte_unused unsigned limit);
>  static int ixgbe_dev_queue_stats_mapping_set(struct rte_eth_dev *eth_dev,
>uint16_t queue_id,
>uint8_t stat_idx,
> @@ -466,6 +470,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
>   .xstats_get   = ixgbe_dev_xstats_get,
>   .stats_reset  = ixgbe_dev_stats_reset,
>   .xstats_reset = ixgbe_dev_xstats_reset,
> + .xstats_names = ixgbe_dev_xstats_names,
>   .queue_stats_mapping_set = ixgbe_dev_queue_stats_mapping_set,
>   .dev_infos_get= ixgbe_dev_info_get,
>   .dev_supported_ptypes_get = ixgbe_dev_supported_ptypes_get,
> @@ -555,6 +560,7 @@ static const struct eth_dev_ops ixgbevf_eth_dev_ops =
> {
>   .xstats_get   = ixgbevf_dev_xstats_get,
>   .stats_reset  = ixgbevf_dev_stats_reset,
>   .xstats_reset = ixgbevf_dev_stats_reset,
> + .xstats_names = ixgbevf_dev_xstats_names,
>   .dev_close= ixgbevf_dev_close,
>   .allmulticast_enable  = ixgbevf_dev_allmulticast_enable,
>   .allmulticast_disable = ixgbevf_dev_allmulticast_disable,
> @@ -2698,6 +2704,76 @@ ixgbe_xstats_calc_num(void) {
>   (IXGBE_NB_TXQ_PRIO_STATS * 8);
>  }
> 
> +static int ixgbe_dev_xstats_names(__rte_unused struct rte_eth_dev *dev,
> + struct rte_eth_xstats_name *ptr_names, __rte_unused unsigned limit)
> +{
> + const unsigned cnt_stats = ixgbe_xstats_calc_num();
> + unsigned stat, i, count, offset;
> +
> + if (ptr_names != NULL) {
> + count = 0;
> + offset = 0;
> +
> + /* Note: limit >= cnt_stats checked upstream
> +  * in rte_eth_xstats_names()
> +  */
> +
> + /* Extended stats from ixgbe_hw_stats */
> + for (i = 0; i < IXGBE_NB_HW_STATS; i++) {
> + snprintf(ptr_names[count].name,
> + sizeof(ptr_names[count].name),
> + "%s",
> + rte_ixgbe_stats_strings[i].name);
> + count++;
> + offset += RTE_ETH_XSTATS_NAME_SIZE;
> + }
> +
> + /* RX Priority Stats */
> + for (stat = 0; stat < IXGBE_NB_RXQ_PRIO_STATS; stat++) {
> + for (i = 0; i < 8; i++) {

8 seems magical.  Is there a constant somewhere that can be used?

> + snprintf(ptr_names[count].name,
> + sizeof(ptr_names[count].name),
> + "rx_priority%u_%s", i,
> + rte_ixgbe_rxq_strings[stat].name);
> + count++;
> + offset += RTE_ETH_XSTATS_NAME_SIZE;
> + }
> + }
> +
> + /* TX Priority Stats */
> + for (stat = 0; stat < IXGBE_NB_TXQ_PRIO_STATS; stat++) {
> + for (i = 0; i < 8; i++) {

Same magic number.

> + snprintf(ptr_names[count].name,
> + sizeof(ptr_names[count].name),
> + "tx_priority%u_%s", i,
> + rte_ixgbe_txq_strings[stat].name);
> + count++;
> + offset += RTE_ETH_XSTATS_NAME_SIZE;
> + }
> + }
> + /* FIXME: Debugging check */

Just a reminder to cleanup.

> + if (cnt_stats != count)
> + return -EIO;
> + }
> + return cnt_stats;
> +}
> +
> +static int ixgbevf_dev_xstats_names(__rte_unused struct 

[dpdk-dev] [PATCH v3] i40evf: Report error if HW CRC strip is disabled for non-DPDK PF hosts

2016-04-29 Thread Zhang, Helin


> -Original Message-
> From: Topel, Bjorn
> Sent: Friday, April 22, 2016 1:39 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin; Wu, Jingjing; Topel, Bjorn
> Subject: [PATCH v3] i40evf: Report error if HW CRC strip is disabled for non-
> DPDK PF hosts
> 
> On hosts running a non-DPDK PF driver, the VF has no means of changing the
> HW CRC strip setting for a RX queue. It's implicitly enabled.
> 
> This patch checks if the host is running a non-DPDK PF kernel driver, and
> returns an error, if HW CRC stripping was disabled.
> 
> Signed-off-by: Bj?rn T?pel 
Acked-by: Helin Zhang 


[dpdk-dev] [RFC PATCH v1 1/3] rte: change xstats to use integer keys

2016-04-29 Thread David Harton (dharton)

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Remy Horton
> Sent: Friday, April 15, 2016 10:44 AM
> To: dev at dpdk.org; Thomas Monjalon 
> Subject: [dpdk-dev] [RFC PATCH v1 1/3] rte: change xstats to use integer
> keys
> 
> Signed-off-by: Remy Horton 
> ---
>  lib/librte_ether/rte_ethdev.c | 87
> +++
>  lib/librte_ether/rte_ethdev.h | 38 +++
>  2 files changed, 117 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index a31018e..cdd0685 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -1507,6 +1507,82 @@ rte_eth_stats_reset(uint8_t port_id)
>   dev->data->rx_mbuf_alloc_failed = 0;
>  }
> 
> +static int
> +rte_eth_xstats_count(uint8_t port_id)

Thanks for adding this.  I believe an overt API is much more clear.

> +{
> + struct rte_eth_dev *dev;
> + int count;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> + dev = _eth_devices[port_id];
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->xstats_names, -ENOTSUP);
> + count = (*dev->dev_ops->xstats_names)(dev, NULL, 0);
> + if (count >= 0) {
> + count += RTE_NB_STATS;
> + count += dev->data->nb_rx_queues * RTE_NB_RXQ_STATS;
> + count += dev->data->nb_tx_queues * RTE_NB_TXQ_STATS;
> + }
> + return count;
> +}
> +
> +int
> +rte_eth_xstats_names(uint8_t port_id, struct rte_eth_xstats_name
> *ptr_names,
> + unsigned limit)
> +{
> + struct rte_eth_dev *dev;
> + int cnt_used_entries;
> + int cnt_expected_entries;
> + uint32_t idx, id_queue;
> + int offset;
> +
> + cnt_expected_entries = rte_eth_xstats_count(port_id);
> + if (cnt_expected_entries < 0 || ptr_names == NULL)
> + return cnt_expected_entries;

I suggest we don't provide two ways to get the number of stats and 
that users always call rte_eth_xstats_count().
Recommend returning -EINVAL if ptr_names is NULL.

> +
> + if ((int)limit < cnt_expected_entries)
> + return -ERANGE;
> +
> + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> + dev = _eth_devices[port_id];
> + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->xstats_names, -ENOTSUP);

I think this check is too restrictive.  There are drivers that do not 
provide device specific xstats today but the xstats API will still 
return the per q stats.
Recommend skipping the device specific steps that follow.

> + cnt_used_entries = (*dev->dev_ops->xstats_names)(
> + dev, ptr_names, limit);
> +
> + if (cnt_used_entries < 0)
> + return cnt_used_entries;
> +
> + offset = cnt_used_entries * RTE_ETH_XSTATS_NAME_SIZE;
> + for (idx = 0; idx < RTE_NB_STATS; idx++) {
> + snprintf(ptr_names[cnt_used_entries].name,
> + sizeof(ptr_names[0].name),
> + "%s", rte_stats_strings[idx].name);
> + offset += RTE_ETH_XSTATS_NAME_SIZE;
> + cnt_used_entries++;
> + }
> + for (id_queue = 0; id_queue < dev->data->nb_rx_queues; id_queue++) {
> + for (idx = 0; idx < RTE_NB_RXQ_STATS; idx++) {
> + snprintf(ptr_names[cnt_used_entries].name,
> + sizeof(ptr_names[0].name),
> + "rx_q%u%s",
> + id_queue, rte_rxq_stats_strings[idx].name);
> + offset += RTE_ETH_XSTATS_NAME_SIZE;
> + cnt_used_entries++;
> + }
> +
> + }
> + for (id_queue = 0; id_queue < dev->data->nb_tx_queues; id_queue++) {
> + for (idx = 0; idx < RTE_NB_TXQ_STATS; idx++) {
> + snprintf(ptr_names[cnt_used_entries].name,
> + sizeof(ptr_names[0].name),
> + "tx_q%u%s",
> + id_queue, rte_txq_stats_strings[idx].name);
> + offset += RTE_ETH_XSTATS_NAME_SIZE;
> + cnt_used_entries++;
> + }
> + }
> + return cnt_used_entries;
> +}
> +
>  /* retrieve ethdev extended statistics */
>  int
>  rte_eth_xstats_get(uint8_t port_id, struct rte_eth_xstats *xstats,
> @@ -1551,8 +1627,7 @@ rte_eth_xstats_get(uint8_t port_id, struct
> rte_eth_xstats *xstats,
>   stats_ptr = RTE_PTR_ADD(_stats,
>   rte_stats_strings[i].offset);
>   val = *stats_ptr;
> - snprintf(xstats[count].name, sizeof(xstats[count].name),
> - "%s", rte_stats_strings[i].name);
> + xstats[count].key = count + xcount;

Suggest setting adding:
xstats[count].name[0] = '\0'

until name is removed.

>   xstats[count++].value = val;
>   }
> 
> @@ -1563,9 +1638,7 @@ rte_eth_xstats_get(uint8_t port_id, struct
> rte_eth_xstats *xstats,
> 

[dpdk-dev] [RFC PATCH v1 0/3] Remove string operations from xstats

2016-04-29 Thread David Harton (dharton)

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Remy Horton
> Sent: Friday, April 15, 2016 10:44 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [RFC PATCH v1 0/3] Remove string operations from
> xstats
> 
> The current extended ethernet statistics fetching involve doing several
> string operations, which causes performance issues if there are lots of
> statistics and/or network interfaces. This RFC patchset changes the API
> for xstats to use integer identifiers instead of strings and implements
> this new API for the ixgbe driver. Others drivers to follow.
> 
> --
> 
> Since this will involve API & ABI breakage as previously advertised,
> there are several design assumptions that need consideration:
> 
> *) id-name & id-value pairs for both lookup and query
> Permits out-of-order and non-contigious returning of names/ids/values,
> even though expected implmentations would in practice return items in
> sorted order by id. Is this sufficent/desirable future proofing? Idea
> is to allow possibility of drivers returning partial statistics.

I think the key is that the order of the stats must always be honored 
and if that's the case then an id isn't necessary.  However, if others 
want an id certainly doesn't hurt.

I don't see drivers autonomously providing a subset of stats and users 
can filter out stats they don't want to their presentation layers.

> 
> *) Bulk name-id mapping lookup only
> At the moment individual lookup is not supported, as this would impose
> extra overheads on drivers. The assumption is that any end user would
> fetch all this data once on startup and then cache the mappings.

Agreed.  Similarly there is no need to return a partial list of stats 
as the presentation layers can filter.

> 
> *) Replacement or additional API
> This patch replaces the current xstats API, but there is no inherant
> reason beyond maintainability why this funtionality could not be in
> addition rather than a replacement. What is consensus on this?

I suggest 3 new functions are added:
- get number of xstats
- get xstats names
- get xstats values

This facilitates:
- parallel development within the release without breaking current usage
- possibility of removing rte_eth_xstats_get() in following release

Thanks for moving this forward,
Dave

> 
> Comments welcome.
> 
> Remy Horton (3):
>   rte: change xstats to use integer keys
>   drivers/net/ixgbe: change xstats to use integer keys
>   examples/ethtool: add xstats display command
> 
>  drivers/net/ixgbe/ixgbe_ethdev.c  | 87
> +++
>  examples/ethtool/ethtool-app/ethapp.c | 57 +++
>  lib/librte_ether/rte_ethdev.c | 87
> +++
>  lib/librte_ether/rte_ethdev.h | 38 +++
>  4 files changed, 252 insertions(+), 17 deletions(-)
> 
> --
> 2.5.5


[dpdk-dev] [RFC PATCH v1 0/3] Remove string operations from xstats

2016-04-29 Thread David Harton (dharton)
Happy Friday,

> -Original Message-
> From: Remy Horton [mailto:remy.horton at intel.com]
> Sent: Friday, April 29, 2016 6:22 AM
> To: David Harton (dharton) ; Tahhan, Maryam
> ; dev at dpdk.org
> Cc: Mcnamara, John ; Van Haaren, Harry
> 
> Subject: Re: [dpdk-dev] [RFC PATCH v1 0/3] Remove string operations from
> xstats
> 
> Morning,
> 
> On 28/04/2016 16:58, David Harton (dharton) wrote:
[...]
> 
> > Maybe I misread the patch series or missed one but I don't see where
> > stats can be obtained without copying strings?  This is the real issue
> > I raised originally.
> 
> See http://dpdk.org/dev/patchwork/patch/12096/ where xstats[].key is used
> to lookup string from ptr_names[].name - I didn't delete the name field
> from rte_eth_xstats because doing so would cause a compile error with the
> drivers I've not yet converted.

Ok, this was one of my fundamental hang ups.  I didn't see any gain because
of the way the proposal is being introduced.  I guess you are saying that
not only will drivers be added in future patch series but that you also
intend to continue modifying the external API as well.

I will start a clean thread and reply/review the provided patches and forgo
my work.

Cheers,
Dave

> 
> 
> 
> > I did not add "get the count" because it wasn't provided in the
> > current API and instead followed the convention but I do believe
> > overtly getting the count it is the better approach.
> 
> I didn't either for the same reason, but if the API is going to be broken,
> I think it should be added.
> 
> 
> ..Remy


[dpdk-dev] [RFC PATCH v1 0/3] Remove string operations from xstats

2016-04-29 Thread Remy Horton
Morning,

On 28/04/2016 16:58, David Harton (dharton) wrote:
[..]
 *) id-name & id-value pairs for both lookup and query Permits
 out-of-order and non-contigious returning of names/ids/values, even
 though expected implmentations would in practice return items in
 sorted order by id. Is this sufficent/desirable future proofing?
 Idea is to allow possibility of drivers returning partial statistics.
>>>
>>> I believe forcing drivers to match to a common id-space will become
>>> burdensome.  If the stats id-space isn't common then matching strings
>>> is probably just as sufficient as long as drivers don't add/remove
>>> stats ad hoc between the time the device is initialized and removed.
>>
>> I'm not aware of drivers adding/removing the stats ad hoc? The idea is to
>> have a common-id space otherwise it will be a free for all and we won't
>> have alignment across the drivers. I don't see it being any more
>> burdensome than having a common register naming across the board which is
>> what is there today. The advantage being that you don't have to pull the
>> strings every time.

Returning both stats (id,value) and names (id,string) as pairs would 
allow (amoung other things) common ids but actually having common ids is 
not an intended goal of mine. I think the whole idea of common ids was 
implicity vetoed when the idea of ENUMs was thrown out.

I opted for both stats and lookup to be provided as pairs because when 
it comes to APIs, I have a slight preference for having that bit of 
extra generality. Not sure its really worth it, so might change stats to 
just use id-indexed integer arrays (ethtool-like basically) rather than 
a typedef that includes the numeric id.


 *) Bulk name-id mapping lookup only
[..]
>>> I'm not sure I see the value of looking up a single stat from a user
>>> perspective.  I can see where the drivers might say that some stats
>>> are less disruptive/etc but the user doesn't have that knowledge and
>>> wouldn't know how to take advantage.  Usually all stats are grabbed
>>> multiple times and the changes noted during debug sessions.
>>>
>>
>> I believe Remy's change doesn't suggest/support individual lookup. It is
>> just a statement that we don't want to burden drivers with individual
>> stats lookups.

Correct.


 *) Replacement or additional API
[..]
>>> However, if
>>> we want to go forward with cleaning up in order to reduce the support
>>> drivers provide I'm all for it.

Whether we want to do such a cleanup is my open question.


> Maybe I misread the patch series or missed one but I don't see where
> stats can be obtained without copying strings?  This is the real issue I
> raised originally.

See http://dpdk.org/dev/patchwork/patch/12096/ where xstats[].key is 
used to lookup string from ptr_names[].name - I didn't delete the name 
field from rte_eth_xstats because doing so would cause a compile error 
with the drivers I've not yet converted.



> I did not add "get the count" because it wasn't provided in the current API
> and instead followed the convention but I do believe overtly getting the
> count it is the better approach.

I didn't either for the same reason, but if the API is going to be 
broken, I think it should be added.


..Remy


[dpdk-dev] [PATCH] mk: add build-time library directory to linker path

2016-04-29 Thread Yuanhan Liu
On Thu, Apr 28, 2016 at 03:43:48PM +0900, Tetsuya Mukawa wrote:
> On 2016/04/27 20:02, Panu Matilainen wrote:
> > This is a pre-requisite for adding DT_NEEDED dependencies
> > between internal libraries.
> >
> > Signed-off-by: Panu Matilainen 
> > ---
> >  mk/rte.lib.mk | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
> > index 8f7e021..b420280 100644
> > --- a/mk/rte.lib.mk
> > +++ b/mk/rte.lib.mk
> > @@ -86,8 +86,8 @@ O_TO_A_DO = @set -e; \
> > $(O_TO_A) && \
> > echo $(O_TO_A_CMD) > $(call exe2cmd,$(@))
> >  
> > -O_TO_S = $(LD) $(_CPU_LDFLAGS) $(EXTRA_LDFLAGS) -shared $(OBJS-y) 
> > $(LDLIBS) \
> > --Wl,-soname,$(LIB) -o $(LIB)
> > +O_TO_S = $(LD) -L$(RTE_OUTPUT)/lib $(_CPU_LDFLAGS) $(EXTRA_LDFLAGS) \
> > + -shared $(OBJS-y) $(LDLIBS) -Wl,-soname,$(LIB) -o $(LIB)
> >  O_TO_S_STR = $(subst ','\'',$(O_TO_S)) #'# fix syntax highlight
> >  O_TO_S_DISP = $(if $(V),"$(O_TO_S_STR)","  LD $(@)")
> >  O_TO_S_DO = @set -e; \
> Tested-by: Tetsuya Mukawa 

Applied to dpdk-next-virtio.

Thanks.

--yliu


[dpdk-dev] [PATCH 4/4] eal: add assert macro for debug

2016-04-29 Thread Thomas Monjalon
2016-04-22 15:42, Stephen Hemminger:
> On Fri, 22 Apr 2016 15:08:50 -0700
> Yuanhan Liu  wrote:
> > On Fri, Apr 22, 2016 at 11:14:35PM +0200, Thomas Monjalon wrote:
> > > rxd = (Vmxnet3_RxDesc *)rxq->cmd_ring[ring_idx].base + 
> > > idx;
> > > +   RTE_SET_USED(rxd); /* used only for assert when enabled */
> > 
> > How about adding the __rte_unused tag at where we declare it?

It is not really unused.
And adding a SET_USED line allows to put a comment in the context
below the assignment.

> Why not just kill the useless assert's all together? They really only helped
> during the short time developer is debugging this code.

They also provide some kind of comments and can help when refactoring.
Anyway, removing the assert would deserve another patch.


[dpdk-dev] [PATCH] ip_pipeline: add rss support

2016-04-29 Thread Jasvinder Singh
This patch enables rss (receive side scaling) per network interface
through the configuration file. The user can specify following
parameters in LINK section for enabling the rss feature - rss_qs,
rss_proto_ipv4, rss_proto_ipv6 and ip_proto_l2.

The "rss_qs" is mandatory parameter which indicates the queues to be
used for rss, while rest of the parameters are optional. When optional
parameters are not provided in the configuration file, default setting
(ETH_RSS_IPV4 | ETH_RSS_IPV6) is assumed for "rss_hf" field of the
rss_conf structure.

For example, following configuration can be applied for using the rss
on port 0 of the network interface;

[PIPELINE0]
type = MASTER
core = 0

[LINK0]
rss_qs = 0 1

[PIPELINE1]
type = PASS-THROUGH
core = 1
pktq_in = RXQ0.0 RXQ0.1 RXQ1.0
pktq_out = TXQ0.0 TXQ1.0 TXQ0.1

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/app.h  |  28 ++--
 examples/ip_pipeline/config_check.c |  32 +++-
 examples/ip_pipeline/config_parse.c | 298 +++-
 examples/ip_pipeline/init.c |  70 -
 4 files changed, 408 insertions(+), 20 deletions(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 55a9841..7534e27 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -50,6 +50,15 @@

 #define APP_PARAM_NAME_SIZE  PIPELINE_NAME_SIZE
 #define APP_LINK_PCI_BDF_SIZE16
+
+#ifndef APP_LINK_MAX_HWQ_IN
+#define APP_LINK_MAX_HWQ_IN  128
+#endif
+
+#ifndef APP_LINK_MAX_HWQ_OUT
+#define APP_LINK_MAX_HWQ_OUT 128
+#endif
+
 struct app_mempool_params {
char *name;
uint32_t parsed;
@@ -69,6 +78,12 @@ struct app_link_params {
uint32_t tcp_local_q; /* 0 = Disabled (pkts go to default queue 0) */
uint32_t udp_local_q; /* 0 = Disabled (pkts go to default queue 0) */
uint32_t sctp_local_q; /* 0 = Disabled (pkts go to default queue 0) */
+   uint32_t rss_qs[APP_LINK_MAX_HWQ_IN];
+   uint32_t n_rss_qs;
+   uint64_t rss_proto_ipv4;
+   uint64_t rss_proto_ipv6;
+   uint64_t rss_proto_l2;
+   uint32_t promisc;
uint32_t state; /* DOWN = 0, UP = 1 */
uint32_t ip; /* 0 = Invalid */
uint32_t depth; /* Valid only when IP is valid */
@@ -76,7 +91,6 @@ struct app_link_params {
char pci_bdf[APP_LINK_PCI_BDF_SIZE];

struct rte_eth_conf conf;
-   uint8_t promisc;
 };

 struct app_pktq_hwq_in_params {
@@ -380,17 +394,9 @@ struct app_eal_params {
 #define APP_MAX_MEMPOOLS 8
 #endif

-#ifndef APP_LINK_MAX_HWQ_IN
-#define APP_LINK_MAX_HWQ_IN  64
-#endif
-
-#ifndef APP_LINK_MAX_HWQ_OUT
-#define APP_LINK_MAX_HWQ_OUT 64
-#endif
-
-#define APP_MAX_HWQ_IN (APP_MAX_LINKS * 
APP_LINK_MAX_HWQ_IN)
+#define APP_MAX_HWQ_IN  (APP_MAX_LINKS * APP_LINK_MAX_HWQ_IN)

-#define APP_MAX_HWQ_OUT   (APP_MAX_LINKS * 
APP_LINK_MAX_HWQ_OUT)
+#define APP_MAX_HWQ_OUT (APP_MAX_LINKS * APP_LINK_MAX_HWQ_OUT)

 #ifndef APP_MAX_PKTQ_SWQ
 #define APP_MAX_PKTQ_SWQ 256
diff --git a/examples/ip_pipeline/config_check.c 
b/examples/ip_pipeline/config_check.c
index fd9ff49..18f57be 100644
--- a/examples/ip_pipeline/config_check.c
+++ b/examples/ip_pipeline/config_check.c
@@ -56,6 +56,26 @@ check_mempools(struct app_params *app)
}
 }

+static inline uint32_t
+link_rxq_used(struct app_link_params *link, uint32_t q_id)
+{
+   uint32_t i;
+
+   if ((link->arp_q == q_id) ||
+   (link->tcp_syn_q == q_id) ||
+   (link->ip_local_q == q_id) ||
+   (link->tcp_local_q == q_id) ||
+   (link->udp_local_q == q_id) ||
+   (link->sctp_local_q == q_id))
+   return 1;
+
+   for (i = 0; i < link->n_rss_qs; i++)
+   if (link->rss_qs[i] == q_id)
+   return 1;
+
+   return 0;
+}
+
 static void
 check_links(struct app_params *app)
 {
@@ -90,14 +110,12 @@ check_links(struct app_params *app)
rxq_max = link->udp_local_q;
if (link->sctp_local_q > rxq_max)
rxq_max = link->sctp_local_q;
+   for (i = 0; i < link->n_rss_qs; i++)
+   if (link->rss_qs[i] > rxq_max)
+   rxq_max = link->rss_qs[i];

for (i = 1; i <= rxq_max; i++)
-   APP_CHECK(((link->arp_q == i) ||
-   (link->tcp_syn_q == i) ||
-   (link->ip_local_q == i) ||
-   (link->tcp_local_q == i) ||
-   (link->udp_local_q == i) ||
-   (link->sctp_local_q == i)),
+   APP_CHECK((link_rxq_used(link, i)),
"%s 

[dpdk-dev] Byte order of vlan_tci of rte_mbuf is different on different source

2016-04-29 Thread Olivier Matz
Hi,

On 04/25/2016 04:35 AM, zhang.xinghua1 at zte.com.cn wrote:
> When using I350 working on SR-IOV mode, we got confused that byte order 
> of vlan_tci in the VF received packet descriptor is different when the
> packet source is different.
> 
> 1) Packets from VF to VF, the byte order is big-endian. (e.g. 0xF00)
> 2) Packets from PC to VF, the byte order is little-endian. (e.g. 0xF)
> 
> Below is the testing net-work:
> VM0VM1 PC
> VF0VF1  |
>   | |   |
>   +--+--+   |
>  |  |
>  PF |
>  hypervisor |
>  SR-IOV NIC |
>  |  |
>  |VLAN 15   |
>  +-switch---+
> 
> 
> We make a breakpoint at the following line of eth_igb_recv_pkts, the 
> vlan_tci
> we observed that everytime.
> 
> uint16_t
> eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
>uint16_t nb_pkts)
> 
> /* Only valid if PKT_RX_VLAN_PKT set in pkt_flags */
> rxm->vlan_tci = rte_le_to_cpu_16(rxd.wb.upper.vlan);

In rte_mbuf.h, it is specified that these values (vlan_tci and
vlan_tci_outer) must be stored in CPU order.

It's probably a driver or hardware issue. Note that in linux there is
something that looks similar to your issue:

http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/igb/igb_main.c#L1278

  /* On i350, i354, i210, and i211, loopback VLAN packets
   * have the tag byte-swapped.
   */
  if (adapter->hw.mac.type >= e1000_i350)
  set_bit(IGB_RING_FLAG_RX_LB_VLAN_BSWAP, >flags);

I think you could check if the same thing is done in the
dpdk driver.


> 
> ZTE Information Security Notice: The information contained in this mail (and 
> any attachment transmitted herewith) is privileged and confidential and is 
> intended for the exclusive use of the addressee(s).  If you are not an 
> intended recipient, any disclosure, reproduction, distribution or other 
> dissemination or use of the information contained is strictly prohibited.  If 
> you have received this mail in error, please delete it and notify us 
> immediately.
> 

This notice should be removed in public emails.

Regards,
Olivier


[dpdk-dev] [PATCH v2 1/1] cmdline: add any multi string mode to token string

2016-04-29 Thread Azarewicz, PiotrX T

> But I agree that a comment could be added above the definition of
> TOKEN_STRING_MULTI that would explain what is the behavior in that case.
> 
> Piotr, do you think this is something you can do?

Okay, I will.

Regards,
Piotr


[dpdk-dev] [PATCH v1 3/3] doc: add keepalive enhancement documentation

2016-04-29 Thread Remy Horton
Signed-off-by: Remy Horton 
---
 doc/guides/rel_notes/release_16_07.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index 83c841b..7309877 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -34,6 +34,11 @@ This section should contain new features added in this 
release. Sample format:

   Refer to the previous release notes for examples.

+* **Added keepalive enhancements.**
+
+   Adds support for reporting LCore liveness to secondary processes and
+   support for idled CPUs.
+

 Resolved Issues
 ---
-- 
2.5.5



[dpdk-dev] [PATCH v1 2/3] examples/l2fwd-keepalive: add IPC liveness reporting

2016-04-29 Thread Remy Horton
Signed-off-by: Remy Horton 
---
 examples/Makefile  |   1 +
 examples/l2fwd-keepalive/Makefile  |   4 +-
 examples/l2fwd-keepalive/ka-agent/Makefile |  51 +++
 examples/l2fwd-keepalive/ka-agent/main.c   | 128 
 examples/l2fwd-keepalive/main.c|  22 -
 examples/l2fwd-keepalive/shm.c | 130 +
 examples/l2fwd-keepalive/shm.h | 102 ++
 7 files changed, 434 insertions(+), 4 deletions(-)
 create mode 100644 examples/l2fwd-keepalive/ka-agent/Makefile
 create mode 100644 examples/l2fwd-keepalive/ka-agent/main.c
 create mode 100644 examples/l2fwd-keepalive/shm.c
 create mode 100644 examples/l2fwd-keepalive/shm.h

diff --git a/examples/Makefile b/examples/Makefile
index b28b30e..bd688b9 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -64,6 +64,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += l2fwd-crypto
 DIRS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += l2fwd-ivshmem
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += l2fwd-jobstats
 DIRS-y += l2fwd-keepalive
+DIRS-y += l2fwd-keepalive/ka-agent
 DIRS-$(CONFIG_RTE_LIBRTE_LPM) += l3fwd
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += l3fwd-acl
 ifeq ($(CONFIG_RTE_LIBRTE_LPM),y)
diff --git a/examples/l2fwd-keepalive/Makefile 
b/examples/l2fwd-keepalive/Makefile
index 568edcb..3fcf513 100644
--- a/examples/l2fwd-keepalive/Makefile
+++ b/examples/l2fwd-keepalive/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -42,7 +42,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 APP = l2fwd-keepalive

 # all source are stored in SRCS-y
-SRCS-y := main.c
+SRCS-y := main.c shm.c

 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS)
diff --git a/examples/l2fwd-keepalive/ka-agent/Makefile 
b/examples/l2fwd-keepalive/ka-agent/Makefile
new file mode 100644
index 000..4eaac76
--- /dev/null
+++ b/examples/l2fwd-keepalive/ka-agent/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overridden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = ka-agent
+
+# all source are stored in SRCS-y
+SRCS-y := main.c
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)/../
+
+EXTRA_CFLAGS += -O3 -g -Wfatal-errors
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/l2fwd-keepalive/ka-agent/main.c 
b/examples/l2fwd-keepalive/ka-agent/main.c
new file mode 100644
index 000..f05e3a5
--- /dev/null
+++ b/examples/l2fwd-keepalive/ka-agent/main.c
@@ -0,0 +1,128 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above 

[dpdk-dev] [PATCH v1 1/3] eal: add new keepalive state & callback hook

2016-04-29 Thread Remy Horton
Signed-off-by: Remy Horton 
---
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +
 lib/librte_eal/common/include/rte_keepalive.h   | 40 +
 lib/librte_eal/common/rte_keepalive.c   | 35 --
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +
 4 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 58c2951..9a33441 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -151,3 +151,10 @@ DPDK_16.04 {
rte_eal_primary_proc_alive;

 } DPDK_2.2;
+
+DPDK_16.7 {
+   global:
+
+   rte_keepalive_register_alive_callback;
+
+} DPDK_16.04;
diff --git a/lib/librte_eal/common/include/rte_keepalive.h 
b/lib/librte_eal/common/include/rte_keepalive.h
index 10dac2e..3159730 100644
--- a/lib/librte_eal/common/include/rte_keepalive.h
+++ b/lib/librte_eal/common/include/rte_keepalive.h
@@ -59,6 +59,16 @@ typedef void (*rte_keepalive_failure_callback_t)(
const int id_core);

 /**
+ * Keepalive 'alive' callback.
+ *
+ *  Receives a data pointer passed to rte_keepalive_register_alive_callback()
+ *  and the id of the failed core.
+ */
+typedef void (*rte_keepalive_alive_callback_t)(
+   void *data,
+   const int id_core);
+
+/**
  * Keepalive state structure.
  * @internal
  */
@@ -105,4 +115,34 @@ void rte_keepalive_register_core(struct rte_keepalive 
*keepcfg,
 void
 rte_keepalive_mark_alive(struct rte_keepalive *keepcfg);

+/**
+ * Per-core sleep-time indication.
+ * @param *keepcfg
+ *   Keepalive structure pointer
+ *
+ * This function needs to be called from within the main process loop of
+ * the LCore going to sleep.
+ */
+void
+rte_keepalive_mark_sleep(struct rte_keepalive *keepcfg);
+
+/**
+ * Registers a 'live core' callback.
+ *
+ * The complement of the 'dead core' callback. This is called when a
+ * core is known to be alive, and is intended for cases when an app
+ * needs to know 'liveness' beyond just knowing when a core has died.
+ *
+ * @param *keepcfg
+ *   Keepalive structure pointer
+ * @param callback
+ *   Function called upon detection of a dead core.
+ * @param data
+ *   Data pointer to be passed to function callback.
+ */
+void
+rte_keepalive_register_alive_callback(struct rte_keepalive *keepcfg,
+   rte_keepalive_alive_callback_t callback,
+   void *data);
+
 #endif /* _KEEPALIVE_H_ */
diff --git a/lib/librte_eal/common/rte_keepalive.c 
b/lib/librte_eal/common/rte_keepalive.c
index 23363ec..7af3558 100644
--- a/lib/librte_eal/common/rte_keepalive.c
+++ b/lib/librte_eal/common/rte_keepalive.c
@@ -46,7 +46,8 @@ struct rte_keepalive {
ALIVE = 1,
MISSING = 0,
DEAD = 2,
-   GONE = 3
+   GONE = 3,
+   SLEEP = 4
} __rte_cache_aligned state_flags[RTE_KEEPALIVE_MAXCORES];

/** Last-seen-alive timestamps */
@@ -68,6 +69,15 @@ struct rte_keepalive {
void *callback_data;
uint64_t tsc_initial;
uint64_t tsc_mhz;
+
+   /** Live core handler. */
+   rte_keepalive_failure_callback_t alive_callback;
+
+   /**
+* Live core handler app data.
+* Pointer is passed to live core handler.
+*/
+   void *alive_callback_data;
 };

 static void
@@ -95,6 +105,11 @@ rte_keepalive_dispatch_pings(__rte_unused void *ptr_timer,
case ALIVE: /* Alive */
keepcfg->state_flags[idx_core] = MISSING;
keepcfg->last_alive[idx_core] = rte_rdtsc();
+   if (keepcfg->alive_callback)
+   keepcfg->alive_callback(
+   keepcfg->alive_callback_data,
+   idx_core
+   );
break;
case MISSING: /* MIA */
print_trace("Core MIA. ", keepcfg, idx_core);
@@ -111,6 +126,8 @@ rte_keepalive_dispatch_pings(__rte_unused void *ptr_timer,
break;
case GONE: /* Buried */
break;
+   case SLEEP: /* Idled core */
+   break;
}
}
 }
@@ -133,11 +150,19 @@ rte_keepalive_create(rte_keepalive_failure_callback_t 
callback,
return keepcfg;
 }

+void rte_keepalive_register_alive_callback(struct rte_keepalive *keepcfg,
+   rte_keepalive_failure_callback_t callback,
+   void *data)
+{
+   keepcfg->alive_callback = callback;
+   keepcfg->alive_callback_data = data;
+}
+
 void
 rte_keepalive_register_core(struct rte_keepalive *keepcfg, const int id_core)
 {
if (id_core < RTE_KEEPALIVE_MAXCORES) {
-   keepcfg->active_cores[id_core] = 1;
+   keepcfg->active_cores[id_core] = ALIVE;

[dpdk-dev] [PATCH v1 0/3] Keep-alive enhancements

2016-04-29 Thread Remy Horton
This patchset adds enhancements to the keepalive core monitoring and
reporting sub-system. The first is support for idled (sleeping and
frequency-stepped) CPU cores, and the second is support for applications
to be notified of active as well as faulted cores. The latter is to allow
core state to be relayed to external (secondary) processes, which is
demonstrated by changes to the l2fed-keepalive example.

Remy Horton (3):
  eal: add new keepalive state & callback hook
  examples/l2fwd-keepalive: add IPC liveness reporting
  doc: add keepalive enhancement documentation

 doc/guides/rel_notes/release_16_07.rst  |   5 +
 examples/Makefile   |   1 +
 examples/l2fwd-keepalive/Makefile   |   4 +-
 examples/l2fwd-keepalive/ka-agent/Makefile  |  51 ++
 examples/l2fwd-keepalive/ka-agent/main.c| 128 +++
 examples/l2fwd-keepalive/main.c |  22 +++-
 examples/l2fwd-keepalive/shm.c  | 130 
 examples/l2fwd-keepalive/shm.h  | 102 +++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   7 ++
 lib/librte_eal/common/include/rte_keepalive.h   |  40 
 lib/librte_eal/common/rte_keepalive.c   |  35 ++-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   7 ++
 12 files changed, 526 insertions(+), 6 deletions(-)
 create mode 100644 examples/l2fwd-keepalive/ka-agent/Makefile
 create mode 100644 examples/l2fwd-keepalive/ka-agent/main.c
 create mode 100644 examples/l2fwd-keepalive/shm.c
 create mode 100644 examples/l2fwd-keepalive/shm.h

-- 
2.5.5



[dpdk-dev] [PATCH v4 0/8] virtio support for container

2016-04-29 Thread Tan, Jianfeng
Sorry, forget to mention, this patchset depends on:
  - [PATCH v2] virtio: fix modify drv_flags for specific device
  - [PATCH v3 0/2] virtio: fix memory leak of virtqueue memzones

Thanks,
Jianfeng


> -Original Message-
> From: Tan, Jianfeng
> Sent: Friday, April 29, 2016 9:18 AM
> To: dev at dpdk.org
> Cc: Tan, Jianfeng; Xie, Huawei; rich.lane at bigswitch.com;
> yuanhan.liu at linux.intel.com; mst at redhat.com;
> nakajima.yoshihiro at lab.ntt.co.jp; p.fedin at samsung.com; Qiu, Michael;
> ann.zhuangyanying at huawei.com; mukawa at igel.co.jp;
> nhorman at tuxdriver.com
> Subject: [PATCH v4 0/8] virtio support for container
> 
> v4:
>  - Avoid using dev_type, instead use (eth_dev->pci_device is NULL) to
>judge if it's virtual device or physical device.
>  - Change the added device name to virtio-user.
>  - Split into vhost_user.c, vhost_kernel.c, vhost.c, virtio_user_pci.c,
>virtio_user_dev.c.
>  - Move virtio-user specific data from struct virtio_hw into struct
>virtio_user_hw.
>  - Add support to send reset_owner message.
>  - Change del_queue implementation. (This need more check)
>  - Remove rte_panic(), and superseded with log.
>  - Add reset_owner into virtio_pci_ops.reset.
>  - Merge parameter "rx" and "tx" to "queues" to emliminate confusion.
>  - Move get_features to after set_owner.
>  - Redefine path in virtio_user_hw from char * to char [].
> 
> v3:
>  - Remove --single-file option; do no change at EAL memory.
>  - Remove the added API rte_eal_get_backfile_info(), instead we check all
>opened files with HUGEFILE_FMT to find hugepage files owned by DPDK.
>  - Accordingly, add more restrictions at "Known issue" section.
>  - Rename parameter from queue_num to queue_size for confusion.
>  - Rename vhost_embedded.c to rte_eth_virtio_vdev.c.
>  - Move code related to the newly added vdev to rte_eth_virtio_vdev.c, to
>reuse eth_virtio_dev_init(), remove its static declaration.
>  - Implement dev_uninit() for rte_eth_dev_detach().
>  - WARN -> ERR, in vhost_embedded.c
>  - Add more commit message for clarify the model.
> 
> v2:
>  - Rebase on the patchset of virtio 1.0 support.
>  - Fix cannot create non-hugepage memory.
>  - Fix wrong size of memory region when "single-file" is used.
>  - Fix setting of offset in virtqueue to use virtual address.
>  - Fix setting TUNSETVNETHDRSZ in vhost-user's branch.
>  - Add mac option to specify the mac address of this virtual device.
>  - Update doc.
> 
> This patchset is to provide high performance networking interface (virtio)
> for container-based DPDK applications. The way of starting DPDK apps in
> containers with ownership of NIC devices exclusively is beyond the scope.
> The basic idea here is to present a new virtual device (named virtio-user),
> which can be discovered and initialized by DPDK. To minimize the change,
> we reuse already-existing virtio PMD code (driver/net/virtio/).
> 
> Background: Previously, we usually use a virtio device in the context of
> QEMU/VM as below pic shows. Virtio nic is emulated in QEMU, and usually
> presented in VM as a PCI device.
> 
>   --
>   |  virtio driver |  ->  VM
>   --
> |
> | --> (over PCI bus or MMIO or Channel I/O)
> |
>   --
>   | device emulate |
>   ||  ->  QEMU
>   | vhost adapter  |
>   --
> |
> | --> (vhost-user protocol or vhost-net ioctls)
> |
>   --
>   | vhost backend  |
>   --
> 
> Compared to QEMU/VM case, virtio support for contaner requires to
> embedded
> device framework inside the virtio PMD. So this converged driver actually
> plays three roles:
>   - virtio driver to drive this new kind of virtual device;
>   - device emulation to present this virtual device and reponse to the
> virtio driver, which is originally by QEMU;
>   - and the role to communicate with vhost backend, which is also
> originally by QEMU.
> 
> The code layout and functionality of each module:
> 
>   --
>   | -- |
>   | | virtio driver  | |> (virtio_user_pci.c)
>   | -- |
>   | |  |
>   | -- | -->  virtio-user PMD
>   | | device emulate |-|> (virtio_user_dev.c)
>   | || |
>   | | vhost adapter  |-|> (vhost_user.c, vhost_kernel.c, vhost.c)
>   | -- |
>   --
>  |
>  | -- --> (vhost-user protocol or vhost-net ioctls)
>  |
>--
>| vhost backend  |
>--
> 
> How to share memory? In VM's case, qemu always shares all physical layout
> to backend. But it's not feasible for a container, as a process, to share
> all virtual memory regions to backend. So only specified virtual memory
> regions (with type of shared) are sent to backend. It's a limitation that
> only addresses in 

[dpdk-dev] [PATCH v3 02/13] ixgbe: move pci device ids to driver

2016-04-29 Thread Wu, Jingjing
Hi, David

For the changes on igb, ixgbe, I saw you create a new header file called 
**__pci_dev_ids.h to replace the rte_pci_dev_ids.h for each driver.
But for the changes on i40e, you didn't do that way.
If you look into the base code, you will find for each Intel NIC, the device 
ids are defined there, such as ixgbe_type.h; i40e_devid.h; E1000_hw.h.

I'd prefer the way you did in i40e driver. It's clearer and with minor change.

Thanks
Jingjing

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of David Marchand
> Sent: Wednesday, April 20, 2016 8:44 PM
> To: dev at dpdk.org
> Cc: thomas.monjalon at 6wind.com; stephen at networkplumber.org;
> Richardson, Bruce; nhorman at tuxdriver.com; pmatilai at redhat.com;
> christian.ehrhardt at canonical.com; Zhang, Helin; Ananyev, Konstantin
> Subject: [dpdk-dev] [PATCH v3 02/13] ixgbe: move pci device ids to driver
> 
> test application and kni still want to know ixgbe pci devices.
> So let's create a header in the driver that will be used by them.
> 
> Same comment as for e1000 driver, we can't reuse base/ headers at the
> moment because of macros redefinitions nightmare.
> 
> Signed-off-by: David Marchand 
> ---
>  app/test-pmd/Makefile   |   2 +
>  app/test-pmd/cmdline.c  |   2 +-
>  app/test/Makefile   |   1 +
>  app/test/test_pci.c |   2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c|   4 +-
>  drivers/net/ixgbe/ixgbe_pci_dev_ids.h   | 213
> 
>  lib/librte_eal/common/include/rte_pci_dev_ids.h | 154 -
>  lib/librte_eal/linuxapp/kni/Makefile|   1 +
>  lib/librte_eal/linuxapp/kni/kni_misc.c  |   4 +-
>  9 files changed, 223 insertions(+), 160 deletions(-)  create mode 100644
> drivers/net/ixgbe/ixgbe_pci_dev_ids.h
> 
> diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile index
> 72426f3..a8899b8 100644


[dpdk-dev] [PATCH v4 8/8] doc: update doc for virtio-user

2016-04-29 Thread Jianfeng Tan
Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-By: Neil Horman 
---
 doc/guides/nics/overview.rst   | 64 +-
 doc/guides/rel_notes/release_16_07.rst |  4 +++
 2 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst
index f08039e..92e7468 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -74,40 +74,40 @@ Most of these differences are summarized below.

 .. table:: Features availability in networking drivers

-    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = =
-   Feature  a b b b c e e e i i i i i i i i i i f f f f m m m n n 
p r s v v v v x
-f n n o x 1 n n 4 4 4 4 g g x x x x m m m m l l p f u 
c i z h i i m e
-p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 1 1 x x i p l 
a n e o r r x n
-a 2 2 d b 0   c e e e e   v b b b b 0 0 0 0 4 5 p   l 
p g d s t t n v
-c x x i e 0   . v v   f e e e e k k k k e  
   a t i i e i
-k   v n   . f f   . v v   . v v
   t   o o t r
-e   f g   .   .   . f f   . f f
   a . 3 t
-t v   v   v   v   v   v
   2 v
-  e   e   e   e   e   e
 e
-  c   c   c   c   c   c
 c
-    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = =
+    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = = =
+   Feature  a b b b c e e e i i i i i i i i i i f f f f m m m n n 
p r s v v v v v x
+f n n o x 1 n n 4 4 4 4 g g x x x x m m m m l l p f u 
c i z h i i i m e
+p x x n g 0 a i 0 0 0 0 b b g g g g 1 1 1 1 x x i p l 
a n e o r r r x n
+a 2 2 d b 0   c e e e e   v b b b b 0 0 0 0 4 5 p   l 
p g d s t t t n v
+c x x i e 0   . v v   f e e e e k k k k e  
   a t i i i e i
+k   v n   . f f   . v v   . v v
   t   o o o t r
+e   f g   .   .   . f f   . f f
   a . u 3 t
+t v   v   v   v   v   v
   2 v s
+  e   e   e   e   e   e
 e e
+  c   c   c   c   c   c
 c r
+    = = = = = = = = = = = = = = = = = = = = = = = = = = = 
= = = = = = = = =
Speed capabilities
-   Link statusY Y   Y Y   Y Y Y Y   Y Y Y Y Y Y
   Y Y Y Y
+   Link statusY Y   Y Y   Y Y Y Y   Y Y Y Y Y Y
   Y Y Y Y Y
Link status event  Y Y Y Y Y Y   Y Y Y Y
 Y
Queue status event  
 Y
Rx interrupt   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
-   Queue start/stop Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y   Y Y
+   Queue start/stop Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y   Y Y Y
MTU update   Y Y Y   Y   Y Y Y Y Y Y
Jumbo frame  Y Y Y Y Y Y Y Y Y   Y Y Y Y Y Y Y Y Y Y   Y
-   Scattered Rx Y Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y   Y
+   Scattered Rx Y Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y   Y   Y
LRO  Y Y Y Y
TSO  Y   Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y
-   Promiscuous mode   Y Y   Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y   Y Y
-   Allmulticast modeY Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y   Y Y
-   Unicast MAC filter Y Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y Y
-   Multicast MAC filter   Y Y Y Y Y Y Y Y Y Y Y
   Y Y
+   Promiscuous mode   Y Y   Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y   Y Y Y
+   Allmulticast modeY Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y   Y Y Y
+   Unicast MAC filter Y Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
   Y Y Y
+   Multicast MAC filter   Y Y Y Y Y Y Y Y Y Y Y
   Y Y Y
RSS hash Y   Y Y Y Y Y Y Y   Y Y Y Y Y Y Y Y Y Y
RSS key update   Y   Y Y Y Y Y   Y Y Y Y Y Y Y Y   Y
RSS reta update  Y   Y Y Y Y Y   Y Y Y Y Y Y Y Y   Y
VMDq Y Y Y   Y Y

[dpdk-dev] [PATCH v4 7/8] virtio-user: add a new virtual device named virtio-user

2016-04-29 Thread Jianfeng Tan
Add a new virtual device named vhost-user, which can be used just like
eth_ring, eth_null, etc. To reuse the code of original virtio, we do
some adjustment in virtio_ethdev.c, such as remove key _static_ of
eth_virtio_dev_init() so that it can be reused in virtual device; and
we add some check to make sure it will not crash.

Configured parameters include:
  - queues (optional, 1 by default), number of rx, multi-queue not
supported for now.
  - cq (optional, 0 by default), not supported for now.
  - mac (optional), random value will be given if not specified.
  - queue_size (optional, 256 by default), size of virtqueues.
  - path (madatory), path of vhost, depends on the file type, vhost
user if the given path points to a unix socket; vhost-net if the
given path points to a char device.
  - ifname (optional), specify the name of backend tap device; only
valid when backend is vhost-net.

When enable CONFIG_RTE_VIRTIO_VDEV (enabled by default), the compiled
library can be used in both VM and container environment.

Examples:
path_vhost=/dev/vhost-net # use vhost-net as a backend
path_vhost= # use vhost-user as a backend

sudo ./examples/l2fwd/build/l2fwd -c 0x10 -n 4 \
--socket-mem 0,1024 --no-pci --file-prefix=l2fwd \
--vdev=virtio-user0,mac=00:01:02:03:04:05,path=$path_vhost -- -p 0x1

Known issues:
 - Control queue and multi-queue are not supported yet.
 - Cannot work with --huge-unlink.
 - Cannot work with no-huge.
 - Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
   hugepages.
 - Root privilege is a must (mainly becase of sorting hugepages according
   to physical address).
 - Applications should not use file name like HUGEFILE_FMT ("%smap_%d").

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-By: Neil Horman 
---
 drivers/net/virtio/virtio_ethdev.c   |  19 +-
 drivers/net/virtio/virtio_ethdev.h   |   2 +
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 307 +++
 3 files changed, 321 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 16b324d..54462a3 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -59,7 +59,6 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"

-static int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
 static int eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev);
 static int  virtio_dev_configure(struct rte_eth_dev *dev);
 static int  virtio_dev_start(struct rte_eth_dev *dev);
@@ -1017,7 +1016,7 @@ rx_func_get(struct rte_eth_dev *eth_dev)
  * This function is based on probe() function in virtio_pci.c
  * It returns 0 on success.
  */
-static int
+int
 eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 {
struct virtio_hw *hw = eth_dev->data->dev_private;
@@ -1048,9 +1047,11 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

pci_dev = eth_dev->pci_dev;

-   ret = vtpci_init(pci_dev, hw, _flags);
-   if (ret)
-   return ret;
+   if (pci_dev) {
+   ret = vtpci_init(pci_dev, hw, _flags);
+   if (ret)
+   return ret;
+   }

/* Reset the device although not necessary at startup */
vtpci_reset(hw);
@@ -1147,7 +1148,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

PMD_INIT_LOG(DEBUG, "hw->max_rx_queues=%d   hw->max_tx_queues=%d",
hw->max_rx_queues, hw->max_tx_queues);
-   PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
+   if (pci_dev)
+   PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
eth_dev->data->port_id, pci_dev->id.vendor_id,
pci_dev->id.device_id);

@@ -1426,7 +1428,10 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
 {
struct virtio_hw *hw = dev->data->dev_private;

-   dev_info->driver_name = dev->driver->pci_drv.name;
+   if (dev->pci_dev)
+   dev_info->driver_name = dev->driver->pci_drv.name;
+   else
+   dev_info->driver_name = "virtio-user PMD";
dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
dev_info->min_rx_bufsize = VIRTIO_MIN_RX_BUFSIZE;
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 66423a0..284afaa 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -113,6 +113,8 @@ uint16_t virtio_recv_pkts_vec(void *rx_queue, struct 
rte_mbuf **rx_pkts,
 uint16_t virtio_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);

+int eth_virtio_dev_init(struct rte_eth_dev *eth_dev);
+
 /*
  * The VIRTIO_NET_F_GUEST_TSO[46] features permit the host to send us
  * frames larger than 1514 bytes. We do not yet support software LRO
diff --git 

[dpdk-dev] [PATCH v4 6/8] virtio-user: add new virtual pci driver for virtio

2016-04-29 Thread Jianfeng Tan
This patch implements another new instance of struct virtio_pci_ops to
drive the virtio-user virtual device. Instead of rd/wr ioport or PCI
configuration space, this virtual pci driver will rd/wr the virtual
device struct virtio_user_hw, and when necessary, invokes APIs provided
by device emulation later to start/stop the device.

  --
  | -- |
  | | virtio driver  | |> (virtio_user_pci.c)
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate | |
  | || |
  | | vhost adapter  | |
  | -- |
  --
|
|
|
   --
   | vhost backend  |
   --

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-By: Neil Horman 
---
 drivers/net/virtio/Makefile  |   1 +
 drivers/net/virtio/virtio_user/virtio_user_dev.h |   2 +
 drivers/net/virtio/virtio_user/virtio_user_pci.c | 209 +++
 3 files changed, 212 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_pci.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 68068bd..13b2b75 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -60,6 +60,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_pci.c
 endif

 # this lib depends upon:
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h 
b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 76250f0..bc4dc1a 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -56,4 +56,6 @@ struct virtio_user_hw {
 int virtio_user_start_device(struct virtio_user_hw *hw);
 int virtio_user_stop_device(struct virtio_user_hw *hw);

+const struct virtio_pci_ops vdev_ops;
+
 #endif
diff --git a/drivers/net/virtio/virtio_user/virtio_user_pci.c 
b/drivers/net/virtio/virtio_user/virtio_user_pci.c
new file mode 100644
index 000..60351d9
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/virtio_user_pci.c
@@ -0,0 +1,209 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "../virtio_logs.h"
+#include "../virtio_pci.h"
+#include "../virtqueue.h"
+#include "virtio_user_dev.h"
+
+static void
+vdev_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+void *dst, int length)
+{
+   int i;
+   struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+   if (offset == offsetof(struct virtio_net_config, mac) &&
+   length == ETHER_ADDR_LEN) {
+   for (i = 0; i < ETHER_ADDR_LEN; ++i)
+   ((uint8_t *)dst)[i] = uhw->mac_addr[i];
+   return;
+   }
+
+   if (offset == offsetof(struct virtio_net_config, status))
+   *(uint16_t *)dst = uhw->status;
+
+   if (offset == offsetof(struct virtio_net_config, max_virtqueue_pairs))
+   *(uint16_t *)dst = 

[dpdk-dev] [PATCH v4 5/8] virtio-user: add device emulation layer APIs

2016-04-29 Thread Jianfeng Tan
Two device emulation layer APIs are added for virtio driver to call:
  - virtio_user_start_device()
  - virtio_user_stop_device()

These APIs will get called by virtio driver, and they call vhost adapter
layer APIs to implement the functionality. Besides, this patch defines
a struct named virtio_user_hw to help manage the data stands for this
kind of virtual device.

  --
  | -- |
  | | virtio driver  | |
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate |-|> (virtio_user_dev.c, virtio_user_dev.h)
  | || |
  | | vhost adapter  | |
  | -- |
  --
|
|
|
   --
   | vhost backend  |
   --

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-By: Neil Horman 
---
 drivers/net/virtio/Makefile  |   1 +
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 168 +++
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  59 
 3 files changed, 228 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_dev.c
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_dev.h

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index c9f2bc0..68068bd 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -59,6 +59,7 @@ ifeq ($(CONFIG_RTE_VIRTIO_VDEV),y)
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
 endif

 # this lib depends upon:
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
new file mode 100644
index 000..81f7f03
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -0,0 +1,168 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "vhost.h"
+#include "virtio_user_dev.h"
+#include "../virtio_ethdev.h"
+
+static int
+kick_one_vq(struct virtio_user_hw *hw, struct virtqueue *vq,
+   unsigned queue_sel)
+{
+   int callfd, kickfd;
+   struct vhost_vring_file file;
+   struct vhost_vring_state state;
+   struct vhost_vring_addr addr = {
+   .index = queue_sel,
+   .desc_user_addr = (uint64_t)(uintptr_t)vq->vq_ring.desc,
+   .avail_user_addr = (uint64_t)(uintptr_t)vq->vq_ring.avail,
+   .used_user_addr = (uint64_t)(uintptr_t)vq->vq_ring.used,
+   .log_guest_addr = 0,
+   .flags = 0, /* disable log */
+   };
+
+   /* May use invalid flag, but some backend leverages kickfd and callfd as
+* criteria to judge if dev is alive. so finally we use real event_fd.
+*/
+   callfd = eventfd(0, O_CLOEXEC | O_NONBLOCK);
+   if (callfd < 0) {
+   PMD_DRV_LOG(ERR, "callfd error, %s\n", strerror(errno));
+   return -1;
+   }
+   kickfd = eventfd(0, O_CLOEXEC | 

[dpdk-dev] [PATCH v4 4/8] virtio-user: add vhost adapter layer

2016-04-29 Thread Jianfeng Tan
This patch is to provide vhost adapter layer implementations. Instead
of relying on a hypervisor to translate between device emulation and
vhost backend, here we directly talk with vhost backend through the
vhost file. Depending on the type of vhost file,
  - vhost-user is used if the given path points to a unix socket;
  - vhost-kernel is used if the given path points to a char device.

Here three main APIs are provided to upper layer (device emulation):
  - vhost_user_setup(), to set up env to talk to a vhost user backend;
  - vhost_kernel_setup(), to set up env to talk to a vhost kernel backend.
  - vhost_call(), to provide a unified interface to communicate with
vhost backend.

  --
  | -- |
  | | virtio driver  | |
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate | |
  | || |
  | | vhost adapter  |-|> (vhost_user.c, vhost_kernel.c, vhost.c)
  | -- |
  --
|
| -- --> (vhost-user protocol or vhost-net ioctls)
|
   --
   | vhost backend  |
   --

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-By: Neil Horman 
---
 config/common_linuxapp|   3 +
 drivers/net/virtio/Makefile   |   6 +
 drivers/net/virtio/virtio_pci.h   |   1 +
 drivers/net/virtio/virtio_user/vhost.c| 105 
 drivers/net/virtio/virtio_user/vhost.h| 221 +++
 drivers/net/virtio/virtio_user/vhost_kernel.c | 254 +
 drivers/net/virtio/virtio_user/vhost_user.c   | 375 ++
 7 files changed, 965 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user/vhost.c
 create mode 100644 drivers/net/virtio/virtio_user/vhost.h
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c
 create mode 100644 drivers/net/virtio/virtio_user/vhost_user.c

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7e698e2..946a6d4 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -43,3 +43,6 @@ CONFIG_RTE_LIBRTE_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
+
+# Enable virtio-user
+CONFIG_RTE_VIRTIO_VDEV=y
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index ef84f60..c9f2bc0 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -55,6 +55,12 @@ ifeq ($(findstring 
RTE_MACHINE_CPUFLAG_SSSE3,$(CFLAGS)),RTE_MACHINE_CPUFLAG_SSSE
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
 endif

+ifeq ($(CONFIG_RTE_VIRTIO_VDEV),y)
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
+endif
+
 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_mempool lib/librte_mbuf
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index a76daf7..b9f1ee5 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -260,6 +260,7 @@ struct virtio_hw {
struct virtio_pci_common_cfg *common_cfg;
struct virtio_net_config *dev_cfg;
const struct virtio_pci_ops *vtpci_ops;
+   void *vdev_private;
 };

 /*
diff --git a/drivers/net/virtio/virtio_user/vhost.c 
b/drivers/net/virtio/virtio_user/vhost.c
new file mode 100644
index 000..ff76658
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/vhost.c
@@ -0,0 +1,105 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 

[dpdk-dev] [PATCH v4 3/8] virtio: enable use virtual address to fill desc

2016-04-29 Thread Jianfeng Tan
This patch is related to how to calculate relative address for vhost
backend.

The principle is that: based on one or multiple shared memory segments,
vhost maintains a reference system with the base addresses and length
for each segment so that an address from VM comes (usually GPA, Guest
Physical Address) can be translated into vhost-recognizable address
(named VVA, Vhost Virtual Address). In VM's case, GPA is always locally
continuous. But for some other case, like virtio-user, virtual address
can be used.

It basically means:
  a. when set_base_addr, VA address is used;
  b. when preparing RX's descriptors, VA address is used;
  c. when transmitting packets, VA is filled in TX's descriptors;
  d. in TX and CQ's header, VA is used.

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-By: Neil Horman 
---
 drivers/net/virtio/virtio_ethdev.c  | 11 ---
 drivers/net/virtio/virtio_rxtx.c|  5 ++---
 drivers/net/virtio/virtio_rxtx_simple.c | 13 +++--
 drivers/net/virtio/virtqueue.h  | 13 -
 4 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 0c20fb9..16b324d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -167,14 +167,14 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
 * One RX packet for ACK.
 */
vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
-   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mem;
vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
vq->vq_free_cnt--;
i = vq->vq_ring.desc[head].next;

for (k = 0; k < pkt_num; k++) {
vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
-   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mem
+ sizeof(struct virtio_net_ctrl_hdr)
+ sizeof(ctrl->status) + sizeof(uint8_t)*sum;
vq->vq_ring.desc[i].len = dlen[k];
@@ -184,7 +184,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
}

vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
-   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mem
+ sizeof(struct virtio_net_ctrl_hdr);
vq->vq_ring.desc[i].len = sizeof(ctrl->status);
vq->vq_free_cnt--;
@@ -426,6 +426,11 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

+   /* Use physical address to fill desc.addr by default,
+* and will be changed to use virtual address for vdev.
+*/
+   vq->offset = offsetof(struct rte_mbuf, buf_physaddr);
+
if (hw->vtpci_ops->setup_queue(hw, vq) < 0) {
PMD_INIT_LOG(ERR, "setup_queue failed");
virtio_dev_queue_release(vq);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index ef21d8e..9d7e537 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -193,8 +193,7 @@ virtqueue_enqueue_recv_refill(struct virtqueue *vq, struct 
rte_mbuf *cookie)

start_dp = vq->vq_ring.desc;
start_dp[idx].addr =
-   (uint64_t)(cookie->buf_physaddr + RTE_PKTMBUF_HEADROOM
-   - hw->vtnet_hdr_size);
+   MBUF_DATA_DMA_ADDR(cookie, vq->offset) - hw->vtnet_hdr_size;
start_dp[idx].len =
cookie->buf_len - RTE_PKTMBUF_HEADROOM + hw->vtnet_hdr_size;
start_dp[idx].flags =  VRING_DESC_F_WRITE;
@@ -265,7 +264,7 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie,
}

do {
-   start_dp[idx].addr  = rte_mbuf_data_dma_addr(cookie);
+   start_dp[idx].addr  = MBUF_DATA_DMA_ADDR(cookie, txvq->offset);
start_dp[idx].len   = cookie->data_len;
start_dp[idx].flags = cookie->next ? VRING_DESC_F_NEXT : 0;
idx = start_dp[idx].next;
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
index 8f5293d..83a794e 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -80,8 +80,8 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,
vq->sw_ring[desc_idx] = cookie;

start_dp = vq->vq_ring.desc;
-   start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
-   RTE_PKTMBUF_HEADROOM - vq->hw->vtnet_hdr_size);
+   start_dp[desc_idx].addr = MBUF_DATA_DMA_ADDR(cookie, vq->offset) -
+ vq->hw->vtnet_hdr_size;
start_dp[desc_idx].len = cookie->buf_len -

[dpdk-dev] [PATCH v4 2/8] virtio: abstract vring hdr desc init as a method

2016-04-29 Thread Jianfeng Tan
To make it reusable, here we abstract the initialization of vring
header into a method.

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-By: Neil Horman 
---
 drivers/net/virtio/virtio_ethdev.c | 22 --
 drivers/net/virtio/virtqueue.h | 20 
 2 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 534f0e6..0c20fb9 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -380,8 +380,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

if (queue_type == VTNET_TQ) {
const struct rte_memzone *hdr_mz;
-   struct virtio_tx_region *txr;
-   unsigned int i;
+   size_t hdr_mz_sz = vq_size * sizeof(struct virtio_tx_region);

/*
 * For each xmit packet, allocate a virtio_net_hdr
@@ -390,7 +389,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
 dev->data->port_id, queue_idx);
hdr_mz = rte_memzone_reserve_aligned(vq_name,
-vq_size * sizeof(*txr),
+hdr_mz_sz,
 socket_id, 0,
 RTE_CACHE_LINE_SIZE);
if (hdr_mz == NULL) {
@@ -404,21 +403,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mz = hdr_mz;
vq->virtio_net_hdr_mem = hdr_mz->phys_addr;

-   txr = hdr_mz->addr;
-   memset(txr, 0, vq_size * sizeof(*txr));
-   for (i = 0; i < vq_size; i++) {
-   struct vring_desc *start_dp = txr[i].tx_indir;
-
-   vring_desc_init(start_dp, RTE_DIM(txr[i].tx_indir));
-
-   /* first indirect descriptor is always the tx header */
-   start_dp->addr = vq->virtio_net_hdr_mem
-   + i * sizeof(*txr)
-   + offsetof(struct virtio_tx_region, tx_hdr);
-
-   start_dp->len = vq->hw->vtnet_hdr_size;
-   start_dp->flags = VRING_DESC_F_NEXT;
-   }
+   memset(hdr_mz->addr, 0, hdr_mz_sz);
+   vring_hdr_desc_init(vq);

} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 83d89ca..3b19fd1 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -264,6 +264,26 @@ vring_desc_init(struct vring_desc *dp, uint16_t n)
dp[i].next = VQ_RING_DESC_CHAIN_END;
 }

+static inline void
+vring_hdr_desc_init(struct virtqueue *vq)
+{
+   int i;
+   struct virtio_tx_region *txr = vq->virtio_net_hdr_mz->addr;
+
+   for (i = 0; i < vq->vq_nentries; i++) {
+   struct vring_desc *start_dp = txr[i].tx_indir;
+
+   vring_desc_init(start_dp, RTE_DIM(txr[i].tx_indir));
+
+   /* first indirect descriptor is always the tx header */
+   start_dp->addr = vq->virtio_net_hdr_mem + i * sizeof(*txr) +
+offsetof(struct virtio_tx_region, tx_hdr);
+
+   start_dp->len = vq->hw->vtnet_hdr_size;
+   start_dp->flags = VRING_DESC_F_NEXT;
+   }
+}
+
 /**
  * Tell the backend not to interrupt us.
  */
-- 
2.1.4



[dpdk-dev] [PATCH v4 1/8] virtio: hide phys addr check inside pci ops

2016-04-29 Thread Jianfeng Tan
This patch is to move phys addr check from virtio_dev_queue_setup
to pci ops. To makt that happen, make sure virtio_ops.setup_queue
return the result if we pass through the check.

Signed-off-by: Huawei Xie 
Signed-off-by: Jianfeng Tan 
Acked-By: Neil Horman 
---
 drivers/net/virtio/virtio_ethdev.c | 17 +
 drivers/net/virtio/virtio_pci.c| 30 --
 drivers/net/virtio/virtio_pci.h|  2 +-
 3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index bd990ff..534f0e6 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -369,17 +369,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
}
}

-   /*
-* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
-* and only accepts 32 bit page frame number.
-* Check if the allocated physical memory exceeds 16TB.
-*/
-   if ((mz->phys_addr + vq->vq_ring_size - 1) >> 
(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
-   PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
-   virtio_dev_queue_release(vq);
-   return -ENOMEM;
-   }
-
memset(mz->addr, 0, sizeof(mz->len));
vq->mz = mz;
vq->vq_ring_mem = mz->phys_addr;
@@ -451,7 +440,11 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

-   hw->vtpci_ops->setup_queue(hw, vq);
+   if (hw->vtpci_ops->setup_queue(hw, vq) < 0) {
+   PMD_INIT_LOG(ERR, "setup_queue failed");
+   virtio_dev_queue_release(vq);
+   return -EINVAL;
+   }

vq->started = 1;
*pvq = vq;
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 9cdca06..6bd239c 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -55,6 +55,22 @@
  */
 #define VIRTIO_PCI_CONFIG(hw) (((hw)->use_msix) ? 24 : 20)

+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+   /* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+* and only accepts 32 bit page frame number.
+* Check if the allocated physical memory exceeds 16TB.
+*/
+   if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+   (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+   PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+   return 0;
+   }
+
+   return 1;
+}
+
 static void
 legacy_read_dev_config(struct virtio_hw *hw, size_t offset,
   void *dst, int length)
@@ -143,15 +159,20 @@ legacy_get_queue_num(struct virtio_hw *hw, uint16_t 
queue_id)
return dst;
 }

-static void
+static int
 legacy_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
 {
uint32_t src;

+   if (!check_vq_phys_addr_ok(vq))
+   return -1;
+
rte_eal_pci_ioport_write(>io, >vq_queue_index, 2,
 VIRTIO_PCI_QUEUE_SEL);
src = vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
rte_eal_pci_ioport_write(>io, , 4, VIRTIO_PCI_QUEUE_PFN);
+
+   return 0;
 }

 static void
@@ -367,12 +388,15 @@ modern_get_queue_num(struct virtio_hw *hw, uint16_t 
queue_id)
return io_read16(>common_cfg->queue_size);
 }

-static void
+static int
 modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
 {
uint64_t desc_addr, avail_addr, used_addr;
uint16_t notify_off;

+   if (!check_vq_phys_addr_ok(vq))
+   return -1;
+
desc_addr = vq->mz->phys_addr;
avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
@@ -400,6 +424,8 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue 
*vq)
PMD_INIT_LOG(DEBUG, "\t used_addr: %" PRIx64, used_addr);
PMD_INIT_LOG(DEBUG, "\t notify addr: %p (notify offset: %u)",
vq->notify_addr, notify_off);
+
+   return 0;
 }

 static void
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 554efea..a76daf7 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -234,7 +234,7 @@ struct virtio_pci_ops {
uint16_t (*set_config_irq)(struct virtio_hw *hw, uint16_t vec);

uint16_t (*get_queue_num)(struct virtio_hw *hw, uint16_t queue_id);
-   void (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq);
+   int (*setup_queue)(struct virtio_hw *hw, struct virtqueue *vq);
void (*del_queue)(struct virtio_hw *hw, struct virtqueue *vq);
void (*notify_queue)(struct virtio_hw *hw, struct virtqueue *vq);
 };
-- 
2.1.4



[dpdk-dev] [PATCH v4 0/8] virtio support for container

2016-04-29 Thread Jianfeng Tan
v4:
 - Avoid using dev_type, instead use (eth_dev->pci_device is NULL) to
   judge if it's virtual device or physical device.
 - Change the added device name to virtio-user.
 - Split into vhost_user.c, vhost_kernel.c, vhost.c, virtio_user_pci.c,
   virtio_user_dev.c.
 - Move virtio-user specific data from struct virtio_hw into struct
   virtio_user_hw.
 - Add support to send reset_owner message.
 - Change del_queue implementation. (This need more check)
 - Remove rte_panic(), and superseded with log.
 - Add reset_owner into virtio_pci_ops.reset.
 - Merge parameter "rx" and "tx" to "queues" to emliminate confusion.
 - Move get_features to after set_owner.
 - Redefine path in virtio_user_hw from char * to char [].

v3:
 - Remove --single-file option; do no change at EAL memory.
 - Remove the added API rte_eal_get_backfile_info(), instead we check all
   opened files with HUGEFILE_FMT to find hugepage files owned by DPDK.
 - Accordingly, add more restrictions at "Known issue" section.
 - Rename parameter from queue_num to queue_size for confusion.
 - Rename vhost_embedded.c to rte_eth_virtio_vdev.c.
 - Move code related to the newly added vdev to rte_eth_virtio_vdev.c, to
   reuse eth_virtio_dev_init(), remove its static declaration.
 - Implement dev_uninit() for rte_eth_dev_detach().
 - WARN -> ERR, in vhost_embedded.c
 - Add more commit message for clarify the model.

v2:
 - Rebase on the patchset of virtio 1.0 support.
 - Fix cannot create non-hugepage memory.
 - Fix wrong size of memory region when "single-file" is used.
 - Fix setting of offset in virtqueue to use virtual address.
 - Fix setting TUNSETVNETHDRSZ in vhost-user's branch.
 - Add mac option to specify the mac address of this virtual device.
 - Update doc.

This patchset is to provide high performance networking interface (virtio)
for container-based DPDK applications. The way of starting DPDK apps in
containers with ownership of NIC devices exclusively is beyond the scope.
The basic idea here is to present a new virtual device (named virtio-user),
which can be discovered and initialized by DPDK. To minimize the change,
we reuse already-existing virtio PMD code (driver/net/virtio/).

Background: Previously, we usually use a virtio device in the context of
QEMU/VM as below pic shows. Virtio nic is emulated in QEMU, and usually
presented in VM as a PCI device.

  --
  |  virtio driver |  ->  VM
  --
|
| --> (over PCI bus or MMIO or Channel I/O)
|
  --
  | device emulate |
  ||  ->  QEMU
  | vhost adapter  |
  --
|
| --> (vhost-user protocol or vhost-net ioctls)
|
  --
  | vhost backend  |
  --

Compared to QEMU/VM case, virtio support for contaner requires to embedded
device framework inside the virtio PMD. So this converged driver actually
plays three roles:
  - virtio driver to drive this new kind of virtual device;
  - device emulation to present this virtual device and reponse to the
virtio driver, which is originally by QEMU;
  - and the role to communicate with vhost backend, which is also
originally by QEMU.

The code layout and functionality of each module:

  --
  | -- |
  | | virtio driver  | |> (virtio_user_pci.c)
  | -- |
  | |  |
  | -- | -->  virtio-user PMD
  | | device emulate |-|> (virtio_user_dev.c)
  | || |
  | | vhost adapter  |-|> (vhost_user.c, vhost_kernel.c, vhost.c)
  | -- |
  --
 |
 | -- --> (vhost-user protocol or vhost-net ioctls)
 |
   --
   | vhost backend  |
   --

How to share memory? In VM's case, qemu always shares all physical layout
to backend. But it's not feasible for a container, as a process, to share
all virtual memory regions to backend. So only specified virtual memory
regions (with type of shared) are sent to backend. It's a limitation that
only addresses in these areas can be used to transmit or receive packets.

Known issues:
 - Control queue and multi-queue are not supported yet.
 - Cannot work with --huge-unlink.
 - Cannot work with no-huge.
 - Cannot work when there are more than VHOST_MEMORY_MAX_NREGIONS(8)
   hugepages.
 - Root privilege is a must (mainly becase of sorting hugepages according
   to physical address).
 - Applications should not use file name like HUGEFILE_FMT ("%smap_%d").

How to use?

a. Apply this patchset.

b. To compile container apps:
$: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc
$: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc

c. To build a docker image using Dockerfile below.
$: cat ./Dockerfile
FROM 

[dpdk-dev] [PATCH v3 2/2] virtio: fix memory leak of virtqueue memzones

2016-04-29 Thread Jianfeng Tan
Issue: When virtio was proposed in DPDK, there is no API to free memzones.
But this has changed since rte_memzone_free() has been implemented by
commit ff909fe21f0a ("mem: introduce memzone freeing").

This patch is to make sure memzones in struct virtqueue, like mz and
virtio_net_hdr_mz, are freed when queue is released or setup fails.

Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_ethdev.c | 21 ++---
 drivers/net/virtio/virtqueue.h |  2 ++
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b3f4158..bd990ff 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -260,12 +260,18 @@ virtio_set_multiple_queues(struct rte_eth_dev *dev, 
uint16_t nb_queues)
 }

 void
-virtio_dev_queue_release(struct virtqueue *vq) {
+virtio_dev_queue_release(struct virtqueue *vq)
+{
struct virtio_hw *hw;

if (vq) {
hw = vq->hw;
-   hw->vtpci_ops->del_queue(hw, vq);
+   if (vq->started)
+   hw->vtpci_ops->del_queue(hw, vq);
+
+   rte_memzone_free(vq->mz);
+   if (vq->virtio_net_hdr_mz)
+   rte_memzone_free(vq->virtio_net_hdr_mz);

rte_free(vq->sw_ring);
rte_free(vq);
@@ -330,7 +336,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 socket_id);
if (!vq->sw_ring) {
PMD_INIT_LOG(ERR, "Can not allocate RX soft ring");
-   rte_free(vq);
+   virtio_dev_queue_release(vq);
return -ENOMEM;
}
}
@@ -358,7 +364,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
if (rte_errno == EEXIST)
mz = rte_memzone_lookup(vq_name);
if (mz == NULL) {
-   rte_free(vq);
+   virtio_dev_queue_release(vq);
return -ENOMEM;
}
}
@@ -370,7 +376,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
 */
if ((mz->phys_addr + vq->vq_ring_size - 1) >> 
(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
-   rte_free(vq);
+   virtio_dev_queue_release(vq);
return -ENOMEM;
}

@@ -402,7 +408,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
if (rte_errno == EEXIST)
hdr_mz = rte_memzone_lookup(vq_name);
if (hdr_mz == NULL) {
-   rte_free(vq);
+   virtio_dev_queue_release(vq);
return -ENOMEM;
}
}
@@ -436,7 +442,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mz =
rte_memzone_lookup(vq_name);
if (vq->virtio_net_hdr_mz == NULL) {
-   rte_free(vq);
+   virtio_dev_queue_release(vq);
return -ENOMEM;
}
}
@@ -447,6 +453,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

hw->vtpci_ops->setup_queue(hw, vq);

+   vq->started = 1;
*pvq = vq;
return 0;
 }
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 4e9239e..83d89ca 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -201,6 +201,8 @@ struct virtqueue {

uint16_t*notify_addr;

+   int started;
+
struct vq_desc_extra {
void  *cookie;
uint16_t  ndescs;
-- 
2.1.4



[dpdk-dev] [PATCH v3 1/2] virtio: cleanup virtio_dev_queue_setup()

2016-04-29 Thread Jianfeng Tan
Signed-off-by: Jianfeng Tan 
---
 drivers/net/virtio/virtio_ethdev.c | 47 +++---
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 1fe90ae..b3f4158 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -285,6 +285,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
unsigned int vq_size, size;
struct virtio_hw *hw = dev->data->dev_private;
struct virtqueue *vq = NULL;
+   const char *queue_names[] = {"rvq", "txq", "cvq"};

PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx);

@@ -304,34 +305,34 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
return -EINVAL;
}

-   if (queue_type == VTNET_RQ) {
-   snprintf(vq_name, sizeof(vq_name), "port%d_rvq%d",
-   dev->data->port_id, queue_idx);
-   vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-   vq_size * sizeof(struct vq_desc_extra), 
RTE_CACHE_LINE_SIZE);
-   vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
-   (RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
-   sizeof(vq->sw_ring[0]), RTE_CACHE_LINE_SIZE, socket_id);
-   } else if (queue_type == VTNET_TQ) {
-   snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d",
-   dev->data->port_id, queue_idx);
-   vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-   vq_size * sizeof(struct vq_desc_extra), 
RTE_CACHE_LINE_SIZE);
-   } else if (queue_type == VTNET_CQ) {
-   snprintf(vq_name, sizeof(vq_name), "port%d_cvq",
-   dev->data->port_id);
-   vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
-   vq_size * sizeof(struct vq_desc_extra),
-   RTE_CACHE_LINE_SIZE);
+   if (queue_type < VTNET_RQ || queue_type > VTNET_CQ) {
+   PMD_INIT_LOG(ERR, "invalid queue type: %d", queue_type);
+   return -EINVAL;
}
+
+   snprintf(vq_name, sizeof(vq_name), "port%d_%s%d",
+dev->data->port_id, queue_names[queue_type], queue_idx);
+   vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
+vq_size * sizeof(struct vq_desc_extra),
+RTE_CACHE_LINE_SIZE);
if (vq == NULL) {
PMD_INIT_LOG(ERR, "Can not allocate virtqueue");
return -ENOMEM;
}
-   if (queue_type == VTNET_RQ && vq->sw_ring == NULL) {
-   PMD_INIT_LOG(ERR, "Can not allocate RX soft ring");
-   rte_free(vq);
-   return -ENOMEM;
+
+   if (queue_type == VTNET_RQ) {
+   size_t sz_sw;
+
+   sz_sw = (RTE_PMD_VIRTIO_RX_MAX_BURST + vq_size) *
+   sizeof(vq->sw_ring[0]);
+   vq->sw_ring = rte_zmalloc_socket("rxq->sw_ring", sz_sw,
+RTE_CACHE_LINE_SIZE,
+socket_id);
+   if (!vq->sw_ring) {
+   PMD_INIT_LOG(ERR, "Can not allocate RX soft ring");
+   rte_free(vq);
+   return -ENOMEM;
+   }
}

vq->hw = hw;
-- 
2.1.4



[dpdk-dev] [PATCH v3 0/2] virtio: fix memory leak of virtqueue memzones

2016-04-29 Thread Jianfeng Tan
Patch 1: Do some cleanup in virtio_dev_queue_setup();
Patch 2: Fix the memory leak bug.

Jianfeng Tan (2):
  v3: Fix a typo in the queue_type check.
  v2: split cleanup and fix into two patches.

  virtio: cleanup virtio_dev_queue_setup()
  virtio: fix memory leak of virtqueue memzones

 drivers/net/virtio/virtio_ethdev.c | 66 +-
 drivers/net/virtio/virtqueue.h |  2 ++
 2 files changed, 39 insertions(+), 29 deletions(-)

-- 
2.1.4