date:20160601

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Yuanhan Liu

On Wed, Jun 01, 2016 at 02:39:28PM +0200, Thomas Monjalon wrote:
> I was thinking to implement the library options parsing in DPDK.
> But if the application implements its own options parsing without using
> the DPDK one, yes the option parsing is obviously done in the application.
> 
> > I'd say, that would work, but I see inflexibility and some drawbacks:
> > 
> > - I would assume "--pciopt" has the input style of
> > 
> >   "domain:bus:devid:func,option1,option2,..."
> > 
> >   It then looks hard to me to use it: I need figure out the
> >   pci id first.
> 
> What do you suggest instead of PCI id?

That might depend on what you need: if you want to configure a specific
device, yes, PCI is needed (or even, a must). In such case, the --pciopt
or the side effect of --pci-whitelist could help. I confess this would
be helpful in some cases.

I guess there is another need is that we might want to pass an option
to a driver, say ixgbe, that would work for all devices operated by it.
In such case, we could use driver name (see the example below).

> > - For the libraries, we have to write code to add new options for
> >   each applictions. With the generic option, user just need use it;
> >   don't need write a single line of code, which could save user's
> >   effort. It also gives user an united interface.
> 
> Applications can leverage the DPDK options parsing (without writing
> any new line of code).
> 
> >   And to make it clear, as stated, I don't object to having an API.
> >   I mean, the generic option gives us a chance to do the right
> >   configuration at startup time: it would still invoke the right
> >   API to do the right thing in the end.
> 
> I agree. I just want to split your --extra-option proposal in few options
> with a bit more context.

Firstly, the name "--extra-option" might not be well taken :(
I just want to show the idea first.

Secondly, splitting it to more options would result to quite many
options introduced; it's also hard to list all of them. User intend
to get lost if so many options provided. It also introduces more
chaos, especially when we are going to add one option for each
library.

For the context issue, I guess we could address it by adding a prefix.
Such as,

--extra-options (or --whatever) "vhost.owner=kvm:kvm virtio.force_legacy
 mempool.handler=xxx"

Based on that, we could introduce other sematics, to let it be:

driver.opt | driver.pci_id.opt

Where,

- driver.opt
  applies to all devices operated by this driver

- driver.pci_id.opt
  applies only to a specific device with the same pci ID.

This could save us changing the --pci-whitelist sematic, yet it saves
us introducing a new option (--pciopt).

What do you think of it?

--yliu

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Thomas Monjalon

Hi Keith,

I'll try to bring more context to this discussion below.

2016-06-01 15:00, Wiles, Keith:
> Started from the link below, but did not want to highjack the thread.
> http://dpdk.org/ml/archives/dev/2016-June/040021.html
> 
> I was thinking about this problem from a user perspective and command
> line options are very difficult to manage specifically when you have
> a large number of options as we have in dpdk.

The user uses an application.
It is up to the application to let users do some configuration.

> I see all of these options as a type of database of information for
> the DPDK and the application, because the application command line
> options are also getting very complex as well.

DPDK is a collection of libraries.
There is no command line options in a library.
So we should not be talking about such issue. But...

... configuration of the DPDK libraries must be improved.
We need some clean API to let the application configure a lot of things
at runtime (during initialization or after).
Ideally the API should not use an argc/argv format.

We also have a lot of applications for tests or examples which use a
common configuration scheme based on command line options.
It is only for test and demonstration purpose. So it is not so important
and must not be complex to maintain.
I also think that we should avoid having to modify a configuration file
for test applications. I like launching a freshly built testpmd with a
copy-pasted command line without having to create a temporary
configuration file.

Instead of wrapping a messy configuration interface, we should proceed
with this steps (in this order):
- implement clean configuration API
- move command line options parsing in a separate library
- implement an alternative to the options parsing library, as an example
- remove the options parsing library if the alternative is better

[dpdk-dev] [vpp-dev] VLAN packets dropped... ?

2016-06-01 Thread John Lo (loj)

We will be adding a patch to the i40e driver that would check the packet for 
presence of VLAN tag set the required PKT_RX_VLAN_PKT flag. I hope to get it 
done by the end of this week.   -John

From: Chandrasekar Kannan [mailto:ckan...@console.to]
Sent: Wednesday, June 01, 2016 2:45 PM
To: John Daley (johndale)
Cc: John Lo (loj); Ananyev, Konstantin; Wiles, Keith; vpp-dev; Zhang, Helin; 
dev at dpdk.org
Subject: Re: [vpp-dev] VLAN packets dropped... ?

Just checking back on this thread to see where things are.  Are we saying this 
is a bug in dpdk or vpp ?. That part wasn't quite clear to me.

-Chandra



On Thu, May 26, 2016 at 3:56 PM, John Daley (johndale) mailto:johndale at cisco.com>> wrote:
John,
As just discussed, what you suggest was considered but there are other apps 
depending a different behavior of the flag, so we thought the best thing to do 
is deprecate it. That is part of what Olivier's patch discussed in 
http://www.dpdk.org/ml/archives/dev/2016-May/038786.html does.  Adding a new 
ptype for VLAN tagged packets after the patch is accepted would allow VPP to 
check if the ptype is supported by the PMD and if so, use it to determine if 
the delivered packet actually has a VLAN tag in it. No need to know if 
stripping is enabled/disabled. If the ptype is not supported,  the app would 
have check the packet in SW.

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On 
> Behalf Of John Lo (loj)
> Sent: Thursday, May 26, 2016 11:11 AM
> To: Ananyev, Konstantin  intel.com>; Wiles, Keith
> mailto:keith.wiles at intel.com>>; Chandrasekar 
> Kannan mailto:ckannan at console.to>>
> Cc: vpp-dev mailto:vpp-dev at lists.fd.io>>; Zhang, 
> Helin mailto:helin.zhang at intel.com>>;
> dev at dpdk.org
> Subject: Re: [dpdk-dev] [vpp-dev] VLAN packets dropped... ?
>
> Hi Konstantin,
>
> Thanks for the link to DPDK discussion wrt this VLAN offload flag
> PKT_RX_VLAN_PKT.  It seems the proposal was to deprecate
> PKT_RX_VLAN_PKT  and introduce two new flags PKT_RX_VLAN_STRIPPED
> and PKT_RX_QINQ_STRIPPED.
>
> It would be a really good idea to keep PKT_RX_VLAN_PKT to indicate at least
> one VLAN tag is present on the packet.  For the case of i40e where its HW
> cannot detect VLAN tag if strip is not enabled, it should be reasonable for 
> the
> i40e DPDK driver software to make a check and set this flag. I would think the
> overhead to check the 1st ethertype field in the MAC header against dot1q
> or dot1ad values in software be pretty minimal.
>
> Regards,
> John
>
>
> -Original Message-
> From: Ananyev, Konstantin [mailto:konstantin.ananyev at 
> intel.com]
> Sent: Wednesday, May 25, 2016 3:23 PM
> To: John Lo (loj); Wiles, Keith; Chandrasekar Kannan
> Cc: vpp-dev; Zhang, Helin; dev at dpdk.org
> Subject: RE: [vpp-dev] VLAN packets dropped... ?
>
>
> > I suppose this has to do with what is expected usage of the
> > PKT_RX_VLAN_PKT offload flag. Is it set only for VLAN packets with the
> > VLAN stripped or should always be set if VLAN is/was present in the
> > received packet. It seems that different DPDK drivers are behaving
> differently which will make it really hard for VPP to take advantage of NIC
> and driver offload capability to provide better performance.
>
> Yes, that's true ixgbe/igb from one side and i40e do raise PKT_RX_VLAN_PKT
> in a different manner.
> There is an attempt to make it unified across all supported devices:
>  http://dpdk.org/dev/patchwork/patch/12938/
>
> Though, I am not sure it will help you with your issue.
> As I understand, for you the desired behaviour is:
> If it is a vlan packet, keep the packet intact (don't strip the vlan) and 
> raise
> PKT_RX_VLAN_PK inside mbuf->ol_flags.
> That's what ixgbe is doing with rte_eth_conf.rxmode.hw_vlan_strip==0.
> Correct?
> As far as I know, i40e HW doesn't provide such ability.
> i40e Receive HW Descriptor can only flag was VLAN tag stripped from the
> packet or not, but if stripping is disabled it wouldn't indicate in any way is
> VLAN tag is present inside the packet or not.
> I am CC-ing it to dpdk.org in case I am missing something 
> here.
> Probably someone knows a way to make it work in that way.
> Konstantin
>
> >
> > If VPP cannot rely on offload flags for VLAN so packets always have to go
> through ethernet-input node, there is a performance cost.
> > For the 10GE case, before the inverse patch of the ixgbe driver, 10GE
> > Rx-vector path removed support of offload flag with DPDK 16.04 and so
> > ethernet-input node is always used. The 10GE IPv4 throughput rate
> > dropped from 6.17MPPSx2 to 4.92MPPSx2 for bidirectional traffic (2
> > streams each with a single IP address as destination) on a single core
> > on my server. Konstantin suggested at that time to use scalar mode which
> does support offload flags properly.

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Yuanhan Liu

On Wed, Jun 01, 2016 at 12:17:50PM +0200, Thomas Monjalon wrote:
> Hi,
> 
> 2016-06-01 14:04, Yuanhan Liu:
> > Hi all,
> > 
> > I guess we (maybe just me :) have stated few times something like
> > "hey, this kind of stuff is good to have, but you are trying to
> > add an EAL CLI option for a specific subsystem/driver, which is
> > wrong".
> 
> Yes
> 
> > One recent example that is still fresh in my mind is the one from
> > Christian [0], that he made a proposal to introduce two new EAL
> > options, --vhost-owner and --vhost-perm, to configure the vhost
> > user socket file permission.
> > 
> > [0]: http://dpdk.org/ml/archives/dev/2016-April/037948.html
> > 
> > Another example is the one I met while enabling virtio 1.0 support.
> > QEMU has the ability to support both virtio 0.95 (legacy) and 1.0
> > (modern) at the same time for one virtio device, therefore, we
> > could either use legacy driver or modern driver to operate the
> > device. However, the current logic is we try with modern driver
> > first, and then legacy driver if it failed. In above case, we will
> > never hit the legacy driver. But sometimes, it's nice to let it
> > force back to the legacy driver, say, for debug or compare purpose.
> > 
> > Apparently, adding a new EAL option like "--force-legacy" looks
> > wrong.
> > 
> > The generic yet elegant solution I just thought of while having
> > lunch is to add a new EAL option, say, --extra-options, where we
> > could specify driver/subsystem specific options. As you see, it's
> > nothing big deal, it just looks like Linux kernel parameters.
> > 
> > Take above two cases as example, it could be:
> > 
> > --extra-options "vhost-owner=kvm:kvm force-legacy"
> 
> I think it's better to have CLI options per device.
> Currently we can pass devargs
>   - to PCI device via --pci-whitelist

Isn't it just for whitelisting a specific PCI device?

>   - to virtual device via --vdev

Yes, --vdev works great here. However, as its name states, it's
just for virtual devices. Say, it will not work for virtio PMD,
the force-legacy option mentioned above.

> I think we just need to refactor these options to have a generic
> --device or keep the options in --vdev and add a new --pciopt
> or something like that.

--pciopt should be able to allow us pass more options to a specific
driver. But what about a library, say vhost?

> And more importantly, these devargs must be set via a new EAL API
> to allow applications do these configurations without building/faking
> some command line arguments.
> 
> To make it clear, applications use API and users use CLI (which call API).

I would agree with that. But that basically works for library; it does
not quite make sense to me to introduce a new API for some a driver
option, such as the force-legacy option for virtio PMD.

So, let me make a summary from reading your email, to make sure I get
you right: for drivers (virtual or physical), we could use --vdev or
--pciopt for passing args, respectively. For the libraries, we should
add a new API, and let the application to introduce some options to
invoke it, to pass the options.

I'd say, that would work, but I see inflexibility and some drawbacks:

- I would assume "--pciopt" has the input style of

  "domain:bus:devid:func,option1,option2,..."

  It then looks hard to me to use it: I need figure out the
  pci id first.

- For the libraries, we have to write code to add new options for
  each applictions. With the generic option, user just need use it;
  don't need write a single line of code, which could save user's
  effort. It also gives user an united interface.

  And to make it clear, as stated, I don't object to having an API.
  I mean, the generic option gives us a chance to do the right
  configuration at startup time: it would still invoke the right
  API to do the right thing in the end.

> > Note that those options could also be delimited by comma.
> > 
> > DPDK EAL then will provide some generic helper functions to get
> > and parse those options, and let the specific driver/subsystem
> > to invoke them to do the actual parse and do the proper action
> > when some option is specified, say, virtio PMD driver will force
> > back to legacy driver when "force-legacy" is given.
> > 
> > Comments? Makes sense to you guys, or something nice to have?
> 
> Thanks for starting the discussion.

Thanks for making comments :)

--yliu

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Arnon Warshavsky

On Wed, Jun 1, 2016 at 7:18 PM, Bruce Richardson  wrote:

> On Wed, Jun 01, 2016 at 10:58:41AM -0500, Jay Rolette wrote:
> > On Wed, Jun 1, 2016 at 10:00 AM, Wiles, Keith 
> wrote:
> >
> > > Started from the link below, but did not want to highjack the thread.
> > > http://dpdk.org/ml/archives/dev/2016-June/040021.html
> > >
> > > I was thinking about this problem from a user perspective and command
> line
> > > options are very difficult to manage specifically when you have a large
> > > number of options as we have in dpdk. I see all of these options as a
> type
> > > of database of information for the DPDK and the application, because
> the
> > > application command line options are also getting very complex as well.
> > >
> > > I have been looking at a number of different options here and the
> > > direction I was thinking was using a file for the options and
> > > configurations with the data in a clean format. It could have been a
> INI
> > > file or JSON or XML, but they all seem to have some problems I do not
> like.
> > > The INI file is too flat and I wanted a hierarchy in the data, the JSON
> > > data is similar and XML is just hard to read. I wanted to be able to
> manage
> > > multiple applications and possible system the DPDK/app runs. The
> problem
> > > with the above formats is they are just data and not easy to make
> decisions
> > > about the system and applications at runtime.
> > >
> >
> > INI format is simplest for users to read, but if you really need
> hierarchy,
> > JSON will do that just fine. Not sure what you mean by "JSON data is
> > similar"...
> >
> >
> I'd be quite concerned if we start needing lots of hierarchies for
> configuration.
>
> I'd really just like to see ini file format used for this because:
> * it's a well understood, simple format
> * very easily human readable and editable
> * lots of support for it in lots of languages
> * hierarchies are possible in it too - just not as easy as in other formats
>   though. [In a previous life I worked with ini files which had address
>   hierarchies 6-levels deep in them. It wasn't hard to work with]
> * it works well with grep since you must have one value per-line
> * it allows comments
> * we already have a DPDK library for parsing them
>
> However, for me the biggest advantage of using something like ini is that
> it
> would force us to keep things simple!
>
> I'd stay away from formats like json or XML that are designed for
> serializing
> entire objects or structures, and look for something that allows us to just
> specify configuration values.
>
> Regards,
> /Bruce
>
>

 +1 for a single cfg file parameter
 +1 for ini simplicity
/Arnon

[dpdk-dev] Suggestions for the dpdk stable tree

2016-06-01 Thread Mcnamara, John

> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Monday, May 23, 2016 3:22 AM
> To: Mcnamara, John 
> Cc: Christian Ehrhardt ; dev
> ; Stephen Hemminger ; Thomas
> Monjalon 
> Subject: Re: [dpdk-dev] Suggestions for the dpdk stable tree
> 
> > We have been looking at identifying a maintainer and validation engineer
> internally to support the effort but haven't be able to finalize that.
> Once we do we will come back to the mailing list with a proposal and a
> request for comments.
> 
> I would nominate myself as the LTS tree maintainer, if it makes sense to
> have one.

Hi Yuanhan,

Thanks for putting your name forward. I think your experience as the 
dpdk-next-virtio
maintainer will help with this.


> > We would probably be looking at 16.04 or even 16.07 as the basis for the
> LTS at this stage.
> 
> Just one opinion from the view of vhost: since 16.07 is a vhost ABI/API
> refactoring release, I'd suggest to base on 16.07, and then we could have
> less conflicts to apply later bug fix patches.

Agreed. At this stage 16.07 make more sense.

I'll start a separate discussion thread about how the LTS process would work
to see if we can get some consensus from interested parties.

John.
--

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Wiles, Keith


On 6/1/16, 11:18 AM, "Richardson, Bruce"  wrote:

>On Wed, Jun 01, 2016 at 10:58:41AM -0500, Jay Rolette wrote:
>> On Wed, Jun 1, 2016 at 10:00 AM, Wiles, Keith  
>> wrote:
>> 
>> > Started from the link below, but did not want to highjack the thread.
>> > http://dpdk.org/ml/archives/dev/2016-June/040021.html
>> >
>> > I was thinking about this problem from a user perspective and command line
>> > options are very difficult to manage specifically when you have a large
>> > number of options as we have in dpdk. I see all of these options as a type
>> > of database of information for the DPDK and the application, because the
>> > application command line options are also getting very complex as well.
>> >
>> > I have been looking at a number of different options here and the
>> > direction I was thinking was using a file for the options and
>> > configurations with the data in a clean format. It could have been a INI
>> > file or JSON or XML, but they all seem to have some problems I do not like.
>> > The INI file is too flat and I wanted a hierarchy in the data, the JSON
>> > data is similar and XML is just hard to read. I wanted to be able to manage
>> > multiple applications and possible system the DPDK/app runs. The problem
>> > with the above formats is they are just data and not easy to make decisions
>> > about the system and applications at runtime.
>> >
>> 
>> INI format is simplest for users to read, but if you really need hierarchy,
>> JSON will do that just fine. Not sure what you mean by "JSON data is
>> similar"...
>> 
>> 
>I'd be quite concerned if we start needing lots of hierarchies for 
>configuration.
>
>I'd really just like to see ini file format used for this because:
>* it's a well understood, simple format
>* very easily human readable and editable
>* lots of support for it in lots of languages
>* hierarchies are possible in it too - just not as easy as in other formats
>  though. [In a previous life I worked with ini files which had address
>  hierarchies 6-levels deep in them. It wasn't hard to work with]

Maybe INI will work for hierarchies, but I bet it was not super obvious without 
a lot of comments.

>* it works well with grep since you must have one value per-line
>* it allows comments

We can have comments in any format not really a deciding factor IMHO.

>* we already have a DPDK library for parsing them
>
>However, for me the biggest advantage of using something like ini is that it
>would force us to keep things simple!

Simple is good and with any of these formats you can be simple or complex, just 
depends on the usage.

If all I wanted was to run a few examples then I would say INI is just fine. I 
would like to have a configuration file that can help me understand the system 
and pick the right options for a two socket system or one socket or system with 
4 cores or 16 or running in a VM/container or not or FreeBSD or Linux or ? you 
get the picture all with a single image (maybe not with the FreeBSD/Linux).

In a static data format you do not get these easily (maybe if you added a bunch 
of different options and then made the code figure out which ones to use), in a 
interpreted language you do get them for free.

This is why I do not want just a database of options.

>
>I'd stay away from formats like json or XML that are designed for serializing
>entire objects or structures, and look for something that allows us to just
>specify configuration values.
>
>Regards,
>/Bruce
>
>

[dpdk-dev] [PATCH v6 1/5] mempool: support external handler

2016-06-01 Thread Hunt, David



On 6/1/2016 5:19 PM, David Hunt wrote:
> Until now, the objects stored in a mempool were internally stored in a
> ring. This patch introduces the possibility to register external handlers
> replacing the ring.
>
> The default behavior remains unchanged, but calling the new function
> rte_mempool_set_handler() right after rte_mempool_create_empty() allows
> the user to change the handler that will be used when populating
> the mempool.
>
> v7 changes:
>* Moved the flags handling from rte_mempool_create_empty to
>  rte_mempool_create, as it's only there for backward compatibility
>* Various comment additions and cleanup
>* Renamed rte_mempool_handler to rte_mempool_ops
>* Added a union for *pool and u64 pool_id in struct rte_mempool

These v7 changes should me merged with the v6 changes below as this is a 
v6 patch.
Or removed altogether, as they are in the cover letter.

> v6 changes:
>* split the original patch into a few parts for easier review.
>* rename functions with _ext_ to _ops_.
>* addressed some review comments
>* renamed put and get functions to enqueue and dequeue
>* renamed rte_mempool_handler struct to rte_mempool_handler_ops
>* changed occurences of rte_mempool_handler_ops to const, as they
>  contain function pointers (security)
>* added some extra comments
>
>

[...]

[dpdk-dev] [PATCH v6 5/5] mbuf: get default mempool handler from configuration

2016-06-01 Thread David Hunt

By default, the mempool handler used for mbuf allocations is a multi
producer and multi consumer ring. We could imagine a target (maybe some
network processors?) that provides an hardware-assisted pool
mechanism. In this case, the default configuration for this architecture
would contain a different value for RTE_MBUF_DEFAULT_MEMPOOL_HANDLER.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
---
 config/common_base |  1 +
 lib/librte_mbuf/rte_mbuf.c | 26 ++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/config/common_base b/config/common_base
index 47c26f6..cd04f54 100644
--- a/config/common_base
+++ b/config/common_base
@@ -394,6 +394,7 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
 #
 CONFIG_RTE_LIBRTE_MBUF=y
 CONFIG_RTE_LIBRTE_MBUF_DEBUG=n
+CONFIG_RTE_MBUF_DEFAULT_MEMPOOL_HANDLER="ring_mp_mc"
 CONFIG_RTE_MBUF_REFCNT_ATOMIC=y
 CONFIG_RTE_PKTMBUF_HEADROOM=128

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index eec1456..7d855f0 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -153,6 +153,7 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
int socket_id)
 {
+   struct rte_mempool *mp;
struct rte_pktmbuf_pool_private mbp_priv;
unsigned elt_size;

@@ -167,10 +168,27 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
mbp_priv.mbuf_data_room_size = data_room_size;
mbp_priv.mbuf_priv_size = priv_size;

-   return rte_mempool_create(name, n, elt_size,
-   cache_size, sizeof(struct rte_pktmbuf_pool_private),
-   rte_pktmbuf_pool_init, _priv, rte_pktmbuf_init, NULL,
-   socket_id, 0);
+   mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
+sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
+   if (mp == NULL)
+   return NULL;
+
+   rte_errno = rte_mempool_set_handler(mp,
+   RTE_MBUF_DEFAULT_MEMPOOL_HANDLER);
+   if (rte_errno != 0) {
+   RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
+   return NULL;
+   }
+   rte_pktmbuf_pool_init(mp, _priv);
+
+   if (rte_mempool_populate_default(mp) < 0) {
+   rte_mempool_free(mp);
+   return NULL;
+   }
+
+   rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
+
+   return mp;
 }

 /* do some sanity checks on a mbuf: panic if it fails */
-- 
2.5.5

[dpdk-dev] [PATCH v6 4/5] app/test: test external mempool handler

2016-06-01 Thread David Hunt

Use a minimal custom mempool external handler and check that it also
passes basic mempool autotests.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
---
 app/test/test_mempool.c | 114 
 1 file changed, 114 insertions(+)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index b586249..518461f 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -83,6 +83,97 @@
 static rte_atomic32_t synchro;

 /*
+ * Simple example of custom mempool structure. Holds pointers to all the
+ * elements which are simply malloc'd in this example.
+ */
+struct custom_mempool {
+   rte_spinlock_t lock;
+   unsigned count;
+   unsigned size;
+   void *elts[];
+};
+
+/*
+ * Loop through all the element pointers and allocate a chunk of memory, then
+ * insert that memory into the ring.
+ */
+static void *
+custom_mempool_alloc(struct rte_mempool *mp)
+{
+   struct custom_mempool *cm;
+
+   cm = rte_zmalloc("custom_mempool",
+   sizeof(struct custom_mempool) + mp->size * sizeof(void *), 0);
+   if (cm == NULL)
+   return NULL;
+
+   rte_spinlock_init(>lock);
+   cm->count = 0;
+   cm->size = mp->size;
+   return cm;
+}
+
+static void
+custom_mempool_free(void *p)
+{
+   rte_free(p);
+}
+
+static int
+custom_mempool_put(void *p, void * const *obj_table, unsigned n)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+   int ret = 0;
+
+   rte_spinlock_lock(>lock);
+   if (cm->count + n > cm->size) {
+   ret = -ENOBUFS;
+   } else {
+   memcpy(>elts[cm->count], obj_table, sizeof(void *) * n);
+   cm->count += n;
+   }
+   rte_spinlock_unlock(>lock);
+   return ret;
+}
+
+
+static int
+custom_mempool_get(void *p, void **obj_table, unsigned n)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+   int ret = 0;
+
+   rte_spinlock_lock(>lock);
+   if (n > cm->count) {
+   ret = -ENOENT;
+   } else {
+   cm->count -= n;
+   memcpy(obj_table, >elts[cm->count], sizeof(void *) * n);
+   }
+   rte_spinlock_unlock(>lock);
+   return ret;
+}
+
+static unsigned
+custom_mempool_get_count(void *p)
+{
+   struct custom_mempool *cm = (struct custom_mempool *)p;
+
+   return cm->count;
+}
+
+static struct rte_mempool_ops mempool_handler_custom = {
+   .name = "custom_handler",
+   .alloc = custom_mempool_alloc,
+   .free = custom_mempool_free,
+   .put = custom_mempool_put,
+   .get = custom_mempool_get,
+   .get_count = custom_mempool_get_count,
+};
+
+MEMPOOL_REGISTER_HANDLER(mempool_handler_custom);
+
+/*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
  */
@@ -477,6 +568,7 @@ test_mempool(void)
 {
struct rte_mempool *mp_cache = NULL;
struct rte_mempool *mp_nocache = NULL;
+   struct rte_mempool *mp_ext = NULL;

rte_atomic32_init();

@@ -505,6 +597,27 @@ test_mempool(void)
goto err;
}

+   /* create a mempool with an external handler */
+   mp_ext = rte_mempool_create_empty("test_ext",
+   MEMPOOL_SIZE,
+   MEMPOOL_ELT_SIZE,
+   RTE_MEMPOOL_CACHE_MAX_SIZE, 0,
+   SOCKET_ID_ANY, 0);
+
+   if (mp_ext == NULL) {
+   printf("cannot allocate mp_ext mempool\n");
+   goto err;
+   }
+   if (rte_mempool_set_handler(mp_ext, "custom_handler") < 0) {
+   printf("cannot set custom handler\n");
+   goto err;
+   }
+   if (rte_mempool_populate_default(mp_ext) < 0) {
+   printf("cannot populate mp_ext mempool\n");
+   goto err;
+   }
+   rte_mempool_obj_iter(mp_ext, my_obj_init, NULL);
+
/* retrieve the mempool from its name */
if (rte_mempool_lookup("test_nocache") != mp_nocache) {
printf("Cannot lookup mempool from its name\n");
@@ -545,6 +658,7 @@ test_mempool(void)
 err:
rte_mempool_free(mp_nocache);
rte_mempool_free(mp_cache);
+   rte_mempool_free(mp_ext);
return -1;
 }

-- 
2.5.5

[dpdk-dev] [PATCH v6 3/5] mempool: add default external mempool handler

2016-06-01 Thread David Hunt

The first patch in this series added the framework for an external
mempool manager. This patch in the series adds a set of default
handlers based on rte_ring.

v6 changes: split out into a separate patch for easier review.

Signed-off-by: David Hunt 
Signed-off-by: Olivier Matz 
---
 lib/librte_mempool/Makefile  |   1 +
 lib/librte_mempool/rte_mempool.h |   2 +
 lib/librte_mempool/rte_mempool_default.c | 153 +++
 3 files changed, 156 insertions(+)
 create mode 100644 lib/librte_mempool/rte_mempool_default.c

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 793745f..f19366e 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 2
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_handler.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 00eb467..17f1e6f 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -410,6 +410,8 @@ extern struct rte_mempool_handler_table 
rte_mempool_handler_table;
 static inline struct rte_mempool_ops *
 rte_mempool_handler_get(int handler_idx)
 {
+   RTE_VERIFY(handler_idx < RTE_MEMPOOL_MAX_HANDLER_IDX);
+
return _mempool_handler_table.handler_ops[handler_idx];
 }

diff --git a/lib/librte_mempool/rte_mempool_default.c 
b/lib/librte_mempool/rte_mempool_default.c
new file mode 100644
index 000..a4c0d9d
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_default.c
@@ -0,0 +1,153 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static int
+common_ring_mp_put(void *p, void * const *obj_table, unsigned n)
+{
+   return rte_ring_mp_enqueue_bulk((struct rte_ring *)p, obj_table, n);
+}
+
+static int
+common_ring_sp_put(void *p, void * const *obj_table, unsigned n)
+{
+   return rte_ring_sp_enqueue_bulk((struct rte_ring *)p, obj_table, n);
+}
+
+static int
+common_ring_mc_get(void *p, void **obj_table, unsigned n)
+{
+   return rte_ring_mc_dequeue_bulk((struct rte_ring *)p, obj_table, n);
+}
+
+static int
+common_ring_sc_get(void *p, void **obj_table, unsigned n)
+{
+   return rte_ring_sc_dequeue_bulk((struct rte_ring *)p, obj_table, n);
+}
+
+static unsigned
+common_ring_get_count(void *p)
+{
+   return rte_ring_count((struct rte_ring *)p);
+}
+
+
+static void *
+common_ring_alloc(struct rte_mempool *mp)
+{
+   int rg_flags = 0, ret;
+   char rg_name[RTE_RING_NAMESIZE];
+   struct rte_ring *r;
+
+   ret = snprintf(rg_name, sizeof(rg_name),
+   RTE_MEMPOOL_MZ_FORMAT, mp->name);
+   if (ret < 0 || ret >= (int)sizeof(rg_name)) {
+   rte_errno = ENAMETOOLONG;
+   return NULL;
+   }
+
+   /* ring flags */
+   if (mp->flags & MEMPOOL_F_SP_PUT)
+   rg_flags |= RING_F_SP_ENQ;
+   if (mp->flags & MEMPOOL_F_SC_GET)
+   rg_flags |= RING_F_SC_DEQ;
+
+   /* Allocate the ring that will be used to store objects.
+* Ring functions

[dpdk-dev] [PATCH v6 2/5] mempool: remove rte_ring from rte_mempool struct

2016-06-01 Thread David Hunt

Now that we're moving to an external mempoool handler, which
uses a void *pool as a pointer to the pool data, remove the
unneeded ring pointer from the mempool struct.

Signed-off-by: David Hunt 
---
 app/test/test_mempool_perf.c | 1 -
 lib/librte_mempool/rte_mempool.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index cdc02a0..091c1df 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -161,7 +161,6 @@ per_lcore_mempool_test(__attribute__((unused)) void *arg)
   n_get_bulk);
if (unlikely(ret < 0)) {
rte_mempool_dump(stdout, mp);
-   rte_ring_dump(stdout, mp->ring);
/* in this case, objects are lost... */
return -1;
}
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index b659565..00eb467 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -204,7 +204,6 @@ struct rte_mempool_memhdr {
  */
 struct rte_mempool {
char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
-   struct rte_ring *ring;   /**< Ring to store objects. */
union {
void *pool;  /**< Ring or pool to store objects */
uint64_t *pool_id;   /**< External mempool identifier */
-- 
2.5.5

[dpdk-dev] [PATCH v6 1/5] mempool: support external handler

2016-06-01 Thread David Hunt

Until now, the objects stored in a mempool were internally stored in a
ring. This patch introduces the possibility to register external handlers
replacing the ring.

The default behavior remains unchanged, but calling the new function
rte_mempool_set_handler() right after rte_mempool_create_empty() allows
the user to change the handler that will be used when populating
the mempool.

v7 changes:
  * Moved the flags handling from rte_mempool_create_empty to
rte_mempool_create, as it's only there for backward compatibility
  * Various comment additions and cleanup
  * Renamed rte_mempool_handler to rte_mempool_ops
  * Added a union for *pool and u64 pool_id in struct rte_mempool

v6 changes:
  * split the original patch into a few parts for easier review.
  * rename functions with _ext_ to _ops_.
  * addressed some review comments
  * renamed put and get functions to enqueue and dequeue
  * renamed rte_mempool_handler struct to rte_mempool_handler_ops
  * changed occurences of rte_mempool_handler_ops to const, as they
contain function pointers (security)
  * added some extra comments

v5 changes: rebasing on top of 35 patch set mempool work.

Signed-off-by: Olivier Matz 
Signed-off-by: David Hunt 
---
 lib/librte_mempool/Makefile  |   1 +
 lib/librte_mempool/rte_mempool.c |  71 --
 lib/librte_mempool/rte_mempool.h | 235 ---
 lib/librte_mempool/rte_mempool_handler.c | 141 +++
 4 files changed, 384 insertions(+), 64 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_handler.c

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 43423e0..793745f 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -42,6 +42,7 @@ LIBABIVER := 2

 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_handler.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index b54de43..9f34d30 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -148,7 +148,7 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, 
phys_addr_t physaddr)
 #endif

/* enqueue in ring */
-   rte_ring_sp_enqueue(mp->ring, obj);
+   rte_mempool_ops_enqueue_bulk(mp, , 1);
 }

 /* call obj_cb() for each mempool element */
@@ -303,40 +303,6 @@ rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t 
elt_num,
return (size_t)paddr_idx << pg_shift;
 }

-/* create the internal ring */
-static int
-rte_mempool_ring_create(struct rte_mempool *mp)
-{
-   int rg_flags = 0, ret;
-   char rg_name[RTE_RING_NAMESIZE];
-   struct rte_ring *r;
-
-   ret = snprintf(rg_name, sizeof(rg_name),
-   RTE_MEMPOOL_MZ_FORMAT, mp->name);
-   if (ret < 0 || ret >= (int)sizeof(rg_name))
-   return -ENAMETOOLONG;
-
-   /* ring flags */
-   if (mp->flags & MEMPOOL_F_SP_PUT)
-   rg_flags |= RING_F_SP_ENQ;
-   if (mp->flags & MEMPOOL_F_SC_GET)
-   rg_flags |= RING_F_SC_DEQ;
-
-   /* Allocate the ring that will be used to store objects.
-* Ring functions will return appropriate errors if we are
-* running as a secondary process etc., so no checks made
-* in this function for that condition.
-*/
-   r = rte_ring_create(rg_name, rte_align32pow2(mp->size + 1),
-   mp->socket_id, rg_flags);
-   if (r == NULL)
-   return -rte_errno;
-
-   mp->ring = r;
-   mp->flags |= MEMPOOL_F_RING_CREATED;
-   return 0;
-}
-
 /* free a memchunk allocated with rte_memzone_reserve() */
 static void
 rte_mempool_memchunk_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
@@ -354,7 +320,7 @@ rte_mempool_free_memchunks(struct rte_mempool *mp)
void *elt;

while (!STAILQ_EMPTY(>elt_list)) {
-   rte_ring_sc_dequeue(mp->ring, );
+   rte_mempool_ops_dequeue_bulk(mp, , 1);
(void)elt;
STAILQ_REMOVE_HEAD(>elt_list, next);
mp->populated_size--;
@@ -383,13 +349,16 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char 
*vaddr,
unsigned i = 0;
size_t off;
struct rte_mempool_memhdr *memhdr;
-   int ret;

/* create the internal ring if not already done */
if ((mp->flags & MEMPOOL_F_RING_CREATED) == 0) {
-   ret = rte_mempool_ring_create(mp);
-   if (ret < 0)
-   return ret;
+   rte_errno = 0;
+   mp->pool = rte_mempool_ops_alloc(mp);
+   if (mp->pool == NULL) {
+   if (rte_errno == 0)
+   return -EINVAL;
+   return -rte_errno;
+   }
}

/* mempool is already

[dpdk-dev] [PATCH v6 0/5] mempool: add external mempool manager

2016-06-01 Thread David Hunt

Here's the latest version of the External Mempool Manager patchset.
It's re-based on top of the latest head as of 1st June 2016, including
Olivier's 35-part patch series on mempool re-org [1]

[1] http://dpdk.org/ml/archives/dev/2016-May/039229.html

Note: After applying the last patch, run "make config ..." before
compiling. It introduces a config file change. 

Note: Hopefully I've addressed all the extensive comments over the
last week. If I've missed any, please let me know, as it would
not have been intentional. I hop I've responded to all comments
via email on the mailing list. 

v6 changes:

 * Moved the flags handling from rte_mempool_create_empty to
   rte_mempool_create, as it's only there for backward compatibility
 * Various comment additions and cleanup
 * Renamed rte_mempool_handler to rte_mempool_ops
 * Added a union for *pool and u64 pool_id in struct rte_mempool
 * split the original patch into a few parts for easier review.
 * rename functions with _ext_ to _ops_.
 * addressed review comments
 * renamed put and get functions to enqueue and dequeue
 * changed occurences of rte_mempool_ops to const, as they
   contain function pointers (security)
 * split out the default external mempool handler into a separate
   patch for easier review

v5 changes:
 * rebasing, as it is dependent on another patch series [1]

v4 changes (Olivier Matz):
 * remove the rte_mempool_create_ext() function. To change the handler, the
   user has to do the following:
   - mp = rte_mempool_create_empty()
   - rte_mempool_set_handler(mp, "my_handler")
   - rte_mempool_populate_default(mp)
   This avoids to add another function with more than 10 arguments, duplicating
   the doxygen comments
 * change the api of rte_mempool_alloc_t: only the mempool pointer is required
   as all information is available in it
 * change the api of rte_mempool_free_t: remove return value
 * move inline wrapper functions from the .c to the .h (else they won't be
   inlined). This implies to have one header file (rte_mempool.h), or it
   would have generate cross dependencies issues.
 * remove now unused MEMPOOL_F_INT_HANDLER (note: it was misused anyway due
   to the use of && instead of &)
 * fix build in debug mode (__MEMPOOL_STAT_ADD(mp, put_pool, n) remaining)
 * fix build with shared libraries (global handler has to be declared in
   the .map file)
 * rationalize #include order
 * remove unused function rte_mempool_get_handler_name()
 * rename some structures, fields, functions
 * remove the static in front of rte_tailq_elem rte_mempool_tailq (comment
   from Yuanhan)
 * test the ext mempool handler in the same file than standard mempool tests,
   avoiding to duplicate the code
 * rework the custom handler in mempool_test
 * rework a bit the patch selecting default mbuf pool handler
 * fix some doxygen comments

v3 changes:
 * simplified the file layout, renamed to rte_mempool_handler.[hc]
 * moved the default handlers into rte_mempool_default.c
 * moved the example handler out into app/test/test_ext_mempool.c
 * removed is_mc/is_mp change, slight perf degredation on sp cached operation
 * removed stack hanler, may re-introduce at a later date
 * Changes out of code reviews

v2 changes:
 * There was a lot of duplicate code between rte_mempool_xmem_create and
   rte_mempool_create_ext. This has now been refactored and is now
   hopefully cleaner.
 * The RTE_NEXT_ABI define is now used to allow building of the library
   in a format that is compatible with binaries built against previous
   versions of DPDK.
 * Changes out of code reviews. Hopefully I've got most of them included.

The External Mempool Manager is an extension to the mempool API that allows
users to add and use an external mempool manager, which allows external memory
subsystems such as external hardware memory management systems and software
based memory allocators to be used with DPDK.

The existing API to the internal DPDK mempool manager will remain unchanged
and will be backward compatible. However, there will be an ABI breakage, as
the mempool struct is changing. These changes are all contained withing
RTE_NEXT_ABI defs, and the current or next code can be changed with
the CONFIG_RTE_NEXT_ABI config setting

There are two aspects to external mempool manager.
  1. Adding the code for your new mempool handler. This is achieved by adding a
 new mempool handler source file into the librte_mempool library, and
 using the REGISTER_MEMPOOL_HANDLER macro.
  2. Using the new API to call rte_mempool_create_empty and
 rte_mempool_set_handler to create a new mempool
 using the name parameter to identify which handler to use.

New API calls added
 1. A new rte_mempool_create_empty() function
 2. rte_mempool_set_handler() which sets the mempool's handler
 3. An rte_mempool_populate_default() and rte_mempool_populate_anon() functions
which populates the mempool using the relevant handler

Several external mempool managers may be used in the same

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Bruce Richardson

On Wed, Jun 01, 2016 at 10:58:41AM -0500, Jay Rolette wrote:
> On Wed, Jun 1, 2016 at 10:00 AM, Wiles, Keith  
> wrote:
> 
> > Started from the link below, but did not want to highjack the thread.
> > http://dpdk.org/ml/archives/dev/2016-June/040021.html
> >
> > I was thinking about this problem from a user perspective and command line
> > options are very difficult to manage specifically when you have a large
> > number of options as we have in dpdk. I see all of these options as a type
> > of database of information for the DPDK and the application, because the
> > application command line options are also getting very complex as well.
> >
> > I have been looking at a number of different options here and the
> > direction I was thinking was using a file for the options and
> > configurations with the data in a clean format. It could have been a INI
> > file or JSON or XML, but they all seem to have some problems I do not like.
> > The INI file is too flat and I wanted a hierarchy in the data, the JSON
> > data is similar and XML is just hard to read. I wanted to be able to manage
> > multiple applications and possible system the DPDK/app runs. The problem
> > with the above formats is they are just data and not easy to make decisions
> > about the system and applications at runtime.
> >
> 
> INI format is simplest for users to read, but if you really need hierarchy,
> JSON will do that just fine. Not sure what you mean by "JSON data is
> similar"...
> 
> 
I'd be quite concerned if we start needing lots of hierarchies for 
configuration.

I'd really just like to see ini file format used for this because:
* it's a well understood, simple format
* very easily human readable and editable
* lots of support for it in lots of languages
* hierarchies are possible in it too - just not as easy as in other formats
  though. [In a previous life I worked with ini files which had address
  hierarchies 6-levels deep in them. It wasn't hard to work with]
* it works well with grep since you must have one value per-line
* it allows comments
* we already have a DPDK library for parsing them

However, for me the biggest advantage of using something like ini is that it
would force us to keep things simple!

I'd stay away from formats like json or XML that are designed for serializing
entire objects or structures, and look for something that allows us to just
specify configuration values.

Regards,
/Bruce

[dpdk-dev] [PATCH v5 1/3] mempool: support external handler

2016-06-01 Thread Jerin Jacob

On Wed, Jun 01, 2016 at 11:46:20AM +0100, Hunt, David wrote:
> 
> 
> On 5/31/2016 10:11 PM, Jerin Jacob wrote:
> > On Tue, May 31, 2016 at 10:41:00PM +0200, Olivier MATZ wrote:
> > > Hi,
> > > 
> > > On 05/31/2016 06:03 PM, Jerin Jacob wrote:
> > > > On Tue, May 31, 2016 at 04:37:02PM +0100, Hunt, David wrote:
> > > > > 
> > > > > On 5/31/2016 9:53 AM, Jerin Jacob wrote:
> > > > > > On Mon, May 30, 2016 at 12:27:26PM +0100, Hunt, David wrote:
> > > > > > > New mempool handlers will use rte_mempool_create_empty(),
> > > > > > > rte_mempool_set_handler(),
> > > > > > > then rte_mempool_populate_*(). These three functions are new to 
> > > > > > > this
> > > > > > > release, to no problem
> > > > > > Having separate APIs for external pool-manager create is worrisome 
> > > > > > in
> > > > > > application perspective. Is it possible to have 
> > > > > > rte_mempool_[xmem]_create
> > > > > > for the both external and existing SW pool manager and make
> > > > > > rte_mempool_create_empty and rte_mempool_populate_*  internal 
> > > > > > functions.
> > > > > > 
> > > > > > IMO, We can do that by selecting  specific rte_mempool_set_handler()
> > > > > > based on _flags_ encoding, something like below
> > > > > > 
> > > > > > bit 0 - 16   // generic bits uses across all the pool managers
> > > > > > bit 16 - 23  // pool handler specific flags bits
> > > > > > bit 24 - 31  // to select the specific pool manager(Up to 256 
> > > > > > different flavors of
> > > > > > pool managers, For backward compatibility, make '0'(in 24-31) to 
> > > > > > select
> > > > > > existing SW pool manager.
> > > > > > 
> > > > > > and applications can choose the handlers by selecting the flag in
> > > > > > rte_mempool_[xmem]_create, That way it will be easy in testpmd or 
> > > > > > any other
> > > > > > applications to choose the pool handler from command line etc in 
> > > > > > future.
> > > > > There might be issues with the 8-bit handler number, as we'd have to 
> > > > > add an
> > > > > api call to
> > > > > first get the index of a given hander by name, then OR it into the 
> > > > > flags.
> > > > > That would mean
> > > > > also extra API calls for the non-default external handlers. I do 
> > > > > agree with
> > > > > the handler-specific
> > > > > bits though.
> > > > That would be an internal API(upper 8 bits to handler name). Right ?
> > > > Seems to be OK for me.
> > > > 
> > > > > Having the _empty and _set_handler  APIs seems to me to be OK for the
> > > > > moment. Maybe Olivier could comment?
> > > > > 
> > > > But need 3 APIs. Right? _empty , _set_handler and _populate ? I believe
> > > > it is better reduce the public API in spec where ever possible ?
> > > > 
> > > > Maybe Olivier could comment ?
> > > Well, I think having 3 different functions is not a problem if the API
> > > is clearer.
> > > 
> > > In my opinion, the following:
> > >   rte_mempool_create_empty()
> > >   rte_mempool_set_handler()
> > >   rte_mempool_populate()
> > > 
> > > is clearer than:
> > >   rte_mempool_create(15 args)
> > But proposed scheme is not adding any new arguments to
> > rte_mempool_create. It just extending the existing flag.
> > 
> > rte_mempool_create(15 args) is still their as API for internal pool
> > creation.
> > 
> > > Splitting the flags into 3 groups, with one not beeing flags but a
> > > pool handler number looks overcomplicated from a user perspective.
> > I am concerned with seem less integration with existing applications,
> > IMO, Its not worth having separate functions for external vs internal
> > pool creation for application(now each every applications has to added this
> > logic every where for no good reason), just my 2 cents.
> 
> I think that there is always going to be some  extra code in the
> applications
>  that want to use an external mempool. The _set_handler approach does
> create, set_hander, populate. The Flags method queries the handler list to
> get the index, sets the flags bits, then calls create. Both methods will
> work.

I was suggesting flags like TXQ in ethdev where application just
selects the mode. Not sure why application has to get the index first.

some thing like,
#define ETH_TXQ_FLAGS_NOMULTSEGS 0x0001 /**< nb_segs=1 for all mbufs */
#define ETH_TXQ_FLAGS_NOREFCOUNT 0x0002 /**< refcnt can be ignored */
#define ETH_TXQ_FLAGS_NOMULTMEMP 0x0004 /**< all bufs come from same mempool */ 

Anyway, Looks like  no one else much bothered about external pool
manger creation API being different. So, I given up. No objections from my side 
:-)

> 
> But I think the _set_handler approach is more user friendly, therefore that
> it the method I would lean towards.
> 
> > > > > > and we can remove "mbuf: get default mempool handler from 
> > > > > > configuration"
> > > > > > change-set OR just add if RTE_MBUF_DEFAULT_MEMPOOL_HANDLER is 
> > > > > > defined then set
> > > > > > the same with rte_mempool_set_handler in rte_mempool_[xmem]_create.
> > > > > > 
> > > > > > What do you think?
> > > >

[dpdk-dev] [PATCH v5 8/8] doc: update doc for virtio-user

2016-06-01 Thread Yuanhan Liu

On Mon, May 30, 2016 at 10:55:39AM +, Jianfeng Tan wrote:
> Signed-off-by: Huawei Xie 
> Signed-off-by: Jianfeng Tan 
> ---
>  doc/guides/rel_notes/release_16_07.rst | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_16_07.rst 
> b/doc/guides/rel_notes/release_16_07.rst
> index f6d543c..b1054b6 100644
> --- a/doc/guides/rel_notes/release_16_07.rst
> +++ b/doc/guides/rel_notes/release_16_07.rst
> @@ -34,6 +34,10 @@ This section should contain new features added in this 
> release. Sample format:
>  
>Refer to the previous release notes for examples.
>  
> +* **Virtio support for containers.**
> +
> +  Add a new virtual device, named virtio-user, to support virtio for 
> containers.

Note that Thomas have stated quite many times, that we should keep the
release note in the same patch that actually enabled the feature.

Besides that, I guess we should state the current limitation here. It
might be even better if we could make a howto doc, say in the vhost
example doc.

--yliu

[dpdk-dev] [PATCH v5 7/8] virtio-user: add a new vdev named virtio-user

2016-06-01 Thread Yuanhan Liu

On Mon, May 30, 2016 at 10:55:38AM +, Jianfeng Tan wrote:
> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
> b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> index 41d8ad1..5e4f60b 100644
> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> @@ -166,3 +166,312 @@ int virtio_user_stop_device(struct virtio_user_dev *dev)
>   return vhost_call(dev->vhostfd, dev->type, VHOST_MSG_RESET_OWNER, NULL);
>  }
>  
> +static inline void parse_mac(struct virtio_user_dev *dev, const char *mac)

Note that this is a slight coding style offensive.

> +{
> + int i, r;
> + uint32_t tmp[ETHER_ADDR_LEN];
> +
> + if (!mac)
> + return;
> +
> + r = sscanf(mac, "%x:%x:%x:%x:%x:%x", [0],
> + [1], [2], [3], [4], [5]);
> + if (r == ETHER_ADDR_LEN) {
> + for (i = 0; i < ETHER_ADDR_LEN; ++i)
> + dev->mac_addr[i] = (uint8_t)tmp[i];
> + dev->mac_specified = 1;
> + } else {
> + /* ignore the wrong mac, use random mac */
> + PMD_DRV_LOG(ERR, "wrong format of mac: %s", mac);
> + }
> +}
> +
> +static int
> +virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
> +  int queue_size, const char *mac, char *ifname)

As stated in last email, we should move all others (except above 2
functions) to the driver layer, where they belong to.

--yliu

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Yerden Zhumabekov

I recently felt tired enough of specifying various options for EAL, so I 
came up to use ini-based configuration. EAL parameters from dedicated 
section of ini file are parsed to argv array which is subsequently fed 
to rte_eal_init(). Quite handy, but maybe a little overkill.


On 01.06.2016 12:04, Yuanhan Liu wrote:
> Hi all,
>
> I guess we (maybe just me :) have stated few times something like
> "hey, this kind of stuff is good to have, but you are trying to
> add an EAL CLI option for a specific subsystem/driver, which is
> wrong".
>
> One recent example that is still fresh in my mind is the one from
> Christian [0], that he made a proposal to introduce two new EAL
> options, --vhost-owner and --vhost-perm, to configure the vhost
> user socket file permission.
>
>  [0]: http://dpdk.org/ml/archives/dev/2016-April/037948.html
>
> Another example is the one I met while enabling virtio 1.0 support.
> QEMU has the ability to support both virtio 0.95 (legacy) and 1.0
> (modern) at the same time for one virtio device, therefore, we
> could either use legacy driver or modern driver to operate the
> device. However, the current logic is we try with modern driver
> first, and then legacy driver if it failed. In above case, we will
> never hit the legacy driver. But sometimes, it's nice to let it
> force back to the legacy driver, say, for debug or compare purpose.
>
> Apparently, adding a new EAL option like "--force-legacy" looks
> wrong.
>
> The generic yet elegant solution I just thought of while having
> lunch is to add a new EAL option, say, --extra-options, where we
> could specify driver/subsystem specific options. As you see, it's
> nothing big deal, it just looks like Linux kernel parameters.
>
> Take above two cases as example, it could be:
>
>  --extra-options "vhost-owner=kvm:kvm force-legacy"
>
> Note that those options could also be delimited by comma.
>
> DPDK EAL then will provide some generic helper functions to get
> and parse those options, and let the specific driver/subsystem
> to invoke them to do the actual parse and do the proper action
> when some option is specified, say, virtio PMD driver will force
> back to legacy driver when "force-legacy" is given.
>
> Comments? Makes sense to you guys, or something nice to have?
>
>   --yliu

[dpdk-dev] [PATCH v5 6/8] virtio-user: add new virtual pci driver for virtio

2016-06-01 Thread Yuanhan Liu

On Mon, May 30, 2016 at 10:55:37AM +, Jianfeng Tan wrote:
> This patch implements another new instance of struct virtio_pci_ops to
> drive the virtio-user virtual device. Instead of rd/wr ioport or PCI
> configuration space, this virtual pci driver will rd/wr the virtual
> device struct virtio_user_hw, and when necessary, invokes APIs provided
> by device emulation later to start/stop the device.
> 
>   --
>   | -- |
>   | | virtio driver  | |> (virtio_user_pci.c)
>   | -- |
>   | |  |
>   | -- | -->  virtio-user PMD
>   | | device emulate | |
>   | || |
>   | | vhost adapter  | |
>   | -- |
>   --
> |
> |
> |
>--
>| vhost backend  |
>--
> 
> Signed-off-by: Huawei Xie 
> Signed-off-by: Jianfeng Tan 
> Acked-by: Neil Horman 
> ---
>  drivers/net/virtio/Makefile  |   1 +
>  drivers/net/virtio/virtio_pci.h  |   1 +
>  drivers/net/virtio/virtio_user/virtio_user_dev.h |   2 +
>  drivers/net/virtio/virtio_user/virtio_user_pci.c | 218 
> +++

Jianfeng, this file (virtio_user_pci.c) belongs to virtio driver, but
not virtio-user device.

I thought it a while, I found it's still better to introduce a helper
function from virtio-user device for each ops, just like the below
vdev_reset(). Sorry for noisy.

And I'd suggest to rename it to virtio_user.c, so that we can move
the virtio_user_pmd_devinit/devuninit in the next patch to there,
too. They also belong the driver side, but not the device side.

> +static void
> +vdev_read_dev_config(struct virtio_hw *hw, uint64_t offset,
> +  void *dst, int length)

Let's user "virtio_user" consistently here, instead of vdev?

--yliu

[dpdk-dev] Can't build DPDK-16.04 on CentOS 6.8

2016-06-01 Thread Martinx - ジェームズ

Guys,

 I'm trying to build DPDK-16.04 on CentOS 6.8, but it is failing, here is
the error:

---
...
== Build lib/librte_eal/linuxapp
== Build lib/librte_eal/linuxapp/eal
== Build lib/librte_eal/linuxapp/igb_uio
  CC eal.o
  CC eal_hugepage_info.o
  CC eal_memory.o
  LD
 
/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/built-in.o
  CC [M]
 
/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.o
  CC eal_thread.o
/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:
In function 'igbuio_msix_mask_irq':
/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:157:
error: 'PCI_MSIX_ENTRY_CTRL_MASKBIT' undeclared (first use in this function)
/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:157:
error: (Each undeclared identifier is reported only once
/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:157:
error: for each function it appears in.)
make[8]: ***
[/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.o]
Error 1
make[7]: ***
[_module_/root/rpmbuild/BUILD/dpdk-16.04/x86_64-default-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio]
Error 2
make[6]: *** [sub-make] Error 2
make[5]: *** [igb_uio.ko] Error 2
make[4]: *** [igb_uio] Error 2
make[4]: *** Waiting for unfinished jobs
  CC eal_log.o
  CC eal_pci.o
  CC eal_pci_uio.o
  CC eal_pci_vfio.o
  CC eal_pci_vfio_mp_sync.o
...
---

 Any clue?

 I'm trying to build it by running:

--
rpmbuild --ba /root/rpmbuild/SPECS/dpdk.spec
--

 I removed the "doc" and the need for Xen out of it... I can take this spec
and the dpdk-16.04.tar.gz and build it on CentOS 7.

Thanks!
Thiago

[dpdk-dev] [PATCH 1/3] vhost: pre update used ring for Tx and Rx

2016-06-01 Thread Michael S. Tsirkin

On Wed, Jun 01, 2016 at 06:40:41AM +, Xie, Huawei wrote:
> On 5/3/2016 8:42 AM, Yuanhan Liu wrote:
> > Pre update and update used ring in batch for Tx and Rx at the stage
> > while fetching all avail desc idx. This would reduce some cache misses
> > and hence, increase the performance a bit.
> >
> > Pre update would be feasible as guest driver will not start processing
> > those entries as far as we don't update "used->idx". (I'm not 100%
> > certain I don't miss anything, though).
> >
> > Cc: Michael S. Tsirkin 
> > Signed-off-by: Yuanhan Liu 
> > ---
> >  lib/librte_vhost/vhost_rxtx.c | 58 
> > +--
> >  1 file changed, 28 insertions(+), 30 deletions(-)
> >
> > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> > index c9cd1c5..2c3b810 100644
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -137,7 +137,7 @@ copy_virtio_net_hdr(struct virtio_net *dev, uint64_t 
> > desc_addr,
> >  
> >  static inline int __attribute__((always_inline))
> >  copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > - struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied)
> > + struct rte_mbuf *m, uint16_t desc_idx)
> >  {
> > uint32_t desc_avail, desc_offset;
> > uint32_t mbuf_avail, mbuf_offset;
> > @@ -161,7 +161,6 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
> > vhost_virtqueue *vq,
> > desc_offset = dev->vhost_hlen;
> > desc_avail  = desc->len - dev->vhost_hlen;
> >  
> > -   *copied = rte_pktmbuf_pkt_len(m);
> > mbuf_avail  = rte_pktmbuf_data_len(m);
> > mbuf_offset = 0;
> > while (mbuf_avail != 0 || m->next != NULL) {
> > @@ -262,6 +261,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
> > struct vhost_virtqueue *vq;
> > uint16_t res_start_idx, res_end_idx;
> > uint16_t desc_indexes[MAX_PKT_BURST];
> > +   uint16_t used_idx;
> > uint32_t i;
> >  
> > LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
> > @@ -285,27 +285,29 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t 
> > queue_id,
> > /* Retrieve all of the desc indexes first to avoid caching issues. */
> > rte_prefetch0(>avail->ring[res_start_idx & (vq->size - 1)]);
> > for (i = 0; i < count; i++) {
> > -   desc_indexes[i] = vq->avail->ring[(res_start_idx + i) &
> > - (vq->size - 1)];
> > +   used_idx = (res_start_idx + i) & (vq->size - 1);
> > +   desc_indexes[i] = vq->avail->ring[used_idx];
> > +   vq->used->ring[used_idx].id = desc_indexes[i];
> > +   vq->used->ring[used_idx].len = pkts[i]->pkt_len +
> > +  dev->vhost_hlen;
> > +   vhost_log_used_vring(dev, vq,
> > +   offsetof(struct vring_used, ring[used_idx]),
> > +   sizeof(vq->used->ring[used_idx]));
> > }
> >  
> > rte_prefetch0(>desc[desc_indexes[0]]);
> > for (i = 0; i < count; i++) {
> > uint16_t desc_idx = desc_indexes[i];
> > -   uint16_t used_idx = (res_start_idx + i) & (vq->size - 1);
> > -   uint32_t copied;
> > int err;
> >  
> > -   err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx, );
> > -
> > -   vq->used->ring[used_idx].id = desc_idx;
> > -   if (unlikely(err))
> > +   err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx);
> > +   if (unlikely(err)) {
> > +   used_idx = (res_start_idx + i) & (vq->size - 1);
> > vq->used->ring[used_idx].len = dev->vhost_hlen;
> > -   else
> > -   vq->used->ring[used_idx].len = copied + dev->vhost_hlen;
> > -   vhost_log_used_vring(dev, vq,
> > -   offsetof(struct vring_used, ring[used_idx]),
> > -   sizeof(vq->used->ring[used_idx]));
> > +   vhost_log_used_vring(dev, vq,
> > +   offsetof(struct vring_used, ring[used_idx]),
> > +   sizeof(vq->used->ring[used_idx]));
> > +   }
> >  
> > if (i + 1 < count)
> > rte_prefetch0(>desc[desc_indexes[i+1]]);
> > @@ -879,6 +881,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > /* Prefetch available ring to retrieve head indexes. */
> > used_idx = vq->last_used_idx & (vq->size - 1);
> > rte_prefetch0(>avail->ring[used_idx]);
> > +   rte_prefetch0(>used->ring[used_idx]);
> >  
> > count = RTE_MIN(count, MAX_PKT_BURST);
> > count = RTE_MIN(count, free_entries);
> > @@ -887,22 +890,23 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> >  
> > /* Retrieve all of the head indexes first to avoid caching issues. */
> > for (i = 0; i < count; i++) {
> > -   desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
> > -   (vq->size - 1)];
> > +

[dpdk-dev] [PATCH v5 3/8] virtio: enable use virtual address to fill desc

2016-06-01 Thread Yuanhan Liu

On Mon, May 30, 2016 at 10:55:34AM +, Jianfeng Tan wrote:
> This patch is related to how to calculate relative address for vhost
> backend.
> 
> The principle is that: based on one or multiple shared memory regions,
> vhost maintains a reference system with the frontend start address,
> backend start address, and length for each segment, so that each
> frontend address (GPA, Guest Physical Address) can be translated into
> vhost-recognizable backend address. To make the address translation
> efficient, we need to maintain as few regions as possible. In the case
> of VM, GPA is always locally continuous. But for some other case, like
> virtio-user, we use virtual address here.
> 
> It basically means:
>   a. when set_base_addr, VA address is used;
>   b. when preparing RX's descriptors, VA address is used;
>   c. when transmitting packets, VA is filled in TX's descriptors;
>   d. in TX and CQ's header, VA is used.
> 
> Signed-off-by: Huawei Xie 
> Signed-off-by: Jianfeng Tan 
> Acked-by: Neil Horman 
> ---
>  drivers/net/virtio/virtio_ethdev.c  | 21 -
>  drivers/net/virtio/virtio_rxtx.c|  5 ++---
>  drivers/net/virtio/virtio_rxtx_simple.c | 13 +++--
>  drivers/net/virtio/virtqueue.h  | 13 -
>  4 files changed, 37 insertions(+), 15 deletions(-)
> 
> @@ -419,8 +419,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
>   vq->virtio_net_hdr_mem = hdr_mz->phys_addr;
>  
>   memset(hdr_mz->addr, 0, hdr_mz_sz);
> - vring_hdr_desc_init(vq);
> -
>   } else if (queue_type == VTNET_CQ) {
>   /* Allocate a page for control vq command, data and status */
>   snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
> @@ -441,6 +439,19 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
>   memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
>   }
>  
> + if (dev->pci_dev)
> + vq->offset = offsetof(struct rte_mbuf, buf_physaddr);
> + else {
> + vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
> + vq->offset = offsetof(struct rte_mbuf, buf_addr);
> + if (vq->virtio_net_hdr_mz)
> + vq->virtio_net_hdr_mem =
> + (phys_addr_t)vq->virtio_net_hdr_mz->addr;
> + }

I guess this piece of code deserves some comments. Say, for virtio-user
case (that is when dev->pci_dev is NULL), we use virtual address,
because, bala, bala ...


> @@ -165,6 +173,7 @@ struct virtqueue {
>   void*vq_ring_virt_mem;/**< linear address of vring*/
>   unsigned int vq_ring_size;
>   phys_addr_t vq_ring_mem;  /**< physical address of vring */
> +   /**< use virtual address for vdev. */

Replace vdev with "virtio-user" is better here?

--yliu

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Thomas Monjalon

2016-06-01 21:19, Yuanhan Liu:
> On Wed, Jun 01, 2016 at 02:39:28PM +0200, Thomas Monjalon wrote:
> > I was thinking to implement the library options parsing in DPDK.
> > But if the application implements its own options parsing without using
> > the DPDK one, yes the option parsing is obviously done in the application.
> > 
> > > I'd say, that would work, but I see inflexibility and some drawbacks:
> > > 
> > > - I would assume "--pciopt" has the input style of
> > > 
> > >   "domain:bus:devid:func,option1,option2,..."
> > > 
> > >   It then looks hard to me to use it: I need figure out the
> > >   pci id first.
> > 
> > What do you suggest instead of PCI id?
> 
> That might depend on what you need: if you want to configure a specific
> device, yes, PCI is needed (or even, a must). In such case, the --pciopt
> or the side effect of --pci-whitelist could help. I confess this would
> be helpful in some cases.
> 
> I guess there is another need is that we might want to pass an option
> to a driver, say ixgbe, that would work for all devices operated by it.
> In such case, we could use driver name (see the example below).
> 
> > > - For the libraries, we have to write code to add new options for
> > >   each applictions. With the generic option, user just need use it;
> > >   don't need write a single line of code, which could save user's
> > >   effort. It also gives user an united interface.
> > 
> > Applications can leverage the DPDK options parsing (without writing
> > any new line of code).
> > 
> > >   And to make it clear, as stated, I don't object to having an API.
> > >   I mean, the generic option gives us a chance to do the right
> > >   configuration at startup time: it would still invoke the right
> > >   API to do the right thing in the end.
> > 
> > I agree. I just want to split your --extra-option proposal in few options
> > with a bit more context.
> 
> Firstly, the name "--extra-option" might not be well taken :(
> I just want to show the idea first.
> 
> Secondly, splitting it to more options would result to quite many
> options introduced; it's also hard to list all of them. User intend
> to get lost if so many options provided. It also introduces more
> chaos, especially when we are going to add one option for each
> library.
> 
> For the context issue, I guess we could address it by adding a prefix.
> Such as,
> 
> --extra-options (or --whatever) "vhost.owner=kvm:kvm virtio.force_legacy
>  mempool.handler=xxx"
> 
> Based on that, we could introduce other sematics, to let it be:
> 
> driver.opt | driver.pci_id.opt
> 
> Where,
> 
> - driver.opt
>   applies to all devices operated by this driver
> 
> - driver.pci_id.opt
>   applies only to a specific device with the same pci ID.
> 
> This could save us changing the --pci-whitelist sematic, yet it saves
> us introducing a new option (--pciopt).
> 
> What do you think of it?

I like the idea :)
One important benefit of having only one option is to make easier to rename
in applications to e.g. --dpdk-options and pass the string to the DPDK
parsing function.
I think we must allow several occurences of this new option on the CLI.

At the end, the main issue is to find a shiny name for this option ;)

[dpdk-dev] [PATCH v5 2/8] virtio: clean up virtio_dev_queue_setup

2016-06-01 Thread Yuanhan Liu

On Wed, Jun 01, 2016 at 07:44:33AM +, Tan, Jianfeng wrote:
> Hi Yuanhan,
> 
> > -Original Message-
> > From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > Sent: Wednesday, June 1, 2016 3:38 PM
> > To: Tan, Jianfeng
> > Cc: dev at dpdk.org; Xie, Huawei; rich.lane at bigswitch.com; mst at 
> > redhat.com;
> > nakajima.yoshihiro at lab.ntt.co.jp; p.fedin at samsung.com;
> > ann.zhuangyanying at huawei.com; mukawa at igel.co.jp;
> > nhorman at tuxdriver.com
> > Subject: Re: [PATCH v5 2/8] virtio: clean up virtio_dev_queue_setup
> > 
> > On Mon, May 30, 2016 at 10:55:33AM +, Jianfeng Tan wrote:
> > > Abstract vring hdr desc init as an inline method.
> > 
> > What's this patch for then? In your last version, it will be invoked
> > twice, but it turned out to be wrong. So, why keeping this change?
> > I didn't see it improves anything.
> > 
> 
> Yes, for now, only one version is kept because the position to call this 
> function is changed. And I think this segment of code functions as a special 
> purpose, which can be abstracted as a function, make sense?

Yeah, maybe. But idealy, we should move it to tx_queue_setup() function.
Let's see what we might do after applying Huawei's rx/tx split patch: it
needs a while (I saw bugs).

--yliu

[dpdk-dev] [PATCH] ivshmem: document a potential segmentation fault in rte_ring

2016-06-01 Thread Anatoly Burakov

Commit 4768c475 added a pointer to the memzone in rte_ring. However,
all memzones are residing in local mem_config, therefore accessing
the memzone pointer inside the guest in an IVSHMEM-shared rte_ring
will cause segmentation fault. This issue is unlikely to ever get
fixed, as this would require lots of changes for very little benefit,
therefore we're documenting this limitation instead.

Signed-off-by: Anatoly Burakov 
---
 doc/guides/prog_guide/ivshmem_lib.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/doc/guides/prog_guide/ivshmem_lib.rst 
b/doc/guides/prog_guide/ivshmem_lib.rst
index 9401ccf..b8a32e4 100644
--- a/doc/guides/prog_guide/ivshmem_lib.rst
+++ b/doc/guides/prog_guide/ivshmem_lib.rst
@@ -79,6 +79,8 @@ The following is a simple guide to using the IVSHMEM Library 
API:
 Only data structures fully residing in DPDK hugepage memory work correctly.
 Supported data structures created by malloc(), mmap()
 or otherwise using non-DPDK memory cause undefined behavior and even a 
segmentation fault.
+Specifically, because the memzone field in an rte_ring refers to a memzone 
structure residing in local memory,
+accessing the memzone field in a shared rte_ring will cause an immediate 
segmentation fault.

 IVSHMEM Environment Configuration
 -
-- 
2.5.0

[dpdk-dev] [PATCH v5 2/8] virtio: clean up virtio_dev_queue_setup

2016-06-01 Thread Yuanhan Liu

On Mon, May 30, 2016 at 10:55:33AM +, Jianfeng Tan wrote:
> Abstract vring hdr desc init as an inline method.

What's this patch for then? In your last version, it will be invoked
twice, but it turned out to be wrong. So, why keeping this change?
I didn't see it improves anything.

--yliu

[dpdk-dev] [PATCH v3] virtio: split virtio rx/tx queue

2016-06-01 Thread Yuanhan Liu

On Mon, May 30, 2016 at 05:06:20PM +0800, Huawei Xie wrote:
> We keep a common vq structure, containing only vq related fields,
> and then split others into RX, TX and control queue respectively.
> 
> Signed-off-by: Huawei Xie 
> ---
> v2:
> - don't split virtio_dev_rx/tx_queue_setup
> v3:
> - fix some 80 char warnings
> - fix other newer version checkpatch warnings
> - remove '\n' in PMD_RX_LOG

Weird, I still saw them.

Besides that, I found a crash issue with this patch applied. You could
easily reproduce it by:

testpmd> start tx_first
testpmd> quit

[82774.055297] testpmd[9661]: segfault at 94 ip 004f7ef0 sp 
7ffcb1fa
66c0 error 4 in testpmd[40+25b000]
./t.pmd: line 11:  9661 Segmentation fault   (core dumped) 
$RTE_SDK/$RTE
_TARGET/app/testpmd -c 0x1f -n 4 -- --nb-cores=4 -i --disable-hw-vlan 
--txqflags
 0xf00 --rxq=$nr_queues --txq=$nr_queues --rxd=256 --txd=256

--yliu

[dpdk-dev] [PATCH] ivshmem: add all memzones of mempool to metada

2016-06-01 Thread Burakov, Anatoly

> From: Yigit, Ferruh
> Sent: Wednesday, June 1, 2016 2:18 PM
> To: dev at dpdk.org
> Cc: Burakov, Anatoly ; Olivier Matz
> ; Yigit, Ferruh 
> Subject: [PATCH] ivshmem: add all memzones of mempool to metada
> 
> Mempool consist of multiple memzones, at least from two of them.
> ivshmem assumes mempool and elements are all in same memzone.
> 
> Updating code to add all memzones when a mempool added.
> 
> Fixes: d1d914ebbc25 ("mempool: allocate in several memory chunks by
> default")
> 
> Signed-off-by: Ferruh Yigit 
> ---
>  lib/librte_ivshmem/rte_ivshmem.c | 30 ++
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_ivshmem/rte_ivshmem.c
> b/lib/librte_ivshmem/rte_ivshmem.c
> index c8b332c..5c83920 100644
> --- a/lib/librte_ivshmem/rte_ivshmem.c
> +++ b/lib/librte_ivshmem/rte_ivshmem.c
> @@ -548,25 +548,39 @@ add_ring_to_metadata(const struct rte_ring * r,
>  }
> 
>  static int
> -add_mempool_to_metadata(const struct rte_mempool * mp,
> - struct ivshmem_config * config)
> +add_mempool_memzone_to_metadata(const void *addr,
> + struct ivshmem_config *config)
>  {
> - struct rte_memzone * mz;
> - int ret;
> + struct rte_memzone *mz;
> 
> - mz = get_memzone_by_addr(mp);
> - ret = 0;
> + mz = get_memzone_by_addr(addr);
> 
>   if (!mz) {
>   RTE_LOG(ERR, EAL, "Cannot find memzone for
> mempool!\n");
>   return -1;
>   }
> 
> - /* mempool consists of memzone and ring */
> - ret = add_memzone_to_metadata(mz, config);
> + return add_memzone_to_metadata(mz, config);
> +}
> +
> +static int
> +add_mempool_to_metadata(const struct rte_mempool *mp,
> + struct ivshmem_config *config)
> +{
> + struct rte_mempool_memhdr *memhdr;
> + int ret;
> +
> + ret = add_mempool_memzone_to_metadata(mp, config);
>   if (ret < 0)
>   return -1;
> 
> + STAILQ_FOREACH(memhdr, >mem_list, next) {
> + ret = add_mempool_memzone_to_metadata(memhdr-
> >addr, config);
> + if (ret < 0)
> + return -1;
> + }
> +
> + /* mempool consists of memzone and ring */
>   return add_ring_to_metadata(mp->ring, config);
>  }
> 
> --
> 2.5.5

Acked-by: Anatoly  Burakov

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Wiles, Keith

Not to highjack this thread I created another one ? please have a look, thanks.
http://dpdk.org/ml/archives/dev/2016-June/040079.html

Regards,
Keith

-Original Message-
From: dev  on behalf of Thomas Monjalon 

Date: Wednesday, June 1, 2016 at 9:03 AM
To: Yuanhan Liu 
Cc: "dev at dpdk.org" , Bruce Richardson , "Tan, Jianfeng" , Stephen Hemminger 
, Christian Ehrhardt , Panu Matilainen , Olivier Matz 

Subject: Re: [dpdk-dev] [RFC] kernel paramters like DPDK CLI options

>2016-06-01 21:19, Yuanhan Liu:
>> On Wed, Jun 01, 2016 at 02:39:28PM +0200, Thomas Monjalon wrote:
>> > I was thinking to implement the library options parsing in DPDK.
>> > But if the application implements its own options parsing without using
>> > the DPDK one, yes the option parsing is obviously done in the application.
>> > 
>> > > I'd say, that would work, but I see inflexibility and some drawbacks:
>> > > 
>> > > - I would assume "--pciopt" has the input style of
>> > > 
>> > >   "domain:bus:devid:func,option1,option2,..."
>> > > 
>> > >   It then looks hard to me to use it: I need figure out the
>> > >   pci id first.
>> > 
>> > What do you suggest instead of PCI id?
>>

[dpdk-dev] [PATCH v2 6/6] ip_pipeline: update release notes

2016-06-01 Thread Jasvinder Singh

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 doc/guides/rel_notes/release_16_07.rst | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index 30e78d4..d8215bc 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -47,6 +47,16 @@ New Features
   * Dropped specific Xen Dom0 code.
   * Dropped specific anonymous mempool code in testpmd.

+* **IP Pipeline Application.**
+
+  The following features have been added to ip_pipeline application;
+
+  * Configure the MAC address in the routing pipeline and automatic routes
+updates with change in link state.
+  * Enable RSS per network interface through the ip pipeline application
+configuration file.
+  * Streamline the CLI code of the IP pipeline application.
+

 Resolved Issues
 ---
-- 
2.5.5

[dpdk-dev] [PATCH v2 5/6] ip_pipeline: sample config file on adding various network layer components

2016-06-01 Thread Jasvinder Singh

The sample configuration file demonstrates that network layer components such
as TCP, UDP, ICMP etc, can be easily integrated into ip pipeline infrastructure.
Similarily, various other functionalities such as IP Reassembly for input
traffic with local destination and IP Fragmentation to enforce the MTU for
the routed output traffic, can be added using SWQs enabled with
reassembly and fragmentation features.

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/config/network_layers.cfg | 223 +
 examples/ip_pipeline/config/network_layers.sh  |  79 +
 2 files changed, 302 insertions(+)
 create mode 100644 examples/ip_pipeline/config/network_layers.cfg
 create mode 100644 examples/ip_pipeline/config/network_layers.sh

diff --git a/examples/ip_pipeline/config/network_layers.cfg 
b/examples/ip_pipeline/config/network_layers.cfg
new file mode 100644
index 000..8054d9f
--- /dev/null
+++ b/examples/ip_pipeline/config/network_layers.cfg
@@ -0,0 +1,223 @@
+;   BSD LICENSE
+;
+;   Copyright(c) 2016 Intel Corporation. All rights reserved.
+;   All rights reserved.
+;
+;   Redistribution and use in source and binary forms, with or without
+;   modification, are permitted provided that the following conditions
+;   are met:
+;
+; * Redistributions of source code must retain the above copyright
+;   notice, this list of conditions and the following disclaimer.
+; * Redistributions in binary form must reproduce the above copyright
+;   notice, this list of conditions and the following disclaimer in
+;   the documentation and/or other materials provided with the
+;   distribution.
+; * Neither the name of Intel Corporation nor the names of its
+;   contributors may be used to endorse or promote products derived
+;   from this software without specific prior written permission.
+;
+;   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+;   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+;   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+;   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+;   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+;   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+;   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+;   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+;   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+;   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+;   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+; The diagram below shows how additional protocol components can be plugged 
into
+; the IP layer implemented by the ip_pipeline application. Pick your favorite
+; open source components for dynamic ARP, ICMP, UDP or TCP termination, etc and
+; connect them through SWQs to the IP infrastructure.
+;
+; The input packets with local destination are sent to the UDP/TCP applications
+; while the input packets with remote destination are routed back to the
+; network. Additional features can easily be added to this setup:
+;  * IP Reassembly: add SWQs with IP reassembly enabled (typically required for
+;the input traffic with local destination);
+;  * IP Fragmentation: add SWQs with IP fragmentation enabled (typically
+;required to enforce the MTU for the routed output traffic);
+;  * Traffic Metering: add Flow Action pipeline instances (e.g. for metering 
the
+;TCP connections or ICMP input traffic);
+;  * Traffic Management: add TMs for the required output LINKs;
+;  * Protocol encapsulations (QinQ, MPLS) for the output packets: part of the
+;routing pipeline configuration.
+;
+; _   _
+;| | | |
+;|   UDP   | |   TCP   |
+;|   App   | |   App   |
+;|_| |_|
+;   ^   |   ^   |
+; __|___V__   __|___V__
+;| |  SWQ0 (UDP TX)  | |  SWQ1 (TCP TX)
+;|   UDP   |---+ |   TCP   |+
+;| |   | | ||
+;|_|   | |_||
+; ^|  ^ |
+; | SWQ2   |  | SWQ3|
+; | (UDP RX)   |  | (TCP RX)|
+; ||  | |
+;| |   | | ||
+;

[dpdk-dev] [PATCH v2 4/6] ip_pipeline: automatic routes update with the change in nic ports state

2016-06-01 Thread Jasvinder Singh

The routing pipeline registers a callback function with the nic ports and
this function is invoked for updating the routing entries (corrsponding to
local host and directly attached network) tables whenever the nic ports
change their states (up/down).

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/app.h |  21 ++-
 examples/ip_pipeline/pipeline/pipeline_common_fe.c |  52 
 examples/ip_pipeline/pipeline/pipeline_common_fe.h |   7 +
 examples/ip_pipeline/pipeline/pipeline_routing.c   | 146 -
 .../ip_pipeline/pipeline/pipeline_routing_be.c |  19 +++
 .../ip_pipeline/pipeline/pipeline_routing_be.h |   1 +
 6 files changed, 241 insertions(+), 5 deletions(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 93e96d0..848244a 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -241,6 +241,22 @@ struct app_pipeline_params {
uint32_t n_args;
 };

+struct app_params;
+
+typedef void (*app_link_op)(struct app_params *app,
+   uint32_t link_id,
+   uint32_t up,
+   void *arg);
+
+#ifndef APP_MAX_PIPELINES
+#define APP_MAX_PIPELINES64
+#endif
+
+struct app_link_data {
+   app_link_op f_link[APP_MAX_PIPELINES];
+   void *arg[APP_MAX_PIPELINES];
+};
+
 struct app_pipeline_data {
void *be;
void *fe;
@@ -416,10 +432,6 @@ struct app_eal_params {
 #define APP_MAX_MSGQ 256
 #endif

-#ifndef APP_MAX_PIPELINES
-#define APP_MAX_PIPELINES64
-#endif
-
 #ifndef APP_EAL_ARGC
 #define APP_EAL_ARGC 64
 #endif
@@ -480,6 +492,7 @@ struct app_params {
struct cpu_core_map *core_map;
uint64_t core_mask;
struct rte_mempool *mempool[APP_MAX_MEMPOOLS];
+   struct app_link_data link_data[APP_MAX_LINKS];
struct rte_ring *swq[APP_MAX_PKTQ_SWQ];
struct rte_sched_port *tm[APP_MAX_PKTQ_TM];
struct rte_ring *msgq[APP_MAX_MSGQ];
diff --git a/examples/ip_pipeline/pipeline/pipeline_common_fe.c 
b/examples/ip_pipeline/pipeline/pipeline_common_fe.c
index fa0a482..70c57e4 100644
--- a/examples/ip_pipeline/pipeline/pipeline_common_fe.c
+++ b/examples/ip_pipeline/pipeline/pipeline_common_fe.c
@@ -419,6 +419,40 @@ app_pipeline_port_in_disable(struct app_params *app,
 }

 int
+app_link_set_op(struct app_params *app,
+   uint32_t link_id,
+   uint32_t pipeline_id,
+   app_link_op op,
+   void *arg)
+{
+   struct app_pipeline_params *pp;
+   struct app_link_params *lp;
+   struct app_link_data *ld;
+   uint32_t ppos, lpos;
+
+   /* Check input arguments */
+   if ((app == NULL) ||
+   (op == NULL))
+   return -1;
+
+   APP_PARAM_FIND_BY_ID(app->link_params, "LINK", link_id, lp);
+   if (lp == NULL)
+   return -1;
+   lpos = lp - app->link_params;
+   ld = >link_data[lpos];
+
+   APP_PARAM_FIND_BY_ID(app->pipeline_params, "PIPELINE", pipeline_id, pp);
+   if (pp == NULL)
+   return -1;
+   ppos = pp - app->pipeline_params;
+
+   ld->f_link[ppos] = op;
+   ld->arg[ppos] = arg;
+
+   return 0;
+}
+
+int
 app_link_config(struct app_params *app,
uint32_t link_id,
uint32_t ip,
@@ -489,6 +523,8 @@ app_link_up(struct app_params *app,
uint32_t link_id)
 {
struct app_link_params *p;
+   struct app_link_data *d;
+   int i;

/* Check input arguments */
if (app == NULL)
@@ -501,6 +537,8 @@ app_link_up(struct app_params *app,
return -1;
}

+   d = >link_data[p - app->link_params];
+
/* Check link state */
if (p->state) {
APP_LOG(app, HIGH, "%s is already UP", p->name);
@@ -515,6 +553,11 @@ app_link_up(struct app_params *app,

app_link_up_internal(app, p);

+   /* Callbacks */
+   for (i = 0; i < APP_MAX_PIPELINES; i++)
+   if (d->f_link[i])
+   d->f_link[i](app, link_id, 1, d->arg[i]);
+
return 0;
 }

@@ -523,6 +566,8 @@ app_link_down(struct app_params *app,
uint32_t link_id)
 {
struct app_link_params *p;
+   struct app_link_data *d;
+   uint32_t i;

/* Check input arguments */
if (app == NULL)
@@ -535,6 +580,8 @@ app_link_down(struct app_params *app,
return -1;
}

+   d = >link_data[p - app->link_params];
+
/* Check link state */
if (p->state == 0) {
APP_LOG(app, HIGH, "%s is already DOWN", p->name);
@@ -543,6 +590,11 @@ app_link_down(struct app_params *app,

app_link_down_internal(app, p);

+   /* Callbacks */
+   for (i = 0; i < APP_MAX_PIPELINES; i++)
+   if (d->f_link[i])
+   d->f_link[i](app, link_id, 0, d->arg[i]);
+
return 0;
 }

diff --git

[dpdk-dev] [PATCH v2 3/6] ip_pipeline: assign nic ports mac address to the routing pipeline outports

2016-06-01 Thread Jasvinder Singh

As a result of tracking, output ports of routing pipelines are linked with
physical nic ports(potentially through other pipeline instances). Thus, the
mac addresses of the NIC ports are assigned to routing pipeline out ports which
are connected to them and are further used in routing table entries instead
of hardcoded default values.

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/pipeline/pipeline_routing.c   | 89 --
 examples/ip_pipeline/pipeline/pipeline_routing.h   |  7 ++
 .../ip_pipeline/pipeline/pipeline_routing_be.c | 61 ++-
 .../ip_pipeline/pipeline/pipeline_routing_be.h | 15 
 4 files changed, 149 insertions(+), 23 deletions(-)

diff --git a/examples/ip_pipeline/pipeline/pipeline_routing.c 
b/examples/ip_pipeline/pipeline/pipeline_routing.c
index 4d074c1..80f029d 100644
--- a/examples/ip_pipeline/pipeline/pipeline_routing.c
+++ b/examples/ip_pipeline/pipeline/pipeline_routing.c
@@ -58,8 +58,11 @@ struct app_pipeline_routing_arp_entry {

 struct pipeline_routing {
/* Parameters */
+   struct app_params *app;
+   uint32_t pipeline_id;
uint32_t n_ports_in;
uint32_t n_ports_out;
+   struct pipeline_routing_params rp;

/* Routes */
TAILQ_HEAD(, app_pipeline_routing_route) routes;
@@ -79,11 +82,13 @@ struct pipeline_routing {
 };

 static void *
-pipeline_routing_init(struct pipeline_params *params,
-   __rte_unused void *arg)
+app_pipeline_routing_init(struct pipeline_params *params,
+   void *arg)
 {
+   struct app_params *app = (struct app_params *) arg;
struct pipeline_routing *p;
-   uint32_t size;
+   uint32_t pipeline_id, size;
+   int status;

/* Check input arguments */
if ((params == NULL) ||
@@ -91,6 +96,8 @@ pipeline_routing_init(struct pipeline_params *params,
(params->n_ports_out == 0))
return NULL;

+   APP_PARAM_GET_ID(params, "PIPELINE", pipeline_id);
+
/* Memory allocation */
size = RTE_CACHE_LINE_ROUNDUP(sizeof(struct pipeline_routing));
p = rte_zmalloc(NULL, size, RTE_CACHE_LINE_SIZE);
@@ -98,9 +105,16 @@ pipeline_routing_init(struct pipeline_params *params,
return NULL;

/* Initialization */
+   p->app = app;
+   p->pipeline_id = pipeline_id;
p->n_ports_in = params->n_ports_in;
p->n_ports_out = params->n_ports_out;

+   status = pipeline_routing_parse_args(>rp, params);
+   if (status) {
+   rte_free(p);
+   return NULL;
+   }
TAILQ_INIT(>routes);
p->n_routes = 0;

@@ -111,6 +125,18 @@ pipeline_routing_init(struct pipeline_params *params,
 }

 static int
+app_pipeline_routing_post_init(void *pipeline)
+{
+   struct pipeline_routing *p = pipeline;
+
+   /* Check input arguments */
+   if (p == NULL)
+   return -1;
+
+   return app_pipeline_routing_set_macaddr(p->app, p->pipeline_id);
+}
+
+static int
 app_pipeline_routing_free(void *pipeline)
 {
struct pipeline_routing *p = pipeline;
@@ -848,6 +874,59 @@ app_pipeline_routing_delete_default_arp_entry(struct 
app_params *app,
return 0;
 }

+int
+app_pipeline_routing_set_macaddr(struct app_params *app,
+   uint32_t pipeline_id)
+{
+   struct app_pipeline_params *p;
+   struct pipeline_routing_set_macaddr_msg_req *req;
+   struct pipeline_routing_set_macaddr_msg_rsp *rsp;
+   uint32_t port_id;
+
+   /* Check input arguments */
+   if (app == NULL)
+   return -EINVAL;
+
+   APP_PARAM_FIND_BY_ID(app->pipeline_params, "PIPELINE", pipeline_id, p);
+   if (p == NULL)
+   return -EINVAL;
+
+   /* Allocate and write request */
+   req = app_msg_alloc(app);
+   if (req == NULL)
+   return -ENOMEM;
+
+   req->type = PIPELINE_MSG_REQ_CUSTOM;
+   req->subtype = PIPELINE_ROUTING_MSG_REQ_SET_MACADDR;
+
+   memset(req->macaddr, 0, sizeof(req->macaddr));
+   for (port_id = 0; port_id < p->n_pktq_out; port_id++) {
+   struct app_link_params *link;
+
+   link = app_pipeline_track_pktq_out_to_link(app,
+   pipeline_id,
+   port_id);
+   if (link)
+   req->macaddr[port_id] = link->mac_addr;
+   }
+
+   /* Send request and wait for response */
+   rsp = app_msg_send_recv(app, pipeline_id, req, MSG_TIMEOUT_DEFAULT);
+   if (rsp == NULL)
+   return -ETIMEDOUT;
+
+   /* Read response and write entry */
+   if (rsp->status) {
+   app_msg_free(app, rsp);
+   return rsp->status;
+   }
+
+   /* Free response */
+   app_msg_free(app, rsp);
+
+   return 0;
+}
+
 /*
  * route
  *
@@ -1385,8 +1464,8 @@ static cmdline_parse_ctx_t pipeline_cmds[] = {
 };

 static struct pipeline_fe_ops pipeline_routing_fe_ops = {
-

[dpdk-dev] [PATCH v2 2/6] ip_pipeline: linking routing pipeline output ports with NIC ports

2016-06-01 Thread Jasvinder Singh

This commit implements tracking mechanism for linking routing pipeline
output ports to physical NIC ports.

Once all the pipelines of the application are initialised, mechanism is
invoked during post initialisation phase for relating routing pipeline
output with NIC ports by navigating through the intermediate pipelines,
if present.

The tracking functions of the pipelines which help in navigating through
the intermediate pipelines are moved from pipeline__be.c
to pipeline_.c. All pipelines except passthrough pipelines
use default tracking function (pipeline/pipeline_common_fe.c).

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/app.h | 151 -
 examples/ip_pipeline/init.c|  32 -
 examples/ip_pipeline/pipeline.h|  11 +-
 examples/ip_pipeline/pipeline/pipeline_common_fe.c | 109 +++
 examples/ip_pipeline/pipeline/pipeline_common_fe.h |  10 ++
 examples/ip_pipeline/pipeline/pipeline_firewall.c  |   2 +
 .../ip_pipeline/pipeline/pipeline_firewall_be.c|  22 ---
 .../ip_pipeline/pipeline/pipeline_flow_actions.c   |   2 +
 .../pipeline/pipeline_flow_actions_be.c|  22 ---
 .../pipeline/pipeline_flow_classification.c|   2 +
 .../pipeline/pipeline_flow_classification_be.c |  22 ---
 examples/ip_pipeline/pipeline/pipeline_master.c|   2 +
 examples/ip_pipeline/pipeline/pipeline_master_be.c |  13 +-
 .../ip_pipeline/pipeline/pipeline_passthrough.c|  27 
 .../ip_pipeline/pipeline/pipeline_passthrough_be.c |  39 +++---
 examples/ip_pipeline/pipeline/pipeline_routing.c   |   6 +-
 .../ip_pipeline/pipeline/pipeline_routing_be.c |  22 ---
 examples/ip_pipeline/pipeline_be.h |  10 +-
 18 files changed, 380 insertions(+), 124 deletions(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index dfdfb51..93e96d0 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -205,9 +205,7 @@ struct app_pktq_out_params {
uint32_t id; /* Position in the appropriate app array */
 };

-#ifndef APP_PIPELINE_TYPE_SIZE
-#define APP_PIPELINE_TYPE_SIZE   64
-#endif
+#define APP_PIPELINE_TYPE_SIZE   PIPELINE_TYPE_SIZE

 #define APP_MAX_PIPELINE_PKTQ_IN PIPELINE_MAX_PORT_IN
 #define APP_MAX_PIPELINE_PKTQ_OUTPIPELINE_MAX_PORT_OUT
@@ -651,6 +649,41 @@ app_swq_get_readers(struct app_params *app, struct 
app_pktq_swq_params *swq)
return n_readers;
 }

+static inline struct app_pipeline_params *
+app_swq_get_reader(struct app_params *app,
+   struct app_pktq_swq_params *swq,
+   uint32_t *pktq_in_id)
+{
+   struct app_pipeline_params *reader;
+   uint32_t pos = swq - app->swq_params;
+   uint32_t n_pipelines = RTE_MIN(app->n_pipelines,
+   RTE_DIM(app->pipeline_params));
+   uint32_t n_readers = 0, id, i;
+
+   for (i = 0; i < n_pipelines; i++) {
+   struct app_pipeline_params *p = >pipeline_params[i];
+   uint32_t n_pktq_in = RTE_MIN(p->n_pktq_in, RTE_DIM(p->pktq_in));
+   uint32_t j;
+
+   for (j = 0; j < n_pktq_in; j++) {
+   struct app_pktq_in_params *pktq = >pktq_in[j];
+
+   if ((pktq->type == APP_PKTQ_IN_SWQ) &&
+   (pktq->id == pos)) {
+   n_readers++;
+   reader = p;
+   id = j;
+   }
+   }
+   }
+
+   if (n_readers != 1)
+   return NULL;
+
+   *pktq_in_id = id;
+   return reader;
+}
+
 static inline uint32_t
 app_tm_get_readers(struct app_params *app, struct app_pktq_tm_params *tm)
 {
@@ -676,6 +709,41 @@ app_tm_get_readers(struct app_params *app, struct 
app_pktq_tm_params *tm)
return n_readers;
 }

+static inline struct app_pipeline_params *
+app_tm_get_reader(struct app_params *app,
+   struct app_pktq_tm_params *tm,
+   uint32_t *pktq_in_id)
+{
+   struct app_pipeline_params *reader;
+   uint32_t pos = tm - app->tm_params;
+   uint32_t n_pipelines = RTE_MIN(app->n_pipelines,
+   RTE_DIM(app->pipeline_params));
+   uint32_t n_readers = 0, id, i;
+
+   for (i = 0; i < n_pipelines; i++) {
+   struct app_pipeline_params *p = >pipeline_params[i];
+   uint32_t n_pktq_in = RTE_MIN(p->n_pktq_in, RTE_DIM(p->pktq_in));
+   uint32_t j;
+
+   for (j = 0; j < n_pktq_in; j++) {
+   struct app_pktq_in_params *pktq = >pktq_in[j];
+
+   if ((pktq->type == APP_PKTQ_IN_TM) &&
+   (pktq->id == pos)) {
+   n_readers++;
+   reader = p;
+   id = j;
+   }
+   }
+   }
+
+

[dpdk-dev] [PATCH v2 1/6] ip_pipeline: increase macros values

2016-06-01 Thread Jasvinder Singh

To allow more queues, pipeline types, threads, source/sink ports,etc., in
the ip pipeline application, larger values of macros are set.

Signed-off-by: Jasvinder Singh 
Acked-by: Cristian Dumitrescu 
---
 examples/ip_pipeline/app.h | 12 ++--
 examples/ip_pipeline/pipeline_be.h |  8 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
index 976fbd0..dfdfb51 100644
--- a/examples/ip_pipeline/app.h
+++ b/examples/ip_pipeline/app.h
@@ -52,11 +52,11 @@
 #define APP_LINK_PCI_BDF_SIZE16

 #ifndef APP_LINK_MAX_HWQ_IN
-#define APP_LINK_MAX_HWQ_IN  64
+#define APP_LINK_MAX_HWQ_IN  128
 #endif

 #ifndef APP_LINK_MAX_HWQ_OUT
-#define APP_LINK_MAX_HWQ_OUT 64
+#define APP_LINK_MAX_HWQ_OUT 128
 #endif

 struct app_mempool_params {
@@ -261,7 +261,7 @@ struct app_thread_pipeline_data {
 };

 #ifndef APP_MAX_THREAD_PIPELINES
-#define APP_MAX_THREAD_PIPELINES 16
+#define APP_MAX_THREAD_PIPELINES 64
 #endif

 #ifndef APP_THREAD_TIMER_PERIOD
@@ -407,15 +407,15 @@ struct app_eal_params {
 #define APP_MAX_PKTQ_TM  APP_MAX_LINKS

 #ifndef APP_MAX_PKTQ_SOURCE
-#define APP_MAX_PKTQ_SOURCE  16
+#define APP_MAX_PKTQ_SOURCE  64
 #endif

 #ifndef APP_MAX_PKTQ_SINK
-#define APP_MAX_PKTQ_SINK16
+#define APP_MAX_PKTQ_SINK64
 #endif

 #ifndef APP_MAX_MSGQ
-#define APP_MAX_MSGQ 64
+#define APP_MAX_MSGQ 256
 #endif

 #ifndef APP_MAX_PIPELINES
diff --git a/examples/ip_pipeline/pipeline_be.h 
b/examples/ip_pipeline/pipeline_be.h
index f4ff262..8f858ed 100644
--- a/examples/ip_pipeline/pipeline_be.h
+++ b/examples/ip_pipeline/pipeline_be.h
@@ -200,15 +200,15 @@ pipeline_port_out_params_get_ops(struct 
pipeline_port_out_params  *p)
 }

 #ifndef PIPELINE_NAME_SIZE
-#define PIPELINE_NAME_SIZE   32
+#define PIPELINE_NAME_SIZE   64
 #endif

 #ifndef PIPELINE_MAX_PORT_IN
-#define PIPELINE_MAX_PORT_IN 16
+#define PIPELINE_MAX_PORT_IN 64
 #endif

 #ifndef PIPELINE_MAX_PORT_OUT
-#define PIPELINE_MAX_PORT_OUT16
+#define PIPELINE_MAX_PORT_OUT64
 #endif

 #ifndef PIPELINE_MAX_TABLES
@@ -224,7 +224,7 @@ pipeline_port_out_params_get_ops(struct 
pipeline_port_out_params  *p)
 #endif

 #ifndef PIPELINE_MAX_ARGS
-#define PIPELINE_MAX_ARGS32
+#define PIPELINE_MAX_ARGS64
 #endif

 struct pipeline_params {
-- 
2.5.5

[dpdk-dev] [PATCH v2 0/6] ip_pipeline: routing pipeline improvements

2016-06-01 Thread Jasvinder Singh

This commit adds following features to the routing pipeline;

1. Implements the tracking mechanism for the routing pipeline for identifying
the physical nic port where a specific output ports of the routing pipeline are
eventually connected. Depending upon the application, tracking could involve
traversing the other intermediate pipelines.

The pipelines such as pass-through allows tracking to be performaned through
them becasue of straightforward connections between their input and output
ports, while pipelines like flow-classifications, firewall fails the tracking
because of dependency upon the table rule set. As a result of tracking
mechainsm, routing pipeline uses the real MAC addresses of the network
interfaces instead of hardcoded default value.

2. Adds support for automatic route automatic update when physical NIC ports
change their state (up/down) or configuration. Every time a physical port
goes up/down, a call-back function that the specific pipeline type
(e.g. routing) registered with NIC ports at init time; will simply add/delete
a route associated with that output port.

v2
- rebased on top of "ip_pipeline: add rss support" patch
  http://dpdk.org/dev/patchwork/patch/13110/
- split the single patch into multiple patches of the patchset. 
- add post init function for tracking and linking physical nic with 
  routing pipeline output port
- create default tracking function "app_pipeline_track_default()" in
  pipeline_common_fe.c for all the pipelines except pass-through
- fix port in/out connections in passthrough pipeline tracking function
  and move that function to pipeline/pipeline_passthrough.c
- fix mac address endianess issue in routing pipeline table
- add/remove implicit routes (for host and local network) when corresponding
  link is brought up/down
- update the release notes

Acked-by: Cristian Dumitrescu 

Jasvinder Singh (6):
  ip_pipeline: increase macros values
  ip_pipeline: linking routing pipeline output ports with NIC ports
  ip_pipeline: assign nic ports mac address to the routing pipeline
outports
  ip_pipeline: automatic routes update with the change in nic ports
state
  ip_pipeline: sample config file on adding various network layer
components
  ip_pipeline: update release notes

 doc/guides/rel_notes/release_16_07.rst |  10 +
 examples/ip_pipeline/app.h | 184 ++--
 examples/ip_pipeline/config/network_layers.cfg | 223 +++
 examples/ip_pipeline/config/network_layers.sh  |  79 +++
 examples/ip_pipeline/init.c|  32 ++-
 examples/ip_pipeline/pipeline.h|  11 +-
 examples/ip_pipeline/pipeline/pipeline_common_fe.c | 161 ++
 examples/ip_pipeline/pipeline/pipeline_common_fe.h |  17 ++
 examples/ip_pipeline/pipeline/pipeline_firewall.c  |   2 +
 .../ip_pipeline/pipeline/pipeline_firewall_be.c|  22 --
 .../ip_pipeline/pipeline/pipeline_flow_actions.c   |   2 +
 .../pipeline/pipeline_flow_actions_be.c|  22 --
 .../pipeline/pipeline_flow_classification.c|   2 +
 .../pipeline/pipeline_flow_classification_be.c |  22 --
 examples/ip_pipeline/pipeline/pipeline_master.c|   2 +
 examples/ip_pipeline/pipeline/pipeline_master_be.c |  13 +-
 .../ip_pipeline/pipeline/pipeline_passthrough.c|  27 +++
 .../ip_pipeline/pipeline/pipeline_passthrough_be.c |  39 ++--
 examples/ip_pipeline/pipeline/pipeline_routing.c   | 239 -
 examples/ip_pipeline/pipeline/pipeline_routing.h   |   7 +
 .../ip_pipeline/pipeline/pipeline_routing_be.c | 102 +
 .../ip_pipeline/pipeline/pipeline_routing_be.h |  16 ++
 examples/ip_pipeline/pipeline_be.h |  18 +-
 23 files changed, 1091 insertions(+), 161 deletions(-)
 create mode 100644 examples/ip_pipeline/config/network_layers.cfg
 create mode 100644 examples/ip_pipeline/config/network_layers.sh

-- 
2.5.5

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Wiles, Keith

Started from the link below, but did not want to highjack the thread.
http://dpdk.org/ml/archives/dev/2016-June/040021.html

I was thinking about this problem from a user perspective and command line 
options are very difficult to manage specifically when you have a large number 
of options as we have in dpdk. I see all of these options as a type of database 
of information for the DPDK and the application, because the application 
command line options are also getting very complex as well.

I have been looking at a number of different options here and the direction I 
was thinking was using a file for the options and configurations with the data 
in a clean format. It could have been a INI file or JSON or XML, but they all 
seem to have some problems I do not like. The INI file is too flat and I wanted 
a hierarchy in the data, the JSON data is similar and XML is just hard to read. 
I wanted to be able to manage multiple applications and possible system the 
DPDK/app runs. The problem with the above formats is they are just data and not 
easy to make decisions about the system and applications at runtime.

If the ?database? of information could be queried by the EAL, drivers and 
application then we do not need to try and create a complex command line. It 
would be nice to execute a DPDK applications like this:

./some_dpdk_app ?config-file dpdk-config-filename

The dpdk-config-filename could contain a lot of information and be able to 
startup multiple different applications. The dpdk-config-file could also 
include other config files to complete the configuration. The format of the 
data in the config file needs to be readable, but allow the user to put in new 
options, needs to be hierarchical in nature and have some simple functions to 
execute if required.

The solution I was thinking is the file information is really just a fragment 
of a scripting language, which the DPDK application contains this scripting 
language interpreter. I was looking at using Lua lua.org as the scripting 
language interpreter it is small and easy to understand. Python and others are 
very big and require a lot of resources and Lua requires very few system 
resources. Also I did not want to have to write a parser (lex/yacc). The other 
nice feature of Lua is you can create a sandbox for the code to run in and 
limit the type of system resources and APIs that can be accessed by the 
application and configuration. Lua can be trimmed down to a fairly small size 
and builds on just about any system or we can just install Lua on the system 
without changes from a rpm or deb.

I use Lua in pktgen at this time and the interface between ?C? and Lua is very 
simple and easy. Currently I include Lua in Pktgen, but I could have just used 
a system library.

The data in the config file can be data statements along with some limited code 
to make some data changes at run time without having to modify the real 
application. Here is a simple config file I wrote: Some of the options do not 
make sense to be in the file at the same time, but wanted to see all of the 
options. The mk_lcore_list() and mk_coremap() are just Lua functions we can 
preload to help convert the simple strings into real data in this case tables 
of information. The application could be something like pktgen = { map = { ? }, 
more_options = 1, } this allows the same file to possible contain many 
application configurations. Needs a bit more work.

dpdk_default = {
lcore_mask = 0xFF00,
lcore_list = mk_lcore_list("0-7", 10, "14-16"),
coremap = mk_coremap("(0-7)@0,10,(14-16)@1"),
master_lcore = 1,
log_level = 7,
ranks = 4,
channels = 2,
memory = 512,
socket_mem = { 512, 512 },
huge_dir = "/mnt/huge",
base_virtaddr = 0,
create_uio_dev = true,
vfio_intr = "legacy",
xen_dom0 = false,
proc_type = "auto",
pci_blacklist = {
"08:00.0",
"08:00.1",
"09:00.0",
"09:00.1",
"83:00.1",
"87:00.0",
"87:00.1",
"89:00.0",
"89:00.1"
},
pci_whitelist = {
},
vdev = {
eth_pcap0 = { iface = "eth2" },
eth_pcap1 = { iface = "eth3" },
},
driver = { },
syslog = true,
vmware_tsc_map = false,
file_prefix = "pg",
huge_unlink = true,
no_huge = false,
no_pci = false,
no_hpet = false,
no_shconf = false,
}

pktgen_data = {
   map = { ? },
   more-data = 1,
}

The EAL, driver, application, ? would query an API to access the data and the 
application can change his options quickly without modifying the code.

Anyway comments are welcome.

Regards,
Keith

[dpdk-dev] [PATCH 1/3] vhost: pre update used ring for Tx and Rx

2016-06-01 Thread Yuanhan Liu

On Wed, Jun 01, 2016 at 06:40:41AM +, Xie, Huawei wrote:
> > /* Retrieve all of the head indexes first to avoid caching issues. */
> > for (i = 0; i < count; i++) {
> > -   desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
> > -   (vq->size - 1)];
> > +   used_idx = (vq->last_used_idx + i) & (vq->size - 1);
> > +   desc_indexes[i] = vq->avail->ring[used_idx];
> > +
> > +   vq->used->ring[used_idx].id  = desc_indexes[i];
> > +   vq->used->ring[used_idx].len = 0;
> > +   vhost_log_used_vring(dev, vq,
> > +   offsetof(struct vring_used, ring[used_idx]),
> > +   sizeof(vq->used->ring[used_idx]));
> > }
> >  
> > /* Prefetch descriptor index. */
> > rte_prefetch0(>desc[desc_indexes[0]]);
> > -   rte_prefetch0(>used->ring[vq->last_used_idx & (vq->size - 1)]);
> > -
> > for (i = 0; i < count; i++) {
> > int err;
> >  
> > -   if (likely(i + 1 < count)) {
> > +   if (likely(i + 1 < count))
> > rte_prefetch0(>desc[desc_indexes[i + 1]]);
> > -   rte_prefetch0(>used->ring[(used_idx + 1) &
> > - (vq->size - 1)]);
> > -   }
> >  
> > pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
> > if (unlikely(pkts[i] == NULL)) {
> > @@ -916,18 +920,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > rte_pktmbuf_free(pkts[i]);
> > break;
> > }
> > -
> > -   used_idx = vq->last_used_idx++ & (vq->size - 1);
> > -   vq->used->ring[used_idx].id  = desc_indexes[i];
> > -   vq->used->ring[used_idx].len = 0;
> > -   vhost_log_used_vring(dev, vq,
> > -   offsetof(struct vring_used, ring[used_idx]),
> > -   sizeof(vq->used->ring[used_idx]));
> > }
> 
> Had tried post-updating used ring in batch,  but forget the perf change.

I would assume pre-updating gives better performance gain, as we are
fiddling with avail and used ring together, which would be more cache
friendly.

> One optimization would be on vhost_log_used_ring.
> I have two ideas,
> a) In QEMU side, we always assume use ring will be changed. so that we
> don't need to log used ring in VHOST.
> 
> Michael: feasible in QEMU? comments on this?
> 
> b) We could always mark the total used ring modified rather than entry
> by entry.

I doubt it's worthwhile. One fact is that vhost_log_used_ring is
a non operation in most time: it will take action only in the short
gap of during live migration.

And FYI, I even tried with all vhost_log_xxx being removed, it showed
no performance boost at all. Therefore, it's not a factor that will
impact performance.

--yliu

[dpdk-dev] [PATCH v2 02/20] thunderx/nicvf: add pmd skeleton

2016-06-01 Thread Jerin Jacob

On Tue, May 31, 2016 at 09:53:55AM -0700, Stephen Hemminger wrote:
> On Sun, 29 May 2016 22:16:46 +0530
> Jerin Jacob  wrote:
> 
> > +
> > +static struct itimerspec alarm_time = {
> > +   .it_interval = {
> > +   .tv_sec = 0,
> > +   .tv_nsec = NICVF_INTR_POLL_INTERVAL_MS * 100,
> > +   },
> > +   .it_value = {
> > +   .tv_sec = 0,
> > +   .tv_nsec = NICVF_INTR_POLL_INTERVAL_MS * 100,
> > +   },
> > +};
> > +
> > +static void
> > +nicvf_interrupt(struct rte_intr_handle *hdl __rte_unused, void *arg)
> > +{
> > +   struct nicvf *nic = (struct nicvf *)arg;
> > +
> > +   nicvf_reg_poll_interrupts(nic);
> > +}
> > +
> > +static int
> > +nicvf_periodic_alarm_start(struct nicvf *nic)
> > +{
> > +   int ret = -EBUSY;
> > +
> > +   nic->intr_handle.type = RTE_INTR_HANDLE_ALARM;
> > +   nic->intr_handle.fd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK);
> > +   if (nic->intr_handle.fd == -1)
> > +   goto error;
> > +   ret = rte_intr_callback_register(>intr_handle,
> > +   nicvf_interrupt, nic);
> > +   ret |= timerfd_settime(nic->intr_handle.fd, 0, _time, NULL);
> > +error:
> > +   return ret;
> > +}
> > +
> > +static int
> > +nicvf_periodic_alarm_stop(struct nicvf *nic)
> > +{
> > +   int ret;
> > +
> > +   ret = rte_intr_callback_unregister(>intr_handle,
> > +   nicvf_interrupt, nic);
> > +   ret |= close(nic->intr_handle.fd);
> > +   return ret;
> > +}
> 
> It would be good to have real link status interrupts or just report that
> device does not support Link State interrupt and let application poll.  
> Having another
> thing going on (timerfd callback) seems like it would add more complexity to
> the environment of an already complex thread structure with DPDK.

But we would still need some polling infrastructure for some 'async'
mbox events and error events from PF.So, I think I can change to
rte_eal_alarm* infrastructure like bond pmd driver. Will fix it in V3.

> 
> Also, timerfd() doesn't exist on all OS's.

[dpdk-dev] [PATCH 2/3] vhost: optimize dequeue for small packets

2016-06-01 Thread Yuanhan Liu

On Wed, Jun 01, 2016 at 06:24:18AM +, Xie, Huawei wrote:
> On 5/3/2016 8:42 AM, Yuanhan Liu wrote:
> > Both current kernel virtio driver and DPDK virtio driver use at least
> > 2 desc buffer for Tx: the first for storing the header, and the others
> > for storing the data.
> 
> Tx could prepend some space for virtio net header whenever possible, so
> that it could use only one descriptor.

In such case, it will work as well: it will goto the "else" code path
then.

> Another thing is this doesn't reduce the check because you also add a check.

Actually, yes, it does save check. Before this patch, we have:

while (not done yet) {
   if (desc_is_drain)
   ...;

   if (mbuf_is_full)
   ...;

   COPY();
}

Note that the "while" check will be done twice, therefore, it's 4
checks in total. After this patch, it would be:

if (virtio_net_hdr_takes_one_desc)
...

while (1) {
COPY();

if (desc_is_drain) {
break if done;
...;
}

if (mbuf_is_full {
/* small packets will bypass this check */
;
}
}

So, for small packets, it takes 2 checks only, which actually saves
2 checks.

--yliu

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Thomas Monjalon

2016-06-01 19:40, Yuanhan Liu:
> On Wed, Jun 01, 2016 at 12:17:50PM +0200, Thomas Monjalon wrote:
> > 2016-06-01 14:04, Yuanhan Liu:
> > > Apparently, adding a new EAL option like "--force-legacy" looks
> > > wrong.
> > > 
> > > The generic yet elegant solution I just thought of while having
> > > lunch is to add a new EAL option, say, --extra-options, where we
> > > could specify driver/subsystem specific options. As you see, it's
> > > nothing big deal, it just looks like Linux kernel parameters.
> > > 
> > > Take above two cases as example, it could be:
> > > 
> > > --extra-options "vhost-owner=kvm:kvm force-legacy"
> > 
> > I think it's better to have CLI options per device.

Sorry I was just talking about per-device options, not libraries.
We may also talk about driver options. So we end up with 4 kind of options:
- device options
- driver options
- other EAL options
- library options
It is not clear what the scope of the --extra-options would be.
We already have EAL options (without specific name or prefix)
and vdev options.
I suggest to add
- PCI options
- driver options (if it makes sense)
- library options

> > Currently we can pass devargs
> > - to PCI device via --pci-whitelist
> 
> Isn't it just for whitelisting a specific PCI device?

Yes it is. As a side effect, it allows to pass some devargs.

> > - to virtual device via --vdev
> 
> Yes, --vdev works great here. However, as its name states, it's
> just for virtual devices. Say, it will not work for virtio PMD,
> the force-legacy option mentioned above.
> 
> > I think we just need to refactor these options to have a generic
> > --device or keep the options in --vdev and add a new --pciopt
> > or something like that.
> 
> --pciopt should be able to allow us pass more options to a specific
> driver. But what about a library, say vhost?

For libraries we can have some specific options.
Should it be prefixed like --mempool:handler=fifo or without prefix
(like EAL options): --mempool-handler=fifo.
This first choice makes easier to parse every mempool options inside
the library.

> > And more importantly, these devargs must be set via a new EAL API
> > to allow applications do these configurations without building/faking
> > some command line arguments.
> > 
> > To make it clear, applications use API and users use CLI (which call API).
> 
> I would agree with that. But that basically works for library; it does
> not quite make sense to me to introduce a new API for some a driver
> option, such as the force-legacy option for virtio PMD.

For drivers, the API must be something like:
devargs_set(port_id, char*)
or
devargs_add(port_id, char*)
So it is generic for every driver-specific options.
There is already rte_eal_devargs_add() but it probably needs to be reworked.

> So, let me make a summary from reading your email, to make sure I get
> you right: for drivers (virtual or physical), we could use --vdev or
> --pciopt for passing args, respectively. For the libraries, we should
> add a new API, and let the application to introduce some options to
> invoke it, to pass the options.

I was thinking to implement the library options parsing in DPDK.
But if the application implements its own options parsing without using
the DPDK one, yes the option parsing is obviously done in the application.

> I'd say, that would work, but I see inflexibility and some drawbacks:
> 
> - I would assume "--pciopt" has the input style of
> 
>   "domain:bus:devid:func,option1,option2,..."
> 
>   It then looks hard to me to use it: I need figure out the
>   pci id first.

What do you suggest instead of PCI id?

> - For the libraries, we have to write code to add new options for
>   each applictions. With the generic option, user just need use it;
>   don't need write a single line of code, which could save user's
>   effort. It also gives user an united interface.

Applications can leverage the DPDK options parsing (without writing
any new line of code).

>   And to make it clear, as stated, I don't object to having an API.
>   I mean, the generic option gives us a chance to do the right
>   configuration at startup time: it would still invoke the right
>   API to do the right thing in the end.

I agree. I just want to split your --extra-option proposal in few options
with a bit more context.

[dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW

2016-06-01 Thread Harish Patil

>

>
>
>> -Original Message-
>> From: Harish Patil [mailto:harish.patil at qlogic.com]
>> Sent: Wednesday, June 1, 2016 3:16 PM
>> To: Thomas Monjalon 
>> Cc: dev at dpdk.org; Richardson, Bruce ; 
>> Rasesh
>> Mody ; Dept-Eng DPDK Dev > EngDPDKDev at qlogic.com>
>> Subject: Re: [dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW
>> 
>> >
>> >2016-05-31 19:21, Harish Patil:
>> >>
>> >> >On Fri, May 06, 2016 at 09:21:31PM -0700, Rasesh Mody wrote:
>> >> >> From: Harish Patil 
>> >> >>
>> >> >> Under certain scenarios, MFW periodically polls the driver for LAN
>> >> >> statistics. This patch implements the osal hook to fill in the
>> >> >> stats.
>> >> >>
>> >> >> Fixes: ffa002d318d36 ("qede: add base driver")
>> >> >>
>> >> >What is MFW?
>> >> >
>> >> >/Bruce
>> >> >
>> >>
>> >> MFW - Management FirmWare running on the card.
>> >
>> >So MFW can probably be replaced by firmware in the title.
>> >
>> 
>> Reason I didn?t use ?firmware? in the first place is because there are
>>two
>> different firmware running on the card:
>> 1) MFW (management firmware - which is flashed)
>> 2) Firmware (datapath  firmware - loaded by driver by reading FW file)
>> 
>> So, I can replace it as management firmware explicitly.
>> 
>> 
>> Thanks,
>> Harish
>> 
>How about firmware in the title, and then you can clarify it as
>management firmware in the message itself?
>
>Regards,
>/Bruce
>

Sure, will do.

[dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW

2016-06-01 Thread Richardson, Bruce



> -Original Message-
> From: Harish Patil [mailto:harish.patil at qlogic.com]
> Sent: Wednesday, June 1, 2016 3:16 PM
> To: Thomas Monjalon 
> Cc: dev at dpdk.org; Richardson, Bruce ; Rasesh
> Mody ; Dept-Eng DPDK Dev  EngDPDKDev at qlogic.com>
> Subject: Re: [dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW
> 
> >
> >2016-05-31 19:21, Harish Patil:
> >>
> >> >On Fri, May 06, 2016 at 09:21:31PM -0700, Rasesh Mody wrote:
> >> >> From: Harish Patil 
> >> >>
> >> >> Under certain scenarios, MFW periodically polls the driver for LAN
> >> >> statistics. This patch implements the osal hook to fill in the
> >> >> stats.
> >> >>
> >> >> Fixes: ffa002d318d36 ("qede: add base driver")
> >> >>
> >> >What is MFW?
> >> >
> >> >/Bruce
> >> >
> >>
> >> MFW - Management FirmWare running on the card.
> >
> >So MFW can probably be replaced by firmware in the title.
> >
> 
> Reason I didn?t use ?firmware? in the first place is because there are two
> different firmware running on the card:
> 1) MFW (management firmware - which is flashed)
> 2) Firmware (datapath  firmware - loaded by driver by reading FW file)
> 
> So, I can replace it as management firmware explicitly.
> 
> 
> Thanks,
> Harish
> 
How about firmware in the title, and then you can clarify it as management 
firmware in the message itself?

Regards,
/Bruce

[dpdk-dev] [dpdk-dev,v5,1/3] mempool: support external handler

2016-06-01 Thread Jan Viktorin

On Tue, 31 May 2016 22:40:59 +0200
Olivier MATZ  wrote:

> Hi,
> 
> On 05/31/2016 03:47 PM, Hunt, David wrote:
> > On 5/31/2016 1:06 PM, Jan Viktorin wrote:  
> >> On Tue, 31 May 2016 10:09:42 +0100
> >> "Hunt, David"  wrote:
> >>  
> >>> The *p pointer is the opaque data for a given mempool handler (ring,
> >>> array, linked list, etc)  
> >> Again, doc comments...
> >>
> >> I don't like the obj_table representation to be an array of void *. I
> >> could see
> >> it already in DPDK for defining Ethernet driver queues, so, it's
> >> probably not
> >> an issue. I just say, I would prefer some basic type safety like
> >>
> >> struct rte_mempool_obj {
> >> void *p;
> >> };
> >>
> >> Is there somebody with different opinions?
> >>
> >> [...]  
> > 
> > Comments added. I've left as a void* for the moment.  
> 
> Jan, could you please detail why you think having a
> rte_mempool_obj structure brings more safety?

First, void * does not say anything about the purpose of the argument.
So, anybody, who is not familiar with the mempool internals would be
lost for a while (as I was when studying the Ethernet queue API of DPDK).

The type safety (in C, LoL) here is in the means of messing up order of 
arguments.
When there are more void * args, you can accidently pass something wrong.
DPDK has quite strict compiler settings, however, I don't consider it to be
a good practice in general.

When you have a named struct or a typedef, its definition usually contains doc
comments describing its purposes. Nobody usually writes good doc comments for
void * arguments of functions.

> 
> For now, I'm in favor of keeping the array of void *, because
> that's what we use in other mempool or ring functions.

It was just a suggestion... I don't consider this to be an issue (as stated
earlier).

Jan

> 
> 
> > +/** Structure defining a mempool handler. */  
>  Later in the text, I suggested to rename rte_mempool_handler to
>  rte_mempool_ops.
>  I believe that it explains the purpose of this struct better. It
>  would improve
>  consistency in function names (the *_ext_* mark is very strange and
>  inconsistent).  
> >>> I agree. I've gone through all the code and renamed to
> >>> rte_mempool_handler_ops.  
> >> Ok. I meant rte_mempool_ops because I find the word "handler" to be
> >> redundant.  
> > 
> > I prefer the use of the word handler, unless others also have opinions
> > either way?  
> 
> Well, I think rte_mempool_ops is clear enough, and shorter,
> so I'd vote for it.
> 
> 
> Regards,
> Olivier



-- 
   Jan Viktorin  E-mail: Viktorin at RehiveTech.com
   System Architect  Web:www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

[dpdk-dev] [PATCH] ivshmem: add all memzones of mempool to metada

2016-06-01 Thread Ferruh Yigit

Mempool consist of multiple memzones, at least from two of them.
ivshmem assumes mempool and elements are all in same memzone.

Updating code to add all memzones when a mempool added.

Fixes: d1d914ebbc25 ("mempool: allocate in several memory chunks by
default")

Signed-off-by: Ferruh Yigit 
---
 lib/librte_ivshmem/rte_ivshmem.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/lib/librte_ivshmem/rte_ivshmem.c b/lib/librte_ivshmem/rte_ivshmem.c
index c8b332c..5c83920 100644
--- a/lib/librte_ivshmem/rte_ivshmem.c
+++ b/lib/librte_ivshmem/rte_ivshmem.c
@@ -548,25 +548,39 @@ add_ring_to_metadata(const struct rte_ring * r,
 }

 static int
-add_mempool_to_metadata(const struct rte_mempool * mp,
-   struct ivshmem_config * config)
+add_mempool_memzone_to_metadata(const void *addr,
+   struct ivshmem_config *config)
 {
-   struct rte_memzone * mz;
-   int ret;
+   struct rte_memzone *mz;

-   mz = get_memzone_by_addr(mp);
-   ret = 0;
+   mz = get_memzone_by_addr(addr);

if (!mz) {
RTE_LOG(ERR, EAL, "Cannot find memzone for mempool!\n");
return -1;
}

-   /* mempool consists of memzone and ring */
-   ret = add_memzone_to_metadata(mz, config);
+   return add_memzone_to_metadata(mz, config);
+}
+
+static int
+add_mempool_to_metadata(const struct rte_mempool *mp,
+   struct ivshmem_config *config)
+{
+   struct rte_mempool_memhdr *memhdr;
+   int ret;
+
+   ret = add_mempool_memzone_to_metadata(mp, config);
if (ret < 0)
return -1;

+   STAILQ_FOREACH(memhdr, >mem_list, next) {
+   ret = add_mempool_memzone_to_metadata(memhdr->addr, config);
+   if (ret < 0)
+   return -1;
+   }
+
+   /* mempool consists of memzone and ring */
return add_ring_to_metadata(mp->ring, config);
 }

-- 
2.5.5

[dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW

2016-06-01 Thread Harish Patil

>
>2016-05-31 19:21, Harish Patil:
>> 
>> >On Fri, May 06, 2016 at 09:21:31PM -0700, Rasesh Mody wrote:
>> >> From: Harish Patil 
>> >> 
>> >> Under certain scenarios, MFW periodically polls the driver
>> >> for LAN statistics. This patch implements the osal hook to
>> >> fill in the stats.
>> >> 
>> >> Fixes: ffa002d318d36 ("qede: add base driver")
>> >> 
>> >What is MFW?
>> >
>> >/Bruce
>> >
>> 
>> MFW - Management FirmWare running on the card.
>
>So MFW can probably be replaced by firmware in the title.
>

Reason I didn?t use ?firmware? in the first place is because there are two
different firmware running on the card:
1) MFW (management firmware - which is flashed)
2) Firmware (datapath  firmware - loaded by driver by reading FW file)

So, I can replace it as management firmware explicitly.

Thanks,
Harish

[dpdk-dev] [PATCH] virtio: use volatile to get used->idx in the loop

2016-06-01 Thread Yuanhan Liu

On Wed, Jun 01, 2016 at 05:40:08AM +, Xie, Huawei wrote:
> On 5/30/2016 4:20 PM, Yuanhan Liu wrote:
> > On Wed, May 25, 2016 at 12:16:41AM +0800, Huawei Xie wrote:
> >> There is no external function call or any barrier in the loop,
> >> the used->idx would only be retrieved once.
> >>
> >> Signed-off-by: Huawei Xie 
> >> ---
> >>  drivers/net/virtio/virtio_ethdev.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/virtio/virtio_ethdev.c 
> >> b/drivers/net/virtio/virtio_ethdev.c
> >> index c3fb628..f6d6305 100644
> >> --- a/drivers/net/virtio/virtio_ethdev.c
> >> +++ b/drivers/net/virtio/virtio_ethdev.c
> >> @@ -204,7 +204,8 @@ virtio_send_command(struct virtqueue *vq, struct 
> >> virtio_pmd_ctrl *ctrl,
> >>usleep(100);
> >>}
> >>  
> >> -  while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
> >> +  while (vq->vq_used_cons_idx !=
> >> + *((volatile uint16_t *)(>vq_ring.used->idx))) {
> > I'm wondering maybe we could fix VIRTQUEUE_NUSED (which has no such
> > qualifier) and use this macro here?
> >
> > If you check the reference of that macro, you might find similar
> > issues, say, it is also used inside the while-loop of
> > virtio_recv_mergeable_pkts().
> >
> > --yliu
> >  
> >
> 
> Yes, seems it has same issue though haven't confirmed with asm code.

So, move the "volatile" qualifier to VIRTQUEUE_NUSED?

--yliu

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Yuanhan Liu

Hi all,

I guess we (maybe just me :) have stated few times something like
"hey, this kind of stuff is good to have, but you are trying to
add an EAL CLI option for a specific subsystem/driver, which is
wrong".

One recent example that is still fresh in my mind is the one from
Christian [0], that he made a proposal to introduce two new EAL
options, --vhost-owner and --vhost-perm, to configure the vhost
user socket file permission.

[0]: http://dpdk.org/ml/archives/dev/2016-April/037948.html

Another example is the one I met while enabling virtio 1.0 support.
QEMU has the ability to support both virtio 0.95 (legacy) and 1.0
(modern) at the same time for one virtio device, therefore, we
could either use legacy driver or modern driver to operate the
device. However, the current logic is we try with modern driver
first, and then legacy driver if it failed. In above case, we will
never hit the legacy driver. But sometimes, it's nice to let it
force back to the legacy driver, say, for debug or compare purpose.

Apparently, adding a new EAL option like "--force-legacy" looks
wrong.

The generic yet elegant solution I just thought of while having
lunch is to add a new EAL option, say, --extra-options, where we
could specify driver/subsystem specific options. As you see, it's
nothing big deal, it just looks like Linux kernel parameters.

Take above two cases as example, it could be:

--extra-options "vhost-owner=kvm:kvm force-legacy"

Note that those options could also be delimited by comma.

DPDK EAL then will provide some generic helper functions to get
and parse those options, and let the specific driver/subsystem
to invoke them to do the actual parse and do the proper action
when some option is specified, say, virtio PMD driver will force
back to legacy driver when "force-legacy" is given.

Comments? Makes sense to you guys, or something nice to have?

--yliu

[dpdk-dev] [PATCH 1/5] pci: fix access to PCI config space in bsd

2016-06-01 Thread Rahul Lakkireddy

Hi Bruce,

On Tuesday, May 05/31/16, 2016 at 09:20:13 -0700, Bruce Richardson wrote:
> On Fri, May 06, 2016 at 01:13:15PM +0530, Rahul Lakkireddy wrote:
> > PCIOCREAD and PCIOCWRITE ioctls to read/write PCI config space fail
> > with EPERM due to missing write permission.  Fix by opening /dev/pci/
> > with O_RDWR instead.
> > 
> > Fixes: 632b2d1deeed ("eal: provide functions to access PCI config")
> > 
> > Signed-off-by: Rahul Lakkireddy 
> > Signed-off-by: Kumar Sanghvi 
> > ---
> >  lib/librte_eal/bsdapp/eal/eal_pci.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
> > b/lib/librte_eal/bsdapp/eal/eal_pci.c
> > index 2d16d78..82330be 100644
> > --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> > +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> > @@ -422,7 +422,7 @@ int rte_eal_pci_read_config(const struct rte_pci_device 
> > *dev,
> > goto error;
> > }
> >  
> > -   fd = open("/dev/pci", O_RDONLY);
> > +   fd = open("/dev/pci", O_RDWR);
> > if (fd < 0) {
> > RTE_LOG(ERR, EAL, "%s(): error opening /dev/pci\n", __func__);
> > goto error;
> > @@ -466,7 +466,7 @@ int rte_eal_pci_write_config(const struct 
> > rte_pci_device *dev,
> >  
> > memcpy(_data, buf, len);
> >  
> > -   fd = open("/dev/pci", O_RDONLY);
> > +   fd = open("/dev/pci", O_RDWR);
> > if (fd < 0) {
> > RTE_LOG(ERR, EAL, "%s(): error opening /dev/pci\n", __func__);
> > goto error;
> > -- 
> Does the read function as well as the write one need O_RDWR permissions? There
> is also an ioctl in rte_eal_pci_scan which operates on a RDONLY file 
> descriptor.
> Does that need to be modified also?

Yes, both PCIOCREAD and PCIOCWRITE ioctls seem to require write
permission.  Otherwise, pci_ioctl [1] seems to return EPERM.  On the
other hand, the PCIOCGETCONF ioctl used in rte_eal_pci_scan doesn't
seem to require a write permission.  So, it should be fine to leave it
as RDONLY.

[1] 
https://svnweb.freebsd.org/base/release/10.3.0/sys/dev/pci/pci_user.c?revision=297553=markup#l493

Thanks,
Rahul

[dpdk-dev] [vpp-dev] VLAN packets dropped... ?

2016-06-01 Thread Chandrasekar Kannan

Awesome!. Thanks for the quick response John!.

-Chandra



On Wed, Jun 1, 2016 at 1:02 PM, John Lo (loj)  wrote:

> We will be adding a patch to the i40e driver that would check the packet
> for presence of VLAN tag set the required PKT_RX_VLAN_PKT flag. I hope to
> get it done by the end of this week.   -John
>
>
>
> *From:* Chandrasekar Kannan [mailto:ckannan at console.to]
> *Sent:* Wednesday, June 01, 2016 2:45 PM
> *To:* John Daley (johndale)
> *Cc:* John Lo (loj); Ananyev, Konstantin; Wiles, Keith; vpp-dev; Zhang,
> Helin; dev at dpdk.org
> *Subject:* Re: [vpp-dev] VLAN packets dropped... ?
>
>
>
> Just checking back on this thread to see where things are.  Are we saying
> this is a bug in dpdk or vpp ?. That part wasn't quite clear to me.
>
>
>
> -Chandra
>
>
>
>
>
>
>
> On Thu, May 26, 2016 at 3:56 PM, John Daley (johndale) 
> wrote:
>
> John,
> As just discussed, what you suggest was considered but there are other
> apps depending a different behavior of the flag, so we thought the best
> thing to do is deprecate it. That is part of what Olivier's patch discussed
> in http://www.dpdk.org/ml/archives/dev/2016-May/038786.html does.  Adding
> a new ptype for VLAN tagged packets after the patch is accepted would allow
> VPP to check if the ptype is supported by the PMD and if so, use it to
> determine if the delivered packet actually has a VLAN tag in it. No need to
> know if stripping is enabled/disabled. If the ptype is not supported,  the
> app would have check the packet in SW.
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John Lo (loj)
> > Sent: Thursday, May 26, 2016 11:11 AM
> > To: Ananyev, Konstantin ; Wiles, Keith
> > ; Chandrasekar Kannan 
> > Cc: vpp-dev ; Zhang, Helin  > intel.com>;
> > dev at dpdk.org
> > Subject: Re: [dpdk-dev] [vpp-dev] VLAN packets dropped... ?
> >
> > Hi Konstantin,
> >
> > Thanks for the link to DPDK discussion wrt this VLAN offload flag
> > PKT_RX_VLAN_PKT.  It seems the proposal was to deprecate
> > PKT_RX_VLAN_PKT  and introduce two new flags PKT_RX_VLAN_STRIPPED
> > and PKT_RX_QINQ_STRIPPED.
> >
> > It would be a really good idea to keep PKT_RX_VLAN_PKT to indicate at
> least
> > one VLAN tag is present on the packet.  For the case of i40e where its HW
> > cannot detect VLAN tag if strip is not enabled, it should be reasonable
> for the
> > i40e DPDK driver software to make a check and set this flag. I would
> think the
> > overhead to check the 1st ethertype field in the MAC header against dot1q
> > or dot1ad values in software be pretty minimal.
> >
> > Regards,
> > John
> >
> >
> > -Original Message-
> > From: Ananyev, Konstantin [mailto:konstantin.ananyev at intel.com]
> > Sent: Wednesday, May 25, 2016 3:23 PM
> > To: John Lo (loj); Wiles, Keith; Chandrasekar Kannan
> > Cc: vpp-dev; Zhang, Helin; dev at dpdk.org
> > Subject: RE: [vpp-dev] VLAN packets dropped... ?
> >
> >
> > > I suppose this has to do with what is expected usage of the
> > > PKT_RX_VLAN_PKT offload flag. Is it set only for VLAN packets with the
> > > VLAN stripped or should always be set if VLAN is/was present in the
> > > received packet. It seems that different DPDK drivers are behaving
> > differently which will make it really hard for VPP to take advantage of
> NIC
> > and driver offload capability to provide better performance.
> >
> > Yes, that's true ixgbe/igb from one side and i40e do raise
> PKT_RX_VLAN_PKT
> > in a different manner.
> > There is an attempt to make it unified across all supported devices:
> >  http://dpdk.org/dev/patchwork/patch/12938/
> >
> > Though, I am not sure it will help you with your issue.
> > As I understand, for you the desired behaviour is:
> > If it is a vlan packet, keep the packet intact (don't strip the vlan)
> and raise
> > PKT_RX_VLAN_PK inside mbuf->ol_flags.
> > That's what ixgbe is doing with rte_eth_conf.rxmode.hw_vlan_strip==0.
> > Correct?
> > As far as I know, i40e HW doesn't provide such ability.
> > i40e Receive HW Descriptor can only flag was VLAN tag stripped from the
> > packet or not, but if stripping is disabled it wouldn't indicate in any
> way is
> > VLAN tag is present inside the packet or not.
> > I am CC-ing it to dpdk.org in case I am missing something here.
> > Probably someone knows a way to make it work in that way.
> > Konstantin
> >
> > >
> > > If VPP cannot rely on offload flags for VLAN so packets always have to
> go
> > through ethernet-input node, there is a performance cost.
> > > For the 10GE case, before the inverse patch of the ixgbe driver, 10GE
> > > Rx-vector path removed support of offload flag with DPDK 16.04 and so
> > > ethernet-input node is always used. The 10GE IPv4 throughput rate
> > > dropped from 6.17MPPSx2 to 4.92MPPSx2 for bidirectional traffic (2
> > > streams each with a single IP address as destination) on a single core
> > > on my server. Konstantin suggested at that time to use scalar mode
> which
> > does support offload flags properly.

[dpdk-dev] dpdk compilation issue on cumuluslinux

2016-06-01 Thread Raja Jayapal



Hi All,

I am working on installing dpdk on cumuluslinux(2.5.7v).
I have followed the steps as mentioned in the dpdk quick start guide for 
installation, but the facing issues during compilation.

Steps followed:
>git clone git://dpdk.org/dpdk
>make config T=x86_64-native-linuxapp-gcc
>sed -ri 's,(PMD_PCAP=).*,\1y,' build/.config
>make

Error:
---
? SYMLINK-FILE include/exec-env/rte_interrupts.h
? SYMLINK-FILE include/exec-env/rte_kni_common.h
? SYMLINK-FILE include/exec-env/rte_dom0_common.h
? INSTALL-LIB librte_eal.a
== Build lib/librte_eal/linuxapp/igb_uio
make: *** /lib/modules/3.2.68-6/build: No such file or directory. ?Stop.
make[5]: *** [igb_uio.ko] Error 2
make[4]: *** [igb_uio] Error 2
make[3]: *** [linuxapp] Error 2
make[2]: *** [librte_eal] Error 2
make[1]: *** [lib] Error 2
make: *** [all] Error 2
root at cumulus:/home/cumulus/dpdk#

I am able to install on the?Ubuntu?machine without any errors.
Please guide how to resolve this issue.


Thanks,
Raja?
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you

[dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy

2016-06-01 Thread Jerin Jacob

On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote:
> Hi Jerin,

Hi Olivier,

> 
> >>>   /* Add elements back into the cache */
> >>> - for (index = 0; index < n; ++index, obj_table++)
> >>> - cache_objs[index] = *obj_table;
> >>> + rte_memcpy(_objs[0], obj_table, sizeof(void *) * n);
> >>>  
> >>>   cache->len += n;
> >>>  
> >>>
> >>
> >> I also checked in the get_bulk() function, which looks like that:
> >>
> >>/* Now fill in the response ... */
> >>for (index = 0, len = cache->len - 1;
> >>index < n;
> >>++index, len--, obj_table++)
> >>*obj_table = cache_objs[len];
> >>
> >> I think we could replace it by something like:
> >>
> >>rte_memcpy(obj_table, _objs[len - n], sizeof(void *) * n);
> >>
> >> The only difference is that it won't reverse the pointers in the
> >> table, but I don't see any problem with that.
> >>
> >> What do you think?
> > 
> > In true sense, it will _not_ be LIFO. Not sure about cache usage 
> > implications
> > on the specific use cases.
> 
> Today, the objects pointers are reversed only in the get(). It means
> that this code:
> 
>   rte_mempool_get_bulk(mp, table, 4);
>   for (i = 0; i < 4; i++)
>   printf("obj = %p\n", t[i]);
>   rte_mempool_put_bulk(mp, table, 4);
> 
> 
>   printf("-\n");
>   rte_mempool_get_bulk(mp, table, 4);
>   for (i = 0; i < 4; i++)
>   printf("obj = %p\n", t[i]);
>   rte_mempool_put_bulk(mp, table, 4);
> 
> prints:
> 
>   addr1
>   addr2
>   addr3
>   addr4
>   -
>   addr4
>   addr3
>   addr2
>   addr1
> 
> Which is quite strange.

IMO, It is the expected LIFO behavior. Right ?

What is not expected is the following, which is the case after change. Or Am I
missing something here?

addr1
addr2
addr3
addr4
-
addr1
addr2
addr3
addr4

> 
> I don't think it would be an issue to replace the loop by a
> rte_memcpy(), it may increase the copy speed and it will be
> more coherent with the put().
> 
> 
> Olivier

[dpdk-dev] [PATCH] port: add kni interface support

2016-06-01 Thread Ethan

Hi Cristian,

No problem. Enjoy your travel.
:-)

B.R.
Ethan

2016-05-30 22:40 GMT+08:00 Dumitrescu, Cristian <
cristian.dumitrescu at intel.com>:

> Hi Wei Jie,
>
> Thank you for submitting this patch. I am currently travelling, I will
> review your patch and come back to you hopefully later this week or early
> next week.
>
> Regards,
> Cristian
>
> > -Original Message-
> > From: WeiJie Zhuang [mailto:zhuangwj at gmail.com]
> > Sent: Saturday, May 28, 2016 12:26 PM
> > To: Dumitrescu, Cristian 
> > Cc: dev at dpdk.org; WeiJie Zhuang 
> > Subject: [PATCH] port: add kni interface support
> >
> > 1. add KNI port type to the packet framework
> > 2. add KNI support to the IP Pipeline sample Application
> >
> > Signed-off-by: WeiJie Zhuang 
> > ---
> > v2:
> > * Fix check patch error.
> > ---
> >  doc/api/doxy-api-index.md   |   1 +
> >  examples/ip_pipeline/Makefile   |   6 +-
> >  examples/ip_pipeline/app.h  |  74 +
> >  examples/ip_pipeline/config/kni.cfg |  12 ++
> >  examples/ip_pipeline/config_check.c |  34 
> >  examples/ip_pipeline/config_parse.c | 130 +++
> >  examples/ip_pipeline/init.c |  79 +
> >  examples/ip_pipeline/kni/kni.c  |  80 +
> >  examples/ip_pipeline/kni/kni.h  |  16 ++
> >  examples/ip_pipeline/pipeline_be.h  |  13 ++
> >  examples/ip_pipeline/thread.c   |   9 +
> >  lib/librte_port/Makefile|   7 +
> >  lib/librte_port/rte_port_kni.c  | 316
> > 
> >  lib/librte_port/rte_port_kni.h  |  81 +
> >  14 files changed, 856 insertions(+), 2 deletions(-)
> >  create mode 100644 examples/ip_pipeline/config/kni.cfg
> >  create mode 100644 examples/ip_pipeline/kni/kni.c
> >  create mode 100644 examples/ip_pipeline/kni/kni.h
> >  create mode 100644 lib/librte_port/rte_port_kni.c
> >  create mode 100644 lib/librte_port/rte_port_kni.h
> >
> > diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> > index f626386..e38a959 100644
> > --- a/doc/api/doxy-api-index.md
> > +++ b/doc/api/doxy-api-index.md
> > @@ -119,6 +119,7 @@ There are many libraries, so their headers may be
> > grouped by topics:
> >  [reass](@ref rte_port_ras.h),
> >  [sched](@ref rte_port_sched.h),
> >  [src/sink] (@ref rte_port_source_sink.h)
> > +[kni]  (@ref rte_port_kni.h)
> >* [table](@ref rte_table.h):
> >  [lpm IPv4] (@ref rte_table_lpm.h),
> >  [lpm IPv6] (@ref rte_table_lpm_ipv6.h),
> > diff --git a/examples/ip_pipeline/Makefile
> b/examples/ip_pipeline/Makefile
> > index 10fe1ba..848c2aa 100644
> > --- a/examples/ip_pipeline/Makefile
> > +++ b/examples/ip_pipeline/Makefile
> > @@ -43,9 +43,10 @@ include $(RTE_SDK)/mk/rte.vars.mk
> >  # binary name
> >  APP = ip_pipeline
> >
> > +VPATH += $(SRCDIR)/kni
> >  VPATH += $(SRCDIR)/pipeline
> >
> > -INC += $(wildcard *.h) $(wildcard pipeline/*.h)
> > +INC += $(wildcard *.h) $(wildcard pipeline/*.h) $(wildcard kni/*.h)
> >
> >  # all source are stored in SRCS-y
> >  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) := main.c
> > @@ -56,6 +57,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += init.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += thread_fe.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += cpu_core_map.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += kni.c
> >
> >  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_common_be.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_common_fe.c
> > @@ -72,7 +74,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) +=
> > pipeline_flow_actions.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_routing_be.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_PIPELINE) += pipeline_routing.c
> >
> > -CFLAGS += -I$(SRCDIR) -I$(SRCDIR)/pipeline
> > +CFLAGS += -I$(SRCDIR) -I$(SRCDIR)/pipeline -I$(SRCDIR)/kni
> >  CFLAGS += -O3
> >  CFLAGS += $(WERROR_FLAGS) -Wno-error=unused-function -Wno-
> > error=unused-variable
> >
> > diff --git a/examples/ip_pipeline/app.h b/examples/ip_pipeline/app.h
> > index e775024..a86ce57 100644
> > --- a/examples/ip_pipeline/app.h
> > +++ b/examples/ip_pipeline/app.h
> > @@ -44,7 +44,9 @@
> >  #include 
> >
> >  #include 
> > +#include 
> >
> > +#include "kni.h"
> >  #include "cpu_core_map.h"
> >  #include "pipeline.h"
> >
> > @@ -99,6 +101,18 @@ struct app_pktq_hwq_out_params {
> >   struct rte_eth_txconf conf;
> >  };
> >
> > +struct app_kni_params {
> > + char *name;
> > + uint32_t parsed;
> > +
> > + uint32_t socket_id;
> > + uint32_t core_id;
> > + uint32_t hyper_th_id;
> > +
> > + uint32_t mempool_id;
> > + uint32_t burst;
> > +};
> > +
> >  struct app_pktq_swq_params {
> >   char *name;
> >   uint32_t parsed;
> > @@ -172,6 +186,7 @@ enum app_pktq_in_type {
> >   APP_PKTQ_IN_SWQ,
> >   APP_PKTQ_IN_TM,
> >   APP_PKTQ_IN_SOURCE,
> > + APP_PKTQ_IN_KNI,
> >  };
> >
> >  struct app_pktq_in_params {
> >

[dpdk-dev] [RFC] kernel paramters like DPDK CLI options

2016-06-01 Thread Thomas Monjalon

Hi,

2016-06-01 14:04, Yuanhan Liu:
> Hi all,
> 
> I guess we (maybe just me :) have stated few times something like
> "hey, this kind of stuff is good to have, but you are trying to
> add an EAL CLI option for a specific subsystem/driver, which is
> wrong".

Yes

> One recent example that is still fresh in my mind is the one from
> Christian [0], that he made a proposal to introduce two new EAL
> options, --vhost-owner and --vhost-perm, to configure the vhost
> user socket file permission.
> 
> [0]: http://dpdk.org/ml/archives/dev/2016-April/037948.html
> 
> Another example is the one I met while enabling virtio 1.0 support.
> QEMU has the ability to support both virtio 0.95 (legacy) and 1.0
> (modern) at the same time for one virtio device, therefore, we
> could either use legacy driver or modern driver to operate the
> device. However, the current logic is we try with modern driver
> first, and then legacy driver if it failed. In above case, we will
> never hit the legacy driver. But sometimes, it's nice to let it
> force back to the legacy driver, say, for debug or compare purpose.
> 
> Apparently, adding a new EAL option like "--force-legacy" looks
> wrong.
> 
> The generic yet elegant solution I just thought of while having
> lunch is to add a new EAL option, say, --extra-options, where we
> could specify driver/subsystem specific options. As you see, it's
> nothing big deal, it just looks like Linux kernel parameters.
> 
> Take above two cases as example, it could be:
> 
> --extra-options "vhost-owner=kvm:kvm force-legacy"

I think it's better to have CLI options per device.
Currently we can pass devargs
- to PCI device via --pci-whitelist
- to virtual device via --vdev
I think we just need to refactor these options to have a generic
--device or keep the options in --vdev and add a new --pciopt
or something like that.

And more importantly, these devargs must be set via a new EAL API
to allow applications do these configurations without building/faking
some command line arguments.

To make it clear, applications use API and users use CLI (which call API).

> Note that those options could also be delimited by comma.
> 
> DPDK EAL then will provide some generic helper functions to get
> and parse those options, and let the specific driver/subsystem
> to invoke them to do the actual parse and do the proper action
> when some option is specified, say, virtio PMD driver will force
> back to legacy driver when "force-legacy" is given.
> 
> Comments? Makes sense to you guys, or something nice to have?

Thanks for starting the discussion.

[dpdk-dev] dpdk compilation issue on cumuluslinux

2016-06-01 Thread Ferruh Yigit

On 6/1/2016 8:37 AM, Raja Jayapal wrote:
> 
> 
> Hi All,
> 
> I am working on installing dpdk on cumuluslinux(2.5.7v).
> I have followed the steps as mentioned in the dpdk quick start guide for 
> installation, but the facing issues during compilation.
> 
> Steps followed:
>> git clone git://dpdk.org/dpdk
>> make config T=x86_64-native-linuxapp-gcc
>> sed -ri 's,(PMD_PCAP=).*,\1y,' build/.config
>> make
> 
> Error:
> ---
>   SYMLINK-FILE include/exec-env/rte_interrupts.h
>   SYMLINK-FILE include/exec-env/rte_kni_common.h
>   SYMLINK-FILE include/exec-env/rte_dom0_common.h
>   INSTALL-LIB librte_eal.a
> == Build lib/librte_eal/linuxapp/igb_uio
> make: *** /lib/modules/3.2.68-6/build: No such file or directory.  Stop.

This is the default location for kernel source, it is possible to
override it by RTE_KERNELDIR.
make RTE_KERNELDIR=

[dpdk-dev] [PATCH v2] qat: fix phys address of content descriptor

2016-06-01 Thread Arek Kusztal

From: Arkadiusz Kusztal 

Fix an error with computation of physical address of
content descriptor in the symmetric operations session

Fixes: 1703e94ac5ce ("qat: add driver for QuickAssist devices")

Signed-off-by: Arkadiusz Kusztal 
---
v2: Added fixes line to commit message

 drivers/crypto/qat/qat_crypto.c  | 9 ++---
 lib/librte_cryptodev/rte_cryptodev.c | 2 +-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index 495ea1c..abe0511 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -807,12 +807,15 @@ static inline uint32_t adf_modulo(uint32_t data, uint32_t 
shift)
return data - mult;
 }

-void qat_crypto_sym_session_init(struct rte_mempool *mp, void *priv_sess)
+void qat_crypto_sym_session_init(struct rte_mempool *mp, void *sym_sess)
 {
-   struct qat_session *s = priv_sess;
+   struct rte_cryptodev_sym_session *sess = sym_sess;
+   struct qat_session *s = (void *)sess->_private;

PMD_INIT_FUNC_TRACE();
-   s->cd_paddr = rte_mempool_virt2phy(mp, >cd);
+   s->cd_paddr = rte_mempool_virt2phy(mp, sess) +
+   offsetof(struct qat_session, cd) +
+   offsetof(struct rte_cryptodev_sym_session, _private);
 }

 int qat_dev_config(__rte_unused struct rte_cryptodev *dev)
diff --git a/lib/librte_cryptodev/rte_cryptodev.c 
b/lib/librte_cryptodev/rte_cryptodev.c
index aa4ea42..960e2d5 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -956,7 +956,7 @@ rte_cryptodev_sym_session_init(struct rte_mempool *mp,
sess->mp = mp;

if (dev->dev_ops->session_initialize)
-   (*dev->dev_ops->session_initialize)(mp, sess->_private);
+   (*dev->dev_ops->session_initialize)(mp, sess);
 }

 static int
-- 
2.1.0

[dpdk-dev] [REQUEST] New repository request for project SPP

2016-06-01 Thread Thomas Monjalon

2016-05-31 19:11, Ferruh Yigit:
> On 5/27/2016 2:59 PM, Thomas Monjalon wrote:
> > 2016-05-26 17:48, Ferruh Yigit:
> >> SPP: Soft Patch Panel, a new project on top of DPDK library.
> >>
> >>
> >> Code is close to its first release, and we would like to open source the
> >> project and host it under dpdk.org.
> >>
> >> Can you please create a new repository under dpdk.org named "spp",
> >> and add me as the maintainer of the repo?
> > 
> > Done
> 
> Thank you, it is working fine. I did push initial code.

Why have you pushed the DPDK code in this repository?
It is supposed to be an independent application, not an example.
http://dpdk.org/browse/apps/spp/commit/?id=3d71facfefbb

[dpdk-dev] [PATCH v5 1/3] mempool: support external handler

2016-06-01 Thread Hunt, David



On 5/31/2016 10:11 PM, Jerin Jacob wrote:
> On Tue, May 31, 2016 at 10:41:00PM +0200, Olivier MATZ wrote:
>> Hi,
>>
>> On 05/31/2016 06:03 PM, Jerin Jacob wrote:
>>> On Tue, May 31, 2016 at 04:37:02PM +0100, Hunt, David wrote:

 On 5/31/2016 9:53 AM, Jerin Jacob wrote:
> On Mon, May 30, 2016 at 12:27:26PM +0100, Hunt, David wrote:
>> New mempool handlers will use rte_mempool_create_empty(),
>> rte_mempool_set_handler(),
>> then rte_mempool_populate_*(). These three functions are new to this
>> release, to no problem
> Having separate APIs for external pool-manager create is worrisome in
> application perspective. Is it possible to have rte_mempool_[xmem]_create
> for the both external and existing SW pool manager and make
> rte_mempool_create_empty and rte_mempool_populate_*  internal functions.
>
> IMO, We can do that by selecting  specific rte_mempool_set_handler()
> based on _flags_ encoding, something like below
>
> bit 0 - 16   // generic bits uses across all the pool managers
> bit 16 - 23  // pool handler specific flags bits
> bit 24 - 31  // to select the specific pool manager(Up to 256 different 
> flavors of
> pool managers, For backward compatibility, make '0'(in 24-31) to select
> existing SW pool manager.
>
> and applications can choose the handlers by selecting the flag in
> rte_mempool_[xmem]_create, That way it will be easy in testpmd or any 
> other
> applications to choose the pool handler from command line etc in future.
 There might be issues with the 8-bit handler number, as we'd have to add an
 api call to
 first get the index of a given hander by name, then OR it into the flags.
 That would mean
 also extra API calls for the non-default external handlers. I do agree with
 the handler-specific
 bits though.
>>> That would be an internal API(upper 8 bits to handler name). Right ?
>>> Seems to be OK for me.
>>>
 Having the _empty and _set_handler  APIs seems to me to be OK for the
 moment. Maybe Olivier could comment?

>>> But need 3 APIs. Right? _empty , _set_handler and _populate ? I believe
>>> it is better reduce the public API in spec where ever possible ?
>>>
>>> Maybe Olivier could comment ?
>> Well, I think having 3 different functions is not a problem if the API
>> is clearer.
>>
>> In my opinion, the following:
>>  rte_mempool_create_empty()
>>  rte_mempool_set_handler()
>>  rte_mempool_populate()
>>
>> is clearer than:
>>  rte_mempool_create(15 args)
> But proposed scheme is not adding any new arguments to
> rte_mempool_create. It just extending the existing flag.
>
> rte_mempool_create(15 args) is still their as API for internal pool
> creation.
>
>> Splitting the flags into 3 groups, with one not beeing flags but a
>> pool handler number looks overcomplicated from a user perspective.
> I am concerned with seem less integration with existing applications,
> IMO, Its not worth having separate functions for external vs internal
> pool creation for application(now each every applications has to added this
> logic every where for no good reason), just my 2 cents.

I think that there is always going to be some  extra code in the 
applications
  that want to use an external mempool. The _set_handler approach does
create, set_hander, populate. The Flags method queries the handler list to
get the index, sets the flags bits, then calls create. Both methods will 
work.

But I think the _set_handler approach is more user friendly, therefore that
it the method I would lean towards.

> and we can remove "mbuf: get default mempool handler from configuration"
> change-set OR just add if RTE_MBUF_DEFAULT_MEMPOOL_HANDLER is defined 
> then set
> the same with rte_mempool_set_handler in rte_mempool_[xmem]_create.
>
> What do you think?
 The "configuration" patch is to allow users to quickly change the mempool
 handler
 by changing RTE_MBUF_DEFAULT_MEMPOOL_HANDLER to another string of a known
 handler. It could just as easily be left out and use the 
 rte_mempool_create.

>>> Yes, I understand, but I am trying to avoid build time constant. IMO, It
>>> would be better by default RTE_MBUF_DEFAULT_MEMPOOL_HANDLER is not
>>> defined in config. and for quick change developers can introduce the build
>>> with RTE_MBUF_DEFAULT_MEMPOOL_HANDLER="specific handler"
>> My understanding of the compile-time configuration option was
>> to allow a specific architecture to define a specific hw-assisted
>> handler by default.
>>
>> Indeed, if there is no such need for now, we may remove it. But
>> we need a way to select another handler, at least in test-pmd
>> (in command line arguments?).
> like txflags in testpmd, IMO, mempool flags will help to select the handlers
> seamlessly as suggest above.
>
> If we are _not_ taking the flags based selection scheme then it makes to
> keep

[dpdk-dev] [vpp-dev] VLAN packets dropped... ?

2016-06-01 Thread Chandrasekar Kannan

Just checking back on this thread to see where things are.  Are we saying
this is a bug in dpdk or vpp ?. That part wasn't quite clear to me.

-Chandra



On Thu, May 26, 2016 at 3:56 PM, John Daley (johndale) 
wrote:

> John,
> As just discussed, what you suggest was considered but there are other
> apps depending a different behavior of the flag, so we thought the best
> thing to do is deprecate it. That is part of what Olivier's patch discussed
> in http://www.dpdk.org/ml/archives/dev/2016-May/038786.html does.  Adding
> a new ptype for VLAN tagged packets after the patch is accepted would allow
> VPP to check if the ptype is supported by the PMD and if so, use it to
> determine if the delivered packet actually has a VLAN tag in it. No need to
> know if stripping is enabled/disabled. If the ptype is not supported,  the
> app would have check the packet in SW.
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of John Lo (loj)
> > Sent: Thursday, May 26, 2016 11:11 AM
> > To: Ananyev, Konstantin ; Wiles, Keith
> > ; Chandrasekar Kannan 
> > Cc: vpp-dev ; Zhang, Helin  > intel.com>;
> > dev at dpdk.org
> > Subject: Re: [dpdk-dev] [vpp-dev] VLAN packets dropped... ?
> >
> > Hi Konstantin,
> >
> > Thanks for the link to DPDK discussion wrt this VLAN offload flag
> > PKT_RX_VLAN_PKT.  It seems the proposal was to deprecate
> > PKT_RX_VLAN_PKT  and introduce two new flags PKT_RX_VLAN_STRIPPED
> > and PKT_RX_QINQ_STRIPPED.
> >
> > It would be a really good idea to keep PKT_RX_VLAN_PKT to indicate at
> least
> > one VLAN tag is present on the packet.  For the case of i40e where its HW
> > cannot detect VLAN tag if strip is not enabled, it should be reasonable
> for the
> > i40e DPDK driver software to make a check and set this flag. I would
> think the
> > overhead to check the 1st ethertype field in the MAC header against dot1q
> > or dot1ad values in software be pretty minimal.
> >
> > Regards,
> > John
> >
> >
> > -Original Message-
> > From: Ananyev, Konstantin [mailto:konstantin.ananyev at intel.com]
> > Sent: Wednesday, May 25, 2016 3:23 PM
> > To: John Lo (loj); Wiles, Keith; Chandrasekar Kannan
> > Cc: vpp-dev; Zhang, Helin; dev at dpdk.org
> > Subject: RE: [vpp-dev] VLAN packets dropped... ?
> >
> >
> > > I suppose this has to do with what is expected usage of the
> > > PKT_RX_VLAN_PKT offload flag. Is it set only for VLAN packets with the
> > > VLAN stripped or should always be set if VLAN is/was present in the
> > > received packet. It seems that different DPDK drivers are behaving
> > differently which will make it really hard for VPP to take advantage of
> NIC
> > and driver offload capability to provide better performance.
> >
> > Yes, that's true ixgbe/igb from one side and i40e do raise
> PKT_RX_VLAN_PKT
> > in a different manner.
> > There is an attempt to make it unified across all supported devices:
> >  http://dpdk.org/dev/patchwork/patch/12938/
> >
> > Though, I am not sure it will help you with your issue.
> > As I understand, for you the desired behaviour is:
> > If it is a vlan packet, keep the packet intact (don't strip the vlan)
> and raise
> > PKT_RX_VLAN_PK inside mbuf->ol_flags.
> > That's what ixgbe is doing with rte_eth_conf.rxmode.hw_vlan_strip==0.
> > Correct?
> > As far as I know, i40e HW doesn't provide such ability.
> > i40e Receive HW Descriptor can only flag was VLAN tag stripped from the
> > packet or not, but if stripping is disabled it wouldn't indicate in any
> way is
> > VLAN tag is present inside the packet or not.
> > I am CC-ing it to dpdk.org in case I am missing something here.
> > Probably someone knows a way to make it work in that way.
> > Konstantin
> >
> > >
> > > If VPP cannot rely on offload flags for VLAN so packets always have to
> go
> > through ethernet-input node, there is a performance cost.
> > > For the 10GE case, before the inverse patch of the ixgbe driver, 10GE
> > > Rx-vector path removed support of offload flag with DPDK 16.04 and so
> > > ethernet-input node is always used. The 10GE IPv4 throughput rate
> > > dropped from 6.17MPPSx2 to 4.92MPPSx2 for bidirectional traffic (2
> > > streams each with a single IP address as destination) on a single core
> > > on my server. Konstantin suggested at that time to use scalar mode
> which
> > does support offload flags properly. The scalar mode did by-pass
> ethernet-
> > input and provided throughput of 5.72MPPS. We ended up inverse patched
> > the ixgbe driver to restore vector mode offload flag support as the
> original
> > restriction (the reason offload flag support was removed) would not
> affect
> > VPP.
> > >
> > > I think for 40GE driver to provide offload flag support in vector mode
> > > but not give indication of presence of VLAN tag is just wrong. This
> make the
> > offload flag support useless for VPP.
> > >
> > > Regards,
> > > John
> > >
> > > -Original Message-
> > > From: Ananyev, Konstantin [mailto:konstantin.ananyev at

[dpdk-dev] [PATCH 2/2] qede: return LAN stats to MFW

2016-06-01 Thread Thomas Monjalon

2016-05-31 19:21, Harish Patil:
> 
> >On Fri, May 06, 2016 at 09:21:31PM -0700, Rasesh Mody wrote:
> >> From: Harish Patil 
> >> 
> >> Under certain scenarios, MFW periodically polls the driver
> >> for LAN statistics. This patch implements the osal hook to
> >> fill in the stats.
> >> 
> >> Fixes: ffa002d318d36 ("qede: add base driver")
> >> 
> >What is MFW?
> >
> >/Bruce
> >
> 
> MFW - Management FirmWare running on the card.

So MFW can probably be replaced by firmware in the title.

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Stephen Hemminger

On Wed, 1 Jun 2016 17:18:26 +0100
Bruce Richardson  wrote:

> On Wed, Jun 01, 2016 at 10:58:41AM -0500, Jay Rolette wrote:
> > On Wed, Jun 1, 2016 at 10:00 AM, Wiles, Keith  
> > wrote:
> > 
> > > Started from the link below, but did not want to highjack the thread.
> > > http://dpdk.org/ml/archives/dev/2016-June/040021.html
> > >
> > > I was thinking about this problem from a user perspective and command line
> > > options are very difficult to manage specifically when you have a large
> > > number of options as we have in dpdk. I see all of these options as a type
> > > of database of information for the DPDK and the application, because the
> > > application command line options are also getting very complex as well.
> > >
> > > I have been looking at a number of different options here and the
> > > direction I was thinking was using a file for the options and
> > > configurations with the data in a clean format. It could have been a INI
> > > file or JSON or XML, but they all seem to have some problems I do not 
> > > like.
> > > The INI file is too flat and I wanted a hierarchy in the data, the JSON
> > > data is similar and XML is just hard to read. I wanted to be able to 
> > > manage
> > > multiple applications and possible system the DPDK/app runs. The problem
> > > with the above formats is they are just data and not easy to make 
> > > decisions
> > > about the system and applications at runtime.
> > >
> > 
> > INI format is simplest for users to read, but if you really need hierarchy,
> > JSON will do that just fine. Not sure what you mean by "JSON data is
> > similar"...
> > 
> > 
> I'd be quite concerned if we start needing lots of hierarchies for 
> configuration.
> 
> I'd really just like to see ini file format used for this because:
> * it's a well understood, simple format
> * very easily human readable and editable
> * lots of support for it in lots of languages
> * hierarchies are possible in it too - just not as easy as in other formats
>   though. [In a previous life I worked with ini files which had address
>   hierarchies 6-levels deep in them. It wasn't hard to work with]
> * it works well with grep since you must have one value per-line
> * it allows comments
> * we already have a DPDK library for parsing them
> 
> However, for me the biggest advantage of using something like ini is that it
> would force us to keep things simple!

Agreed, INI is much easier to deal with than JSON.

Also, lets still keep the configuration options to a minimum. And the defaults
must still work.

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-01 Thread Jerin Jacob

On Wed, Jun 01, 2016 at 11:29:47AM +0800, Jianbo Liu wrote:
> On 1 June 2016 at 03:28, Olivier MATZ  wrote:
> > Hi Jianbo,
> >
> > On 05/31/2016 05:06 AM, Jianbo Liu wrote:
> >> Change the inline function to macro with parameters
> >>
> >> Signed-off-by: Jianbo Liu 
> >>
> >> [...]
> >> --- a/lib/librte_mbuf/rte_mbuf.h
> >> +++ b/lib/librte_mbuf/rte_mbuf.h
> >> @@ -849,14 +849,15 @@ struct rte_mbuf {
> >>   * in the receive path. If the cache line of the architecture is higher 
> >> than
> >>   * 64B, the second part will also be prefetched.
> >>   *
> >> + * @param method
> >> + *   The prefetch method: prefetch0, prefetch1, prefetch2 or
> >> + *prefetch_non_temporal.
> >> + *
> >>   * @param m
> >>   *   The pointer to the mbuf.
> >>   */
> >> -static inline void
> >> -rte_mbuf_prefetch_part1(struct rte_mbuf *m)
> >> -{
> >> - rte_prefetch0(>cacheline0);
> >> -}
> >> +#define RTE_MBUF_PREFETCH_PART1(method, m)   \
> >> + rte_##method(&(m)->cacheline0)
> >
> > I'm not very fan of this macro, because it allows to
> > really do everything):
> >
> >   RTE_MBUF_PREFETCH_PART1(pktmbuf_free, m)
> >
> > would expand as:
> >
> >   rte_pktmbuf_free(m)
> >
> >
> > I'd prefer to have a switch case like this, almost similar
> > to what Keith proposed in the initial discussion for my
> > patch:
> >
> > enum rte_mbuf_prefetch_type {
> > PREFETCH0,
> > PREFETCH1,
> > ...
> > };
> >
> > static inline void
> > rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
> > struct rte_mbuf *m)
> > {
> > switch (type) {
> > case PREFETCH0:
> > rte_prefetch0(>cacheline0);
> > break;
> > case PREFETCH1:
> > rte_prefetch1(>cacheline0);
> > break;
> > ...
> > }
> >
> How about adding these to forbid the illegal use of this macro?
> enum rte_mbuf_prefetch_type {
>  ENUM_prefetch0,
>  ENUM_prefetch1,
>  ...
> };
> 
> #define RTE_MBUF_PREFETCH_PART1(type, m) \
> if (ENUM_##type == ENUM_prefretch0) \
> rte_prefetch0(&(m)->cacheline0);   \
> else if (ENUM_##type == ENUM_prefetch1) \
> rte_prefetch1(&(m)->cacheline0); \
> 
> 
> >
> > Some questions: could you give some details about the use
> > of non-temporal prefetch in ixgbe_vec_neon? What are the
> > pros and cons, and would it be useful in other drivers?
> > Currently all drivers are doing prefetch0 when they prefetch
> > the mbuf structure. Some drivers use prefetch1 for data.
> >
> It's for performance consideration, and only on armv8a platform.

Strictly it is not armv8 specific, IA also implemented this API with
_MM_HINT_NTA hint.

Do we really need non-temporal/transient version of prefetch for ixgbe?
If so, for x86 also it makes sense to keep it? Right?

The primary use case for transient version would be use with pipe line
line mode where the same cpu wont consume the packet.

/**
 * Prefetch a cache line into all cache levels (non-temporal/transient
 * version)
 *
 * The non-temporal prefetch is intended as a prefetch hint that
 * processor will
 * use the prefetched data only once or short period, unlike the
 * rte_prefetch0() function which imply that prefetched data to use
 * repeatedly.
 *
 * @param p
 *   Address to prefetch
 */
static inline void rte_prefetch_non_temporal(const volatile void *p); 

> 
> >
> > By the way, I did not try to apply the patch, but it looks
> > it's on top of dpdk-next-net/rel_16_07, right?
> >
> Yes

[dpdk-dev] [PATCH] mbuf: extend rte_mbuf_prefetch_part* to support more prefetching methods

2016-06-01 Thread Jianbo Liu

On 1 June 2016 at 03:28, Olivier MATZ  wrote:
> Hi Jianbo,
>
> On 05/31/2016 05:06 AM, Jianbo Liu wrote:
>> Change the inline function to macro with parameters
>>
>> Signed-off-by: Jianbo Liu 
>>
>> [...]
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -849,14 +849,15 @@ struct rte_mbuf {
>>   * in the receive path. If the cache line of the architecture is higher than
>>   * 64B, the second part will also be prefetched.
>>   *
>> + * @param method
>> + *   The prefetch method: prefetch0, prefetch1, prefetch2 or
>> + *prefetch_non_temporal.
>> + *
>>   * @param m
>>   *   The pointer to the mbuf.
>>   */
>> -static inline void
>> -rte_mbuf_prefetch_part1(struct rte_mbuf *m)
>> -{
>> - rte_prefetch0(>cacheline0);
>> -}
>> +#define RTE_MBUF_PREFETCH_PART1(method, m)   \
>> + rte_##method(&(m)->cacheline0)
>
> I'm not very fan of this macro, because it allows to
> really do everything):
>
>   RTE_MBUF_PREFETCH_PART1(pktmbuf_free, m)
>
> would expand as:
>
>   rte_pktmbuf_free(m)
>
>
> I'd prefer to have a switch case like this, almost similar
> to what Keith proposed in the initial discussion for my
> patch:
>
> enum rte_mbuf_prefetch_type {
> PREFETCH0,
> PREFETCH1,
> ...
> };
>
> static inline void
> rte_mbuf_prefetch_part1(enum rte_mbuf_prefetch_type type,
> struct rte_mbuf *m)
> {
> switch (type) {
> case PREFETCH0:
> rte_prefetch0(>cacheline0);
> break;
> case PREFETCH1:
> rte_prefetch1(>cacheline0);
> break;
> ...
> }
>
How about adding these to forbid the illegal use of this macro?
enum rte_mbuf_prefetch_type {
 ENUM_prefetch0,
 ENUM_prefetch1,
 ...
};

#define RTE_MBUF_PREFETCH_PART1(type, m) \
if (ENUM_##type == ENUM_prefretch0) \
rte_prefetch0(&(m)->cacheline0);   \
else if (ENUM_##type == ENUM_prefetch1) \
rte_prefetch1(&(m)->cacheline0); \


>
> Some questions: could you give some details about the use
> of non-temporal prefetch in ixgbe_vec_neon? What are the
> pros and cons, and would it be useful in other drivers?
> Currently all drivers are doing prefetch0 when they prefetch
> the mbuf structure. Some drivers use prefetch1 for data.
>
It's for performance consideration, and only on armv8a platform.

>
> By the way, I did not try to apply the patch, but it looks
> it's on top of dpdk-next-net/rel_16_07, right?
>
Yes

[dpdk-dev] SR-IOV/DPDK/VPP with vfio-pci

2016-06-01 Thread Chandrasekar Kannan

I have been attempting to VPP (dpdk) application with SR-IOV enabled.
Using the vfio-pci driver/i40e/XL710nics/.
I'm encountering a SEGV on the i40e code path.

Has anyone else seen this behaviour ?.

full mail thread discussion with vpp-dev is here -
https://lists.fd.io/pipermail/vpp-dev/2016-June/001348.html



here's a backtrace during vpp crash...

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f2b811fd700 (LWP 8725)]
0x00455539 in rx_recv_pkts ()
(gdb) backtrace
#0  0x00455539 in rx_recv_pkts ()
#1  0x00455e76 in i40e_recv_pkts_bulk_alloc ()
#2  0x7f2be43d8bf7 in rte_eth_rx_burst (nb_pkts=256,
rx_pkts=0x7f2bb1e66a80, queue_id=1, port_id=)
at
/w/workspace/vpp-merge-master-centos7/build-root/install-vpp-native/dpdk/include/rte_ethdev.h:2641
#3  dpdk_rx_burst (dm=0x7f2be4673980 , queue_id=1,
xd=0x7f2bb1898540)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/dpdk_priv.h:64
#4  dpdk_device_input (dm=0x7f2be4673980 , queue_id=1,
cpu_index=, node=0x7f2bb1ef59fc, xd=)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/node.c:508
#5  dpdk_input_rss (vm=, node=0x7f2bb1ef59fc, f=)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/node.c:821
#6  0x7f2be47eb77a in dispatch_node (vm=vm at entry=0x7f2bb1f02cd4,
node=node at entry=0x7f2bb1ef59fc, type=type at entry=VLIB_NODE_TYPE_INPUT,
dispatch_state=dispatch_state at entry=VLIB_NODE_STATE_POLLING,
frame=frame at entry=0x0, last_time_stamp=1482097661494035)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vlib/vlib/main.c:996
#7  0x7f2be43db6d8 in dpdk_worker_thread_internal (have_io_threads=0,
callback=0x0, vm=0x7f2bb1f02cd4)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/threads.c:209
#8  dpdk_worker_thread (w=, io_name=,
callback=0x0)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/threads.c:265
#9  0x7f2be3791530 in clib_calljmp () at
/w/workspace/vpp-merge-master-centos7/build-data/../vppinfra/vppinfra/longjmp.S:110
#10 0x7f2b811fcc40 in ?? ()
#11 0x004eb9c7 in eal_thread_loop ()
---Type  to continue, or q  to quit---
#12 0x in ?? ()
(gdb)
(gdb) thread apply all bt

Thread 10 (Thread 0x7f2b8c685700 (LWP 8722)):
#0  0x7f2be32584ad in accept () at ../sysdeps/unix/syscall-template.S:81
#1  0x004ef178 in pci_vfio_mp_sync_thread ()
#2  0x7f2be3251dc5 in start_thread (arg=0x7f2b8c685700) at
pthread_create.c:308
#3  0x7f2be2992ced in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 9 (Thread 0x7f2b821ff700 (LWP 8723)):
#0  0x7f2be29932c3 in epoll_wait () at
../sysdeps/unix/syscall-template.S:81
#1  0x004efe04 in eal_intr_thread_main ()
#2  0x7f2be3251dc5 in start_thread (arg=0x7f2b821ff700) at
pthread_create.c:308
#3  0x7f2be2992ced in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 8 (Thread 0x7f2b819fe700 (LWP 8724)):
#0  0x0045526b in rx_recv_pkts ()
#1  0x00455e76 in i40e_recv_pkts_bulk_alloc ()
#2  0x7f2be43d8bf7 in rte_eth_rx_burst (nb_pkts=256,
rx_pkts=0x7f2bb1e75580, queue_id=0, port_id=)
at
/w/workspace/vpp-merge-master-centos7/build-root/install-vpp-native/dpdk/include/rte_ethdev.h:2641
#3  dpdk_rx_burst (dm=0x7f2be4673980 , queue_id=0,
xd=0x7f2bb1a1fd40)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/dpdk_priv.h:64
---Type  to continue, or q  to quit---
#4  dpdk_device_input (dm=0x7f2be4673980 , queue_id=0,
cpu_index=, node=0x7f2bb1ea18cc, xd=)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/node.c:508
#5  dpdk_input_rss (vm=, node=0x7f2bb1ea18cc, f=)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/node.c:821
#6  0x7f2be47eb77a in dispatch_node (vm=vm at entry=0x7f2bb1f237d4,
node=node at entry=0x7f2bb1ea18cc, type=type at entry=VLIB_NODE_TYPE_INPUT,
dispatch_state=dispatch_state at entry=VLIB_NODE_STATE_POLLING,
frame=frame at entry=0x0, last_time_stamp=148209754800)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vlib/vlib/main.c:996
#7  0x7f2be43db6d8 in dpdk_worker_thread_internal (have_io_threads=0,
callback=0x0, vm=0x7f2bb1f237d4)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/threads.c:209
#8  dpdk_worker_thread (w=, io_name=,
callback=0x0)
at
/w/workspace/vpp-merge-master-centos7/build-data/../vnet/vnet/devices/dpdk/threads.c:265
#9  0x7f2be3791530 in clib_calljmp () at
/w/workspace/vpp-merge-master-centos7/build-data/../vppinfra/vppinfra/longjmp.S:110
#10 0x7f2b819fdc40 in ?? ()
#11 0x004eb9c7 in eal_thread_loop ()
#12 0x in ?? ()

Thread 7 (Thread 0x7f2b811fd700 (LWP 8725)):
#0  0x00455539 in rx_recv_pkts ()
#1  0x00455e76 in

[dpdk-dev] dpdk pipeline multi app

2016-06-01 Thread Ramin Najjarbashi

tnx
but i want to run other process like rxtx_callbacks or customized script.

On 31 May 2016 at 17:42, Singh, Jasvinder  wrote:

> Hi Ramin,
>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ramin Najjarbashi
> > Sent: Tuesday, May 31, 2016 8:19 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] dpdk pipeline multi app
> >
> > hi
> > how we can arrive to this architecture?
> > Every client is different like IPS, Firewall, DPI and else.
> >
> >
> > http://stackoverflow.com/questions/37539194/dpdk-pipeline-multi-app
>
> Have a look at example/ip_pipeline sample app. It contains various packet
> processing modules such as flow-classification firewall, flow-actions,
> routing, etc., developed using similar template.
>
> Jasvinder
>

[dpdk-dev] [REQUEST] New repository request for project SPP

2016-06-01 Thread Ferruh Yigit

On 6/1/2016 10:48 AM, Thomas Monjalon wrote:
> 2016-05-31 19:11, Ferruh Yigit:
>> On 5/27/2016 2:59 PM, Thomas Monjalon wrote:
>>> 2016-05-26 17:48, Ferruh Yigit:
 SPP: Soft Patch Panel, a new project on top of DPDK library.

 Code is close to its first release, and we would like to open source the
 project and host it under dpdk.org.

 Can you please create a new repository under dpdk.org named "spp",
 and add me as the maintainer of the repo?
>>>
>>> Done
>>
>> Thank you, it is working fine. I did push initial code.
> 
> Why have you pushed the DPDK code in this repository?
> It is supposed to be an independent application, not an example.
>   http://dpdk.org/browse/apps/spp/commit/?id=3d71facfefbb
> 

Target is to make it independent application, not example.
Currently it is sitting in the example folder, because that is how it
started.

The plan is to separate it completely when it becomes more mature.

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Jay Rolette

On Wed, Jun 1, 2016 at 10:00 AM, Wiles, Keith  wrote:

> Started from the link below, but did not want to highjack the thread.
> http://dpdk.org/ml/archives/dev/2016-June/040021.html
>
> I was thinking about this problem from a user perspective and command line
> options are very difficult to manage specifically when you have a large
> number of options as we have in dpdk. I see all of these options as a type
> of database of information for the DPDK and the application, because the
> application command line options are also getting very complex as well.
>
> I have been looking at a number of different options here and the
> direction I was thinking was using a file for the options and
> configurations with the data in a clean format. It could have been a INI
> file or JSON or XML, but they all seem to have some problems I do not like.
> The INI file is too flat and I wanted a hierarchy in the data, the JSON
> data is similar and XML is just hard to read. I wanted to be able to manage
> multiple applications and possible system the DPDK/app runs. The problem
> with the above formats is they are just data and not easy to make decisions
> about the system and applications at runtime.
>

INI format is simplest for users to read, but if you really need hierarchy,
JSON will do that just fine. Not sure what you mean by "JSON data is
similar"...


> If the ?database? of information could be queried by the EAL, drivers and
> application then we do not need to try and create a complex command line.
> It would be nice to execute a DPDK applications like this:
>
> ./some_dpdk_app ?config-file dpdk-config-filename
>

+1 much nicer than the mess that is EAL command line args today.


> The dpdk-config-filename could contain a lot of information and be able to
> startup multiple different applications. The dpdk-config-file could also
> include other config files to complete the configuration. The format of the
> data in the config file needs to be readable, but allow the user to put in
> new options, needs to be hierarchical in nature and have some simple
> functions to execute if required.
>
> The solution I was thinking is the file information is really just a
> fragment of a scripting language, which the DPDK application contains this
> scripting language interpreter. I was looking at using Lua lua.org as the
> scripting language interpreter it is small and easy to understand. Python
> and others are very big and require a lot of resources and Lua requires
> very few system resources. Also I did not want to have to write a parser
> (lex/yacc). The other nice feature of Lua is you can create a sandbox for
> the code to run in and limit the type of system resources and APIs that can
> be accessed by the application and configuration. Lua can be trimmed down
> to a fairly small size and builds on just about any system or we can just
> install Lua on the system without changes from a rpm or deb.
>

There are JSON and INI file parser libraries for pretty much any language
you care to use. That shouldn't be a factor in choosing file format.

The argument about "Python and others are very big and require a lot of
resources" doesn't end up mattering much since it is already required by a
couple of the DPDK tools (in particular, dpdk_nic_bind.py).


> I use Lua in pktgen at this time and the interface between ?C? and Lua is
> very simple and easy. Currently I include Lua in Pktgen, but I could have
> just used a system library.
>
> The data in the config file can be data statements along with some limited
> code to make some data changes at run time without having to modify the
> real application. Here is a simple config file I wrote: Some of the options
> do not make sense to be in the file at the same time, but wanted to see all
> of the options. The mk_lcore_list() and mk_coremap() are just Lua functions
> we can preload to help convert the simple strings into real data in this
> case tables of information. The application could be something like pktgen
> = { map = { ? }, more_options = 1, } this allows the same file to possible
> contain many application configurations. Needs a bit more work.
>
> dpdk_default = {
>


> }
>
> The EAL, driver, application, ? would query an API to access the data and
> the application can change his options quickly without modifying the code.
>
> Anyway comments are welcome.
>
> Regards,
> Keith
>

I like the concept overall. I'd suggest separating out the Lua thing. Lua's
fine for scripting, but nothing here really requires it or saves a lot of
development work.

Jay

[dpdk-dev] [dpdk-dev,v5,1/3] mempool: support external handler

2016-06-01 Thread Hunt, David



On 5/31/2016 9:40 PM, Olivier MATZ wrote:

[...]

>> +/** Structure defining a mempool handler. */
> Later in the text, I suggested to rename rte_mempool_handler to
> rte_mempool_ops.
> I believe that it explains the purpose of this struct better. It
> would improve
> consistency in function names (the *_ext_* mark is very strange and
> inconsistent).
 I agree. I've gone through all the code and renamed to
 rte_mempool_handler_ops.
>>> Ok. I meant rte_mempool_ops because I find the word "handler" to be
>>> redundant.
>> I prefer the use of the word handler, unless others also have opinions
>> either way?
> Well, I think rte_mempool_ops is clear enough, and shorter,
> so I'd vote for it.
>

OK, I've just changed it. It will be in next revision. :)

Regards,
Dave.

[dpdk-dev] [PATCH v3 2/2] examples/ethtool: get reg width to allocate memory

2016-06-01 Thread z...@semihalf.com

From: Zyta Szpak 

Version 2 of fixing the fixed register width assumption.
Not every device uses 32-bit wide register. The app was allocating too
little space for 64-bit registers which resulted in memory corruption.
This commit resolves this by getting the size of register in bytes for
a specific device. If the device does not implement this function, it
fallsback to sizeof(uint32_t)

Signed-off-by: Zyta Szpak 
---
 examples/ethtool/lib/rte_ethtool.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/examples/ethtool/lib/rte_ethtool.c 
b/examples/ethtool/lib/rte_ethtool.c
index 42e05f1..59191ca 100644
--- a/examples/ethtool/lib/rte_ethtool.c
+++ b/examples/ethtool/lib/rte_ethtool.c
@@ -88,10 +88,14 @@ int
 rte_ethtool_get_regs_len(uint8_t port_id)
 {
int count_regs;
+   int reg_width;

count_regs = rte_eth_dev_get_reg_length(port_id);
+   reg_width = rte_eth_dev_get_reg_width(port_id);
+   if (reg_width < 0)
+   reg_width = sizeof(uint32_t);
if (count_regs > 0)
-   return count_regs * sizeof(uint32_t);
+   return count_regs * reg_width;
return count_regs;
 }

-- 
1.9.1

[dpdk-dev] [PATCH v3 1/2] ethdev: add callback to get register size in bytes

2016-06-01 Thread z...@semihalf.com

From: Zyta Szpak 

Version 2 of fixing the fixed register width assumption.
rte_eth_dev_get_reg_length and rte_eth_dev_get_reg callbacks
do not provide register size to the app in any way. It is
needed to allocate proper number of bytes before retrieving
registers content with rte_eth_dev_get_reg.

Signed-off-by: Zyta Szpak 
---
 lib/librte_ether/rte_ethdev.c  | 12 
 lib/librte_ether/rte_ethdev.h  | 18 ++
 lib/librte_ether/rte_ether_version.map |  7 +++
 3 files changed, 37 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a31018e..e0765f8 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -3231,6 +3231,18 @@ rte_eth_dev_get_reg_length(uint8_t port_id)
 }

 int
+rte_eth_dev_get_reg_width(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+   dev = _eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_reg_width, -ENOTSUP);
+   return (*dev->dev_ops->get_reg_width)(dev);
+}
+
+int
 rte_eth_dev_get_reg_info(uint8_t port_id, struct rte_dev_reg_info *info)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..552eaed 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1292,6 +1292,9 @@ typedef int (*eth_timesync_write_time)(struct rte_eth_dev 
*dev,
 typedef int (*eth_get_reg_length_t)(struct rte_eth_dev *dev);
 /**< @internal Retrieve device register count  */

+typedef int (*eth_get_reg_width_t)(struct rte_eth_dev *dev);
+/**< @internal Retrieve device register byte number */
+
 typedef int (*eth_get_reg_t)(struct rte_eth_dev *dev,
struct rte_dev_reg_info *info);
 /**< @internal Retrieve registers  */
@@ -1455,6 +1458,8 @@ struct eth_dev_ops {

eth_get_reg_length_t get_reg_length;
/**< Get # of registers */
+   eth_get_reg_width_t get_reg_width;
+   /**< Get # of bytes in register */
eth_get_reg_t get_reg;
/**< Get registers */
eth_get_eeprom_length_t get_eeprom_length;
@@ -3971,6 +3976,19 @@ int rte_eth_tx_queue_info_get(uint8_t port_id, uint16_t 
queue_id,
  */
 int rte_eth_dev_get_reg_length(uint8_t port_id);

+/*
+ * Retrieve the number of bytes in register for a specific device
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - (>=0) number of registers if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - others depends on the specific operations implementation.
+ */
+int rte_eth_dev_get_reg_width(uint8_t port_id);
+
 /**
  * Retrieve device registers and register attributes
  *
diff --git a/lib/librte_ether/rte_ether_version.map 
b/lib/librte_ether/rte_ether_version.map
index 214ecc7..568509c 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -132,3 +132,10 @@ DPDK_16.04 {
rte_eth_tx_buffer_set_err_callback;

 } DPDK_2.2;
+
+DPDK_16.07 {
+   global:
+
+   rte_eth_dev_get_reg_width;
+
+} DPDK_16.04;
\ No newline at end of file
-- 
1.9.1

[dpdk-dev] [PATCH v2 2/2] igb: VF supports mailbox interruption for PF link up/down

2016-06-01 Thread Wenzhuo Lu

In this scenario, kernel PF + DPDK VF, when PF finds the link
state changes, up -> down or down -> up, it will send a
message to VF by mailbox. This link state change may be
triggered by PHY disconnection/reconnection, configuration
like *ifconfig down/up* or interface parameter, like MTU,
change.
This patch enables the support of the mailbox interruption,
so VF can receive the message of link up/down.
After VF receives this message, VF port need to be reset to
recover. So the handler of this message registers a reset
callback to let APP reset the VF port.

Signed-off-by: Wenzhuo Lu 
---
 doc/guides/rel_notes/release_16_07.rst |   2 +-
 drivers/net/e1000/igb_ethdev.c | 159 +
 2 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index 990bd46..a761e3c 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -47,7 +47,7 @@ New Features
   * Dropped specific Xen Dom0 code.
   * Dropped specific anonymous mempool code in testpmd.

-* **Added mailbox interruption support for ixgbe VF.**
+* **Added mailbox interruption support for ixgbe/igb VF.**

   When the link becomes down or up, PF will use mailbox message to notice
   VF. To handle this link up/down event, add the mailbox interruption
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index f0921ee..b0e5e6a 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -86,6 +86,12 @@
 #define E1000_INCVALUE_82576 (16 << IGB_82576_TSYNC_SHIFT)
 #define E1000_TSAUXC_DISABLE_SYSTIME 0x8000

+#define E1000_VTIVAR_MISC0x01740
+#define E1000_VTIVAR_MISC_MASK   0xFF
+#define E1000_VTIVAR_VALID   0x80
+#define E1000_VTIVAR_MISC_MAILBOX0
+#define E1000_VTIVAR_MISC_INTR_MASK  0x3
+
 static int  eth_igb_configure(struct rte_eth_dev *dev);
 static int  eth_igb_start(struct rte_eth_dev *dev);
 static void eth_igb_stop(struct rte_eth_dev *dev);
@@ -259,6 +265,9 @@ static void eth_igb_assign_msix_vector(struct e1000_hw *hw, 
int8_t direction,
 static void eth_igb_write_ivar(struct e1000_hw *hw, uint8_t msix_vector,
   uint8_t index, uint8_t offset);
 static void eth_igb_configure_msix_intr(struct rte_eth_dev *dev);
+static void eth_igbvf_interrupt_handler(struct rte_intr_handle *handle,
+   void *param);
+static void igbvf_mbx_process(struct rte_eth_dev *dev);

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -554,6 +563,41 @@ igb_intr_disable(struct e1000_hw *hw)
E1000_WRITE_FLUSH(hw);
 }

+static inline void
+igbvf_intr_enable(struct rte_eth_dev *dev)
+{
+   struct e1000_hw *hw =
+   E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   /* only for mailbox */
+   E1000_WRITE_REG(hw, E1000_EIAM, 1 << E1000_VTIVAR_MISC_MAILBOX);
+   E1000_WRITE_REG(hw, E1000_EIAC, 1 << E1000_VTIVAR_MISC_MAILBOX);
+   E1000_WRITE_REG(hw, E1000_EIMS, 1 << E1000_VTIVAR_MISC_MAILBOX);
+   E1000_WRITE_FLUSH(hw);
+}
+
+/* only for mailbox now. If RX/TX needed, should extend this function.  */
+static void
+igbvf_set_ivar_map(struct e1000_hw *hw, uint8_t msix_vector)
+{
+   uint32_t tmp = 0;
+
+   /* mailbox */
+   tmp |= (msix_vector & E1000_VTIVAR_MISC_INTR_MASK);
+   tmp |= E1000_VTIVAR_VALID;
+   E1000_WRITE_REG(hw, E1000_VTIVAR_MISC, tmp);
+}
+
+static void
+eth_igbvf_configure_msix_intr(struct rte_eth_dev *dev)
+{
+   struct e1000_hw *hw =
+   E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   /* Configure VF other cause ivar */
+   igbvf_set_ivar_map(hw, E1000_VTIVAR_MISC_MAILBOX);
+}
+
 static inline int32_t
 igb_pf_reset_hw(struct e1000_hw *hw)
 {
@@ -942,6 +986,10 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 eth_dev->data->port_id, pci_dev->id.vendor_id,
 pci_dev->id.device_id, "igb_mac_82576_vf");

+   rte_intr_callback_register(_dev->intr_handle,
+  eth_igbvf_interrupt_handler,
+  (void *)eth_dev);
+
return 0;
 }

@@ -950,6 +998,7 @@ eth_igbvf_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct e1000_adapter *adapter =
E1000_DEV_PRIVATE(eth_dev->data->dev_private);
+   struct rte_pci_device *pci_dev = eth_dev->pci_dev;

PMD_INIT_FUNC_TRACE();

@@ -966,6 +1015,12 @@ eth_igbvf_dev_uninit(struct rte_eth_dev *eth_dev)
rte_free(eth_dev->data->mac_addrs);
eth_dev->data->mac_addrs = NULL;

+   /* disable uio intr before callback unregister */
+   rte_intr_disable(_dev->intr_handle);
+   rte_intr_callback_unregister(_dev->intr_handle,
+eth_igbvf_interrupt_handler,
+(void *)eth_dev);
+

[dpdk-dev] [PATCH v2 1/2] ixgbe: VF supports mailbox interruption for PF link up/down

2016-06-01 Thread Wenzhuo Lu

In this scenario, kernel PF + DPDK VF, when PF finds the link
state changes, up -> down or down -> up, it will send a
message to VF by mailbox. This link state change may be
triggered by PHY disconnection/reconnection, configuration
like *ifconfig down/up* or interface parameter, like MTU,
change.
This patch enables the support of the mailbox interruption,
so VF can receive the message of link up/down.
After VF receives this message, VF port need to be reset to
recover. So the handler of this message registers a reset
callback to let APP reset the VF port.

Signed-off-by: Wenzhuo Lu 
---
 doc/guides/rel_notes/release_16_07.rst |  6 +++
 drivers/net/ixgbe/ixgbe_ethdev.c   | 84 --
 2 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_16_07.rst 
b/doc/guides/rel_notes/release_16_07.rst
index 30e78d4..990bd46 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -47,6 +47,12 @@ New Features
   * Dropped specific Xen Dom0 code.
   * Dropped specific anonymous mempool code in testpmd.

+* **Added mailbox interruption support for ixgbe VF.**
+
+  When the link becomes down or up, PF will use mailbox message to notice
+  VF. To handle this link up/down event, add the mailbox interruption
+  support to receive the message.
+

 Resolved Issues
 ---
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index a2b170b..05f4f29 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -150,6 +150,7 @@
 #define IXGBE_VMVIR_TAGA_ETAG_INSERT   0x0800
 #define IXGBE_VMTIR(_i) (0x00017000 + ((_i) * 4)) /* 64 of these (0-63) */
 #define IXGBE_QDE_STRIP_TAG0x0004
+#define IXGBE_VTEICR_MASK  0x07

 enum ixgbevf_xcast_modes {
IXGBEVF_XCAST_MODE_NONE = 0,
@@ -361,6 +362,8 @@ static int ixgbe_timesync_read_time(struct rte_eth_dev *dev,
   struct timespec *timestamp);
 static int ixgbe_timesync_write_time(struct rte_eth_dev *dev,
   const struct timespec *timestamp);
+static void ixgbevf_dev_interrupt_handler(struct rte_intr_handle *handle,
+ void *param);

 static int ixgbe_dev_l2_tunnel_eth_type_conf
(struct rte_eth_dev *dev, struct rte_eth_l2_tunnel_conf *l2_tunnel);
@@ -1442,6 +1445,12 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
return -EIO;
}

+   rte_intr_callback_register(_dev->intr_handle,
+  ixgbevf_dev_interrupt_handler,
+  (void *)eth_dev);
+   rte_intr_enable(_dev->intr_handle);
+   ixgbevf_intr_enable(hw);
+
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x mac.type=%s",
 eth_dev->data->port_id, pci_dev->id.vendor_id,
 pci_dev->id.device_id, "ixgbe_mac_82599_vf");
@@ -1455,6 +1464,7 @@ static int
 eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
 {
struct ixgbe_hw *hw;
+   struct rte_pci_device *pci_dev = eth_dev->pci_dev;

PMD_INIT_FUNC_TRACE();

@@ -1476,6 +1486,11 @@ eth_ixgbevf_dev_uninit(struct rte_eth_dev *eth_dev)
rte_free(eth_dev->data->mac_addrs);
eth_dev->data->mac_addrs = NULL;

+   rte_intr_disable(_dev->intr_handle);
+   rte_intr_callback_unregister(_dev->intr_handle,
+ixgbevf_dev_interrupt_handler,
+(void *)eth_dev);
+
return 0;
 }

@@ -4074,6 +4089,8 @@ ixgbevf_dev_stop(struct rte_eth_dev *dev)

PMD_INIT_FUNC_TRACE();

+   ixgbevf_intr_disable(hw);
+
hw->adapter_stopped = 1;
ixgbe_stop_adapter(hw);

@@ -4818,6 +4835,9 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
uint32_t q_idx;
uint32_t vector_idx = IXGBE_MISC_VEC_ID;

+   /* Configure VF other cause ivar */
+   ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
+
/* won't configure msix register if no mapping is done
 * between intr vector and event fd.
 */
@@ -4832,9 +4852,6 @@ ixgbevf_configure_msix(struct rte_eth_dev *dev)
ixgbevf_set_ivar_map(hw, 0, q_idx, vector_idx);
intr_handle->intr_vec[q_idx] = vector_idx;
}
-
-   /* Configure VF other cause ivar */
-   ixgbevf_set_ivar_map(hw, -1, 1, vector_idx);
 }

 /**
@@ -7154,6 +7171,67 @@ ixgbevf_dev_allmulticast_disable(struct rte_eth_dev *dev)
ixgbevf_update_xcast_mode(hw, IXGBEVF_XCAST_MODE_NONE);
 }

+static void ixgbevf_mbx_process(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   u32 in_msg = 0;
+
+   if (ixgbe_read_mbx(hw, _msg, 1, 0))
+   return;
+
+   /* PF reset VF event */
+   if (in_msg == IXGBE_PF_CONTROL_MSG)
+

[dpdk-dev] [PATCH v2 0/2] support mailbox interruption on ixgbe/igb VF

2016-06-01 Thread Wenzhuo Lu

This patch set addes the support of the mailbox interruption on VF.
So, VF can receice the messges for physical link down/up. And
register a reset callback in the handler.

PS: This patch set is splitted from a previous patch set, *automatic
link recovery on ixgbe/igb VF*.

v2:
- Reword the commit log and add more details.
- Rebase the patches on the latest code.

Wenzhuo Lu (2):
  ixgbe: VF supports mailbox interruption for PF link up/down
  igb: VF supports mailbox interruption for PF link up/down

 doc/guides/rel_notes/release_16_07.rst |   6 ++
 drivers/net/e1000/igb_ethdev.c | 159 +
 drivers/net/ixgbe/ixgbe_ethdev.c   |  84 -
 3 files changed, 246 insertions(+), 3 deletions(-)

-- 
1.9.3

[dpdk-dev] about rx checksum flags

2016-06-01 Thread Ananyev, Konstantin



> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, May 31, 2016 11:03 PM
> To: Olivier MATZ
> Cc: Yuanhan Liu; dev at dpdk.org; Ananyev, Konstantin; Richardson, Bruce; 
> Adrien Mazarguil; Tan, Jianfeng
> Subject: Re: [dpdk-dev] about rx checksum flags
> 
> On Tue, 31 May 2016 22:58:57 +0200
> Olivier MATZ  wrote:
> 
> > Hi Stephen,
> >
> > On 05/31/2016 10:28 PM, Stephen Hemminger wrote:
> > > On Tue, 31 May 2016 21:11:59 +0200
> > > Olivier MATZ  wrote:
> > >
> > >>
> > >>
> > >> On 05/31/2016 10:09 AM, Yuanhan Liu wrote:
> > >>> On Mon, May 30, 2016 at 05:26:21PM +0200, Olivier Matz wrote:
> >   PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
> >   data, but the integrity of the L4 header is verified.
> >    -> the application can process the packet but must not verify the
> >   checksum by sw. It has to take care to recalculate the cksum
> >   if the packet is transmitted (either by sw or using tx offload)
> > >>>
> > >>> I like the explanation you made at [1] better :)
> > >>>
> > >>> So in general, I think this proposal is good to have.
> > >>
> > >> Thanks everyone for your feedback.
> > >>
> > >> I'll try to send a first patch proposition soon.
> > >>
> > >> Regards,
> > >> Olivier
> > >
> > > I think it is time to ditch the old definitions of Rx checksum and instead
> > > use something more compatiable with virtio (and Linux). I.e have three 
> > > values
> > >   1) checksum is know good for packet contents
> > >   2) checksum value one's complement for packet contents
> > >   3) checksum is undetermined
> > > The original definition seems to be Intel HW centric and applies to a 
> > > limited
> > > range of devices making it unusable by general application.
> > >
> > > Break the ABI, and ditch the old values (ok mark PKT_RX_L4_CKSUM_BAD as 
> > > __deprecated
> > > and remove all usage).
> > >
> >
> > Don't you think knowing that a checksum is bad could be useful?
> 
> Not really. They should be mark as undetermined, then software can recheck
> for the possibly buggy hardware.

Hmm, I don't see the point here.
If the HW clearly reports that checksum is invalid (not unknown),
why SW has to assume it is ' undetermined' and recheck it?
To me that means just wasted cycles.
In general, it sounds like really strange approach to me:
write your SW with assumption that all HW you are going to use
will not work correctly. 

> 
> > In that case the application can drop/log the packet without any
> > additional cpu cost.
> >
> > What do you mean by beeing unusable by general application?
> 
> Right now application can only see "known bad" or "indeterminate"
> there is no way to no which packets are good. Since good is the 
> desired/expected
> case, software has to checksum every packet.
> 
> >
> > I think the "2)" also requires a csum_start + csum_offset in
> > mbuf structure, right?
> 
> Not really, it would mean having a way to get the raw one's complement sum
> out of the hardware. This is a good way to support the tunnel protocol du jour
> without having to have firmware support. Unfortunately, most hardware vendors
> don't believe in doing it that way.

It might be a good feature to have, but if most HW vendors don't support it
why to bother?

> 
> 
> > Do you also suggest to drop IP checksum flags?
> 
> IP checksum offload is mostly useless. If application needs to look
> at IP, it can do whole checksum in very few instructions, the whole header
> is in the same cache line as src/dst so the HW offload is really no help.
> 
> >
> > Will it be possible to manage tunnel checksums?
> >
> > I think this would be a pretty big change. If there is no additional
> > argument than beeing more compatible with virtio/linux, I'm wondering
> > if it's worth breaking the API. Let's wait for other opinions.

I think that what Olivier proposed is good enough and
definitely a step forward from what we have right now.

Konstantin

> >
> > Thanks for your feedback.
> > Olivier

[dpdk-dev] [RFC] Yet another option for DPDK options

2016-06-01 Thread Matthew Hall

On Wed, Jun 01, 2016 at 03:00:11PM +, Wiles, Keith wrote:
> The INI file is too flat and I wanted a hierarchy in the data, the JSON data 
> is similar and XML is just hard to read.

I don't think it's fair to say JSON lacks hierarchy. Personally it is working 
great in my current application. The main "bug" is that the spec designers 
intentionally and idiotically left out the ability to make comments. But there 
are some smarter JSON parsers such as json-c and the Perl JSON parser which 
will allow them using either "#" or "//".

You can also build JSON in memory pretty nicely using json-c. It has a simple 
DOM-like API for this.

I am using it in the config file for my app right now, and passing a fake argc 
and argv to DPDK using wordexp() to prevent it from munging the argc and argv 
of my application.

> It would be nice to execute a DPDK applications like this:
> 
> ./some_dpdk_app ???config-file dpdk-config-filename

FYI, I think you used Outlook with some of MS's bad defaults and it mangled 
all your special characters...

> The dpdk-config-filename could contain a lot of information and be able to 
> startup multiple different applications. The dpdk-config-file could also 
> include other config files to complete the configuration. The format of the 
> data in the config file needs to be readable, but allow the user to put in 
> new options, needs to be hierarchical in nature and have some simple 
> functions to execute if required.

To me, this is way too complicated and includes a lot of features I'm not 
convinced we actually need or want. I'd really prefer if we just have one file 
per app. I don't want a super complicated way to configure it replacing an 
already super complicated way to configure it.

> The solution I was thinking is the file information is really just a 
> fragment of a scripting language, which the DPDK application contains this 
> scripting language interpreter. I was looking at using Lua lua.org as the 
> scripting language interpreter it is small and easy to understand.

If we're stuck doing this Lua is the best option but I'd still rather avoid 
it. I like the fact that DPDK is a lot of clean C code, this is why I find it 
so much easier to read and code than the awful kernel network stacks.

> lcore_list = mk_lcore_list("0-7", 10, "14-16"),
> coremap = mk_coremap("(0-7)@0,10,(14-16)@1"),

These magical functions feel weird compared to just having some simple 
functions that take them as JSON strings and validate them. Which is what I'm 
doing in my app right now with minimal pain.

> The EAL, driver, application, ??? would query an API to access the data and 
> the application can change his options quickly without modifying the code.

I don't want to have to use somebody else's API to get to the config of my app 
if I can avoid it. I like the approach of json-c where I can lay it out how I 
want, and pass the parts I want DPDK to have over to DPDK. I don't necessarily 
want to have to go through DPDK to get to my own config stuff. Which is what I 
am stuck doing if we put a weird proprietary DPDK specific file format or 
scripting environment in these files.

> Keith

Matthew.

[dpdk-dev] [PATCH v5 2/8] virtio: clean up virtio_dev_queue_setup

2016-06-01 Thread Tan, Jianfeng

Hi Yuanhan,

> -Original Message-
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> Sent: Wednesday, June 1, 2016 3:38 PM
> To: Tan, Jianfeng
> Cc: dev at dpdk.org; Xie, Huawei; rich.lane at bigswitch.com; mst at 
> redhat.com;
> nakajima.yoshihiro at lab.ntt.co.jp; p.fedin at samsung.com;
> ann.zhuangyanying at huawei.com; mukawa at igel.co.jp;
> nhorman at tuxdriver.com
> Subject: Re: [PATCH v5 2/8] virtio: clean up virtio_dev_queue_setup
> 
> On Mon, May 30, 2016 at 10:55:33AM +, Jianfeng Tan wrote:
> > Abstract vring hdr desc init as an inline method.
> 
> What's this patch for then? In your last version, it will be invoked
> twice, but it turned out to be wrong. So, why keeping this change?
> I didn't see it improves anything.
> 

Yes, for now, only one version is kept because the position to call this 
function is changed. And I think this segment of code functions as a special 
purpose, which can be abstracted as a function, make sense?

Thanks,
Jianfeng

[dpdk-dev] [PATCH 3/3] vhost: arrange virtio_net fields for better cache sharing

2016-06-01 Thread Xie, Huawei

On 5/3/2016 8:42 AM, Yuanhan Liu wrote:
> the ifname[] field takes so much space, that it seperate some frequently
> used fields into different caches, say, features and broadcast_rarp.
>
> This patch move all those fields that will be accessed frequently in Rx/Tx
> together (before the ifname[] field) to let them share one cache line.
>
> Signed-off-by: Yuanhan Liu 

Acked-by: Huawei Xie

[dpdk-dev] [PATCH 1/3] vhost: pre update used ring for Tx and Rx

2016-06-01 Thread Xie, Huawei

On 5/3/2016 8:42 AM, Yuanhan Liu wrote:
> Pre update and update used ring in batch for Tx and Rx at the stage
> while fetching all avail desc idx. This would reduce some cache misses
> and hence, increase the performance a bit.
>
> Pre update would be feasible as guest driver will not start processing
> those entries as far as we don't update "used->idx". (I'm not 100%
> certain I don't miss anything, though).
>
> Cc: Michael S. Tsirkin 
> Signed-off-by: Yuanhan Liu 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 58 
> +--
>  1 file changed, 28 insertions(+), 30 deletions(-)
>
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index c9cd1c5..2c3b810 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -137,7 +137,7 @@ copy_virtio_net_hdr(struct virtio_net *dev, uint64_t 
> desc_addr,
>  
>  static inline int __attribute__((always_inline))
>  copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
> -   struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied)
> +   struct rte_mbuf *m, uint16_t desc_idx)
>  {
>   uint32_t desc_avail, desc_offset;
>   uint32_t mbuf_avail, mbuf_offset;
> @@ -161,7 +161,6 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct 
> vhost_virtqueue *vq,
>   desc_offset = dev->vhost_hlen;
>   desc_avail  = desc->len - dev->vhost_hlen;
>  
> - *copied = rte_pktmbuf_pkt_len(m);
>   mbuf_avail  = rte_pktmbuf_data_len(m);
>   mbuf_offset = 0;
>   while (mbuf_avail != 0 || m->next != NULL) {
> @@ -262,6 +261,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>   struct vhost_virtqueue *vq;
>   uint16_t res_start_idx, res_end_idx;
>   uint16_t desc_indexes[MAX_PKT_BURST];
> + uint16_t used_idx;
>   uint32_t i;
>  
>   LOG_DEBUG(VHOST_DATA, "(%d) %s\n", dev->vid, __func__);
> @@ -285,27 +285,29 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>   /* Retrieve all of the desc indexes first to avoid caching issues. */
>   rte_prefetch0(>avail->ring[res_start_idx & (vq->size - 1)]);
>   for (i = 0; i < count; i++) {
> - desc_indexes[i] = vq->avail->ring[(res_start_idx + i) &
> -   (vq->size - 1)];
> + used_idx = (res_start_idx + i) & (vq->size - 1);
> + desc_indexes[i] = vq->avail->ring[used_idx];
> + vq->used->ring[used_idx].id = desc_indexes[i];
> + vq->used->ring[used_idx].len = pkts[i]->pkt_len +
> +dev->vhost_hlen;
> + vhost_log_used_vring(dev, vq,
> + offsetof(struct vring_used, ring[used_idx]),
> + sizeof(vq->used->ring[used_idx]));
>   }
>  
>   rte_prefetch0(>desc[desc_indexes[0]]);
>   for (i = 0; i < count; i++) {
>   uint16_t desc_idx = desc_indexes[i];
> - uint16_t used_idx = (res_start_idx + i) & (vq->size - 1);
> - uint32_t copied;
>   int err;
>  
> - err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx, );
> -
> - vq->used->ring[used_idx].id = desc_idx;
> - if (unlikely(err))
> + err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx);
> + if (unlikely(err)) {
> + used_idx = (res_start_idx + i) & (vq->size - 1);
>   vq->used->ring[used_idx].len = dev->vhost_hlen;
> - else
> - vq->used->ring[used_idx].len = copied + dev->vhost_hlen;
> - vhost_log_used_vring(dev, vq,
> - offsetof(struct vring_used, ring[used_idx]),
> - sizeof(vq->used->ring[used_idx]));
> + vhost_log_used_vring(dev, vq,
> + offsetof(struct vring_used, ring[used_idx]),
> + sizeof(vq->used->ring[used_idx]));
> + }
>  
>   if (i + 1 < count)
>   rte_prefetch0(>desc[desc_indexes[i+1]]);
> @@ -879,6 +881,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>   /* Prefetch available ring to retrieve head indexes. */
>   used_idx = vq->last_used_idx & (vq->size - 1);
>   rte_prefetch0(>avail->ring[used_idx]);
> + rte_prefetch0(>used->ring[used_idx]);
>  
>   count = RTE_MIN(count, MAX_PKT_BURST);
>   count = RTE_MIN(count, free_entries);
> @@ -887,22 +890,23 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  
>   /* Retrieve all of the head indexes first to avoid caching issues. */
>   for (i = 0; i < count; i++) {
> - desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
> - (vq->size - 1)];
> + used_idx = (vq->last_used_idx + i) & (vq->size - 1);
> + desc_indexes[i] = vq->avail->ring[used_idx];
> +
> +

[dpdk-dev] [PATCH 2/3] vhost: optimize dequeue for small packets

2016-06-01 Thread Xie, Huawei

On 5/3/2016 8:42 AM, Yuanhan Liu wrote:
> Both current kernel virtio driver and DPDK virtio driver use at least
> 2 desc buffer for Tx: the first for storing the header, and the others
> for storing the data.

Tx could prepend some space for virtio net header whenever possible, so
that it could use only one descriptor.

Another thing is this doesn't reduce the check because you also add a check.


>
> Therefore, we could fetch the first data desc buf before the main loop,
> and do the copy first before the check of "are we done yet?". This
> could save one check for small packets, that just have one data desc
> buffer and need one mbuf to store it.
>
> Signed-off-by: Yuanhan Liu 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 52 
> ++-
>  1 file changed, 36 insertions(+), 16 deletions(-)
>
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 2c3b810..34d6ed1 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -753,18 +753,48 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
> vhost_virtqueue *vq,
>   return -1;
>  
>   desc_addr = gpa_to_vva(dev, desc->addr);
> - rte_prefetch0((void *)(uintptr_t)desc_addr);
> -
> - /* Retrieve virtio net header */
>   hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
> - desc_avail  = desc->len - dev->vhost_hlen;
> - desc_offset = dev->vhost_hlen;
> + rte_prefetch0(hdr);
> +
> + /*
> +  * Both current kernel virio driver and DPDK virtio driver
> +  * use at least 2 desc bufferr for Tx: the first for storing
> +  * the header, and others for storing the data.
> +  */
> + if (likely(desc->len == dev->vhost_hlen)) {
> + desc = >desc[desc->next];
> +
> + desc_addr = gpa_to_vva(dev, desc->addr);
> + rte_prefetch0((void *)(uintptr_t)desc_addr);
> +
> + desc_offset = 0;
> + desc_avail  = desc->len;
> + nr_desc+= 1;
> +
> + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> + } else {
> + desc_avail  = desc->len - dev->vhost_hlen;
> + desc_offset = dev->vhost_hlen;
> + }
>  
>   mbuf_offset = 0;
>   mbuf_avail  = m->buf_len - RTE_PKTMBUF_HEADROOM;
> - while (desc_avail != 0 || (desc->flags & VRING_DESC_F_NEXT) != 0) {
> + while (1) {
> + cpy_len = RTE_MIN(desc_avail, mbuf_avail);
> + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset),
> + (void *)((uintptr_t)(desc_addr + desc_offset)),
> + cpy_len);
> +
> + mbuf_avail  -= cpy_len;
> + mbuf_offset += cpy_len;
> + desc_avail  -= cpy_len;
> + desc_offset += cpy_len;
> +
>   /* This desc reaches to its end, get the next one */
>   if (desc_avail == 0) {
> + if ((desc->flags & VRING_DESC_F_NEXT) == 0)
> + break;
> +
>   if (unlikely(desc->next >= vq->size ||
>++nr_desc >= vq->size))
>   return -1;
> @@ -800,16 +830,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct 
> vhost_virtqueue *vq,
>   mbuf_offset = 0;
>   mbuf_avail  = cur->buf_len - RTE_PKTMBUF_HEADROOM;
>   }
> -
> - cpy_len = RTE_MIN(desc_avail, mbuf_avail);
> - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset),
> - (void *)((uintptr_t)(desc_addr + desc_offset)),
> - cpy_len);
> -
> - mbuf_avail  -= cpy_len;
> - mbuf_offset += cpy_len;
> - desc_avail  -= cpy_len;
> - desc_offset += cpy_len;
>   }
>  
>   prev->data_len = mbuf_offset;

[dpdk-dev] [PATCH] virtio: check if devargs is NULL before checking its value

2016-06-01 Thread Xie, Huawei

On 5/27/2016 10:08 AM, Yuanhan Liu wrote:
> On Wed, May 25, 2016 at 12:47:30PM +0200, Thomas Monjalon wrote:
>>> -   dev->devargs->type != RTE_DEVTYPE_WHITELISTED_PCI) {
>>> +   (!dev->devargs ||
>>> +dev->devargs->type != RTE_DEVTYPE_WHITELISTED_PCI)) {
>> Should the title be something like "fix crash ..."?
>>
>> I would also add
>> Reported-by: Vincent Li 
> Huawei, the two are good comments (Thomas, thanks for the review, BTW :).

np.

>
> So, mind to send v2? BTW, I think this patch deserves some explanation,
> say, why dev->devargs could be NULL.
>
>
>
>   --yliu
>

[dpdk-dev] [PATCH] virtio: use volatile to get used->idx in the loop

2016-06-01 Thread Xie, Huawei

On 5/30/2016 4:20 PM, Yuanhan Liu wrote:
> On Wed, May 25, 2016 at 12:16:41AM +0800, Huawei Xie wrote:
>> There is no external function call or any barrier in the loop,
>> the used->idx would only be retrieved once.
>>
>> Signed-off-by: Huawei Xie 
>> ---
>>  drivers/net/virtio/virtio_ethdev.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio/virtio_ethdev.c 
>> b/drivers/net/virtio/virtio_ethdev.c
>> index c3fb628..f6d6305 100644
>> --- a/drivers/net/virtio/virtio_ethdev.c
>> +++ b/drivers/net/virtio/virtio_ethdev.c
>> @@ -204,7 +204,8 @@ virtio_send_command(struct virtqueue *vq, struct 
>> virtio_pmd_ctrl *ctrl,
>>  usleep(100);
>>  }
>>  
>> -while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
>> +while (vq->vq_used_cons_idx !=
>> +   *((volatile uint16_t *)(>vq_ring.used->idx))) {
> I'm wondering maybe we could fix VIRTQUEUE_NUSED (which has no such
> qualifier) and use this macro here?
>
> If you check the reference of that macro, you might find similar
> issues, say, it is also used inside the while-loop of
> virtio_recv_mergeable_pkts().
>
>   --yliu
>  
>

Yes, seems it has same issue though haven't confirmed with asm code.

[dpdk-dev] [PATCH v3 20/35] mempool: allocate in several memory chunks by default

2016-06-01 Thread Ferruh Yigit

On 5/18/2016 12:04 PM, Olivier Matz wrote:
> Introduce rte_mempool_populate_default() which allocates
> mempool objects in several memzones.
> 
> The mempool header is now always allocated in a specific memzone
> (not with its objects). Thanks to this modification, we can remove
> many specific behavior that was required when hugepages are not
> enabled in case we are using rte_mempool_xmem_create().
> 
> This change requires to update how kni and mellanox drivers lookup for
> mbuf memory. For now, this will only work if there is only one memory
> chunk (like today), but we could make use of rte_mempool_mem_iter() to
> support more memory chunks.
> 
> We can also remove RTE_MEMPOOL_OBJ_NAME that is not required anymore for
> the lookup, as memory chunks are referenced by the mempool.
> 
> Note that rte_mempool_create() is still broken (it was the case before)
> when there is no hugepages support (rte_mempool_create_xmem() has to be
> used). This is fixed in next commit.
> 
> Signed-off-by: Olivier Matz 
> ---



> - if (vaddr == NULL) {
> - /* calculate address of the first elt for continuous mempool. */
> - obj = (char *)mp + MEMPOOL_HEADER_SIZE(mp, cache_size) +
> - private_data_size;
> - obj = RTE_PTR_ALIGN_CEIL(obj, RTE_MEMPOOL_ALIGN);
> -
> - ret = rte_mempool_populate_phys(mp, obj,
> - mp->phys_addr + ((char *)obj - (char *)mp),
> - objsz.total_size * n, NULL, NULL);
> - if (ret != (int)mp->size)
> - goto exit_unlock;
> - } else {
> + if (vaddr == NULL)
> + ret = rte_mempool_populate_default(mp);

This breaks current ivshmem code, since now mp has multiple mz.
I will send a patch for ivshmem.

[dpdk-dev] [PATCH] app/test: fix bond device name too long

2016-06-01 Thread Xu, HuilongX

Test case: link_bonging_autotest 
Package:dpdk.org master branch newest commit + this patch 
Test cmdline: ./x86_64-native-linuxapp-gcc/app/test -c  -n 1
  Exec link_bonging_autotest cmdl Test environment:
OS: dpdk-rhel72 3.10.0-327.el7.x86_64
Gcc: gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
Hugepage: 4096*2M

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski
> Sent: Friday, May 27, 2016 11:21 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] app/test: fix bond device name too long
> 
> Bond device name was too long (grather than 32 signs) that
> cause mempool allocation to fail.
> 
> Fixes: 92073ef961ee ("bond: unit tests")
> 
> Signed-off-by: Michal Jastrzebski 
  Tested-by: huilong xu 
> ---
>  app/test/test_link_bonding.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
> index 7cbc289..eeb1395 100644
> --- a/app/test/test_link_bonding.c
> +++ b/app/test/test_link_bonding.c
> @@ -83,7 +83,7 @@
>  #define MAX_PKT_BURST(512)
>  #define DEF_PKT_BURST(16)
> 
> -#define BONDED_DEV_NAME  ("unit_test_bonded_device")
> +#define BONDED_DEV_NAME  ("unit_test_bond_dev")
> 
>  #define INVALID_SOCKET_ID(-1)
>  #define INVALID_PORT_ID  (-1)
> --
> 1.7.9.5

[dpdk-dev] [PATCH v5 1/3] mempool: support external handler

2016-06-01 Thread Jerin Jacob

On Tue, May 31, 2016 at 10:41:00PM +0200, Olivier MATZ wrote:
> Hi,
> 
> On 05/31/2016 06:03 PM, Jerin Jacob wrote:
> > On Tue, May 31, 2016 at 04:37:02PM +0100, Hunt, David wrote:
> >>
> >>
> >> On 5/31/2016 9:53 AM, Jerin Jacob wrote:
> >>> On Mon, May 30, 2016 at 12:27:26PM +0100, Hunt, David wrote:
>  New mempool handlers will use rte_mempool_create_empty(),
>  rte_mempool_set_handler(),
>  then rte_mempool_populate_*(). These three functions are new to this
>  release, to no problem
> >>> Having separate APIs for external pool-manager create is worrisome in
> >>> application perspective. Is it possible to have rte_mempool_[xmem]_create
> >>> for the both external and existing SW pool manager and make
> >>> rte_mempool_create_empty and rte_mempool_populate_*  internal functions.
> >>>
> >>> IMO, We can do that by selecting  specific rte_mempool_set_handler()
> >>> based on _flags_ encoding, something like below
> >>>
> >>> bit 0 - 16   // generic bits uses across all the pool managers
> >>> bit 16 - 23  // pool handler specific flags bits
> >>> bit 24 - 31  // to select the specific pool manager(Up to 256 different 
> >>> flavors of
> >>> pool managers, For backward compatibility, make '0'(in 24-31) to select
> >>> existing SW pool manager.
> >>>
> >>> and applications can choose the handlers by selecting the flag in
> >>> rte_mempool_[xmem]_create, That way it will be easy in testpmd or any 
> >>> other
> >>> applications to choose the pool handler from command line etc in future.
> >>
> >> There might be issues with the 8-bit handler number, as we'd have to add an
> >> api call to
> >> first get the index of a given hander by name, then OR it into the flags.
> >> That would mean
> >> also extra API calls for the non-default external handlers. I do agree with
> >> the handler-specific
> >> bits though.
> > 
> > That would be an internal API(upper 8 bits to handler name). Right ?
> > Seems to be OK for me.
> > 
> >>
> >> Having the _empty and _set_handler  APIs seems to me to be OK for the
> >> moment. Maybe Olivier could comment?
> >>
> > 
> > But need 3 APIs. Right? _empty , _set_handler and _populate ? I believe
> > it is better reduce the public API in spec where ever possible ?
> > 
> > Maybe Olivier could comment ?
> 
> Well, I think having 3 different functions is not a problem if the API
> is clearer.
> 
> In my opinion, the following:
>   rte_mempool_create_empty()
>   rte_mempool_set_handler()
>   rte_mempool_populate()
> 
> is clearer than:
>   rte_mempool_create(15 args)

But proposed scheme is not adding any new arguments to
rte_mempool_create. It just extending the existing flag.

rte_mempool_create(15 args) is still their as API for internal pool
creation.

> 
> Splitting the flags into 3 groups, with one not beeing flags but a
> pool handler number looks overcomplicated from a user perspective.

I am concerned with seem less integration with existing applications,
IMO, Its not worth having separate functions for external vs internal
pool creation for application(now each every applications has to added this
logic every where for no good reason), just my 2 cents.

> 
> >>> and we can remove "mbuf: get default mempool handler from configuration"
> >>> change-set OR just add if RTE_MBUF_DEFAULT_MEMPOOL_HANDLER is defined 
> >>> then set
> >>> the same with rte_mempool_set_handler in rte_mempool_[xmem]_create.
> >>>
> >>> What do you think?
> >>
> >> The "configuration" patch is to allow users to quickly change the mempool
> >> handler
> >> by changing RTE_MBUF_DEFAULT_MEMPOOL_HANDLER to another string of a known
> >> handler. It could just as easily be left out and use the 
> >> rte_mempool_create.
> >>
> > 
> > Yes, I understand, but I am trying to avoid build time constant. IMO, It
> > would be better by default RTE_MBUF_DEFAULT_MEMPOOL_HANDLER is not
> > defined in config. and for quick change developers can introduce the build 
> > with RTE_MBUF_DEFAULT_MEMPOOL_HANDLER="specific handler"
> 
> My understanding of the compile-time configuration option was
> to allow a specific architecture to define a specific hw-assisted
> handler by default.
> 
> Indeed, if there is no such need for now, we may remove it. But
> we need a way to select another handler, at least in test-pmd
> (in command line arguments?).

like txflags in testpmd, IMO, mempool flags will help to select the handlers
seamlessly as suggest above.

If we are _not_ taking the flags based selection scheme then it makes to
keep RTE_MBUF_DEFAULT_MEMPOOL_HANDLER

> 
> 
>  to add a parameter to one of them for the config data. Also since we're
>  adding some new
>  elements to the mempool structure, how about we add a new pointer for a 
>  void
>  pointer to a
>  config data structure, as defined by the handler.
> 
>  So, new element in rte_mempool struct alongside the *pool
>   void *pool;
>   void *pool_config;
> 
>  Then add a param to the

[dpdk-dev] [PATCH] app/test: fix bond device name too long

2016-06-01 Thread Xu, HuilongX

Tester-by: huilong xu
Test case: link_bonging_autotest
Package:dpdk.org master branch newest commit + this patch
Test cmdline: ./x86_64-native-linuxapp-gcc/app/test -c  -n 1
  Exec link_bonging_autotest cmdl
Test environment:
OS: dpdk-rhel72 3.10.0-327.el7.x86_64
Gcc: gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
Hugepage: 4096*2M

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michal Jastrzebski
> Sent: Friday, May 27, 2016 11:21 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] app/test: fix bond device name too long
> 
> Bond device name was too long (grather than 32 signs) that
> cause mempool allocation to fail.
> 
> Fixes: 92073ef961ee ("bond: unit tests")
> 
> Signed-off-by: Michal Jastrzebski 
> ---
>  app/test/test_link_bonding.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
> index 7cbc289..eeb1395 100644
> --- a/app/test/test_link_bonding.c
> +++ b/app/test/test_link_bonding.c
> @@ -83,7 +83,7 @@
>  #define MAX_PKT_BURST(512)
>  #define DEF_PKT_BURST(16)
> 
> -#define BONDED_DEV_NAME  ("unit_test_bonded_device")
> +#define BONDED_DEV_NAME  ("unit_test_bond_dev")
> 
>  #define INVALID_SOCKET_ID(-1)
>  #define INVALID_PORT_ID  (-1)
> --
> 1.7.9.5

[dpdk-dev] about memzone name size issue

2016-06-01 Thread Xu, HuilongX

Hi ,
I will tester-by this patch, thanks  a lot.

> -Original Message-
> From: Iremonger, Bernard
> Sent: Tuesday, May 31, 2016 9:59 PM
> To: Wiles, Keith; Xu, HuilongX; dev at dpdk.org
> Subject: RE: [dpdk-dev] about memzone name size issue
> 
> Hi Huilong,
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wiles, Keith
> > Sent: Tuesday, May 31, 2016 2:18 PM
> > To: Xu, HuilongX ; dev at dpdk.org
> > Subject: Re: [dpdk-dev] about memzone name size issue
> >
> > >Hi all,
> > >I find a issue on link_bonding unit test case.
> > >
> > >When I run model6 test case, will generate core dump error.
> > >I debug it, find the error code in function:
> > >rte_mempool_create_empty(const char *name, unsigned n, unsigned
> > elt_size,
> > >unsigned cache_size, unsigned private_data_size,
> > >int socket_id, unsigned flags) {
> > >  ...
> > >ret = snprintf(mz_name, sizeof(mz_name),
> > RTE_MEMPOOL_MZ_FORMAT, name);
> > >if (ret < 0 || ret >= (int)sizeof(mz_name)) {
> > >rte_errno = ENAMETOOLONG;
> > >goto exit_unlock;
> > >}
> > >   .
> > >}
> > >The memzone name size only 32 bytes, but the mz_name in link_bonding
> is
> > bigger 32 bytes. Could we set memzone name size to 64 bytes ?
> > >Thanks  a lot
> >
> > Having a name of 32 bytes is pretty big IMO, what would be a better
> reason
> > for changing a internal structure as it will take two releases to make
> that
> > change?
> > How big is the name string you are passing into the routine?
> > >
> >
> >
> The following patches were submitted last Friday to fix this issue with
> the link bonding tests.
> 
> http://dpdk.org/dev/patchwork/patch/13052/
> http://dpdk.org/dev/patchwork/patch/13053/
> 
> Regards,
> 
> Bernard.
>

[dpdk-dev] [PATCH v2 5/5] testpmd: show topology at forwarding start

2016-06-01 Thread Zhihong Wang

This patch show topology at forwarding start.

"show config fwd" also does this, but showing it directly can reduce the
possibility of misconfiguration.


Signed-off-by: Zhihong Wang 
---
 app/test-pmd/cmdline.c | 2 +-
 app/test-pmd/config.c  | 4 ++--
 app/test-pmd/testpmd.c | 2 +-
 app/test-pmd/testpmd.h | 3 +--
 4 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index ef66d4e..bc800f8 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -5445,7 +5445,7 @@ static void cmd_showcfg_parsed(void *parsed_result,
else if (!strcmp(res->what, "cores"))
fwd_lcores_config_display();
else if (!strcmp(res->what, "fwd"))
-   fwd_config_display();
+   fwd_config_setup_display();
else if (!strcmp(res->what, "txpkts"))
show_tx_pkt_segments();
 }
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index cfdacd8..c70f308 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1383,7 +1383,7 @@ icmp_echo_config_setup(void)
}
 }

-void
+static void
 fwd_config_setup(void)
 {
cur_fwd_config.fwd_eng = cur_fwd_eng;
@@ -1443,7 +1443,7 @@ pkt_fwd_config_display(struct fwd_config *cfg)


 void
-fwd_config_display(void)
+fwd_config_setup_display(void)
 {
fwd_config_setup();
pkt_fwd_config_display(_fwd_config);
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9b1d99c..b946034 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1009,7 +1009,7 @@ start_packet_forwarding(int with_tx_first)
if(!no_flush_rx)
flush_fwd_rx_queues();

-   fwd_config_setup();
+   fwd_config_setup_display();
rxtx_config_display();

for (i = 0; i < cur_fwd_config.nb_fwd_ports; i++) {
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 62ec055..5fd08e8 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -480,9 +480,8 @@ void port_infos_display(portid_t port_id);
 void rx_queue_infos_display(portid_t port_idi, uint16_t queue_id);
 void tx_queue_infos_display(portid_t port_idi, uint16_t queue_id);
 void fwd_lcores_config_display(void);
-void fwd_config_display(void);
+void fwd_config_setup_display(void);
 void rxtx_config_display(void);
-void fwd_config_setup(void);
 void set_def_fwd_config(void);
 void reconfig(portid_t new_port_id, unsigned socket_id);
 int init_fwd_streams(void);
-- 
2.5.0

[dpdk-dev] [PATCH v2 4/5] testpmd: handle all rxqs in rss setup

2016-06-01 Thread Zhihong Wang

This patch removes constraints in rxq handling when multiqueue is enabled
to handle all the rxqs.

Current testpmd forces a dedicated core for each rxq, some rxqs may be
ignored when core number is less than rxq number, and that causes confusion
and inconvenience.


Signed-off-by: Zhihong Wang 
---
 app/test-pmd/config.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index f487b87..cfdacd8 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1196,19 +1196,13 @@ rss_fwd_config_setup(void)
cur_fwd_config.nb_fwd_ports = nb_fwd_ports;
cur_fwd_config.nb_fwd_streams =
(streamid_t) (nb_q * cur_fwd_config.nb_fwd_ports);
-   if (cur_fwd_config.nb_fwd_streams > cur_fwd_config.nb_fwd_lcores)
-   cur_fwd_config.nb_fwd_streams =
-   (streamid_t)cur_fwd_config.nb_fwd_lcores;
-   else
-   cur_fwd_config.nb_fwd_lcores =
-   (lcoreid_t)cur_fwd_config.nb_fwd_streams;

/* reinitialize forwarding streams */
init_fwd_streams();

setup_fwd_config_of_each_lcore(_fwd_config);
rxp = 0; rxq = 0;
-   for (lc_id = 0; lc_id < cur_fwd_config.nb_fwd_lcores; lc_id++) {
+   for (lc_id = 0; lc_id < cur_fwd_config.nb_fwd_streams; lc_id++) {
struct fwd_stream *fs;

fs = fwd_streams[lc_id];
-- 
2.5.0

[dpdk-dev] [PATCH v2 3/5] testpmd: show throughput in port stats

2016-06-01 Thread Zhihong Wang

This patch adds throughput numbers (in the period since last use of this
command) in port statistics display for "show port stats (port_id|all)".


Signed-off-by: Zhihong Wang 
---
 app/test-pmd/config.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index c611649..f487b87 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -92,6 +92,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "testpmd.h"

@@ -150,6 +151,10 @@ print_ethaddr(const char *name, struct ether_addr 
*eth_addr)
 void
 nic_stats_display(portid_t port_id)
 {
+   static uint64_t sum_rx[RTE_MAX_ETHPORTS];
+   static uint64_t sum_tx[RTE_MAX_ETHPORTS];
+   static uint64_t cycles[RTE_MAX_ETHPORTS];
+   uint64_t pkt_rx, pkt_tx, cycle;
struct rte_eth_stats stats;
struct rte_port *port = [port_id];
uint8_t i;
@@ -209,6 +214,21 @@ nic_stats_display(portid_t port_id)
}
}

+   cycle = cycles[port_id];
+   cycles[port_id] = rte_rdtsc();
+   if (cycle > 0)
+   cycle = cycles[port_id] - cycle;
+
+   pkt_rx = stats.ipackets - sum_rx[port_id];
+   pkt_tx = stats.opackets - sum_tx[port_id];
+   sum_rx[port_id] = stats.ipackets;
+   sum_tx[port_id] = stats.opackets;
+   printf("\n  Throughput (since last show)\n");
+   printf("  RX-pps: %12"PRIu64"\n"
+   "  TX-pps: %12"PRIu64"\n",
+   cycle > 0 ? pkt_rx * rte_get_tsc_hz() / cycle : 0,
+   cycle > 0 ? pkt_tx * rte_get_tsc_hz() / cycle : 0);
+
printf("  %s%s\n",
   nic_stats_border, nic_stats_border);
 }
-- 
2.5.0

[dpdk-dev] [PATCH v2 2/5] testpmd: configurable tx_first burst number

2016-06-01 Thread Zhihong Wang

This patch enables configurable tx_first burst number.

Use "start tx_first (burst_num)" to specify how many bursts of packets to
be sent before forwarding start, or "start tx_first" like before for the
default 1 burst send.


Signed-off-by: Zhihong Wang 
---
 app/test-pmd/cmdline.c  | 41 +
 app/test-pmd/testpmd.c  |  7 +++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  6 +++--
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 0af3f05..ef66d4e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -5309,6 +5309,46 @@ cmdline_parse_inst_t cmd_start_tx_first = {
},
 };

+/* *** START FORWARDING WITH N TX BURST FIRST *** */
+struct cmd_start_tx_first_n_result {
+   cmdline_fixed_string_t start;
+   cmdline_fixed_string_t tx_first;
+   uint32_t tx_num;
+};
+
+static void
+cmd_start_tx_first_n_parsed(void *parsed_result,
+ __attribute__((unused)) struct cmdline *cl,
+ __attribute__((unused)) void *data)
+{
+   struct cmd_start_tx_first_n_result *res = parsed_result;
+
+   start_packet_forwarding(res->tx_num);
+}
+
+cmdline_parse_token_string_t cmd_start_tx_first_n_start =
+   TOKEN_STRING_INITIALIZER(struct cmd_start_tx_first_n_result,
+   start, "start");
+cmdline_parse_token_string_t cmd_start_tx_first_n_tx_first =
+   TOKEN_STRING_INITIALIZER(struct cmd_start_tx_first_n_result,
+   tx_first, "tx_first");
+cmdline_parse_token_num_t cmd_start_tx_first_n_tx_num =
+   TOKEN_NUM_INITIALIZER(struct cmd_start_tx_first_n_result,
+   tx_num, UINT32);
+
+cmdline_parse_inst_t cmd_start_tx_first_n = {
+   .f = cmd_start_tx_first_n_parsed,
+   .data = NULL,
+   .help_str = "start packet forwarding, after sending  "
+   "bursts of packets",
+   .tokens = {
+   (void *)_start_tx_first_n_start,
+   (void *)_start_tx_first_n_tx_first,
+   (void *)_start_tx_first_n_tx_num,
+   NULL,
+   },
+};
+
 /* *** SET LINK UP *** */
 struct cmd_set_link_up_result {
cmdline_fixed_string_t set;
@@ -10468,6 +10508,7 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)_showcfg,
(cmdline_parse_inst_t *)_start,
(cmdline_parse_inst_t *)_start_tx_first,
+   (cmdline_parse_inst_t *)_start_tx_first_n,
(cmdline_parse_inst_t *)_set_link_up,
(cmdline_parse_inst_t *)_set_link_down,
(cmdline_parse_inst_t *)_reset,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 7ab67b8..9b1d99c 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1043,8 +1043,11 @@ start_packet_forwarding(int with_tx_first)
for (i = 0; i < cur_fwd_config.nb_fwd_ports; i++)
(*port_fwd_begin)(fwd_ports_ids[i]);
}
-   launch_packet_forwarding(run_one_txonly_burst_on_core);
-   rte_eal_mp_wait_lcore();
+   while (with_tx_first--) {
+   launch_packet_forwarding(
+   run_one_txonly_burst_on_core);
+   rte_eal_mp_wait_lcore();
+   }
port_fwd_end = tx_only_engine.port_fwd_end;
if (port_fwd_end != NULL) {
for (i = 0; i < cur_fwd_config.nb_fwd_ports; i++)
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 03412db..ff94593 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -98,9 +98,11 @@ Start packet forwarding with current configuration::
 start tx_first
 ~~

-Start packet forwarding with current configuration after sending one burst of 
packets::
+Start packet forwarding with current configuration after sending specified 
number of bursts of packets::

-   testpmd> start tx_first
+   testpmd> start tx_first (""|burst_num)
+
+The default burst number is 1 when ``burst_num`` not presented.

 stop
 
-- 
2.5.0

[dpdk-dev] [PATCH v2 1/5] testpmd: add retry option

2016-06-01 Thread Zhihong Wang

This patch adds retry option in testpmd to prevent most packet losses.
It can be enabled by "set fwd  retry". All modes except rxonly
support this option.

Adding retry mechanism expands test case coverage to support scenarios
where packet loss affects test results.


Signed-off-by: Zhihong Wang 
---
 app/test-pmd/Makefile   |   1 -
 app/test-pmd/cmdline.c  |  75 -
 app/test-pmd/config.c   |  47 +++-
 app/test-pmd/csumonly.c |  12 ++
 app/test-pmd/flowgen.c  |  12 ++
 app/test-pmd/icmpecho.c |  15 +++
 app/test-pmd/iofwd.c|  22 +++-
 app/test-pmd/macfwd-retry.c | 164 
 app/test-pmd/macfwd.c   |  13 +++
 app/test-pmd/macswap.c  |  12 ++
 app/test-pmd/testpmd.c  |   4 +-
 app/test-pmd/testpmd.h  |  11 +-
 app/test-pmd/txonly.c   |  12 ++
 doc/guides/testpmd_app_ug/run_app.rst   |   1 -
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  10 +-
 15 files changed, 227 insertions(+), 184 deletions(-)
 delete mode 100644 app/test-pmd/macfwd-retry.c

diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile
index 40039a1..2a0b5a5 100644
--- a/app/test-pmd/Makefile
+++ b/app/test-pmd/Makefile
@@ -50,7 +50,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_CMDLINE) += cmdline.c
 SRCS-y += config.c
 SRCS-y += iofwd.c
 SRCS-y += macfwd.c
-SRCS-y += macfwd-retry.c
 SRCS-y += macswap.c
 SRCS-y += flowgen.c
 SRCS-y += rxonly.c
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index c5b9479..0af3f05 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -246,8 +246,8 @@ static void cmd_help_long_parsed(void *parsed_result,
"Set number of packets per burst.\n\n"

"set burst tx delay (microseconds) retry (num)\n"
-   "Set the transmit delay time and number of retries"
-   " in mac_retry forwarding mode.\n\n"
+   "Set the transmit delay time and number of retries,"
+   " effective when retry is enabled.\n\n"

"set txpkts (x[,y]*)\n"
"Set the length of each segment of TXONLY"
@@ -4480,6 +4480,7 @@ static void cmd_set_fwd_mode_parsed(void *parsed_result,
 {
struct cmd_set_fwd_mode_result *res = parsed_result;

+   retry_enabled = 0;
set_pkt_forwarding_mode(res->mode);
 }

@@ -4525,6 +4526,74 @@ static void cmd_set_fwd_mode_init(void)
token_struct->string_data.str = token;
 }

+/* *** SET RETRY FORWARDING MODE *** */
+struct cmd_set_fwd_retry_mode_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t fwd;
+   cmdline_fixed_string_t mode;
+   cmdline_fixed_string_t retry;
+};
+
+static void cmd_set_fwd_retry_mode_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_set_fwd_retry_mode_result *res = parsed_result;
+
+   retry_enabled = 1;
+   set_pkt_forwarding_mode(res->mode);
+}
+
+cmdline_parse_token_string_t cmd_setfwd_retry_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_retry_mode_result,
+   set, "set");
+cmdline_parse_token_string_t cmd_setfwd_retry_fwd =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_retry_mode_result,
+   fwd, "fwd");
+cmdline_parse_token_string_t cmd_setfwd_retry_mode =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_retry_mode_result,
+   mode,
+   "" /* defined at init */);
+cmdline_parse_token_string_t cmd_setfwd_retry_retry =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_fwd_retry_mode_result,
+   retry, "retry");
+
+cmdline_parse_inst_t cmd_set_fwd_retry_mode = {
+   .f = cmd_set_fwd_retry_mode_parsed,
+   .data = NULL,
+   .help_str = NULL, /* defined at init */
+   .tokens = {
+   (void *)_setfwd_retry_set,
+   (void *)_setfwd_retry_fwd,
+   (void *)_setfwd_retry_mode,
+   (void *)_setfwd_retry_retry,
+   NULL,
+   },
+};
+
+static void cmd_set_fwd_retry_mode_init(void)
+{
+   char *modes, *c;
+   static char token[128];
+   static char help[256];
+   cmdline_parse_token_string_t *token_struct;
+
+   modes = list_pkt_forwarding_retry_modes();
+   snprintf(help, sizeof(help), "set fwd %s retry - "
+   "set packet forwarding mode with retry", modes);
+   cmd_set_fwd_retry_mode.help_str = help;
+
+   /* string token separator is # */
+   for (c = token; *modes != '\0'; modes++)
+   if (*modes == '|')
+   *c++ = '#';
+   else
+

[dpdk-dev] [PATCH v2 0/5] vhost/virtio performance loopback utility

2016-06-01 Thread Zhihong Wang

This patch enables vhost/virtio pmd performance loopback test in testpmd.
All the features are for general usage.

The loopback test focuses on the maximum full-path packet forwarding
performance between host and guest, it runs vhost/virtio pmd only without
introducing extra overhead.

Therefore, the main requirement is traffic generation, since there's no
other packet generators like IXIA to help.

In current testpmd, iofwd is the best candidate to perform this loopback
test because it's the fastest possible forwarding engine: Start testpmd
iofwd in host with 1 vhost port, and start testpmd iofwd in the connected
guest with 1 corresponding virtio port, and these 2 ports form a forwarding
loop: Host vhost tx -> Guest virtio rx -> Guest virtio tx -> Host vhost rx.

As to traffic generation, "start tx_first" injects a burst of packets into
the loop.

However 2 issues remain:

   1. If only 1 burst of packets are injected in the loop, there will
  definitely be empty rx operations, e.g. When guest virtio port send
  burst to the host, then it starts the rx immediately, it's likely
  the packets are still being forwarded by host vhost port and haven't
  reached the guest yet.

  We need to fill up the ring to keep all pmds busy.

   2. iofwd doesn't provide retry mechanism, so if packet loss occurs,
  there won't be a full burst in the loop.

To address these issues, this patch:

   1. Add retry option in testpmd to prevent most packet losses.

   2. Add parameter to enable configurable tx_first burst number.

Other related improvements include:

   1. Handle all rxqs when multiqueue is enabled: Current testpmd forces a
  single core for each rxq which causes inconvenience and confusion.

  This change doesn't break anything, we can still force a single core
  for each rxq, by giving the same number of cores with the number of
  rxqs.

  One example: One Red Hat engineer was doing multiqueue test, there're
  2 ports in guest each with 4 queues, and testpmd was used as the
  forwarding engine in guest, as usual he used 1 core for forwarding, as
  a results he only saw traffic from port 0 queue 0 to port 1 queue 0,
  then a lot of emails and quite some time are spent to root cause it,
  and of course it's caused by this unreasonable testpmd behavior.

  Moreover, even if we understand this behavior, if we want to test the
  above case, we still need 8 cores for a single guest to poll all the
  rxqs, obviously this is too expensive.

  We met quite a lot cases like this.

   2. Show topology at forwarding start: "show config fwd" also does this,
  but show it directly can reduce the possibility of mis-configuration.

  Like the case above, if testpmd shows topology at forwarding start,
  then probably all those debugging efforts can be saved.

   3. Add throughput information in port statistics display for "show port
  stats (port_id|all)".

Finally there's documentation update.

Example on how to enable vhost/virtio performance loopback test:

   1. Start testpmd in host with 1 vhost port only.

   2. Start testpmd in guest with only 1 virtio port connected to the
  corresponding vhost port.

   3. "set fwd io retry" in testpmds in both host and guest.

   4. "start" in testpmd in guest.

   5. "start tx_first 16" in testpmd in host.

Then use "show port stats all" to monitor the performance.

--
Changes in v2:

   1. Add retry as an option for existing forwarding engines except rxonly.

   2. Minor code adjustment and more detailed patch description.


Zhihong Wang (5):
  testpmd: add retry option
  testpmd: configurable tx_first burst number
  testpmd: show throughput in port stats
  testpmd: handle all rxqs in rss setup
  testpmd: show topology at forwarding start

 app/test-pmd/Makefile   |   1 -
 app/test-pmd/cmdline.c  | 118 +++-
 app/test-pmd/config.c   |  79 +++---
 app/test-pmd/csumonly.c |  12 ++
 app/test-pmd/flowgen.c  |  12 ++
 app/test-pmd/icmpecho.c |  15 +++
 app/test-pmd/iofwd.c|  22 +++-
 app/test-pmd/macfwd-retry.c | 164 
 app/test-pmd/macfwd.c   |  13 +++
 app/test-pmd/macswap.c  |  12 ++
 app/test-pmd/testpmd.c  |  13 ++-
 app/test-pmd/testpmd.h  |  14 ++-
 app/test-pmd/txonly.c   |  12 ++
 doc/guides/testpmd_app_ug/run_app.rst   |   1 -
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  16 +--
 15 files changed, 303 insertions(+), 201 deletions(-)
 delete mode 100644 app/test-pmd/macfwd-retry.c

-- 
2.5.0

[dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy

2016-06-01 Thread Olivier MATZ

Hi Jerin,

>>> /* Add elements back into the cache */
>>> -   for (index = 0; index < n; ++index, obj_table++)
>>> -   cache_objs[index] = *obj_table;
>>> +   rte_memcpy(_objs[0], obj_table, sizeof(void *) * n);
>>>  
>>> cache->len += n;
>>>  
>>>
>>
>> I also checked in the get_bulk() function, which looks like that:
>>
>>  /* Now fill in the response ... */
>>  for (index = 0, len = cache->len - 1;
>>  index < n;
>>  ++index, len--, obj_table++)
>>  *obj_table = cache_objs[len];
>>
>> I think we could replace it by something like:
>>
>>  rte_memcpy(obj_table, _objs[len - n], sizeof(void *) * n);
>>
>> The only difference is that it won't reverse the pointers in the
>> table, but I don't see any problem with that.
>>
>> What do you think?
> 
> In true sense, it will _not_ be LIFO. Not sure about cache usage implications
> on the specific use cases.

Today, the objects pointers are reversed only in the get(). It means
that this code:

rte_mempool_get_bulk(mp, table, 4);
for (i = 0; i < 4; i++)
printf("obj = %p\n", t[i]);
rte_mempool_put_bulk(mp, table, 4);


printf("-\n");
rte_mempool_get_bulk(mp, table, 4);
for (i = 0; i < 4; i++)
printf("obj = %p\n", t[i]);
rte_mempool_put_bulk(mp, table, 4);

prints:

addr1
addr2
addr3
addr4
-
addr4
addr3
addr2
addr1

Which is quite strange.

I don't think it would be an issue to replace the loop by a
rte_memcpy(), it may increase the copy speed and it will be
more coherent with the put().


Olivier

94 matches

Mail list logo